What is feature encoding?
Feature values can be encoded for (1) data compatibility or (2) to improve model performance. For data compatibility, your modeling algorithm may need, for example, to convert non-numeric features into numerical values or to resize inputs to a fixed size. Many deep-learning models have better performance when their numerical input features are standardized - that is, they have a mean of zero and a standard deviation of one.
Do I need feature encoding?
Probably, but it depends on your modeling algorithm. Examples of modeling algorithms that require encoding categorical features are deep learning and XGBoost. Catboost, however, does not require encoding categorical features. XGBoost works fine without encoding numerical features. However, deep learning models require encoding of numerical features to improve their performance.
Example of categorical and numerical encoding in Scikit-Learn
Here the data contains both categorical and numerical features. The categorical features are one-hot encoded using scikit-learn's OneHotEncoder, which creates a binary representation of each stringified category. The numerical features are standardized using scikit-learn's StandardScaler, which subtracts the mean and divides by the standard deviation. Finally, the encoded and scaled features are concatenated into a single feature matrix X, which can be used as input to a machine learning model.