All About Normalization In Machine Learning….

Pranati Maity Roy
6 min readFeb 7, 2021

--

Hello Guys, Here I am to give you a brief and clear idea on what Normalization actually is and why do we use it so…Hope it helps you all a lot in study of machine learning. So here we start…

What is Normalization?

Normalization is basically a technique which scale down your features in dataset in a same range and basically the range is considered as 0 to 1.

So basically it is the adjusting values measured on different scales to a notionally common scale.

Data with min value 0 and max value 1

Methods of Normalisation:

The most basic technique used for normalization is Min-Max scaling.

x ′ = ( X− X m i n ) / ( Xm a x − Xm i n )

Here, Xmax and Xmin are the maximum and the minimum values of the feature respectively. x ′ is the normalized value we get and X is the value for which we want to get a normalized value.

Clipping Method- Here We follow if x > max, then x’ = max. if x < min, then x’ = min.
out of these we also use, Log Scaling- x’ = log(x)

but mostly we use the min-max scaling technique everywhere.

Why is it being Used?

When we do web scrapping or retrieve our data from different resources, a lot of features will be there(here we talk about basically data columns which are known as the features). The Data mainly consists of two types of features- 1. Independent Feature 2. Dependent Feature(our target variable).

And We consider our features, those are having two properties-1. Unit 2. Magnitude. As all the independent features are different to each other they are being computed in different units, under one feature heading, there will be different values of that feature which are actually its different magnitude at different scenario.

A Dataset with Features

In the above picture Green colored columns are independent variable based on which the red colored column that is our target variable “Expected Position Level” is to be calculated.

Now see one independent feature “Salary” unit will be in rupee or dollar say, its range lies in general within 1000–10,00,000. but The another independent feature “Year of Experience” unit will be in years and its range will be within 0 to 60 years.

So here try to understand The two independent features will contain the different range values that holds the meaning of dissimilar significance. for example let’s assume one persons salary is 1000 it considered as lower salary but another person’s year of experience is 40 years it should be considered in high significance but comparing the value 1000 the machine will give less importance to the value because the machine can only understand numbers and will give the importance basis on the natural number it knows without considering the unit.

And it initialize weights on the each independent feature value only based on the magnitude irrespective of the measured unit, it will not understand what is the number’s actual significance in that feature and give the wrong weight initialization and to get the optimal weight(to get right weight associated with the feature), the more back propagation will be needed and according to which computation time will be more and for the more calculation some time complexity will also be there. So we apply normalization here to bring all the values between 0 to 1 based on the unit in what the value was measured.

And The most basic technique used for normalization is Min-Max scaling. We use the following formula to get the min-max normalization.

x ′ = ( x − X min ) / ( X m a x − X min)

Here, X max and X min are the maximum and the minimum values of the feature respectively. x ′ is the normalized value we get,

For the above dataset for the “Salary” Data having maximum value 120000 and minimum value present in dataset is 32000 . If we find a normalized value for the value 55000 .

Then the normalized value of 55000=(55000–32000)/(120000–32000)=23000/88000=0.26

for the “Year of Experience” Data having maximum value 15 and minimum value present in dataset is 5. If we find a normalized value for the value 6.

Then the normalized value of 6=(6–5)/(15–5)=1/10=0.1 thus machine get all the numbers in 0 to 1 with similar significance it will easily put weight according their importance in the dataset.

There is another reason of this, we already saw that we do outlier treatment, missing value treatment, feature engineering to get the all our useful features for our machine learning algorithm, so after getting the useful features if there is any outlier present in your data, then model will consider the outlier data also and draw the best fitted line through the data, but that will be generally wrong prediction and if we normalize our data, then in between the small range data will be well uniformed to get the best fitted line.

The best fit line is that makes the vertical distance from the data points to the regression line as small as possible. for the red line each distance is more than that black line which it should be, therefore we will be getting the wrong prediction respect to original.

Here we apply normalization on the features, the data points will be distributed and closed to each other in a small range. Hence we will get a correct best fitted line easily.

So where can we use Normalization?

In KNN, k-means clustering, where the Euclidean distance formula is introduced there we can use normalization to avoid larger calculation

And also in Linear, Logical Regression, All the deep learning Algorithm-ANN,CNN where term gradient descent come, the weight needs to initialized. there is the actual need of Normalization.

But see, in decision tree, random forest we just extract important feature based on the smallest tree calculation. we are going to divide our dataset in getting small no of tress with leaf node or less variation values, there is no need of scaling, so normalization is not needed here.

Hence all the facts related to Normalization, It is basically used for the scaling data in a range. There is another method used for scaling down which is Standardization. It also helps us to scale down our data based on standard normal distribution where mean=0 and standard deviation =1. But two are doing the same work but on different principle, there is a misconception both are same, but they works on different way, only scaling down of features are the same topic between those.

left one is normal distribution : right one is standard normal distribution

So Here it is all, hope you all get the idea of what normalization is and why it is applied in ML.

--

--

Pranati Maity Roy

Data Scientist and Machine Learning , DL vehemence, want to explain all the complex terms I faced in simple way.