SoFunction
Updated on 2024-11-19

python implements several normalization methods (Normalization Method)

Data normalization is an important issue in data mining when the expression of feature vectors, when different features are listed together, due to the expression of the features themselves lead to the absolute value of the small data is eaten by the big data "eat" the situation, this time we need to do is the extracted features vector for normalization to ensure that each feature is treated equally by the classifier. Below I describe several common Normalization Method, and provide the corresponding python implementation (actually very simple):

1. (0,1) Normalization:

This is the simplest and easiest way to think of, by iterating over each piece of data in the feature vector, recording the Max and Min's and normalizing the data by using Max-Min as the base (i.e., Min=0, Max=1):


LaTex:{x}_{normalization}=\frac{x-Min}{Max-Min}

Python implementation:

def MaxMinNormalization(x,Max,Min):
	x = (x - Min) / (Max - Min);
	return x;

Just use () and () directly to find the size, and try not to use python's built-in max() and min() unless you like to manage numbers in a List.

2. Z-score standardization:

This method gives the original data mean (mean) and standard deviation (standard deviation) to standardize the data. The processed data meets the standard normal distribution, that is, the mean is 0, the standard deviation is 1, the key here is the composite standard normal distribution, I personally believe that to a certain extent changed the distribution of features, on the use of the experience of welcome to discuss, I'm not very familiar with this kind of standardization, the transformation function is:


LaTex:{x}_{normalization}=\frac{x-\mu }{\sigma }

Python implementation:

def Z_ScoreNormalization(x,mu,sigma):
	x = (x - mu) / sigma;
	return x;

Same here, just use () for mu (i.e., mean) and () for sigma (i.e., standard deviation).

3、Sigmoid function

Sigmoid function is a function with an S-shaped curve, is a good threshold function, centrosymmetric at (0, 0.5), in the (0, 0.5) has a relatively large slope, and when the data tends to positive infinity and negative infinity, the mapped value will infinitely converge to 1 and 0, is a very personal favorite "normalization method! ", the reason for quotation marks because I think the Sigmoid function in the threshold segmentation also has a very good performance, according to the formula changes, you can change the segmentation threshold, here as a normalization method, we only consider (0, 0.5) as the segmentation threshold for the case of the point:


LaTex:{x}_{normalization}=\frac{1}{1+{e}^{-x}}

Python implementation:

def sigmoid(X,useStatus):
	if useStatus:
		return 1.0 / (1 + (-float(X)));
	else:
		return float(X);

Here useStatus manages the status of whether or not to use sigmoid, for debugging purposes.

This is the whole content of this article.