In the process of machine learning, the processing of data, often need to normalize the data, the following is introduced (0, 1) normalization, simply put, its function is to preprocessed data value range according to a certain relationship "compression" to (0, 1) range class.
The usual (0, 1) labeling treatment is given by:
That is, the value of the sample point minus the minimum value, and then divided by the difference between the largest and smallest sample point values, the principle formula is so basic.
Here's a look at a programmatic implementation using the python language
import numpy as np import as plt def noramlization(data): minVals = (0) maxVals = (0) ranges = maxVals - minVals normData = ((data)) m = [0] normData = data - (minVals, (m, 1)) normData = normData/(ranges, (m, 1)) return normData, ranges, minVals x = ([[78434.0829, 26829.86612], [78960.4042, 26855.13451], [72997.8308, 26543.79201], [74160.2849, 26499.56629], [75908.5746, 26220.11996], [74880.6989, 26196.03995], [74604.7169, 27096.87862], [79547.6796, 25986.68579], [74997.7791, 24021.50132], [74487.4915, 26040.18441], [77134.2636, 24647.274], [74975.2792, 24067.31441], [76013.5305, 24566.02273], [79191.518, 26840.29867], [80653.4589, 25937.22248], [79185.9935, 26996.18228], [74426.881, 24227.71439], [73246.4295, 26561.59268], [77963.1478, 25580.05298], [74469.8778, 26082.15448], [81372.3787, 26649.69232], [76826.8262, 24549.77367], [77774.2608, 25999.96037], [79673.1361, 25229.04353], [75251.7951, 24902.72185], [78458.073, 23924.15117], [82247.5439, 29671.33493], [82041.2247, 27903.34268], [80083.2029, 28692.35517], [80962.0043, 28519.81002], [79799.8328, 28740.27736], [80743.9947, 28862.75402], [80888.449, 29724.53706], [81768.4638, 30180.20618], [80283.8783, 30417.55057], [79460.7078, 29092.52867], [75514.1202, 28071.73721], [80595.5945, 30292.25917], [80750.4876, 29651.32254], [80020.662, 30023.70025], [82992.3395, 29466.83067], [80185.5946, 29943.15481], [81854.6163, 29846.18257], [81526.4017, 30218.27078], [79174.5312, 29960.69999], [78112.3051, 26467.57545], [80262.4121, 29340.23218], [81284.9734, 28257.71529], [81928.9905, 28752.84811], [80739.2727, 29288.85126], [83135.3435, 30223.4974], [83131.8223, 29049.10112], [82549.9076, 28910.15209], [81574.0822, 28326.55367], [80507.399, 28553.56851], [82956.2103, 29157.62372], [81909.7132, 29359.24497], [80893.5603, 29326.64155], [82520.1272, 30424.96703], [82829.8548, 31062.24418], [80532.1495, 29198.10407], [80112.7963, 29143.47905], [81175.0882, 28443.10574]]) newgroup, _, _ = noramlization(x) newdata = newgroup (x[:, 0], x[:, 1], marker='*', c='r', s=24) () print(len(x[:, 0])) print(len(x[:, 1])) print(newdata)
After normalizing the data and using matplotlib the processed scatterplot distribution was plotted as follows:
You can see that the values of the data are all in the range of (0, 1).
This is the whole content of this article.