Introduction to Histograms
A histogram, also known as a mass distribution chart, is a statistical reporting chart that consists of a series of longitudinal bars or lines of varying heights that represent the distribution of data. The horizontal axis is generally used to indicate the type of data and the vertical axis to indicate the distribution.
A histogram is an accurate graphical representation of the distribution of numerical data. It is an estimate of the probability distribution of a continuous variable (quantitative variable) and was first introduced by Karl Pearson. It is a bar graph.
To construct a histogram, the first step is to segment the range of values, i.e., divide the entire range of values into a series of intervals, and then count how many values are in each interval. These values are usually specified as consecutive, non-overlapping intervals of the variable. The intervals must be contiguous and are usually (but not necessarily) of equal size.
The histogram can also be normalized to show "relative" frequencies. It then shows the proportion of cases belonging to each of several categories, with a height equal to 1.
Parameters for plotting histograms (())
Generally speaking, there are many ways to draw histograms, such as the use of matplotlib inside the module to draw, can also be pandas inside the graph to draw, you can also use Python inside the other statistical plotting module to draw graphs, in short, want to graphical display of the beautiful, then you need to configure their own, that is to say, the template is important, but if you do not understand the principle of carrying and borrowing, but the effect is not very good! I do not understand the principle of carrying and borrowing, but the effect is not very good!
Example of connecting to a database for histogram plotting
# -*- coding: utf-8 -*- import numpy as np import matplotlib as mpl import as plt from matplotlib.font_manager import FontProperties ['-serif']=['SimHei'] #Display Chinese ['axes.unicode_minus']=False # Normal display of negative sign import pymysql # Connecting to MySQL databases v1 = [] v2 = [] db = (host='127.0.0.1', port=3306, database='mydb',user='root',password='root') cursor = () # Read the order form data and count the daily profit amount sql_str = "SELECT order_date,ROUND(SUM(profit)/10000,2) FROM orders WHERE FY=2019 GROUP BY order_date" (sql_str) result = () for res in result: (res[0]) # order_date (res[1]) # sum_profit_by_order_date Daily profit amount (figsize=(10,5)) # Setting the graphic size cs,bs,bars = (v2, bins=20, density=False, facecolor="cyan", edgecolor="black", alpha=0.7) width = bs[1]-bs[0] for i,c in enumerate(cs): (bs[i]+width/3,c,round(c)) # Returns an array of counts, an array of bins, and a graph object # Show horizontal axis labels ("Interval.",fontdict={'family':'Fangsong','fontsize':15}) # Show vertical axis labels ("Frequency",fontdict={'family':'Fangsong','fontsize':15}) # Show diagram title ("Histogram of the distribution of profit amounts",fontdict={'family':'Fangsong','fontsize':20}) ()
Plotting using the plot function inside a dataframe (universal template)
In general, we import data, the probability is based on the table data for visualization, rarely use those autonomous independent data for drawing, if it is the kind of data, many people will go to the use of origin this mapping software, the biggest advantage of the program drawing is that there is no need for the data results of the output, the input, which reduces our time to a large extent, and improve the efficiency of our work. work efficiency.
# Plotting using DataFrame's plot function import numpy as np import matplotlib as mpl import as plt from matplotlib.font_manager import FontProperties ['-serif']=['SimHei'] #Display Chinese ['-serif'] = 'KaiTi' # Set the global font to Chinese italic ['axes.unicode_minus']=False # Normal display of negative sign (dpi=130) datafile = r'../data/' data = pd.read_csv(datafile).query("FY==2019").groupby('ORDER_DATE')[['PROFIT']].sum() (kind='hist',bins=20,figsize=(15,5),color='y',alpha=0.5,edgecolor='c',histtype='bar') ("Interval.",fontdict={'family':'Fangsong','fontsize':15}) ("Frequency",fontdict={'family':'Fangsong','fontsize':15}) ("Histogram of the distribution of profit amounts",fontdict={'family':'Fangsong','fontsize':20},y=1.03) # Set values for each type of theme on the graph ('Histogram case',size=22,y=1.05) ("Drawing Date: 2022 Nickname: Wang Xiao Wang-123", loc='right',size=12,y=1.03) ("Home page: /weixin_47723732", loc='left',size=12,y=1.03) ()
Plotting Multiple Subplots (Multiple Subplot Histogram Case Template)
plt.tight_layout() # automatic compact layout to avoid occlusion
is a very important parameter, usually added at the end of the parameter
import pandas as pd datafile = r'../data/' data = pd.read_csv(datafile).query("FY==2019").groupby('ORDER_DATE')[['PROFIT']].sum() fig = (figsize=(10,5),dpi=130) # Generate canvas # Generate subfigure 1 ax1 = (121) # 1 of 1 row and 2 columns ("CSDN Blogging Expert.", loc='left',size=12,y=1.03) #add a note # Generate subfigure 2 ax2 = (122) # 2 of 1 row and 2 columns # Set values for each type of theme on the graph ("Wang Xiao Wang-123.", loc='right',size=12,y=1.03)#add a note # Make a figure-level plotting function that generates a new figure by default, and you can specify the coordinate subplot of the plot via the ax parameter. (kind='hist',bins=20,color='c',alpha=0.5,edgecolor='c',histtype='bar',ax=ax1,figure=fig) # Specify that this figure is to be drawn into ax1 #("interval",fontdict={'family':'Fangsong','fontsize':15}) ax1.set_xlabel("Interval.",fontdict={'family':'Fangsong','fontsize':15}) #("frequency",fontdict={'family':'Fangsong','fontsize':15}) ax1.set_ylabel("Frequency",fontdict={'family':'Fangsong','fontsize':15}) ax1.set_title("cyan") #print(ax1.get_xticks()) (kind='hist',bins=20,color='y',alpha=0.5,edgecolor='y',histtype='bar',ax=ax2,figure=fig) # Specify that this figure is drawn into ax2 # # = ().set_xlabel() plt. Get the "current" coordinate subplot, be careful where you execute it! ("Interval.",fontdict={'family':'Fangsong','fontsize':15}) ("Frequency",fontdict={'family':'Fangsong','fontsize':15}) ("yellow") # Title of subplot ("Histogram of the distribution of profit amounts",fontdict={'family':'Fangsong','size':22}) # Title of figure plt.tight_layout() # Automatic compact layout to avoid occlusion ()
Histograms of probability distributions (statistical graphics)
# -*- coding:utf-8 -*- import numpy as np import as plt # Histograms of probability distributions # Gaussian distribution # The mean is 0 mean = 0 #Standard deviation of 1, responding to whether the data are centralized or decentralized values sigma = 1 x=mean+sigma*(10000) fig,(ax0,ax1) = (nrows=2,figsize=(9,6)) # The second parameter is whether the column is wider or narrower, the larger the narrower the denser (x,40,density=1,histtype='bar',facecolor='yellowgreen',alpha=0.75) # histtype returns an array of bars ##pdf probability distribution graph, how many of the 10,000 numbers fall in a certain interval ax0.set_title('pdf') (x,20,density=1,histtype='stepfilled',facecolor='pink',alpha=0.75,cumulative=True,rwidth=0.8) # Returns a line of steps, cumulative=True values of the cumulative #cdf cumulative probability function, cumulative cumulative. For example, the need to count the probability of the number less than 5 ax1.set_title("cdf") fig.subplots_adjust(hspace=0.4) ()
Display of line graph distribution within the histogram
import as mlab import as plt ['-serif']=['SimHei'] #Display Chinese ['-serif'] = 'KaiTi' # Set the global font to Chinese italic ['axes.unicode_minus']=False # Normal display of negative sign (figsize=(17,8),dpi=120) import numpy as np from import norm (10680801) mu=100 sigma=15 x=mu+sigma*(500) num_bins=60 fig,ax=() #fig,ax=(ncols=2) #ax1 = ax[0] #ax2 = ax[1] n,bins,patches=(x,num_bins,density=True) y=(bins,mu,sigma) (bins,y,'--') ax.set_xlabel('IQ') ax.set_ylabel('Probability density') ax.set_title(r'Histogram of the distribution of IQ') fig.tight_layout()
Histogram of stacked area
import numpy as np import pandas as pd from matplotlib import pyplot as plt crime=pd.read_csv(r"/") fig,ax=() (crime["robbery"],bins=12,histtype="bar",alpha=0.6,label="robbery",stacked=True) (crime["aggravated_assault"],bins=12,histtype="bar",alpha=0.6,label="aggravated_assault",stacked=True) () ax.set_xticks((0,721,60)) ax.set_xlim(0,720) ax.set_yticks((0,21,4)) ()
Plotting the distribution of values of various types of crime data in different subplots
import numpy as np import pandas as pd from matplotlib import pyplot as plt crime=pd.read_csv(r"/") crime = ("state!='United States'").query("state!='District of Columbia'") (figsize=(10,5),dpi=120) nrows=2 ncols=4 n = (nrows*ncols)+1 for i in n: ax = (nrows,ncols,i) ([:,i]) ax.set_title([i]) ("Numerical distribution of data on various types of crime",y=1.02) plt.tight_layout()
Other cases
Frequency histogram of passenger age distribution
# Import third-party libraries import pandas as pd import as plt # Setting Chinese ['-serif'] = ['SimHei'] # Create graphics (figsize=(20,8),dpi=80) # Prepare data (read Titanic dataset) titanic = pd.read_csv(r'E:\PythonData\exercise_data\') # Check for missing ages any(()) # Delete observations containing missing ages (subset=['Age'], inplace=True) # Plot: Frequency histogram of passenger age (, # Plotting data bins = 20, # Specify the number of bars in the histogram as 20 color = 'steelblue', # Specify the fill color edgecolor = 'k', # Set the histogram border color label = 'Histogram' )# Presenting labels for histograms # Scale setting (fontsize=15) (fontsize=15) # Add description information ('Age: years',fontsize=20) ('Number of persons: one',fontsize=20) ('Age distribution of passengers',fontsize=20) # Display graphics ()
Histogram of male and female passengers (2D data)
Setting the group distance and other parameters
# Import library import as plt import numpy as np # Set the font ['-serif'] = ['SimHei'] # Create graphics (figsize=(20,8),dpi=80) # Extraction of gender-specific age data age_female = [ == 'female'] age_male = [ == 'male'] # Set group spacing for histograms bins = ((), (), 2) # Male passenger age histogram (age_male, bins = bins, label = 'Male',edgecolor = 'k', color = 'steelblue', alpha = 0.7) # Histogram of age of female passengers (age_female, bins = bins, label = 'Female',edgecolor = 'k', alpha = 0.6,color='r') # Adjustment scale (fontsize=15) (fontsize=15) # Setting up axis labels and titles ('Age histograms of male and female passengers',fontsize=20) ('Age',fontsize=20) ('Number of people',fontsize=20) # Remove the scale from the top and right borders of the graphic plt.tick_params(top='off', right='off') # Show legend (loc='best',fontsize=20) # Display graphics ()
Histogram of movie duration distribution
# Import library import as plt # Set the font ['-serif'] = ['SimHei'] # Create graphics (figsize=(20,8),dpi=80) # Prepare data time=[131,98,125,131,124,139,131,117,128,108,135,138,131,102,107,114,119,128,121,142,127,130,124,101,110,116,117,110,128,128,115,99,136,126, 134,95,138,117,111,78,132,124,113,150,110,117,86,95,144,105,126,130,126,130,126,116,123,106,112,138,123,86,101,99,136,123,117,119,105, 137,123,128,125,104,109,134,125,127,105,120,107,129,116,108,132,103,136,118,102,120,114,105,115,132,145,119,121,112,139,125,138,109, 132,134,156,106,117,127,144,139,139,119,140,83,110,102,123,107,143,115,136,118,139,123,112,118,125,109,119,133,112,114,122,109,106, 123,116,131,127,115,118,112,135,115,146,137,116,103,144,83,123,111,110,111, 100,154,136,100,118,119,133,134,106,129,126,110,111,109, 141,120,117,106,149,122,122,110,118,127,121,114,125,126,114,140,103,130,141,117,106,114,121,114,133,137,92,121,112,146,97,137,105,98, 117,112,81,97,139,113,134,106,144,110,137,137,111,104,117,100,111,101,110,105,129,137,112,120,113,133,112,83,94,146, 133,101,131,116, 111, 84,137,115,122,106,144,109,123,116,111,111,133,150] # Set group spacing bins=2 groups = int((max(time)-min(time))/bins) # Plotting histograms (time,groups,color='b', edgecolor = 'k', density = True) # Specify the boundary color of the histogram) # Adjustment scale (list(range(min(time),max(time)))[::2],fontsize=15) (fontsize=15) # Add description information ('Movie length: minutes',fontsize=20) ('Number of movies as a percentage',fontsize=20) # Add grids (True,linestyle='--',alpha=1) # Add title ('Histogram of movie duration distribution',fontsize=20) ()
to this article on the basic configuration of matplotlib histogram (universal template case) of the article is introduced to this, more related matplotlib histogram content please search my previous posts or continue to browse the following related articles I hope that you will support me in the future!