SoFunction
Updated on 2024-12-11

Basic configuration of matplotlib for plotting histograms (universal template example)

Introduction to Histograms

A histogram, also known as a mass distribution chart, is a statistical reporting chart that consists of a series of longitudinal bars or lines of varying heights that represent the distribution of data. The horizontal axis is generally used to indicate the type of data and the vertical axis to indicate the distribution.

A histogram is an accurate graphical representation of the distribution of numerical data. It is an estimate of the probability distribution of a continuous variable (quantitative variable) and was first introduced by Karl Pearson. It is a bar graph.

To construct a histogram, the first step is to segment the range of values, i.e., divide the entire range of values into a series of intervals, and then count how many values are in each interval. These values are usually specified as consecutive, non-overlapping intervals of the variable. The intervals must be contiguous and are usually (but not necessarily) of equal size.

The histogram can also be normalized to show "relative" frequencies. It then shows the proportion of cases belonging to each of several categories, with a height equal to 1.

Parameters for plotting histograms (())

Generally speaking, there are many ways to draw histograms, such as the use of matplotlib inside the module to draw, can also be pandas inside the graph to draw, you can also use Python inside the other statistical plotting module to draw graphs, in short, want to graphical display of the beautiful, then you need to configure their own, that is to say, the template is important, but if you do not understand the principle of carrying and borrowing, but the effect is not very good! I do not understand the principle of carrying and borrowing, but the effect is not very good!

Example of connecting to a database for histogram plotting

# -*- coding: utf-8 -*-
 
import numpy as np
import matplotlib as mpl
import  as plt
from matplotlib.font_manager import FontProperties 
['-serif']=['SimHei']     #Display Chinese
['axes.unicode_minus']=False       # Normal display of negative sign
 
import pymysql
 
 
# Connecting to MySQL databases
v1 = []
v2 = []
db = (host='127.0.0.1', port=3306, database='mydb',user='root',password='root')
cursor = ()
 
# Read the order form data and count the daily profit amount
sql_str = "SELECT order_date,ROUND(SUM(profit)/10000,2) FROM orders WHERE FY=2019 GROUP BY order_date"
(sql_str)
result = ()
for res in result:
    (res[0])  # order_date
    (res[1])  # sum_profit_by_order_date Daily profit amount
 
(figsize=(10,5))         # Setting the graphic size
cs,bs,bars = (v2, bins=20, density=False, facecolor="cyan", edgecolor="black", alpha=0.7)
width = bs[1]-bs[0]
for i,c in enumerate(cs):
    (bs[i]+width/3,c,round(c))
 
# Returns an array of counts, an array of bins, and a graph object
# Show horizontal axis labels
("Interval.",fontdict={'family':'Fangsong','fontsize':15})
# Show vertical axis labels
("Frequency",fontdict={'family':'Fangsong','fontsize':15})
# Show diagram title
("Histogram of the distribution of profit amounts",fontdict={'family':'Fangsong','fontsize':20})
()

Plotting using the plot function inside a dataframe (universal template)

In general, we import data, the probability is based on the table data for visualization, rarely use those autonomous independent data for drawing, if it is the kind of data, many people will go to the use of origin this mapping software, the biggest advantage of the program drawing is that there is no need for the data results of the output, the input, which reduces our time to a large extent, and improve the efficiency of our work. work efficiency.

# Plotting using DataFrame's plot function
import numpy as np
import matplotlib as mpl
import  as plt
from matplotlib.font_manager import FontProperties 
['-serif']=['SimHei']     #Display Chinese
['-serif'] = 'KaiTi' # Set the global font to Chinese italic
['axes.unicode_minus']=False       # Normal display of negative sign
(dpi=130)
datafile = r'../data/'
data = pd.read_csv(datafile).query("FY==2019").groupby('ORDER_DATE')[['PROFIT']].sum()
(kind='hist',bins=20,figsize=(15,5),color='y',alpha=0.5,edgecolor='c',histtype='bar')
 
 
("Interval.",fontdict={'family':'Fangsong','fontsize':15})
("Frequency",fontdict={'family':'Fangsong','fontsize':15})
("Histogram of the distribution of profit amounts",fontdict={'family':'Fangsong','fontsize':20},y=1.03)
 
# Set values for each type of theme on the graph
('Histogram case',size=22,y=1.05)
("Drawing Date: 2022 Nickname: Wang Xiao Wang-123", loc='right',size=12,y=1.03)
 
("Home page: /weixin_47723732", loc='left',size=12,y=1.03)
 
()

Plotting Multiple Subplots (Multiple Subplot Histogram Case Template)

plt.tight_layout() # automatic compact layout to avoid occlusion

is a very important parameter, usually added at the end of the parameter

import pandas as pd
 
datafile = r'../data/'
data = pd.read_csv(datafile).query("FY==2019").groupby('ORDER_DATE')[['PROFIT']].sum()
 
fig = (figsize=(10,5),dpi=130)  # Generate canvas
 
# Generate subfigure 1
ax1 = (121)  # 1 of 1 row and 2 columns
("CSDN Blogging Expert.", loc='left',size=12,y=1.03) #add a note
 
# Generate subfigure 2
ax2 = (122)  # 2 of 1 row and 2 columns
 
# Set values for each type of theme on the graph
("Wang Xiao Wang-123.", loc='right',size=12,y=1.03)#add a note
 
 
# Make a figure-level plotting function that generates a new figure by default, and you can specify the coordinate subplot of the plot via the ax parameter.
(kind='hist',bins=20,color='c',alpha=0.5,edgecolor='c',histtype='bar',ax=ax1,figure=fig)  # Specify that this figure is to be drawn into ax1
#("interval",fontdict={'family':'Fangsong','fontsize':15})
ax1.set_xlabel("Interval.",fontdict={'family':'Fangsong','fontsize':15})
#("frequency",fontdict={'family':'Fangsong','fontsize':15})
ax1.set_ylabel("Frequency",fontdict={'family':'Fangsong','fontsize':15})
ax1.set_title("cyan")
#print(ax1.get_xticks())
 
(kind='hist',bins=20,color='y',alpha=0.5,edgecolor='y',histtype='bar',ax=ax2,figure=fig) # Specify that this figure is drawn into ax2 #
# = ().set_xlabel() plt. Get the "current" coordinate subplot, be careful where you execute it!
("Interval.",fontdict={'family':'Fangsong','fontsize':15})
("Frequency",fontdict={'family':'Fangsong','fontsize':15})
("yellow")                                                        # Title of subplot
 
("Histogram of the distribution of profit amounts",fontdict={'family':'Fangsong','size':22})  # Title of figure
plt.tight_layout() # Automatic compact layout to avoid occlusion
()

Histograms of probability distributions (statistical graphics)

# -*- coding:utf-8 -*-
import numpy as np
import  as plt
 
# Histograms of probability distributions
# Gaussian distribution
# The mean is 0
mean = 0
#Standard deviation of 1, responding to whether the data are centralized or decentralized values
sigma = 1
x=mean+sigma*(10000)
fig,(ax0,ax1) = (nrows=2,figsize=(9,6))
# The second parameter is whether the column is wider or narrower, the larger the narrower the denser
(x,40,density=1,histtype='bar',facecolor='yellowgreen',alpha=0.75)  # histtype returns an array of bars
##pdf probability distribution graph, how many of the 10,000 numbers fall in a certain interval
ax0.set_title('pdf')
(x,20,density=1,histtype='stepfilled',facecolor='pink',alpha=0.75,cumulative=True,rwidth=0.8) # Returns a line of steps, cumulative=True values of the cumulative
#cdf cumulative probability function, cumulative cumulative. For example, the need to count the probability of the number less than 5
ax1.set_title("cdf")
fig.subplots_adjust(hspace=0.4)
()

Display of line graph distribution within the histogram

import  as mlab
import  as plt
['-serif']=['SimHei']     #Display Chinese
['-serif'] = 'KaiTi' # Set the global font to Chinese italic
['axes.unicode_minus']=False       # Normal display of negative sign
(figsize=(17,8),dpi=120)
 
import numpy as np
from  import norm
(10680801)
mu=100
sigma=15
x=mu+sigma*(500)
num_bins=60
fig,ax=()
#fig,ax=(ncols=2)
#ax1 = ax[0]
#ax2 = ax[1]
n,bins,patches=(x,num_bins,density=True)
y=(bins,mu,sigma)
(bins,y,'--')
ax.set_xlabel('IQ')
ax.set_ylabel('Probability density')
ax.set_title(r'Histogram of the distribution of IQ')
fig.tight_layout()

Histogram of stacked area

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
crime=pd.read_csv(r"/")
fig,ax=()
 
(crime["robbery"],bins=12,histtype="bar",alpha=0.6,label="robbery",stacked=True)
(crime["aggravated_assault"],bins=12,histtype="bar",alpha=0.6,label="aggravated_assault",stacked=True)
()
ax.set_xticks((0,721,60))
ax.set_xlim(0,720)
ax.set_yticks((0,21,4))
()

Plotting the distribution of values of various types of crime data in different subplots

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
crime=pd.read_csv(r"/")
 
crime = ("state!='United States'").query("state!='District of Columbia'")
 
(figsize=(10,5),dpi=120)
nrows=2
ncols=4
n = (nrows*ncols)+1
for i in n:
    ax = (nrows,ncols,i)
    ([:,i])
    ax.set_title([i])
 
("Numerical distribution of data on various types of crime",y=1.02)
plt.tight_layout()

Other cases

Frequency histogram of passenger age distribution

# Import third-party libraries
import pandas as pd
import  as plt
 
# Setting Chinese
['-serif'] = ['SimHei']
 
# Create graphics
(figsize=(20,8),dpi=80)
 
# Prepare data (read Titanic dataset)
titanic = pd.read_csv(r'E:\PythonData\exercise_data\')
 
# Check for missing ages
any(())
 
# Delete observations containing missing ages
(subset=['Age'], inplace=True)
 
# Plot: Frequency histogram of passenger age
(, # Plotting data
        bins = 20, # Specify the number of bars in the histogram as 20
        color = 'steelblue', # Specify the fill color
        edgecolor = 'k', # Set the histogram border color
        label = 'Histogram'
        )# Presenting labels for histograms
 
# Scale setting
(fontsize=15)
(fontsize=15)
 
# Add description information
('Age: years',fontsize=20)
('Number of persons: one',fontsize=20)
('Age distribution of passengers',fontsize=20)
 
# Display graphics
()

 

Histogram of male and female passengers (2D data)

Setting the group distance and other parameters

# Import library
import  as plt
import numpy as np
 
# Set the font
['-serif'] = ['SimHei']
 
# Create graphics
(figsize=(20,8),dpi=80)
 
# Extraction of gender-specific age data
age_female = [ == 'female']
age_male = [ == 'male']
 
# Set group spacing for histograms
bins = ((), (), 2)
 
# Male passenger age histogram
(age_male, bins = bins, label = 'Male',edgecolor = 'k', color = 'steelblue', alpha = 0.7)
 
# Histogram of age of female passengers
(age_female, bins = bins, label = 'Female',edgecolor = 'k', alpha = 0.6,color='r')
 
# Adjustment scale
(fontsize=15)
(fontsize=15)
 
# Setting up axis labels and titles
('Age histograms of male and female passengers',fontsize=20)
('Age',fontsize=20)
('Number of people',fontsize=20)
 
# Remove the scale from the top and right borders of the graphic
plt.tick_params(top='off', right='off')
 
# Show legend
(loc='best',fontsize=20)
 
# Display graphics
()

Histogram of movie duration distribution

# Import library
import  as plt
 
# Set the font
['-serif'] = ['SimHei']
 
# Create graphics
(figsize=(20,8),dpi=80)
 
# Prepare data
time=[131,98,125,131,124,139,131,117,128,108,135,138,131,102,107,114,119,128,121,142,127,130,124,101,110,116,117,110,128,128,115,99,136,126,
   134,95,138,117,111,78,132,124,113,150,110,117,86,95,144,105,126,130,126,130,126,116,123,106,112,138,123,86,101,99,136,123,117,119,105,
   137,123,128,125,104,109,134,125,127,105,120,107,129,116,108,132,103,136,118,102,120,114,105,115,132,145,119,121,112,139,125,138,109,
   132,134,156,106,117,127,144,139,139,119,140,83,110,102,123,107,143,115,136,118,139,123,112,118,125,109,119,133,112,114,122,109,106,
   123,116,131,127,115,118,112,135,115,146,137,116,103,144,83,123,111,110,111, 100,154,136,100,118,119,133,134,106,129,126,110,111,109,
   141,120,117,106,149,122,122,110,118,127,121,114,125,126,114,140,103,130,141,117,106,114,121,114,133,137,92,121,112,146,97,137,105,98,
   117,112,81,97,139,113,134,106,144,110,137,137,111,104,117,100,111,101,110,105,129,137,112,120,113,133,112,83,94,146, 133,101,131,116,
   111, 84,137,115,122,106,144,109,123,116,111,111,133,150]
# Set group spacing
bins=2
 
groups = int((max(time)-min(time))/bins)
 
# Plotting histograms
(time,groups,color='b',
            edgecolor = 'k',
        density = True) # Specify the boundary color of the histogram)
 
# Adjustment scale
(list(range(min(time),max(time)))[::2],fontsize=15)
(fontsize=15)
 
# Add description information
('Movie length: minutes',fontsize=20)
('Number of movies as a percentage',fontsize=20)
 
# Add grids
(True,linestyle='--',alpha=1)
 
# Add title
('Histogram of movie duration distribution',fontsize=20)
 
()

to this article on the basic configuration of matplotlib histogram (universal template case) of the article is introduced to this, more related matplotlib histogram content please search my previous posts or continue to browse the following related articles I hope that you will support me in the future!