SoFunction
Updated on 2024-11-20

Python Data Visualization Exploration Example Sharing

I. Data visualization and exploratory maps

Data visualization refers to the presentation of data in graphical or tabular form. Graphs can clearly present the nature of data and the relationship between data or attributes, and can be easily interpreted. Exploratory Graphs allow users to understand the characteristics of the data, look for trends, and lower the threshold of understanding the data.

II. Examples of common charts

In this chapter, we mainly use Pandas to draw diagrams, instead of using Matplotlib module. In fact, Pandas has already integrated Matplotlib's drawing methods into DataFrame, so in practical applications, users do not need to refer to Matplotlib directly to complete the drawing work.

1. Line Chart

Line charts are the most basic charts that can be used to present the relationship between successive data in different columns. Line charts are drawn using the () method, which allows you to set parameters such as color and shape. In terms of usage, the line charting method is fully inherited from Matplotlib, so the program must also call () at the end to generate the chart, as shown in Figure 8.4.

df_iris[['sepal length (cm)']].()
()
ax = df[['sepal length (cm)']].(color='green',title="Demo",style='--')
(xlabel="index", ylabel="length")
()

2. Scatterplot

Scatter Chart is used to review the relationship between discrete data in different columns. A scatter chart is drawn using (), as shown in Figure 8.5.

df = df_iris
(x='sepal length (cm)', y='sepal width (cm)')
from matplotlib import cm
cmap = cm.get_cmap('Spectral')
(x='sepal length (cm)',
          y='sepal width (cm)',
          s=df[['petal length (cm)']]*20,
          c=df['target'],
          cmap=cmap,
          title='different circle size by petal length (cm)')

3. Histograms, bar charts

A Histogram Chart is usually used to present the distribution of continuous data in the same column. Another type of chart similar to the Histogram Chart is the Bar Chart, which is used to review the same column, as shown in Figure 8.6.

df[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)','petal width (cm)']].()
2 .value_counts().()

4. Pie charts, box plots

Pie Chart can be used to review the proportion of each category in the same column, while Box Chart can be used to review the same column or to compare the differences in the distribution of data in different columns, as shown in Figure 8.7.

.value_counts().(legend=True)
(column=['target'],figsize=(10,5))

Data Exploration Practical Sharing.

This section utilizes two real datasets to actually demonstrate several techniques for data exploration.

III. Community surveys

About 3.5 million households are asked detailed questions about who they are and how they live each year in the American Community Survey (ACS). The survey covers many topics, including ancestry, education, work, transportation, Internet use, and residence.

Data Name:2013 American Community Survey。

Begin by observing what the data look like and their characteristics, as well as the meaning, type, and range that each column represents.

# Read the data
df = pd.read_csv("./")
# of column types

# (756065,231)
# Range of field values
()

First, the two strings are connected, and the data contains a total of 300,000 entries in three fields: SCHL (School Level), PINCP (Income), and ESR (Work Status).

pusa = pd.read_csv("") pusb = pd.read_csv("")
# Serialize two copies of data
col = ['SCHL','PINCP','ESR']
df['ac_survey'] = ([pusa[col],pusb[col],axis=0)

The data were clustered based on educational qualifications, and the proportion of the number of people with different educational qualifications was observed, followed by the calculation of their average income.

group = df['ac_survey'].groupby(by=['SCHL']) print('Educational Distribution:' + ())
group = ac_survey.groupby(by=['SCHL']) print('Average income:' +())

IV. Boston Housing Data Set

The Boston House Price Dataset contains information about homes in the Boston area, including 506 data samples and 13 feature dimensions.

Data Name:Boston House Price Dataset。

Begin by observing what the data look like and their characteristics, as well as the meaning, type, and range that each column represents.

The distribution of house prices (MEDV) can be plotted as a histogram, as shown in Figure 8.8.

df = pd.read_csv("./")
# of column types

# (506, 14)
# Range of column values ()
import  as plt
df[['MEDV']].()
()

Notes:The English text in the figure corresponds to the names specified by the author in the code or in the data, and in practice the reader can replace them with the text he or she needs.

The next thing to know is which dimensions are significantly related to "house prices". Let's start with a scatterplot, as shown in Figure 8.9.

# draw scatter chart
(x='MEDV', y='RM') .
()

Finally, the correlation coefficients are calculated and visually presented using a clustered heatmap, as shown in Figure 8.10.

# compute pearson correlation
corr = ()
# draw  heatmap
import seaborn as sns
corr = ()
(corr)
()

The color red indicates a positive relationship; the color blue indicates a negative relationship; and the color white indicates no relationship. rm and house price correlations are skewed towards red for a positive relationship; lstat and pretatio are skewed towards dark blue for a negative relationship; and crim, rad, and age are skewed towards white for no relationship.

To this article on the Python data visualization to explore examples of sharing the article is introduced to this, more related Python data visualization content please search for my previous articles or continue to browse the following related articles I hope that you will support me more in the future!