Today I'm here to talk to you aboutPython
amongaltair
Visualization module, and by calling the module to draw some common charts, with Altair, we can focus more energy and time on understanding the data itself and the meaning of the data, from the complexity of the data visualization process.
What's Altair?
Altair is known as a statistical visualization library because it provides a comprehensive way to know data, understand and analyze data through classification and aggregation, data transformation, data interaction, graphical compositing, etc. And its installation process is very simple, directly through thepip
command to execute, as follows
pip install altair pip install vega_datasets pip install altair_viewer
If you are using the conda package manager to install the Altair module, the code is as follows
conda install -c conda-forge altair vega_datasets
Altair First Experience
Let's simply try to draw a histogram by first creating aDataFrame
dataset, the code is as follows
df = ({"brand":["iPhone","Xiaomi","HuaWei","Vivo"], "profit(B)":[200,55,88,60]})
Next is the code for plotting the histogram
import altair as alt import pandas as pd import altair_viewer chart = (df).mark_bar().encode(x="brand:N",y="profit(B):Q") # Display the data, call the display() method altair_viewer.display(chart,inline=True)
output
Looking at the entire syntactic structure, first using the()
Specify the dataset to use, and then use the example methodmark_*()
Drawing chart style, and finally specify the data represented by the X-axis and Y-axis, you may be curious, among theN
as well asQ
What they stand for, respectively, this is an abbreviated form of the variable type, in other wordsAltair
The module needs to understand the types of variables involved in drawing a graph, only then will the graph be drawn with the effect we expect.
includedN
represents a variable of nominal type (Nominal
), for example, the brands of cell phones are all one moniker, and theQ
represents a numeric variable (Quantitative
), which can be categorized into discrete data (discrete
) and continuous type data (continuous
), in addition to time series type data, abbreviated asT
and order-type variables (O
), for example, there are 1-5 star ratings for merchants in the online shopping process.
Preservation of charts
For saving the final chart, we can just call thesave()
method to save the object asHTML
file with the following code
("")
It can also be saved asJSON
file, which is very similar from the code point of view
("")
Of course, we can also save the file as an image format, as shown in the following figure
Altair Advanced Operations
Let's build on the above by deriving and expanding on it, for example, if we want to draw a horizontally oriented bar graph.X
shafts andY
The data for the axes are interchanged with the following code
chart = (df).mark_bar().encode(x="profit(B):Q", y="brand:N") ("")
output
Let's also try to draw a line graph, calling themark_line()
The method code is as follows
## Create a new set of data with date as the row index value (29) value = (365) data = (value) date = pd.date_range(start="20220101", end="20221231") df = ({"num": data}, index=date) line_chart = (df.reset_index()).mark_line().encode(x="index:T", y="num:Q") line_chart.save("")
output
We can also draw a Gantt chart, which is often used in project management.X
axis adds the time and date, while theY
The axis represents the progress of the project, and the code is as follows
project = [{"project": "Proj1", "start_time": "2022-01-16", "end_time": "2022-03-20"}, {"project": "Proj2", "start_time": "2022-04-12", "end_time": "2022-11-20"}, ...... ] df = (values=project) chart = (df).mark_bar().encode( ("start_time:T", axis=(format="%x", formatType="time", tickCount=3), scale=(domain=[(year=2022, month=1, date=1), (year=2022, month=12, date=1)])), alt.X2("end_time:T"), ("project:N", axis=(labelAlign="left", labelFontSize=15, labelOffset=0, labelPadding=50)), color=("project:N", legend=(labelFontSize=12, symbolOpacity=0.7, titleFontSize=15))) ("chart_gantt.html")
output
In the chart above, we see that the team is working on several projects, each of which has a different level of progress, and of course, the time span of the different projects is also different, which is very intuitive when shown on the chart.
Immediately after that, we'll plot the scatterplot again, calling themark_circle()
method with the following code
df = () ## Filter out passenger car data for the region "USA", i.e., the United States. df_1 = (df).transform_filter( == "USA" ) df = () df_1 = (df).transform_filter( == "USA" ) chart = df_1.mark_circle().encode( ("Horsepower:Q"), ("Miles_per_Gallon:Q") ) ("chart_dots.html")
output
Of course, we can further optimize it to make the chart look more beautiful, add some color to it, the code is as follows
chart = df_1.mark_circle(color=("radial",[("white", 0.0), ("red", 1.0)]), size=160).encode( ("Horsepower:Q", scale=(zero=False,padding=20)), ("Miles_per_Gallon:Q", scale=(zero=False,padding=20)) )
output
We change the size of the scatter, different scatter sizes represent different values, the code is as follows
chart = df_1.mark_circle(color=("radial",[("white", 0.0), ("red", 1.0)]), size=160).encode( ("Horsepower:Q", scale=(zero=False, padding=20)), ("Miles_per_Gallon:Q", scale=(zero=False, padding=20)), size="Acceleration:Q" )
output
Above is the use of Python visualization module altair detailed content, more information about Python visualization module altair please pay attention to my other related articles!