Visualization is indeed relevant for everyone because it is indeed intuitive, and each set of big data can be enlightening if it can be presented in a visualization. But in other scenarios, supplemented with a small amount ofTextual cue
and labeling is essential. While the most basicannotation
The type may be just the axis title vs. the plot title, but annotations can be much more than that. Let's visualize some data and see how we can add annotations to express information more appropriately.
First import some of the functions you need to use for drawing:
import as plt import matplotlib as mpl ('seaborn-whitegrid') import numpy as np import pandas as pd
1 Case: The Impact of Holidays on the U.S. Birth Rate
The data can be found in the/jakevdp/data-CDCbirths Download with the following data types:
Process the data with cleaning methods and then plot the results.
Statistical chart of average daily number of births
births = pd.read_csv('') quartiles = (births['births'], [25, 50, 75]) mu, sig = quartiles[1], 0.74 * (quartiles[2] - quartiles[0]) births = ('(births > @mu - 5 * @sig) & (births < @mu + 5 * @sig)') births['day'] = births['day'].astype(int) = pd.to_datetime(10000 * + 100 * + , format='%Y%m%d') births_by_date = births.pivot_table('births', [, ]) births_by_date.index = [(2012, month, day) for (month, day) in births_by_date.index] fig, ax = (figsize=(12, 4)) births_by_date.plot(ax=ax);
import as plt import matplotlib as mpl ('seaborn-whitegrid') import numpy as np import pandas as pd births = pd.read_csv('C:\\Users\\Y\\Desktop\\data-CDCbirths-master\\') quartiles = (births['births'], [25, 50, 75]) mu, sig = quartiles[1], 0.74 * (quartiles[2] - quartiles[0]) births = ('(births > @mu - 5 * @sig) & (births < @mu + 5 * @sig)') births['day'] = births['day'].astype(int) = pd.to_datetime(10000 * + 100 * + , format='%Y%m%d') births_by_date = births.pivot_table('births', [, ]) births_by_date.index = [(2012, month, day) for (month, day) in births_by_date.index] fig, ax = (figsize=(12, 4)) births_by_date.plot(ax=ax); ()
Add comments to the graph of average daily births statistics
When using a diagram like this to make a point, if you can add some notes to the diagram, it will be more appealing to the reader. This can be done by /
commands to add comments manually, they can put text on specific x / y coordinate points
fig, ax = (figsize=(12, 4)) births_by_date.plot(ax=ax) # Add text labels to diagrams style = dict(size=10, color='gray') ('2012-1-1', 3950, "New Year's Day", **style) ('2012-7-4', 4250, "Independence Day", ha='center', **style) ('2012-9-4', 4850, "Labor Day", ha='center', **style) ('2012-10-31', 4600, "Halloween", ha='right', **style) ('2012-11-25', 4450, "Thanksgiving", ha='center', **style) ('2012-12-25', 3850, "Christmas ", ha='right', **style) # Setting Axis Titles (title='USA births by day of year (1969-1988)', ylabel='average daily births') # Set the x-axis scale value to center the month .set_major_locator(()) .set_minor_locator((bymonthday=15)) .set_major_formatter(()) .set_minor_formatter(('%h'));
import as plt import matplotlib as mpl ('seaborn-whitegrid') import numpy as np import pandas as pd births = pd.read_csv('C:\\Users\\Y\\Desktop\\data-CDCbirths-master\\') quartiles = (births['births'], [25, 50, 75]) mu, sig = quartiles[1], 0.74 * (quartiles[2] - quartiles[0]) births = ('(births > @mu - 5 * @sig) & (births < @mu + 5 * @sig)') births['day'] = births['day'].astype(int) = pd.to_datetime(10000 * + 100 * + , format='%Y%m%d') births_by_date = births.pivot_table('births', [, ]) births_by_date.index = [(2012, month, day) for (month, day) in births_by_date.index] fig, ax = (figsize=(12, 4)) births_by_date.plot(ax=ax) # Add text labels to diagrams style = dict(size=10, color='gray') ('2012-1-1', 3950, "New Year's Day", **style) ('2012-7-4', 4250, "Independence Day", ha='center', **style) ('2012-9-4', 4850, "Labor Day", ha='center', **style) ('2012-10-31', 4600, "Halloween", ha='right', **style) ('2012-11-25', 4450, "Thanksgiving", ha='center', **style) ('2012-12-25', 3850, "Christmas ", ha='right', **style) # Setting Axis Titles (title='USA births by day of year (1969-1988)', ylabel='average daily births') # Set the x-axis scale value to center the month .set_major_locator(()) .set_minor_locator((bymonthday=15)) .set_major_formatter(()) .set_minor_formatter(('%h')); ()
method takes an x-axis coordinate, a y-axis coordinate, a string, and some optional parameters such as the text color, font size, style, alignment, and other text properties. Here we use the
ha='right'
together withha='center'
,ha
is horizontally aligned (horizonal alignment
) abbreviation. For more information on configuration parameters, refer to the()
together with()
The program documentation for the
2 Coordinate transformation and text position
The previous example placed the text at the location of the target data. But sometimes it may be necessary to place the text in a location that is not related to the data, such as an axis or a graph. In Matplotlib, we do this by tweaking theCoordinate transformation (transform)
to realize.
Any graphical display framework requires some mechanism for transforming the coordinate system. For example, when a graphic display frame located in the(x, y) = (1, 1)
Positions are represented in pixels on the screen when the points need to be displayed in a certain way at a specific location on the graph. It's easy to handle such coordinate system transformations mathematically, and Matplotlib has a great set of tools to do just that (these are located in the submodule
(center).
Although the average user does not need to be concerned with the details of these transformations, knowledge of them can be very helpful in placing text on a diagram. There are three predefined transformations that solve this type of problem.
- Data-based coordinate transformations.
- Coordinate transformations in terms of coordinate axes (in axis dimensions).
- Graph-based coordinate transformations (in graph dimensions).
By default, the text above is left-aligned in their respective coordinate systems. The three strings starting with the . characters are basically the corresponding coordinate positions.transData
The coordinates use the labels of the x-axis and y-axis as data coordinates.transAxes
The coordinates are rendered in proportion to the size of the axes, using the position of the lower left corner of the axes (the white rectangles in the figure) as the origin.transFigure
The coordinates are similar, but are rendered as coordinates proportional to the size of the graphic, using the position of the lower left corner of the graphic (the gray rectangle in the figure) as the origin.
Comparing the three coordinate systems of Matplotlib (1)
Here's an example of drawing text in different positions using three transformations:.
fig, ax = (facecolor='lightgray') ([0, 10, 0, 10]) # Although transform= is the default value, set it anyway (1, 5, ". Data: (1, 5)", transform=) (0.5, 0.1, ". Axes: (0.5, 0.1)", transform=) (0.2, 0.2, ". Figure: (0.2, 0.2)", transform=);
import as plt import matplotlib as mpl ('seaborn-whitegrid') import numpy as np import pandas as pd births = pd.read_csv('C:\\Users\\Y\\Desktop\\data-CDCbirths-master\\') quartiles = (births['births'], [25, 50, 75]) mu, sig = quartiles[1], 0.74 * (quartiles[2] - quartiles[0]) births = ('(births > @mu - 5 * @sig) & (births < @mu + 5 * @sig)') births['day'] = births['day'].astype(int) = pd.to_datetime(10000 * + 100 * + , format='%Y%m%d') births_by_date = births.pivot_table('births', [, ]) births_by_date.index = [(2012, month, day) for (month, day) in births_by_date.index] fig, ax = (facecolor='lightgray') ([0, 10, 0, 10]) # Although transform= is the default value, set it anyway (1, 5, ". Data: (1, 5)", transform=) (0.5, 0.1, ". Axes: (0.5, 0.1)", transform=) (0.2, 0.2, ". Figure: (0.2, 0.2)", transform=); ()
Comparing the three coordinate systems of Matplotlib (2)
Note that if you change the upper and lower limits of the axes, only the transData coordinates will be affected, all other coordinate systems will remain unchanged.
ax.set_xlim(0, 2) ax.set_ylim(-6, 6) fig
import as plt import matplotlib as mpl ('seaborn-whitegrid') import numpy as np import pandas as pd births = pd.read_csv('C:\\Users\\Y\\Desktop\\data-CDCbirths-master\\') quartiles = (births['births'], [25, 50, 75]) mu, sig = quartiles[1], 0.74 * (quartiles[2] - quartiles[0]) births = ('(births > @mu - 5 * @sig) & (births < @mu + 5 * @sig)') births['day'] = births['day'].astype(int) = pd.to_datetime(10000 * + 100 * + , format='%Y%m%d') births_by_date = births.pivot_table('births', [, ]) births_by_date.index = [(2012, month, day) for (month, day) in births_by_date.index] fig, ax = (facecolor='lightgray') ([0, 10, 0, 10]) # Although transform= is the default value, set it anyway (1, 5, ". Data: (1, 5)", transform=) (0.5, 0.1, ". Axes: (0.5, 0.1)", transform=) (0.2, 0.2, ". Figure: (0.2, 0.2)", transform=); ax.set_xlim(0, 2) ax.set_ylim(-6, 6) fig ()
If you change the upper and lower limits of the axes, then you can see the changes just described more clearly.
3 Arrows and Notes
In addition to tick marks and text, simple arrows are a useful annotation label.
Drawing arrows inside Matplotlib is usually more difficult than you think. Although there is a()
function can accomplish this, but I don't recommend using it because it creates arrows that areSVG Vector
objects that change as the graphics resolution changes, the end result may not be what the user wants at all. I'm going to recommend the()
function. This function creates both text and arrows, and the arrows it creates can be configured very flexibly.
Graphical annotations
Here's a demonstration using some of annotate's configuration options
fig, ax = () x = (0, 20, 1000) (x, (x)) ('equal') ('local maximum', xy=(6.28, 1), xytext=(10, 4),arrowprops=dict(facecolor='black', shrink=0.05)) ('local minimum', xy=(5 * , -1), xytext=(2, -6),arrowprops=dict(arrowstyle="->", connectionstyle="angle3,angleA=0,angleB=-90"));
import as plt import matplotlib as mpl ('seaborn-whitegrid') import numpy as np import pandas as pd births = pd.read_csv('C:\\Users\\Y\\Desktop\\data-CDCbirths-master\\') quartiles = (births['births'], [25, 50, 75]) mu, sig = quartiles[1], 0.74 * (quartiles[2] - quartiles[0]) births = ('(births > @mu - 5 * @sig) & (births < @mu + 5 * @sig)') births['day'] = births['day'].astype(int) = pd.to_datetime(10000 * + 100 * + , format='%Y%m%d') births_by_date = births.pivot_table('births', [, ]) births_by_date.index = [(2012, month, day) for (month, day) in births_by_date.index] fig, ax = () x = (0, 20, 1000) (x, (x)) ('equal') ('local maximum', xy=(6.28, 1), xytext=(10, 4),arrowprops=dict(facecolor='black', shrink=0.05)) ('local minimum', xy=(5 * , -1), xytext=(2, -6),arrowprops=dict(arrowstyle="->",connectionstyle="angle3,angleA=0,angleB=-90")); ()
The style of the arrows is controlled by the arrowprops dictionary, which has a number of options available. Since these options are described in great detail in the official Matplotlib documentation, I won't go into them again, just a little demonstration of the functionality.
Annotated average number of births per day
Let's use the previous graph of U.S. births to demonstrate some arrow annotations
fig, ax = (figsize=(12, 4)) births_by_date.plot(ax=ax) # Add arrow labels to diagrams ("New Year's Day", xy=('2012-1-1', 4100), xycoords='data',xytext=(50, -30), textcoords='offset points',arrowprops=dict(arrowstyle="->",connectionstyle="arc3,rad=-0.2")) ("Independence Day", xy=('2012-7-4', 4250),xycoords='data',bbox=dict(boxstyle="round", fc="none", ec="gray"),xytext=(10, -40), textcoords='offset points', ha='center', arrowprops=dict(arrowstyle="->")) ('Labor Day', xy=('2012-9-4', 4850), xycoords='data', ha='center', xytext=(0, -20), textcoords='offset points') ('', xy=('2012-9-1', 4850), xytext=('2012-9-7', 4850), xycoords='data', textcoords='data', arrowprops={'arrowstyle': '|-|,widthA=0.2,widthB=0.2', }) ('Halloween', xy=('2012-10-31', 4600), xycoords='data', xytext=(-80, -40), textcoords='offset points', arrowprops=dict(arrowstyle="fancy", fc="0.6", ec="none", connectionstyle="angle3,angleA=0,angleB=-90")) ('Thanksgiving', xy=('2012-11-25', 4500), xycoords='data', xytext=(-120, -60), textcoords='offset points', bbox=dict(boxstyle="round4,pad=.5", fc="0.9"), arrowprops=dict(arrowstyle="->", connectionstyle="angle,angleA=0,angleB=80,rad=20")) ('Christmas', xy=('2012-12-25', 3850), xycoords='data', xytext=(-30, 0), textcoords='offset points', size=13, ha='right', va="center", bbox=dict(boxstyle="round", alpha=0.1), arrowprops=dict(arrowstyle="wedge,tail_width=0.5", alpha=0.1)); # Setting Axis Titles (title='USA births by day of year (1969-1988)', ylabel='average daily births') # Set the x-axis scale value to center the month .set_major_locator(()) .set_minor_locator((bymonthday=15)) .set_major_formatter(()) .set_minor_formatter(('%h')); ax.set_ylim(3600, 5400);
import as plt import matplotlib as mpl ('seaborn-whitegrid') import numpy as np import pandas as pd births = pd.read_csv('C:\\Users\\Y\\Desktop\\data-CDCbirths-master\\') quartiles = (births['births'], [25, 50, 75]) mu, sig = quartiles[1], 0.74 * (quartiles[2] - quartiles[0]) births = ('(births > @mu - 5 * @sig) & (births < @mu + 5 * @sig)') births['day'] = births['day'].astype(int) = pd.to_datetime(10000 * + 100 * + , format='%Y%m%d') births_by_date = births.pivot_table('births', [, ]) births_by_date.index = [(2012, month, day) for (month, day) in births_by_date.index] fig, ax = (figsize=(12, 4)) births_by_date.plot(ax=ax) # Add arrow labels to diagrams ("New Year's Day", xy=('2012-1-1', 4100), xycoords='data', xytext=(50, -30), textcoords='offset points', arrowprops=dict(arrowstyle="->", connectionstyle="arc3,rad=-0.2")) ("Independence Day", xy=('2012-7-4', 4250), xycoords='data', bbox=dict(boxstyle="round", fc="none", ec="gray"), xytext=(10, -40), textcoords='offset points', ha='center', arrowprops=dict(arrowstyle="->")) ('Labor Day', xy=('2012-9-4', 4850), xycoords='data', ha='center', xytext=(0, -20), textcoords='offset points') ('', xy=('2012-9-1', 4850), xytext=('2012-9-7', 4850), xycoords='data', textcoords='data', arrowprops={'arrowstyle': '|-|,widthA=0.2,widthB=0.2', }) ('Halloween', xy=('2012-10-31', 4600), xycoords='data', xytext=(-80, -40), textcoords='offset points', arrowprops=dict(arrowstyle="fancy", fc="0.6", ec="none", connectionstyle="angle3,angleA=0,angleB=-90")) ('Thanksgiving', xy=('2012-11-25', 4500), xycoords='data', xytext=(-120, -60), textcoords='offset points', bbox=dict(boxstyle="round4,pad=.5", fc="0.9"), arrowprops=dict(arrowstyle="->", connectionstyle="angle,angleA=0,angleB=80,rad=20")) ('Christmas', xy=('2012-12-25', 3850), xycoords='data',xytext=(-30, 0), textcoords='offset points',size=13, ha='right', va="center",bbox=dict(boxstyle="round", alpha=0.1),arrowprops=dict(arrowstyle="wedge,tail_width=0.5", alpha=0.1)); # Setting Axis Titles (title='USA births by day of year (1969-1988)',ylabel='average daily births') # Set the x-axis scale value to center the month .set_major_locator(()) .set_minor_locator((bymonthday=15)) .set_major_formatter(()) .set_minor_formatter(('%h')); ax.set_ylim(3600, 5400); ()
As you may have noticed, the arrow and text box configuration features are very detailed so that you can create the arrow style you want. However, too much detailed functionality often also means that it's more complicated to manipulate, and if you really want to make a product-level graphic, it can take a lot of time. I'd like to conclude by saying that the hybrid styles applied earlier are not best practices for data visualization, and are simply there to demonstrate some of the features.
to this article on the use of Python Matplotlib text and annotations to this article, more related to Matplotlib text and annotations, please search for my previous posts or continue to browse the following related articles I hope you will support me in the future!