—1—
If you're interested in the code in this article, check it out in Github (provided at the end of the article). The first time you run it, it reports an error (haven't found a workaround yet), but once you run it again, it works fine.
This article is not a typical data science article, but it touches on data science and business intelligence applications, and Python's Matplotlib is one of the most commonly used libraries for charting and data visualization. We're all familiar with line charts, bar charts, and heat maps, but did you know that Matplotlib can also do simple animations?
Here's an example of an animation using Matplotlib. Shown is John Conway's The Game of Life, a programming challenge title from Metis (a data science summer camp) that also gave me the opportunity to make my first Python animation. Check out the resulting animation:
The focus of this article is still mainly on how to create animations in python using Matploylib.
But if you're not too familiar with the simulation game (it's more of a simulation animation you can watch than a game you can play), let me give you the rules:
- Start by setting up an N×N grid (I used 50×50 in my animation);
- The grid is then randomly populated with "small cells" (initially 1500 out of 2500 are randomly selected);
- If there are less than or equal to 1 neighboring small cell, then the small cells in the lattice will die;
- If there are more than or equal to 4 neighbors it will also die;
- Survive with only 2 or 3 neighbors;
- If there are exactly 3 neighbors in an empty cell, a new "small cell" will grow;
—2—
Establishment of a grid
We start by importing the required libraries.
import time from IPython import display import as plt import as animation
We'll utilize the FuncAnimation() function in the Matploylib animation module. FuncAnimation() is used to make an image move by calling a function multiple times and updating the image one at a time. Let's implement this process step by step.
But first, we need to initialize our grid. The following lines of code are used to store our input data:
- We need a 50×50 sized grid;
- The pad variable makes it easier to compute neighbors. By adding a layer of blank lattice outside the boundaries, we don't need to write an additional piece of logic to handle the boundaries of the lattice. So our 50×50 grid is actually surrounded by a circle of blank lattice, which makes the actual numpy sequence 52×52 in size;
- The initial_cels variable indicates how many "small cells" we want when the grid starts. They will be randomly distributed on the grid.
# Input variables for the board boardsize = 50 # board will be X by X where X = boardsize pad = 2 # padded border, do not change this! initial_cells = 1500 # this number of initial cells will be placed # in randomly generated positions
Next, we randomly generate a series of initial coordinates for the "small cells" (we chose 1500 above). Store these coordinates in the pos_list variable.
# Get a list of random coordinates so that we can initialize # board with randomly placed organisms pos_list = [] for i in range(initial_cells): pos_list.append([(1, boardsize), (1, boardsize)])
Then it's time for us to initialize the mesh. We'll use a numpy sequence called my_board to represent our mesh-we'll start with a 52×52 matrix sequence with a value of 0 (larger than 50×50 due to the addition of blank edges), and then call the init_board() function to populate the mesh with "small cells" based on the coordinates in the pos_list. list to populate the grid with "small cells" based on the coordinates in the pos_list. I won't go into the details of the helper functions, but I've organized them on my Github.
# Initialize the board my_board = ((boardsize+pad, boardsize+pad)) my_board = init_board(pos_list, my_board)
—3—
Creating grid animations
This is the part we're most looking forward to - the animation! First, we need to refine some of our configuration. The following lines of code are used to generate the mtplotlib frame that will display our animation.
# Required line for plotting the animation %matplotlib notebook # Initialize the plot of the board that will be used for animation fig = ()
Let's create our first frame. The imshow() function in mtplotlib takes a numpy matrix and returns an image. Pretty cool, huh?
# Show first image - which is the initial board im = (my_board) ()
The variable passed into imshow() is our initial grid my_board. the resulting image looks like this:
Now we need to write a helper function that can be called by FuncAnimation(). The animate() function takes a frame as input and acts as a counter. This frame counter is the bridge between FuncAnimation() and the animate() function - at each point in time (i.e., each frame), it calls animate() once. Then animate() iterates over the mesh, one at a time, using the helper function update_board(). Finally, the set_data() function updates the image to the iterated grid, and that's it.
# Helper function that updates the board and returns a new image of # the updated board animate is the function that FuncAnimation calls def animate(frame): im.set_data(update_board(my_board)) return im,
All good! We're ready to call the FuncAnimation() function. Note the input parameters:
- fig is the graphic variable we created earlier to load our animation;
- animate is a function of FuncAnimation() that communicates with the frame counter (passed in automatically, no need to declare it specifically)
- frames indicates how many frames we want the animation to last, in this case we want the animation to be 200 frames long;
- interval indicates the number of milliseconds between each frame. We want 50 milliseconds between each frame.
# This line creates the animation anim = (fig, animate, frames=200, interval=50)
It's that simple! It's not that hard, is it? To celebrate our successful animation, I'm going to give you one more animation:
—4—
summarize
I hope this article has been helpful. Before I end, let me help you brainstorm more applications of the animation features we learned today for data science:
- Plotting the Monte Carlo simulation data one by one, you can observe how the final distribution evolves over time;
- Traversing time series data sequentially can paint a picture of how your model or data behaves under a new perspective of observation;
- When you change the input parameters, such as the number of ethnic groups, you can show how your algorithm divides the groups;
- Generate correlation heat maps based on time or different subsets of data for observing how different samples affect the expected parameters of your model.
This is the whole content of this article.