present (sb for a job etc)
Waterfall plots are a very useful tool for plotting certain types of data. Not surprisingly, we can create a repeatable waterfall plot using Pandas and matplotlib.
Before moving on, I want to tell you what type of chart I'm referring to. I'm going to create aWikipedia articleThe 2D waterfall diagram described in
A typical use of such a chart is to show the + and - values that act as a "bridge" between the start and end values. For this reason, finance people sometimes refer to it as a bridge. Similar to the other examples I used earlier, this type of chart is not easy to generate in Excel, but there must be a way to generate it, but it is not easy to remember.
The key point to remember about the waterfall chart is that it is essentially a stacked bar chart, but with the peculiarity that it has a blank bottom bar, so that the top bar "hovers" in the air. So, let's get started.
Creating Charts
First, perform the standard input and make sure that IPython displays the matplot plot.
import numpy as np import pandas as pd import as plt %matplotlib inline
Set up the data we want to draw the waterfall graph and load it into a DataFrame.
The data needs to start with your starting value, but you need to give the final total. We will calculate it below.
index = ['sales','returns','credit fees','rebates','late charges','shipping'] data = {'amount': [350000,-30000,-7500,-25000,95000,-7000]} trans = (data=data,index=index)
I used the handy display function in IPython to more simply control what I want to display.
from import display display(trans)
The big trick with waterfall charts is figuring out what is in the bottom stacked bar. I learned a lot about this from discussions on *.
First, we get cumulative and.
display(()) sales 350000 returns 320000 credit fees 312500 rebates 287500 late charges 382500 shipping 375500 Name: amount, dtype: int64
This looks good, but we need to move the data from one place to the right.
blank=().shift(1).fillna(0) display(blank) sales 0 returns 350000 credit fees 320000 rebates 312500 late charges 287500 shipping 382500 Name: amount, dtype: float64
We need to add a net total to the trans and blank data frames.
total = ().amount ["net"] = total ["net"] = total display(trans) display(blank)
sales 0 returns 350000 credit fees 320000 rebates 312500 late charges 287500 shipping 382500 net 375500 Name: amount, dtype: float64
Create the steps we use to show changes.
step = blank.reset_index(drop=True).repeat(3).shift(-1) step[1::3] = display(step) 0 0 0 NaN 0 350000 1 350000 1 NaN 1 320000 2 320000 2 NaN 2 312500 3 312500 3 NaN 3 287500 4 287500 4 NaN 4 382500 5 382500 5 NaN 5 375500 6 375500 6 NaN 6 NaN Name: amount, dtype: float64
For the "net" line, we need to make sure that the blank value is 0 in order not to double the stack.
["net"] = 0
Then, diagram it and see what it looks like.
my_plot = (kind='bar', stacked=True, bottom=blank,legend=None, title="2014 Sales Waterfall") my_plot.plot(, ,'k')
That looks pretty good, but let's try to format the y-axis to make it more readable. To do this, we use FuncFormatter and some Python 2.7+ syntax to truncate decimals and add a comma to the format.
def money(x, pos): 'The two args are the value and tick position' return "${:,.0f}".format(x) from import FuncFormatter formatter = FuncFormatter(money)
Then, put it all together.
my_plot = (kind='bar', stacked=True, bottom=blank,legend=None, title="2014 Sales Waterfall") my_plot.plot(, ,'k') my_plot.set_xlabel("Transaction Types") my_plot.yaxis.set_major_formatter(formatter)
Full Script
The basic graphic works fine, but I want to add some labels and make some minor formatting changes. Here is my final script:
import numpy as np import pandas as pd import as plt from import FuncFormatter #Use python 2.7+ syntax to format currency def money(x, pos): 'The two args are the value and tick position' return "${:,.0f}".format(x) formatter = FuncFormatter(money) #Data to plot. Do not include a total, it will be calculated index = ['sales','returns','credit fees','rebates','late charges','shipping'] data = {'amount': [350000,-30000,-7500,-25000,95000,-7000]} #Store data and create a blank series to use for the waterfall trans = (data=data,index=index) blank = ().shift(1).fillna(0) #Get the net total number for the final element in the waterfall total = ().amount ["net"]= total ["net"] = total #The steps graphically show the levels as well as used for label placement step = blank.reset_index(drop=True).repeat(3).shift(-1) step[1::3] = #When plotting the last element, we want to show the full bar, #Set the blank to 0 ["net"] = 0 #Plot and label my_plot = (kind='bar', stacked=True, bottom=blank,legend=None, figsize=(10, 5), title="2014 Sales Waterfall") my_plot.plot(, ,'k') my_plot.set_xlabel("Transaction Types") #Format the axis for dollars my_plot.yaxis.set_major_formatter(formatter) #Get the y-axis position for the labels y_height = ().shift(1).fillna(0) #Get an offset so labels don't sit right on top of the bar max = () neg_offset = max / 25 pos_offset = max / 50 plot_offset = int(max / 15) #Start label loop loop = 0 for index, row in (): # For the last item in the list, we don't want to double count if row['amount'] == total: y = y_height[loop] else: y = y_height[loop] + row['amount'] # Determine if we want a neg or pos offset if row['amount'] > 0: y += pos_offset else: y -= neg_offset my_plot.annotate("{:,.0f}".format(row['amount']),(loop,y),ha="center") loop+=1 #Scale up the y axis so there is room for the labels my_plot.set_ylim(0,()+int(plot_offset)) #Rotate the labels my_plot.set_xticklabels(,rotation=0) my_plot.get_figure().savefig("",dpi=200,bbox_inches='tight')
Running the script will generate the nice chart below:
Final thoughts
If you weren't familiar with waterfall charts before, hopefully this example will show you how useful it really is. I suppose it's possible that some people might feel a little bad about needing so much script code for a chart. In some ways, I agree with that thought. If you're just making a waterfall chart and won't touch it again in the future, then you might as well keep using the methods in Excel.
However, what if the waterfall diagram is really useful and you need to replicate it for 100 clients? What will you do next? Using Excel at this point would be a challenge, whereas using the script in this article to create 100 different tables would be fairly easy. Again, the real value of this program is that it makes it easy to create a program that is easy to replicate when you need to extend the solution.
I really enjoyed learning more about Pandas, matplotlib and IPothon. I'm glad that this approach was able to help you and hope that others can learn something from it and apply what they learned in this lesson to their daily work.