A tutorial on drawing a waterfall plot of data using Python

present (sb for a job etc)

Waterfall plots are a very useful tool for plotting certain types of data. Not surprisingly, we can create a repeatable waterfall plot using Pandas and matplotlib.

Before moving on, I want to tell you what type of chart I'm referring to. I'm going to create aWikipedia articleThe 2D waterfall diagram described in

A typical use of such a chart is to show the + and - values that act as a "bridge" between the start and end values. For this reason, finance people sometimes refer to it as a bridge. Similar to the other examples I used earlier, this type of chart is not easy to generate in Excel, but there must be a way to generate it, but it is not easy to remember.

The key point to remember about the waterfall chart is that it is essentially a stacked bar chart, but with the peculiarity that it has a blank bottom bar, so that the top bar "hovers" in the air. So, let's get started.
Creating Charts

First, perform the standard input and make sure that IPython displays the matplot plot.

import numpy as np
import pandas as pd
import  as plt
 
%matplotlib inline

Set up the data we want to draw the waterfall graph and load it into a DataFrame.

The data needs to start with your starting value, but you need to give the final total. We will calculate it below.

index = ['sales','returns','credit fees','rebates','late charges','shipping']
data = {'amount': [350000,-30000,-7500,-25000,95000,-7000]}
trans = (data=data,index=index)

I used the handy display function in IPython to more simply control what I want to display.

from  import display
display(trans)

(291×347)

The big trick with waterfall charts is figuring out what is in the bottom stacked bar. I learned a lot about this from discussions on *.

First, we get cumulative and.

display(())
sales      350000
returns     320000
credit fees   312500
rebates     287500
late charges  382500
shipping    375500
Name: amount, dtype: int64

This looks good, but we need to move the data from one place to the right.

blank=().shift(1).fillna(0)
display(blank)
 
sales        0
returns     350000
credit fees   320000
rebates     312500
late charges  287500
shipping    382500
Name: amount, dtype: float64

We need to add a net total to the trans and blank data frames.

total = ().amount
["net"] = total
["net"] = total
display(trans)
display(blank)

(275×390)

sales        0
returns     350000
credit fees   320000
rebates     312500
late charges  287500
shipping    382500
net       375500
Name: amount, dtype: float64

Create the steps we use to show changes.

step = blank.reset_index(drop=True).repeat(3).shift(-1)
step[1::3] = 
display(step)
 
0     0
0    NaN
0  350000
1  350000
1    NaN
1  320000
2  320000
2    NaN
2  312500
3  312500
3    NaN
3  287500
4  287500
4    NaN
4  382500
5  382500
5    NaN
5  375500
6  375500
6    NaN
6    NaN
Name: amount, dtype: float64

For the "net" line, we need to make sure that the blank value is 0 in order not to double the stack.

["net"] = 0

Then, diagram it and see what it looks like.

my_plot = (kind='bar', stacked=True, bottom=blank,legend=None, title="2014 Sales Waterfall")
my_plot.plot(, ,'k')

(391×317)

That looks pretty good, but let's try to format the y-axis to make it more readable. To do this, we use FuncFormatter and some Python 2.7+ syntax to truncate decimals and add a comma to the format.

def money(x, pos):
  'The two args are the value and tick position'
  return "${:,.0f}".format(x)
 
from  import FuncFormatter
formatter = FuncFormatter(money)

Then, put it all together.

my_plot = (kind='bar', stacked=True, bottom=blank,legend=None, title="2014 Sales Waterfall")
my_plot.plot(, ,'k')
my_plot.set_xlabel("Transaction Types")
my_plot.yaxis.set_major_formatter(formatter)

(399×332)

Full Script

The basic graphic works fine, but I want to add some labels and make some minor formatting changes. Here is my final script:

import numpy as np
import pandas as pd
import  as plt
from  import FuncFormatter
 
#Use python 2.7+ syntax to format currency
def money(x, pos):
  'The two args are the value and tick position'
  return "${:,.0f}".format(x)
formatter = FuncFormatter(money)
 
#Data to plot. Do not include a total, it will be calculated
index = ['sales','returns','credit fees','rebates','late charges','shipping']
data = {'amount': [350000,-30000,-7500,-25000,95000,-7000]}
 
#Store data and create a blank series to use for the waterfall
trans = (data=data,index=index)
blank = ().shift(1).fillna(0)
 
#Get the net total number for the final element in the waterfall
total = ().amount
["net"]= total
["net"] = total
 
#The steps graphically show the levels as well as used for label placement
step = blank.reset_index(drop=True).repeat(3).shift(-1)
step[1::3] = 
 
#When plotting the last element, we want to show the full bar,
#Set the blank to 0
["net"] = 0
 
#Plot and label
my_plot = (kind='bar', stacked=True, bottom=blank,legend=None, figsize=(10, 5), title="2014 Sales Waterfall")
my_plot.plot(, ,'k')
my_plot.set_xlabel("Transaction Types")
 
#Format the axis for dollars
my_plot.yaxis.set_major_formatter(formatter)
 
#Get the y-axis position for the labels
y_height = ().shift(1).fillna(0)
 
#Get an offset so labels don't sit right on top of the bar
max = ()
neg_offset = max / 25
pos_offset = max / 50
plot_offset = int(max / 15)
 
#Start label loop
loop = 0
for index, row in ():
  # For the last item in the list, we don't want to double count
  if row['amount'] == total:
    y = y_height[loop]
  else:
    y = y_height[loop] + row['amount']
  # Determine if we want a neg or pos offset
  if row['amount'] > 0:
    y += pos_offset
  else:
    y -= neg_offset
  my_plot.annotate("{:,.0f}".format(row['amount']),(loop,y),ha="center")
  loop+=1
 
#Scale up the y axis so there is room for the labels
my_plot.set_ylim(0,()+int(plot_offset))
#Rotate the labels
my_plot.set_xticklabels(,rotation=0)
my_plot.get_figure().savefig("",dpi=200,bbox_inches='tight')

Running the script will generate the nice chart below:

(650×361)

Final thoughts

If you weren't familiar with waterfall charts before, hopefully this example will show you how useful it really is. I suppose it's possible that some people might feel a little bad about needing so much script code for a chart. In some ways, I agree with that thought. If you're just making a waterfall chart and won't touch it again in the future, then you might as well keep using the methods in Excel.

However, what if the waterfall diagram is really useful and you need to replicate it for 100 clients? What will you do next? Using Excel at this point would be a challenge, whereas using the script in this article to create 100 different tables would be fairly easy. Again, the real value of this program is that it makes it easy to create a program that is easy to replicate when you need to extend the solution.

I really enjoyed learning more about Pandas, matplotlib and IPothon. I'm glad that this approach was able to help you and hope that others can learn something from it and apply what they learned in this lesson to their daily work.