Reading Time: 5mins, First Published: Sun, May 6, 2018
Facebook Prophet
Time-Series
Open-Data
Data-Science


In this post we will be using Facebook’s Prophet time series analysis api to forecast daily cycle rentals in London.

Time Series Modelling


Time series analysis is an important discipline that any good data scientist should have an awareness of. Time series modelling typically involves looking at data points recorded at successive time intervals,  a share price is one example:

kromek

The post is intended to provide a quick and practical introduction to the  features of the Prophet Python API. During the tutorial we will build a model capable of forecasting cycle hires, using real world data.

Prophet is a procedure for forecasting time series data. It is based on an additive model where non-linear trends are fit with yearly and weekly seasonality, plus holidays. It works best with daily periodicity data with at least one year of historical data. Prophet is robust to missing data, shifts in the trend, and large outliers.

Setup


Installing Facebook Prophet

Installing Facebook’s prophet is straight forward using anaconda: the following command will install the fbprophet package along with the pystan dependency.

conda install -c conda-forge fbprophet

Getting started

For best results please following along in a Jupyter notebook

We start by importing pandas, numpy, matplotlib, and fbprophet. If you are using a jupyter notebook utilise the %matplotlib inline command to ensure you have inline plotting.

import pandas as pd
import numpy as np
from fbprophet import Prophet
import matplotlib.pyplot as plt
%matplotlib inline

We will utilise the London cycle hire data set for this tutorial: it’s a data set I come back to frequently when experimenting with new time series models. You can download a copy from the London data store, where you can also find many other interesting public data sets.

https://data.london.gov.uk/

The data set is available at the following url. We utilise panda’s read excel function to pull the data from the url and load the data into a pandas DataFrame, and then select the first two columns.

cycle_url = 'https://files.datapress.com/london/dataset/number-bicycle-hires/2017-05-09T13:54:35.44/tfl-daily-cycle-hires.xls'

cycle_hires = pd.read_excel(cycle_url, sheetname="Data")[["Day", "Number of Bicycle Hires"]]

Next we clean the column headers by setting the names to lowercase and replacing any whitespace with underscores. This step utilises a list comprehension.

cycle_hires.columns = [name.lower().replace(" ", "_") for name in cycle_hires.columns]
cycle_hires.set_index("day", inplace=True)

We also set the index to the “day” column. Utilising the datetime index makes it straight forward to plot the data using pandas.

cycle_hires.plot(figsize=(15, 5))
plt.title("London Cycle Hires")
plt.ylabel("Hires per day")
plt.savefig("hires.png")

From the plot above we can see that the data set exhibits a high degree of seasonality. We might also imagine that there are further weekly trends within the data.

As this is intended to be a quick tutorial we will not go into an extensive Exploratory Data Analysis (EDA) phase instead let’s let the Prophet library do the hard work!

I also want to illustrate the “holidays” feature of the prophet library: We can see that there are certain days that have spikes of activity, perhaps these are bank holidays, the prophet library is going to allow us to factor any holiday periods into our time series analysis. But first we need dates of UK bank holidays:

bank_holidays_url = "http://www.dmo.gov.uk/docs/giltsmarket/formulae/UKbankholidays.xls"
bank_holidays = pd.read_excel(bank_holidays_url, sheetname="UK Bank Hols", skip_rows=1)

Data Modelling

Great, now we are ready to begin. The prophet library expects a certain structure:

cycle_hires.reset_index(inplace=True)
cycle_hires.columns = ["ds", "y"]

The dataframe should look similar to this: where ds is the date stamp, and y is the target variable, in this case the number of cycle hires per day.

ds y
2010-07-30 6897
1 2010-07-31 5564
2 2010-08-01 4303
3 2010-08-02 6642
4 2010-08-03 7966

Now we are ready to fit the model.

future = m.make_future_dataframe(periods=365)
forecast = m.predict(future)

If you get an error complaining about datetime not being understood, you probably need to update pandas.

That was easy! and we have a great looking plot too! Let’s factor in those bank holidays. The bank holidays data looks like this:

Again prophet requires a certain format for the holidays.

holidays = pd.DataFrame({
 'holiday': "bank_holiday",
 'ds': bank_holidays["UK BANK HOLIDAYS"],
 'lower_window':0,
 'upper_window':0
})

holidays.lower_window = np.where(holidays.ds.dt.dayofweek==0, -2, 0)

holidays.upper_window = np.where(holidays.ds.dt.dayofweek==5, 2, 0)

holidays.lower_window = np.where(holidays.ds.dt.dayofyear==1, -2,
 holidays.lower_window )

The lower and upper window denote the number of days to either side of the day that you wish to include. I have included the preceding two days for a bank holiday day falling on a Monday, or the first of the year. and the following weekend for a bank holiday day falling on a Friday.

ds holiday lower_window upper_window
1998-01-01 bank_holiday -2
1 1998-04-10 bank_holiday
2 1998-04-13 bank_holiday -2
3 1998-05-04 bank_holiday -2
4 1998-05-25 bank_holiday -2

Pass in the holidays dataframe when instantiating the Prophet c model.

m = Prophet(holidays=holidays)
m.fit(cycle_hires)
forecast = m.predict(future)

The plot looks very similar, to the first one. Prophet also provides a function to view the plot components, which allows us to see the contribution of: trend, and repeat patterns within the data. We can see that there seems to be an overall upward trend in cycle hires. Mid-week sees the greatest number of hires presumably coinciding with the busiest commuting periods. We also see that demand is seasonal: if you’ve cycled in London in January you know why!

That concludes our brief tour of the Prophet. Clearly Prophet is an impressive and easy to use tool for time series analysis.

In a future tutorial we may look into some more advanced uses of the library, including getting under the hood  to extract the Monte Carlo simulations which feed onto the uncertainty estimations.

If you want to learn more about prophet checkout the following references:

https://facebookincubator.github.io/prophet/static/prophet_paper_20170113.pdf

https://facebookincubator.github.io/prophet/

If you are interested in learning more about time series analysis a good place to start is with ARIMA and ARMA modelling. Also try experimenting with other approaches: as an exercise try using random forests on this data, using shifted features, and categorical encoding!