Reading Time: 21mins, First Published: Sat, Oct 6, 2018
Data Science

This post explores the application of gradient boosted algorithms to time-series modelling, as we try to predict the number of London Fire Brigade Call outs using seasonal patterns, and 3rd party weather data.

Key Topics:

  • Lagged or shifted variables
  • Capturing seasonality and “holiday”/“event” patterns in categorical features
  • Building in 3rd party data
  • Persistence models and performance benchmarking
  • Ensembles

It will be helpful for readers to a foundational to intermediate knowledge of data science techniques, in order to follow along with this post.


The London Fire Brigade

The London Fire Brigade (LFB) is the statutory fire and rescue service for London. It was formed by the Metropolitan Fire Brigade Act of 1865, under the leadership of superintendent Eyre Massey Shaw.

It is the second-largest of all the fire services in the United Kingdom, after the national Scottish Fire and Rescue Service and the fifth-largest in the world, after the Tokyo Fire Department, New York City Fire Department, Paris Fire Brigade and the Scottish service, with 5,992 staff, including 5,096 operational firefighters and officers based at 102 fire stations (plus one river station).

In 2013 to 14 the LFB handled 171,067 999 emergency calls. Of the calls it actually mobilised to, 20,934 were fires, including 10,992 that were of a serious nature, making it one of the busiest fire brigades in the world. In the same 12-month period, it received 3,172 hoax calls, the highest number of any UK fire service, but crews were mobilised to only 1,424 of them. In 2015 to 16 the LFB received 171,488 emergency calls. These consisted of: 20,773 fires, 30,066 special service callouts, and 48,696 false alarms.

As well as firefighting, the LFB also responds to road traffic collisions, floods, trapped-in-lift releases, and other incidents such as those involving hazardous materials or major transport accidents. It also conducts emergency planning and performs fire safety inspections and education. It does not provide an ambulance service as this function is performed by the London Ambulance Service as an independent NHS trust, although all LFB firefighters are trained in first aid and all of its fire engines carry first aid equipment. Since 2016, the LFB responds to provide first aid for some life-threatening medical emergencies (e.g. cardiac or respiratory arrest).

The Objective

Predicting the number of hourly call-outs for the next week

In this analysis we will build a series of machine learning models capable of predicting the number of hourly call-outs for different types incident.

The various call-out types that the London Fire Brigade respond to are classified in to a number of sub-categories:

  • Special Services
  • False Alarms
  • Fire

Different call-out types are likely to have different causes. So we will look at two approaches to the problem of modelling call frequencies.

  1. A single model that predicts the total number of call-outs regardless of type
  2. Separate models for different callout types, which could be aggregated or “ensembled” to predict the total number of call-outs:
    • Special services
    • False alarms
    • Fire

The Data

Fire Brigade Call-out Information

Call-out information for the London Fire Brigade is available from the London Data Store.

The data used in this analysis is 2013 onwards, but data is available back to 2009.

Weather Data

Weather is one factor that we might expect to influence the number of call-outs, particularly in the case of flood incidents, but we could also speculate that weather is likely to effect the number of traffic collisions, and potentially fires.

To test this theory we will supplement the call-out information with hourly weather data. As I was tight on time when putting together this post, I opted to purchase a batch of hourly weather data from OpenWeatherMap for $10; however, there may be freely available alternatives.

We assume that similar data would be available via forecasts.

Data Modelling

The Algorithm: Gradient Boosted Modelling

This post will ulilise the LightGBM implementation of Gradient Boosting, you can read more about LightGBM here. I won’t go in the details of the implementation in this post, but a key advantage of this algorithm over the XGBoost is that the LightGBM algorithm trains faster and provides very similar results in terms of performance.

For this project we will be making use of the Python programming language, and we will utilise the following libraries and tool kits.


# Gradient Boost Regression Model
import lightgbm
from lightgbm.sklearn import LGBMRegressor

# Visualisation tools
import matplotlib.pyplot as plt
import matplotlib as mpl
import missingno as msno
import seaborn as sns

# Numerical libraries and data analysis
import numpy as np
import pandas as pd
from scipy import stats

# Machine learning libraries
from sklearn import model_selection, metrics

%matplotlib inline"bmh")

pd.options.display.max_columns = 500

Load London Fire Brigade Dataset

Large datasets will load faster from pickles than from Excel, so to save time when we re-run the notebook let’s pickle our dataframes, and load from the pickles next time.

    lfb_df_2017 = pd.read_pickle("LFB Incident data from January 2017.pickle")
    lfb_df_2013_2016 = pd.read_pickle("LFB Incident data from January 2013 to December 2016.pickle")

except FileNotFoundError:
    lfb_df_2017 = pd.ExcelFile("LFB Incident data from January 2017.xlsx").parse(0)
    lfb_df_2013_2016 = pd.ExcelFile("LFB Incident data from January 2013 to December 2016.xlsx").parse(0)

    lfb_df_2017.to_pickle("LFB Incident data from January 2017.pickle")
    lfb_df_2013_2016.to_pickle("LFB Incident data from January 2013 to December 2016.pickle")

Let’s take a quick look at the London Fire Brigade dataset.


There’s quite a lot of data here, and there’s certainly a lot of scope for investigating business questions outside of predicting call-outs, for example arrival times, which is something I’ve looked at before.

IncidentNumber DateOfCall CalYear TimeOfCall HourOfCall IncidentGroup StopCodeDescription SpecialServiceType PropertyCategory PropertyType AddressQualifier Postcode_full Postcode_district IncGeo_BoroughCode IncGeo_BoroughName ProperCase IncGeo_WardCode IncGeo_WardName IncGeo_WardNameNew Easting_m Northing_m Easting_rounded Northing_rounded FRS IncidentStationGround FirstPumpArriving_AttendanceTime FirstPumpArriving_DeployedFromStation SecondPumpArriving_AttendanceTime SecondPumpArriving_DeployedFromStation NumStationsWithPumpsAttending NumPumpsAttending
0 1131 2013-01-01 2013 00:02:06 0 False Alarm AFA NaN Other Residential Boarding House/B&B for homeless/asylum seekers Correct incident location NW6 1PG NW6 E09000007 CAMDEN Camden E05000145 West Hampstead West Hampstead 525424.0 184894.0 525450 184850 London West Hampstead 167.0 West Hampstead NaN NaN 1.0 1.0
1 4131 2013-01-01 2013 00:02:09 0 False Alarm AFA NaN Non Residential Single shop Correct incident location UB6 0HY UB6 E09000009 EALING Ealing E05000183 North Greenford North Greenford 515405.0 185445.0 515450 185450 London Northolt 236.0 Northolt NaN NaN 1.0 1.0
2 5131 2013-01-01 2013 00:02:54 0 False Alarm AFA NaN Non Residential Other cultural venue Correct incident location W6 0RF W6 E09000013 HAMMERSMITH AND FULHAM Hammersmith And fulham E05000261 Ravenscourt Park Ravenscourt Park 522456.0 178647.0 522450 178650 London Hammersmith 218.0 Hammersmith NaN NaN 1.0 1.0
3 2131 2013-01-01 2013 00:03:02 0 Fire Secondary Fire NaN Outdoor Structure Small refuse/rubbish container In street close to gazetteer location W1H 7DX W1H E09000033 WESTMINSTER Westminster E05000632 Bryanston and Dorset Square Bryanston and Dorset Square 527814.0 181016.0 527850 181050 London Soho 426.0 Knightsbridge NaN NaN 1.0 1.0
4 3131 2013-01-01 2013 00:03:03 0 False Alarm AFA NaN Non Residential Purpose built office Within same building EC3R 7QQ EC3R E09000001 CITY OF LONDON City Of london E05009310 Tower Tower 533338.0 180736.0 533350 180750 London Whitechapel 346.0 Dowgate NaN NaN 2.0 2.0

Libraries like missingno offer a simple method of visualising the completeness of data







Generate Call-outs per Day Target Variables

The original dataset contains a lot of information about each call-out. Unfortunately much of this information is not usable feature information for the purposes of this project, as this data would not be known before the incident has occurred.

The columns that we will use are the DateOfCall, and Incident group information. We will use this data to generate the target variables for the project.

Target Variable: Number of call-outs per hour

def get_target_variables(lfb_df_2013_2016, lfb_df_2017):
    call_outs_2013_to_2016 = _get_categorised_call_out_table(lfb_df_2013_2016)
    call_outs_2017_onwards = _get_categorised_call_out_table(lfb_df_2017)
    combined_call_out_data = pd.concat([call_outs_2013_to_2016, call_outs_2017_onwards], axis=0)
    call_out_per_hour = _get_call_outs_per_hour(combined_call_out_data)
    return call_out_per_hour

def _get_categorised_call_out_table(df):
    df.dropna(subset=["IncidentGroup"], inplace=True)
    time_stamp_str = df.DateOfCall.astype(str) +" " + df.TimeOfCall.astype(str)
    call_time = pd.to_datetime(time_stamp_str).to_frame(name="call_ds")
    dummy_incident_groups = pd.get_dummies(df.IncidentGroup)
    call_time = pd.concat([dummy_incident_groups, call_time], axis=1, verify_integrity=True)
    call_time["all_calls"] = call_time.sum(axis=1)
    return call_time

def _get_call_outs_per_hour(call_out_df):
    call_out_df.call_ds = pd.to_datetime(call_out_df.call_ds)
    call_out_per_hour = call_out_df.set_index("call_ds").resample("h").agg(np.count_nonzero)
    return call_out_per_hour

call_out_per_hour = get_target_variables(lfb_df_2013_2016, lfb_df_2017)

The table below illustrates the basic data structure of callouts per hour by type.

False Alarm Fire Special Service all_calls
2013-01-01 00:00:00 11 6 3 20
2013-01-01 01:00:00 5 3 9 17
2013-01-01 02:00:00 0 2 11 13
2013-01-01 03:00:00 7 1 5 13
2013-01-01 04:00:00 4 1 2 7

If we calculate the correlation between the different causes we can see that the correlation between the different incident types is less than 0.4.

False Alarm Fire Special Service all_calls
False Alarm 1.000000 0.287677 0.377027 0.808032
Fire 0.287677 1.000000 0.277343 0.622155
Special Service 0.377027 0.277343 1.000000 0.763787
all_calls 0.808032 0.622155 0.763787 1.000000
False Alarm        280392
Fire               114889
Special Service    175070
all_calls          570351
dtype: int64

Feature Engineering

On first observation the call_out_per_hour DataFrame does not have much scope for feature engineering, but there’s actually a few simple techniques to generate some features that can be used with the majority of time series datasets.

Seasonal Features

We can try and capture seasonal trends in the data by engineering a set of categorical features based on the time of call information. Below we create features to capture: hourly, weekly, monthly, and annual trends.

Both cyclical and “holiday” event trends can be captured in this manner.

def get_seasonal_features(call_out_per_hour):
    call_out_per_hour["hour"] = call_out_per_hour.index.hour
    call_out_per_hour["day_of_week"] = call_out_per_hour.index.dayofweek
    call_out_per_hour["month"] = call_out_per_hour.index.month
    call_out_per_hour["year"] = call_out_per_hour.index.year
    call_out_per_hour["new_years"] = call_out_per_hour.index.dayofyear == 1
    call_out_per_hour["guy_fawkes"] = (call_out_per_hour.index.month == 11) & ( == 5)
    return call_out_per_hour

call_out_per_hour = get_seasonal_features(call_out_per_hour)
False Alarm Fire Special Service all_calls hour day_of_week month year new_years guy_fawkes
2013-01-01 00:00:00 11 6 3 20 0 1 1 2013 True False
2013-01-01 01:00:00 5 3 9 17 1 1 1 2013 True False
2013-01-01 02:00:00 0 2 11 13 2 1 1 2013 True False
2013-01-01 03:00:00 7 1 5 13 3 1 1 2013 True False
2013-01-01 04:00:00 4 1 2 7 4 1 1 2013 True False
def plot_seasonal_violin_patterns(df, x, y):
    fig, axes = plt.subplots(figsize=(15, 5))
    sns.violinplot(x=x, y=y, data=df, ax=axes)
    axes.set_title(f"Violin plot {y} by {x}")

plot_seasonal_violin_patterns(call_out_per_hour, "hour", "all_calls")


Shifted Target Features

Another source of feature information is what I would call shifted target information or lagged features; that is we can look a recent, or relevant, observation of the thing we are trying to predict, and use this as a feature in the model.

The important principle is that your model data must not contain information that would not be available at the time of prediction so a one day shift is only available one day before. So your model is only predicting one day ahead. If you wanted to forecast calls for the entire year upfront, you would not have access to the one day shifted information, and so you would need to make appropriate modifications to the shift i.e. 365 day shift.

In the introduction we formulated the business problem as trying to predict seven days ahead so I have only included shifted targets that would be available 7 days before.

We will also examine the impact on model performance of leaving these features out of the model, as leaving out the shifted variables will make the model’s use less restricted.

shifted_targets = call_out_per_hour[["False Alarm", "Fire", "Special Service", "all_calls"]].shift(24 * 7)
shifted_targets.columns = [f"s7_{name}" for name in shifted_targets.columns]
s7_False Alarm s7_Fire s7_Special Service s7_all_calls
2018-07-31 19:00:00 5.0 14.0 7.0 26.0
2018-07-31 20:00:00 7.0 9.0 10.0 26.0
2018-07-31 21:00:00 4.0 5.0 6.0 15.0
2018-07-31 22:00:00 10.0 1.0 3.0 14.0
2018-07-31 23:00:00 3.0 4.0 3.0 10.0
call_out_per_hour = pd.concat([call_out_per_hour, shifted_targets], axis=1)

# Drop the rows where we could not generated the shifted features

A simple analysis of correlation between the number of call-outs in a given hour, and the number of call-outs per hour 7 days prior reveals a reasonable amount of correlation.

np.corrcoef(call_out_per_hour.s7_all_calls, call_out_per_hour.all_calls)
array([[1.        , 0.53896767],
       [0.53896767, 1.        ]])

And this is confirmed if we take a look at a small sample of 7 day shifted values

def plot_seven_day_shifted_calls(df):
    fig, axes = plt.subplots(figsize=(15, 5))
    df[["s7_all_calls", "all_calls"]]["2018-01-01": "2018-01-7"].plot(ax=axes, marker="o", alpha=0.8)
    axes.set_ylabel("Calls Per Hour")



Exploratory Data Analysis: Target Data

Let’s do a very brief Exploratory Data Analysis (EDA) of our target data. We can start by plotting the average number of daily call-outs per week for each call type, and all call types.

We can observe some periodic features in the data, more call-outs seem to occur during the summer months. We also note some large spikes in the data.

def plot_average_daily_calls_per_period(call_out_df, period, title="", start=None, stop=None):
    fig, axes = plt.subplots(figsize=(15, 5))
    avg_call_out = call_out_df[["False Alarm", "Fire", "Special Service", "all_calls"]].resample(period).mean()
    avg_call_out[start: stop].plot(ax=axes)
    axes.legend(loc="upper left")
    axes.set_xlabel("Date Stamp")
    axes.set_ylabel(f"Average calls per {period}");

plot_average_daily_calls_per_period(call_out_per_hour, "W", title="Average call-outs per day (weekly sampling)")


Hourly call-outs over a seven day period

When we look at hourly resolution data we find that there appears to be cyclical trends in the dataset

plot_average_daily_calls_per_period(call_out_per_hour, "h", title="Call-outs per hour",
                                    start="2017-01-01", stop="2017-01-7")


This pattern is more clearly seen if we average the observations over a 3 hour period.

plot_average_daily_calls_per_period(call_out_per_hour, "3h", title="Average call-outs per 2 hour window",
                                    start="2017-01-01", stop="2017-01-7")


3rd Party Data: Weather Data

Many of the call outs seemed to be weather related, this is particularly true of some of the spikes in the call out dataset. For this reason I thought it would be interesting to introduce some third-party weather data into the analysis.

The weather data used in the model was purchased from the OpenWeatherMap website at a cost of $10, although you may be able to find free datasets in the public domain, this seemed liked a reasonable cost.

As the dataset is proprietary I will not be including any graphics which would give away the full content of the dataset, however the code below details the fields from the data which I have selected for use as features in the model.

Weather data used for modelling

weather = pd.read_csv("dd2ab270f5df36b94f37178fbabf8c73.csv")
weather.dt_iso = pd.to_datetime(weather.dt_iso.str.replace("\+0000 UTC", ""))

owm_cols = ['dt_iso', 'temp', 'temp_min', 'temp_max', 'pressure', 'humidity', 'wind_speed', 'wind_deg',
            'rain_1h', 'rain_3h', 'rain_24h','rain_today', 'snow_1h', 'snow_3h', 'snow_24h', 'snow_today',
            'clouds_all', 'weather_id']
owm_london = weather[owm_cols]

Make Model Dataset

Merge Callout and Weather Dataset

Now that we have our seasonal, shifted, and weather features we just to need to join the datasets together, here we make use of the pandas merge_asof function.

We also fill any missing values with -999.

def get_model_data(call_out_per_hour, owm_london):
    model_data = pd.merge_asof(call_out_per_hour.reset_index(), owm_london, left_on="call_ds", right_on="dt_iso")
    model_data = model_data.drop(["dt_iso"], axis=1)
    model_data.fillna(-999, inplace=True)
    model_data.set_index("call_ds", inplace=True)
    return model_data

model_data = get_model_data(call_out_per_hour, owm_london)
False Alarm Fire Special Service all_calls hour day_of_week month year new_years guy_fawkes s7_False Alarm s7_Fire s7_Special Service s7_all_calls temp temp_min temp_max pressure humidity wind_speed wind_deg rain_1h rain_3h rain_24h rain_today snow_1h snow_3h snow_24h snow_today clouds_all weather_id

Generate Dummy Categorical Variables

LightGBM has a nice feature to allow you to declare columns as categorical and avoid one-hot encoding the categorical variables, however I received a number of warning messages when using this feature in the setup below, so I have opted to utilise the pandas get_dummies as a “quick and dirty” alternative.

dummified_cols = []
categorical_vars = ["hour", "day_of_week", "month", "year", "weather_id"]
for cat in categorical_vars:
    dummified_cols.append(pd.get_dummies(model_data[cat], prefix=cat+"_"))
model_data = pd.concat([model_data.drop(categorical_vars, axis=1)] + dummified_cols, axis=1)

The final model data

(48744, 113)
False Alarm Fire Special Service all_calls new_years guy_fawkes s7_False Alarm s7_Fire s7_Special Service s7_all_calls temp temp_min temp_max pressure humidity wind_speed wind_deg rain_1h rain_3h rain_24h rain_today snow_1h snow_3h snow_24h snow_today clouds_all hour__0 hour__1 hour__2 hour__3 hour__4 hour__5 hour__6 hour__7 hour__8 hour__9 hour__10 hour__11 hour__12 hour__13 hour__14 hour__15 hour__16 hour__17 hour__18 hour__19 hour__20 hour__21 hour__22 hour__23 day_of_week__0 day_of_week__1 day_of_week__2 day_of_week__3 day_of_week__4 day_of_week__5 day_of_week__6 month__1 month__2 month__3 month__4 month__5 month__6 month__7 month__8 month__9 month__10 month__11 month__12 year__2013 year__2014 year__2015 year__2016 year__2017 year__2018 weather_id__200 weather_id__201 weather_id__202 weather_id__211 weather_id__300 weather_id__301 weather_id__302 weather_id__310 weather_id__311 weather_id__312 weather_id__500 weather_id__501 weather_id__502 weather_id__503 weather_id__511 weather_id__520 weather_id__521 weather_id__522 weather_id__600 weather_id__601 weather_id__602 weather_id__611 weather_id__612 weather_id__615 weather_id__616 weather_id__620 weather_id__621 weather_id__622 weather_id__701 weather_id__711 weather_id__721 weather_id__741 weather_id__771 weather_id__800 weather_id__801 weather_id__802 weather_id__803 weather_id__804

Machine Learning

Train test splits

For time-series data it is sensible to have an “out of time” test set. Here we hold out the 2018 as a test set, and we use the 2017 and prior data as a training set.

We also generate a validation set from the training data. The validation set will be used in conjunction with the early stopping procedure, described below.

targets = ["all_calls", "False Alarm", "Fire", "Special Service"]
X, y = model_data.drop(targets, axis=1), model_data[targets]
X_train, X_test, y_train, y_test = X[:"2017"], X["2018":], y[:"2017"], y["2018":]

# Early stopping
X_train, X_val, y_train, y_val = X_train[:"2016"], X_train["2017"], y_train[:"2016"], y_train["2017"]

Fit Gradient Boosted Model

We create an instance of the LGBMRegressor class, which provides an sklearn style interface for LightGBM, we initiate the model with n_estimators @ 50,000 but we also set the early stopping kwarg in the fit parameters dictionary, which allows the the model to choose to stop early if no improvement on the validation set is observed after 50 boosting rounds.

We initiate the model with a fixed learning rate, a lower learning rate will result in a slower training time as the optimal number of estimators is inversely proportional to learning rate, and a greater number of estimators will result in slower training times.

The code also includes a basic grid-search to tune the hyper-parameters. This an area of the modelling that could easily be improved. We have also used a time-series split rather than a standard k-fold split.

Basic Modelling script

The make_model function houses all of our model training code, as we are going to create multiple machine learning models: All Call-outs, Fire, Special Services, and False Alarm, we want to wrap this code in a function to avoid repeating ourselves.

def make_model(X_train, X_val, y_train, y_val, target_name):
    # Select target
    print(f"\nTraining {target_name} model")
    y_train, y_val = y_train[target_name], y_val[target_name]

    # Early Stopping
    base_learning_rate = 0.1

    ts_cv = model_selection.TimeSeriesSplit(5)

    param_grid = {
        "max_depth": [1, 2, 3, 4, 5],

    fit_params = {
        "early_stopping_rounds": 50,
        "eval_set": [(X_val, y_val)],

    LGBM_base = LGBMRegressor(
    LGBM_model = model_selection.GridSearchCV(estimator=LGBM_base, param_grid=param_grid, cv=ts_cv), y_train, verbose=0, **fit_params)

    # Adjust boosting rounds for larger dataset
    n_stopping_rounds = LGBM_model.best_estimator_.best_iteration_
    n_sr_adjustment = len(X_train) / (len(X_train) + len(X_val))
    n_stopping_rounds *= 1 / n_sr_adjustment
    n_stopping_rounds = int(n_stopping_rounds)
    print(f"\n  n_boosting rounds selected: {n_stopping_rounds}")

    # Refit on validation set and training set
    X_train_val = pd.concat([X_train, X_val], axis=0)
    y_train_val = pd.concat([y_train, y_val], axis=0)

    print("\tRetrain on full training dataset")
    LGBM_base = LGBMRegressor(objective="poisson", n_estimators=n_stopping_rounds, learning_rate=base_learning_rate)
    LGBM_model = model_selection.GridSearchCV(estimator=LGBM_base, param_grid=param_grid, cv=ts_cv), y_train, verbose=0)

    # Print model params
    print("Best Params:")
    print(LGBM_model.best_estimator_.get_params(), "\n", "-" * 80)
    return LGBM_model

Train All Models

We then train a model for each of the targets previously specified.

['all_calls', 'False Alarm', 'Fire', 'Special Service']
models = {}
for target in targets:
    LGBM_model = make_model(X_train, X_val, y_train, y_val, target_name=target)
    models[target] = LGBM_model
Training all_calls model

  n_boosting rounds selected: 3864
    Retrain on full training dataset
Best Params:
{'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 1.0, 'importance_type': 'split', 'learning_rate': 0.1, 'max_depth': 1, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 3864, 'n_jobs': -1, 'num_leaves': 31, 'objective': 'poisson', 'random_state': None, 'reg_alpha': 0.0, 'reg_lambda': 0.0, 'silent': True, 'subsample': 1.0, 'subsample_for_bin': 200000, 'subsample_freq': 0}

Training False Alarm model

  n_boosting rounds selected: 2309
    Retrain on full training dataset
Best Params:
{'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 1.0, 'importance_type': 'split', 'learning_rate': 0.1, 'max_depth': 1, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 2309, 'n_jobs': -1, 'num_leaves': 31, 'objective': 'poisson', 'random_state': None, 'reg_alpha': 0.0, 'reg_lambda': 0.0, 'silent': True, 'subsample': 1.0, 'subsample_for_bin': 200000, 'subsample_freq': 0}

Training Fire model

  n_boosting rounds selected: 1765
    Retrain on full training dataset
Best Params:
{'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 1.0, 'importance_type': 'split', 'learning_rate': 0.1, 'max_depth': 2, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 1765, 'n_jobs': -1, 'num_leaves': 31, 'objective': 'poisson', 'random_state': None, 'reg_alpha': 0.0, 'reg_lambda': 0.0, 'silent': True, 'subsample': 1.0, 'subsample_for_bin': 200000, 'subsample_freq': 0}

Training Special Service model

  n_boosting rounds selected: 1297
    Retrain on full training dataset
Best Params:
{'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 1.0, 'importance_type': 'split', 'learning_rate': 0.1, 'max_depth': 2, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 1297, 'n_jobs': -1, 'num_leaves': 31, 'objective': 'poisson', 'random_state': None, 'reg_alpha': 0.0, 'reg_lambda': 0.0, 'silent': True, 'subsample': 1.0, 'subsample_for_bin': 200000, 'subsample_freq': 0}

Model Evaluation

We have now trained our machine learning models. Let’s analyse the model performance on the training, and test datasets.

Generate Predicted Vs Actual Performance

We start by generating a table of predicted vs actual call-outs for each of the models. We keep the train and test sets in separate DataFrames.

pred_vs_actual_train = y_train.copy()
for target, model in models.items():
    pred_vs_actual_train["pred_" + target] = model.predict(X_train)

pred_vs_actual_test = y_test.copy()
for target, model in models.items():
    pred_vs_actual_test["pred_" + target] = model.predict(X_test)
all_calls False Alarm Fire Special Service pred_all_calls pred_False Alarm pred_Fire pred_Special Service
2018-01-01 00:00:00 33 12 12 9 9.123232 4.237914 2.442630 3.347101
2018-01-01 01:00:00 33 6 22 5 7.569961 3.491113 2.207699 2.740476
2018-01-01 02:00:00 13 4 3 6 6.176294 2.965301 1.759040 1.861252
2018-01-01 03:00:00 10 2 3 5 4.792316 2.414905 1.291232 1.218071
2018-01-01 04:00:00 12 5 1 6 4.310484 2.143858 1.105697 1.199398

An initial plot of the predicted vs actual performance shows that the models seem to pick up the daily seasonality.

def plot_predicted_vs_actual(results_df):
    pred_cols = [col for col in results_df.columns if "pred" in col]
    actual_cols = [col for col in results_df.columns if "pred" not in col and "s7" not in col]

    fig, axes = plt.subplots(len(pred_cols), figsize=(15, 7))

    results_df["2018-06-01": "2018-06-07"][pred_cols].plot(ax=axes, subplots=True)
    results_df["2018-06-01": "2018-06-07"][actual_cols].plot(ax=axes, subplots=True, color="darkgrey");

    for i, ax in enumerate(axes):
        ax.set_ylabel(pred_cols[i], fontsize=10)

    axes[0].set_title("Predicted Vs Actual")



Baseline Models: Persistence Model

It is a very good idea when investigating machine learning problems to establish a simple base line model to compare the trained model’s performance against. Baseline models should be simple to implement and understand, and easily reproducible.

In this case a baseline model could be to use the “shifted” by 7 days features as predictions, the models performance can then be measured as the improvement in mean squared error relative to the base line model predictions.

In time series analysis this sort of baseline model is sometimes referred to as a persistence model. Persistence is a term used in time series analysis to capture the notion that observation is closely related to another earlier observation.

pred_vs_actual_test = pred_vs_actual_test.merge(X_test[[col for col in X_test.columns if "s7" in col]],
                                                left_index=True, right_index=True)

pred_vs_actual_train = pred_vs_actual_train.merge(X_train[[col for col in X_train.columns if "s7" in col]],
                                                 left_index=True, right_index=True)

Model performance vs baseline

The code below generates some baseline performance comparisons for each of four individual models, plus an ensemble model which combines the individual call-out types and is evaluated against the “all_calls” target.

def get_summary_metrics(pred_vs_actual_test):
    summary_metrics = []
    for i, target in enumerate(targets):
        persistence_mse = metrics.mean_squared_error(pred_vs_actual_test[target],
                                                     pred_vs_actual_test["s7_" + target])
        evaluation_mse = metrics.mean_squared_error(pred_vs_actual_test[target],
                                                     pred_vs_actual_test["pred_" + target])

        target_metrics = {
            "target": target,
            "persistence_mse": persistence_mse,
            "evaluation_mse": evaluation_mse,

        if target == "all_calls":
            ensemble = pred_vs_actual_test[['pred_False Alarm', 'pred_Fire', 'pred_Special Service']].sum(axis=1)
            ensemble_mse = metrics.mean_squared_error(pred_vs_actual_test["all_calls"], ensemble)
            target_metrics.update({"ensemble_mse": ensemble_mse})


    summary_metrics = pd.DataFrame(summary_metrics).set_index("target")

    summary_metrics["pc_mse_delta_indiv_vs_persist"] = (summary_metrics.evaluation_mse
                                                    / summary_metrics.persistence_mse - 1)
    summary_metrics["pc_mse_delta_ensemble_vs_persist"] = (summary_metrics.ensemble_mse
                                                  / summary_metrics.persistence_mse - 1)
    return summary_metrics

The metrics suggest that the new model delivers a ca. 50% reduction in mean squared error relative to the the persistence model.

Interestingly the ensemble model out performances the single all_claims model: four models are indeed better than one.

ensemble_mse evaluation_mse persistence_mse pc_mse_delta_indiv_vs_persist pc_mse_delta_ensemble_vs_persist
all_calls 24.182435 24.468096 47.337854 -0.483118 -0.489152
False Alarm NaN 7.166391 13.850825 -0.482602 NaN
Fire NaN 2.967179 5.750393 -0.484004 NaN
Special Service NaN 11.180652 22.030660 -0.492496 NaN

Performance Visualisations

The code below generates scatter plots of predicted vs actual call-outs, and computes the mean squared error, and Pearson’s correlation coefficient for the dataset.

def plot_scatter_plots(pred_vs_actual_test, pred_vs_actual_train, targets, persistence_flag=None,
    fig, axes = plt.subplots(len(targets), 2, figsize=(15, 25))

    for i, target in enumerate(targets):
        relevent_cols = [name for name in pred_vs_actual_test.columns if target in name]

        pred_col = "pred_" + target if persistence_flag is None else persistence_flag + "_" + target

        # Test set results
        lim = max(pred_vs_actual_test[target])
        axes[i][1].scatter(pred_vs_actual_test[target], pred_vs_actual_test[pred_col], alpha=0.25, color=color)
        axes[i][1].set_ylabel("Predicted: Calls per day")
        axes[i][1].set_xlabel("Actual: Calls per day")
        axes[i][1].set_xlim(0, lim)
        axes[i][1].set_ylim(0, lim)
        axes[i][1].plot([0, lim], [0, lim], color="r", linestyle="--")
        pearsons_r = stats.pearsonr(pred_vs_actual_test[target], pred_vs_actual_test[pred_col])[0]
        mse = metrics.mean_squared_error(pred_vs_actual_test[target], pred_vs_actual_test[pred_col])
        axes[i][1].set_title(f"Test Set: {target} Pearsons: {pearsons_r:,.3f}, MSE: {mse:,.2f}")

        # Train set results
        lim = max(pred_vs_actual_test[target])
        axes[i][0].scatter(pred_vs_actual_train[target], pred_vs_actual_train[pred_col], alpha=0.25, color=color)
        axes[i][0].set_ylabel("Predicted: Calls per day")
        axes[i][0].set_xlabel("Actual: Calls per day")
        axes[i][0].set_xlim(0, lim)
        axes[i][0].set_ylim(0, lim)
        axes[i][0].plot([0, lim], [0, lim], color="r", linestyle="--")
        pearsons_r = stats.pearsonr(pred_vs_actual_train[target], pred_vs_actual_train[pred_col])[0]
        mse = metrics.mean_squared_error(pred_vs_actual_train[target], pred_vs_actual_train[pred_col])
        axes[i][0].set_title(f"Train Set: {target}: {pearsons_r:,.3f}, MSE: {mse:,.2f}")

plot_scatter_plots(pred_vs_actual_test, pred_vs_actual_train, targets)


Results for the persistence model

The persistence model results are plotted below, for reference/comparison.

plot_scatter_plots(pred_vs_actual_test, pred_vs_actual_train, targets, persistence_flag="s7", color="darkgrey")


Feature Importance

Plotting the feature importances for each model we find that the feature importance varies between models. The shifted variables seem consistently important but interesting weather related features are also important in the case of fire and special service call-outs.

def plot_feature_importances(models, cols):
    no_models = len(models)
    fig, axes = plt.subplots(no_models, figsize=(15, 6 * no_models))

    for i, (title, model) in enumerate(models.items()):
        feature_importances = pd.Series(model.best_estimator_.feature_importances_, index=cols)
        feature_importances.sort_values(inplace=True, ascending=False)
        feature_importances[:50][i], color="dodgerblue")
        axes[i].set_title(f"Feature Importance: {title.title()}")
        axes[i].set_xlabel("Feature Names")

plot_feature_importances(models, cols=X_train.columns)



Thanks for taking the time to read the post. There’s still a lot we can do to improve the model, particularly the hyper-parameter tuning and further feature engineering, but hopefully this post illustrated some interesting practical approaches to modelling time series with gradient boosted models.

Regards, James