Reading Time: 17mins, First Published: Sat, Oct 6, 2018
Time-Series
Machine-Learning
Weather
Data Science


This post explores the application of gradient boosted algorithms to time-series modelling, as we try to predict the number of London Fire Brigade Call outs using seasonal patterns, and 3rd party weather data.

Key Topics:

  • Lagged or shifted variables
  • Capturing seasonality and “holiday”/“event” patterns in categorical features
  • Building in 3rd party data
  • Persistence models and performance benchmarking
  • Ensembles

It will be helpful for readers to a foundational to intermediate knowledge of data science techniques, in order to follow along with this post.

Introduction


The London Fire Brigade


The London Fire Brigade (LFB) is the statutory fire and rescue service for London. It was formed by the Metropolitan Fire Brigade Act of 1865, under the leadership of superintendent Eyre Massey Shaw.

It is the second-largest of all the fire services in the United Kingdom, after the national Scottish Fire and Rescue Service and the fifth-largest in the world, after the Tokyo Fire Department, New York City Fire Department, Paris Fire Brigade and the Scottish service, with 5,992 staff, including 5,096 operational firefighters and officers based at 102 fire stations (plus one river station).

In 2013 to 14 the LFB handled 171,067 999 emergency calls. Of the calls it actually mobilised to, 20,934 were fires, including 10,992 that were of a serious nature, making it one of the busiest fire brigades in the world. In the same 12-month period, it received 3,172 hoax calls, the highest number of any UK fire service, but crews were mobilised to only 1,424 of them. In 2015 to 16 the LFB received 171,488 emergency calls. These consisted of: 20,773 fires, 30,066 special service callouts, and 48,696 false alarms.

As well as firefighting, the LFB also responds to road traffic collisions, floods, trapped-in-lift releases, and other incidents such as those involving hazardous materials or major transport accidents. It also conducts emergency planning and performs fire safety inspections and education. It does not provide an ambulance service as this function is performed by the London Ambulance Service as an independent NHS trust, although all LFB firefighters are trained in first aid and all of its fire engines carry first aid equipment. Since 2016, the LFB responds to provide first aid for some life-threatening medical emergencies (e.g. cardiac or respiratory arrest).

https://en.wikipedia.org/wiki/London_Fire_Brigade

The Objective


Predicting the number of hourly call-outs for the next week

In this analysis we will build a series of machine learning models capable of predicting the number of hourly call-outs for different types incident.

The various call-out types that the London Fire Brigade respond to are classified in to a number of sub-categories:

  • Special Services
  • False Alarms
  • Fire

Different call-out types are likely to have different causes. So we will look at two approaches to the problem of modelling call frequencies.

  1. A single model that predicts the total number of call-outs regardless of type
  2. Separate models for different callout types, which could be aggregated or “ensembled” to predict the total number of call-outs:
    • Special services
    • False alarms
    • Fire

The Data


Fire Brigade Call-out Information

Call-out information for the London Fire Brigade is available from the London Data Store.

The data used in this analysis is 2013 onwards, but data is available back to 2009.

Weather Data

Weather is one factor that we might expect to influence the number of call-outs, particularly in the case of flood incidents, but we could also speculate that weather is likely to effect the number of traffic collisions, and potentially fires.

To test this theory we will supplement the call-out information with hourly weather data. As I was tight on time when putting together this post, I opted to purchase a batch of hourly weather data from OpenWeatherMap for $10; however, there may be freely available alternatives.

We assume that similar data would be available via forecasts.

Data Modelling


The Algorithm: Gradient Boosted Modelling

This post will ulilise the LightGBM implementation of Gradient Boosting, you can read more about LightGBM here. I won’t go in the details of the implementation in this post, but a key advantage of this algorithm over the XGBoost is that the LightGBM algorithm trains faster and provides very similar results in terms of performance.

For this project we will be making use of the Python programming language, and we will utilise the following libraries and tool kits.

Libraries


# Gradient Boost Regression Model
import lightgbm
from lightgbm.sklearn import LGBMRegressor

# Visualisation tools
import matplotlib.pyplot as plt
import matplotlib as mpl
import missingno as msno
import seaborn as sns

# Numerical libraries and data analysis
import numpy as np
import pandas as pd
from scipy import stats

# Machine learning libraries
from sklearn import model_selection, metrics

%matplotlib inline
plt.style.use("bmh")

pd.options.display.max_columns = 500

Load London Fire Brigade Dataset


Large datasets will load faster from pickles than from Excel, so to save time when we re-run the notebook let’s pickle our dataframes, and load from the pickles next time.

try:
    lfb_df_2017 = pd.read_pickle("LFB Incident data from January 2017.pickle")
    lfb_df_2013_2016 = pd.read_pickle("LFB Incident data from January 2013 to December 2016.pickle")

except FileNotFoundError:
    lfb_df_2017 = pd.ExcelFile("LFB Incident data from January 2017.xlsx").parse(0)
    lfb_df_2013_2016 = pd.ExcelFile("LFB Incident data from January 2013 to December 2016.xlsx").parse(0)

    lfb_df_2017.to_pickle("LFB Incident data from January 2017.pickle")
    lfb_df_2013_2016.to_pickle("LFB Incident data from January 2013 to December 2016.pickle")

Let’s take a quick look at the London Fire Brigade dataset.

lfb_df_2013_2016.head()

There’s quite a lot of data here, and there’s certainly a lot of scope for investigating business questions outside of predicting call-outs, for example arrival times, which is something I’ve looked at before.

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

Libraries like missingno offer a simple method of visualising the completeness of data

msno.matrix(lfb_df_2013_2016);

png

msno.matrix(lfb_df_2017);

png

msno.bar(lfb_df_2013_2016);

png

msno.bar(lfb_df_2017);

png

Generate Call-outs per Day Target Variables


The original dataset contains a lot of information about each call-out. Unfortunately much of this information is not usable feature information for the purposes of this project, as this data would not be known before the incident has occurred.

The columns that we will use are the DateOfCall, and Incident group information. We will use this data to generate the target variables for the project.

Target Variable: Number of call-outs per hour

def get_target_variables(lfb_df_2013_2016, lfb_df_2017):
    call_outs_2013_to_2016 = _get_categorised_call_out_table(lfb_df_2013_2016)
    call_outs_2017_onwards = _get_categorised_call_out_table(lfb_df_2017)
    combined_call_out_data = pd.concat([call_outs_2013_to_2016, call_outs_2017_onwards], axis=0)
    call_out_per_hour = _get_call_outs_per_hour(combined_call_out_data)
    return call_out_per_hour

def _get_categorised_call_out_table(df):
    df.dropna(subset=["IncidentGroup"], inplace=True)
    time_stamp_str = df.DateOfCall.astype(str) +" " + df.TimeOfCall.astype(str)
    call_time = pd.to_datetime(time_stamp_str).to_frame(name="call_ds")
    dummy_incident_groups = pd.get_dummies(df.IncidentGroup)
    call_time = pd.concat([dummy_incident_groups, call_time], axis=1, verify_integrity=True)
    call_time["all_calls"] = call_time.sum(axis=1)
    return call_time

def _get_call_outs_per_hour(call_out_df):
    call_out_df.call_ds = pd.to_datetime(call_out_df.call_ds)
    call_out_per_hour = call_out_df.set_index("call_ds").resample("h").agg(np.count_nonzero)
    return call_out_per_hour

call_out_per_hour = get_target_variables(lfb_df_2013_2016, lfb_df_2017)

The table below illustrates the basic data structure of callouts per hour by type.

call_out_per_hour.head()

If we calculate the correlation between the different causes we can see that the correlation between the different incident types is less than 0.4.

call_out_per_hour.corr()
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

call_out_per_hour.sum()
False Alarm        280392
Fire               114889
Special Service    175070
all_calls          570351
dtype: int64

Feature Engineering


On first observation the call_out_per_hour DataFrame does not have much scope for feature engineering, but there’s actually a few simple techniques to generate some features that can be used with the majority of time series datasets.

Seasonal Features


We can try and capture seasonal trends in the data by engineering a set of categorical features based on the time of call information. Below we create features to capture: hourly, weekly, monthly, and annual trends.

Both cyclical and “holiday” event trends can be captured in this manner.

def get_seasonal_features(call_out_per_hour):
    call_out_per_hour["hour"] = call_out_per_hour.index.hour
    call_out_per_hour["day_of_week"] = call_out_per_hour.index.dayofweek
    call_out_per_hour["month"] = call_out_per_hour.index.month
    call_out_per_hour["year"] = call_out_per_hour.index.year
    call_out_per_hour["new_years"] = call_out_per_hour.index.dayofyear == 1
    call_out_per_hour["guy_fawkes"] = (call_out_per_hour.index.month == 11) & (call_out_per_hour.index.day == 5)
    call_out_per_hour
    return call_out_per_hour

call_out_per_hour = get_seasonal_features(call_out_per_hour)
call_out_per_hour.head()
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

def plot_seasonal_violin_patterns(df, x, y):
    fig, axes = plt.subplots(figsize=(15, 5))
    sns.violinplot(x=x, y=y, data=df, ax=axes)
    axes.set_title(f"Violin plot {y} by {x}")

plot_seasonal_violin_patterns(call_out_per_hour, "hour", "all_calls")

png

Shifted Target Features


Another source of feature information is what I would call shifted target information or lagged features; that is we can look a recent, or relevant, observation of the thing we are trying to predict, and use this as a feature in the model.

The important principle is that your model data must not contain information that would not be available at the time of prediction so a one day shift is only available one day before. So your model is only predicting one day ahead. If you wanted to forecast calls for the entire year upfront, you would not have access to the one day shifted information, and so you would need to make appropriate modifications to the shift i.e. 365 day shift.

In the introduction we formulated the business problem as trying to predict seven days ahead so I have only included shifted targets that would be available 7 days before.

We will also examine the impact on model performance of leaving these features out of the model, as leaving out the shifted variables will make the model’s use less restricted.

shifted_targets = call_out_per_hour[["False Alarm", "Fire", "Special Service", "all_calls"]].shift(24 * 7)
shifted_targets.columns = [f"s7_{name}" for name in shifted_targets.columns]
shifted_targets.tail()
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

call_out_per_hour = pd.concat([call_out_per_hour, shifted_targets], axis=1)

# Drop the rows where we could not generated the shifted features
call_out_per_hour.dropna(inplace=True)

A simple analysis of correlation between the number of call-outs in a given hour, and the number of call-outs per hour 7 days prior reveals a reasonable amount of correlation.

np.corrcoef(call_out_per_hour.s7_all_calls, call_out_per_hour.all_calls)
array([[1.        , 0.53896767],
       [0.53896767, 1.        ]])

And this is confirmed if we take a look at a small sample of 7 day shifted values

def plot_seven_day_shifted_calls(df):
    fig, axes = plt.subplots(figsize=(15, 5))
    df[["s7_all_calls", "all_calls"]]["2018-01-01": "2018-01-7"].plot(ax=axes, marker="o", alpha=0.8)
    axes.set_ylabel("Calls Per Hour")

plot_seven_day_shifted_calls(call_out_per_hour)

png

Exploratory Data Analysis: Target Data


Let’s do a very brief Exploratory Data Analysis (EDA) of our target data. We can start by plotting the average number of daily call-outs per week for each call type, and all call types.

We can observe some periodic features in the data, more call-outs seem to occur during the summer months. We also note some large spikes in the data.

def plot_average_daily_calls_per_period(call_out_df, period, title="", start=None, stop=None):
    fig, axes = plt.subplots(figsize=(15, 5))
    avg_call_out = call_out_df[["False Alarm", "Fire", "Special Service", "all_calls"]].resample(period).mean()
    avg_call_out[start: stop].plot(ax=axes)
    axes.set_title(title)
    axes.legend(loc="upper left")
    axes.set_xlabel("Date Stamp")
    axes.set_ylabel(f"Average calls per {period}");

plot_average_daily_calls_per_period(call_out_per_hour, "W", title="Average call-outs per day (weekly sampling)")

png

Hourly call-outs over a seven day period


When we look at hourly resolution data we find that there appears to be cyclical trends in the dataset

plot_average_daily_calls_per_period(call_out_per_hour, "h", title="Call-outs per hour",
                                    start="2017-01-01", stop="2017-01-7")

png

This pattern is more clearly seen if we average the observations over a 3 hour period.

plot_average_daily_calls_per_period(call_out_per_hour, "3h", title="Average call-outs per 2 hour window",
                                    start="2017-01-01", stop="2017-01-7")

png

3rd Party Data: Weather Data


Many of the call outs seemed to be weather related, this is particularly true of some of the spikes in the call out dataset. For this reason I thought it would be interesting to introduce some third-party weather data into the analysis.

The weather data used in the model was purchased from the OpenWeatherMap website at a cost of $10, although you may be able to find free datasets in the public domain, this seemed liked a reasonable cost.

As the dataset is proprietary I will not be including any graphics which would give away the full content of the dataset, however the code below details the fields from the data which I have selected for use as features in the model.

Weather data used for modelling


weather = pd.read_csv("dd2ab270f5df36b94f37178fbabf8c73.csv")
weather.dt_iso = pd.to_datetime(weather.dt_iso.str.replace("\+0000 UTC", ""))

owm_cols = ['dt_iso', 'temp', 'temp_min', 'temp_max', 'pressure', 'humidity', 'wind_speed', 'wind_deg',
            'rain_1h', 'rain_3h', 'rain_24h','rain_today', 'snow_1h', 'snow_3h', 'snow_24h', 'snow_today',
            'clouds_all', 'weather_id']
owm_london = weather[owm_cols]

Make Model Dataset


Merge Callout and Weather Dataset


Now that we have our seasonal, shifted, and weather features we just to need to join the datasets together, here we make use of the pandas merge_asof function.

We also fill any missing values with -999.

def get_model_data(call_out_per_hour, owm_london):
    model_data = pd.merge_asof(call_out_per_hour.reset_index(), owm_london, left_on="call_ds", right_on="dt_iso")
    model_data = model_data.drop(["dt_iso"], axis=1)
    model_data.fillna(-999, inplace=True)
    model_data.set_index("call_ds", inplace=True)
    return model_data

model_data = get_model_data(call_out_per_hour, owm_london)
model_data.head(0)
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

Generate Dummy Categorical Variables


LightGBM has a nice feature to allow you to declare columns as categorical and avoid one-hot encoding the categorical variables, however I received a number of warning messages when using this feature in the setup below, so I have opted to utilise the pandas get_dummies as a “quick and dirty” alternative.

dummified_cols = []
categorical_vars = ["hour", "day_of_week", "month", "year", "weather_id"]
for cat in categorical_vars:
    dummified_cols.append(pd.get_dummies(model_data[cat], prefix=cat+"_"))
model_data = pd.concat([model_data.drop(categorical_vars, axis=1)] + dummified_cols, axis=1)

The final model data


model_data.shape
(48744, 113)
model_data.head(0)
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}


Machine Learning


Train test splits


For time-series data it is sensible to have an “out of time” test set. Here we hold out the 2018 as a test set, and we use the 2017 and prior data as a training set.

We also generate a validation set from the training data. The validation set will be used in conjunction with the early stopping procedure, described below.

targets = ["all_calls", "False Alarm", "Fire", "Special Service"]
X, y = model_data.drop(targets, axis=1), model_data[targets]
X_train, X_test, y_train, y_test = X[:"2017"], X["2018":], y[:"2017"], y["2018":]

# Early stopping
X_train, X_val, y_train, y_val = X_train[:"2016"], X_train["2017"], y_train[:"2016"], y_train["2017"]

Fit Gradient Boosted Model


We create an instance of the LGBMRegressor class, which provides an sklearn style interface for LightGBM, we initiate the model with n_estimators @ 50,000 but we also set the early stopping kwarg in the fit parameters dictionary, which allows the the model to choose to stop early if no improvement on the validation set is observed after 50 boosting rounds.

We initiate the model with a fixed learning rate, a lower learning rate will result in a slower training time as the optimal number of estimators is inversely proportional to learning rate, and a greater number of estimators will result in slower training times.

The code also includes a basic grid-search to tune the hyper-parameters. This an area of the modelling that could easily be improved. We have also used a time-series split rather than a standard k-fold split.

Basic Modelling script


The make_model function houses all of our model training code, as we are going to create multiple machine learning models: All Call-outs, Fire, Special Services, and False Alarm, we want to wrap this code in a function to avoid repeating ourselves.

def make_model(X_train, X_val, y_train, y_val, target_name):
    # Select target
    print(f"\nTraining {target_name} model")
    y_train, y_val = y_train[target_name], y_val[target_name]

    # Early Stopping
    base_learning_rate = 0.1

    ts_cv = model_selection.TimeSeriesSplit(5)

    param_grid = {
        "max_depth": [1, 2, 3, 4, 5],
    }

    fit_params = {
        "early_stopping_rounds": 50,
        "eval_set": [(X_val, y_val)],
    }

    LGBM_base = LGBMRegressor(
        objective="poisson",
        n_estimators=50000,
        learning_rate=base_learning_rate,
        importance_type="gain"
    )
    LGBM_model = model_selection.GridSearchCV(estimator=LGBM_base, param_grid=param_grid, cv=ts_cv)
    LGBM_model.fit(X_train, y_train, verbose=0, **fit_params)

    # Adjust boosting rounds for larger dataset
    n_stopping_rounds = LGBM_model.best_estimator_.best_iteration_
    n_sr_adjustment = len(X_train) / (len(X_train) + len(X_val))
    n_stopping_rounds *= 1 / n_sr_adjustment
    n_stopping_rounds = int(n_stopping_rounds)
    print(f"\n  n_boosting rounds selected: {n_stopping_rounds}")

    # Refit on validation set and training set
    X_train_val = pd.concat([X_train, X_val], axis=0)
    y_train_val = pd.concat([y_train, y_val], axis=0)

    print("\tRetrain on full training dataset")
    LGBM_base = LGBMRegressor(objective="poisson", n_estimators=n_stopping_rounds, learning_rate=base_learning_rate)
    LGBM_model = model_selection.GridSearchCV(estimator=LGBM_base, param_grid=param_grid, cv=ts_cv)
    LGBM_model.fit(X_train, y_train, verbose=0)

    # Print model params
    print("Best Params:")
    print(LGBM_model.best_estimator_.get_params(), "\n", "-" * 80)
    return LGBM_model

Train All Models


We then train a model for each of the targets previously specified.

print(targets)
['all_calls', 'False Alarm', 'Fire', 'Special Service']
models = {}
for target in targets:
    LGBM_model = make_model(X_train, X_val, y_train, y_val, target_name=target)
    models[target] = LGBM_model
Training all_calls model

  n_boosting rounds selected: 3864
    Retrain on full training dataset
Best Params:
{'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 1.0, 'importance_type': 'split', 'learning_rate': 0.1, 'max_depth': 1, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 3864, 'n_jobs': -1, 'num_leaves': 31, 'objective': 'poisson', 'random_state': None, 'reg_alpha': 0.0, 'reg_lambda': 0.0, 'silent': True, 'subsample': 1.0, 'subsample_for_bin': 200000, 'subsample_freq': 0}
 --------------------------------------------------------------------------------

Training False Alarm model

  n_boosting rounds selected: 2309
    Retrain on full training dataset
Best Params:
{'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 1.0, 'importance_type': 'split', 'learning_rate': 0.1, 'max_depth': 1, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 2309, 'n_jobs': -1, 'num_leaves': 31, 'objective': 'poisson', 'random_state': None, 'reg_alpha': 0.0, 'reg_lambda': 0.0, 'silent': True, 'subsample': 1.0, 'subsample_for_bin': 200000, 'subsample_freq': 0}
 --------------------------------------------------------------------------------

Training Fire model

  n_boosting rounds selected: 1765
    Retrain on full training dataset
Best Params:
{'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 1.0, 'importance_type': 'split', 'learning_rate': 0.1, 'max_depth': 2, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 1765, 'n_jobs': -1, 'num_leaves': 31, 'objective': 'poisson', 'random_state': None, 'reg_alpha': 0.0, 'reg_lambda': 0.0, 'silent': True, 'subsample': 1.0, 'subsample_for_bin': 200000, 'subsample_freq': 0}
 --------------------------------------------------------------------------------

Training Special Service model

  n_boosting rounds selected: 1297
    Retrain on full training dataset
Best Params:
{'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 1.0, 'importance_type': 'split', 'learning_rate': 0.1, 'max_depth': 2, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 1297, 'n_jobs': -1, 'num_leaves': 31, 'objective': 'poisson', 'random_state': None, 'reg_alpha': 0.0, 'reg_lambda': 0.0, 'silent': True, 'subsample': 1.0, 'subsample_for_bin': 200000, 'subsample_freq': 0}
 --------------------------------------------------------------------------------

Model Evaluation


We have now trained our machine learning models. Let’s analyse the model performance on the training, and test datasets.

Generate Predicted Vs Actual Performance


We start by generating a table of predicted vs actual call-outs for each of the models. We keep the train and test sets in separate DataFrames.

pred_vs_actual_train = y_train.copy()
for target, model in models.items():
    pred_vs_actual_train["pred_" + target] = model.predict(X_train)

pred_vs_actual_test = y_test.copy()
for target, model in models.items():
    pred_vs_actual_test["pred_" + target] = model.predict(X_test)
pred_vs_actual_test.head()
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

An initial plot of the predicted vs actual performance shows that the models seem to pick up the daily seasonality.

def plot_predicted_vs_actual(results_df):
    pred_cols = [col for col in results_df.columns if "pred" in col]
    actual_cols = [col for col in results_df.columns if "pred" not in col and "s7" not in col]

    fig, axes = plt.subplots(len(pred_cols), figsize=(15, 7))

    results_df["2018-06-01": "2018-06-07"][pred_cols].plot(ax=axes, subplots=True)
    results_df["2018-06-01": "2018-06-07"][actual_cols].plot(ax=axes, subplots=True, color="darkgrey");

    for i, ax in enumerate(axes):
        ax.set_ylabel(pred_cols[i], fontsize=10)

    axes[0].set_title("Predicted Vs Actual")

plot_predicted_vs_actual(pred_vs_actual_test)

png

Baseline Models: Persistence Model


It is a very good idea when investigating machine learning problems to establish a simple base line model to compare the trained model’s performance against. Baseline models should be simple to implement and understand, and easily reproducible.

In this case a baseline model could be to use the “shifted” by 7 days features as predictions, the models performance can then be measured as the improvement in mean squared error relative to the base line model predictions.

In time series analysis this sort of baseline model is sometimes referred to as a persistence model. Persistence is a term used in time series analysis to capture the notion that observation is closely related to another earlier observation.

pred_vs_actual_test = pred_vs_actual_test.merge(X_test[[col for col in X_test.columns if "s7" in col]],
                                                left_index=True, right_index=True)

pred_vs_actual_train = pred_vs_actual_train.merge(X_train[[col for col in X_train.columns if "s7" in col]],
                                                 left_index=True, right_index=True)

Model performance vs baseline


The code below generates some baseline performance comparisons for each of four individual models, plus an ensemble model which combines the individual call-out types and is evaluated against the “all_calls” target.

def get_summary_metrics(pred_vs_actual_test):
    summary_metrics = []
    for i, target in enumerate(targets):
        persistence_mse = metrics.mean_squared_error(pred_vs_actual_test[target],
                                                     pred_vs_actual_test["s7_" + target])
        evaluation_mse = metrics.mean_squared_error(pred_vs_actual_test[target],
                                                     pred_vs_actual_test["pred_" + target])

        target_metrics = {
            "target": target,
            "persistence_mse": persistence_mse,
            "evaluation_mse": evaluation_mse,
        }

        if target == "all_calls":
            ensemble = pred_vs_actual_test[['pred_False Alarm', 'pred_Fire', 'pred_Special Service']].sum(axis=1)
            ensemble_mse = metrics.mean_squared_error(pred_vs_actual_test["all_calls"], ensemble)
            target_metrics.update({"ensemble_mse": ensemble_mse})


        summary_metrics.append(
            target_metrics
        )

    summary_metrics = pd.DataFrame(summary_metrics).set_index("target")

    summary_metrics["pc_mse_delta_indiv_vs_persist"] = (summary_metrics.evaluation_mse
                                                    / summary_metrics.persistence_mse - 1)
    summary_metrics["pc_mse_delta_ensemble_vs_persist"] = (summary_metrics.ensemble_mse
                                                  / summary_metrics.persistence_mse - 1)
    return summary_metrics

The metrics suggest that the new model delivers a ca. 50% reduction in mean squared error relative to the the persistence model.

Interestingly the ensemble model out performances the single all_claims model: four models are indeed better than one.

get_summary_metrics(pred_vs_actual_test)
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

Performance Visualisations


The code below generates scatter plots of predicted vs actual call-outs, and computes the mean squared error, and Pearson’s correlation coefficient for the dataset.

def plot_scatter_plots(pred_vs_actual_test, pred_vs_actual_train, targets, persistence_flag=None,
                       color="dodgerblue"):
    fig, axes = plt.subplots(len(targets), 2, figsize=(15, 25))

    for i, target in enumerate(targets):
        relevent_cols = [name for name in pred_vs_actual_test.columns if target in name]

        pred_col = "pred_" + target if persistence_flag is None else persistence_flag + "_" + target

        # Test set results
        lim = max(pred_vs_actual_test[target])
        axes[i][1].scatter(pred_vs_actual_test[target], pred_vs_actual_test[pred_col], alpha=0.25, color=color)
        axes[i][1].set_title(target)
        axes[i][1].set_ylabel("Predicted: Calls per day")
        axes[i][1].set_xlabel("Actual: Calls per day")
        axes[i][1].set_xlim(0, lim)
        axes[i][1].set_ylim(0, lim)
        axes[i][1].plot([0, lim], [0, lim], color="r", linestyle="--")
        pearsons_r = stats.pearsonr(pred_vs_actual_test[target], pred_vs_actual_test[pred_col])[0]
        mse = metrics.mean_squared_error(pred_vs_actual_test[target], pred_vs_actual_test[pred_col])
        axes[i][1].set_title(f"Test Set: {target} Pearsons: {pearsons_r:,.3f}, MSE: {mse:,.2f}")

        # Train set results
        lim = max(pred_vs_actual_test[target])
        axes[i][0].scatter(pred_vs_actual_train[target], pred_vs_actual_train[pred_col], alpha=0.25, color=color)
        axes[i][0].set_title(target)
        axes[i][0].set_ylabel("Predicted: Calls per day")
        axes[i][0].set_xlabel("Actual: Calls per day")
        axes[i][0].set_xlim(0, lim)
        axes[i][0].set_ylim(0, lim)
        axes[i][0].plot([0, lim], [0, lim], color="r", linestyle="--")
        pearsons_r = stats.pearsonr(pred_vs_actual_train[target], pred_vs_actual_train[pred_col])[0]
        mse = metrics.mean_squared_error(pred_vs_actual_train[target], pred_vs_actual_train[pred_col])
        axes[i][0].set_title(f"Train Set: {target}: {pearsons_r:,.3f}, MSE: {mse:,.2f}")


plot_scatter_plots(pred_vs_actual_test, pred_vs_actual_train, targets)

png

Results for the persistence model


The persistence model results are plotted below, for reference/comparison.

plot_scatter_plots(pred_vs_actual_test, pred_vs_actual_train, targets, persistence_flag="s7", color="darkgrey")

png

Feature Importance


Plotting the feature importances for each model we find that the feature importance varies between models. The shifted variables seem consistently important but interesting weather related features are also important in the case of fire and special service call-outs.

def plot_feature_importances(models, cols):
    no_models = len(models)
    fig, axes = plt.subplots(no_models, figsize=(15, 6 * no_models))

    for i, (title, model) in enumerate(models.items()):
        feature_importances = pd.Series(model.best_estimator_.feature_importances_, index=cols)
        feature_importances.sort_values(inplace=True, ascending=False)
        feature_importances[:50].plot.bar(ax=axes[i], color="dodgerblue")
        axes[i].set_title(f"Feature Importance: {title.title()}")
        axes[i].set_ylabel("Importance")
        axes[i].set_xlabel("Feature Names")
    fig.tight_layout()

plot_feature_importances(models, cols=X_train.columns)

png

Conclusion


Thanks for taking the time to read the post. There’s still a lot we can do to improve the model, particularly the hyper-parameter tuning and further feature engineering, but hopefully this post illustrated some interesting practical approaches to modelling time series with gradient boosted models.

Regards, James