Reading Time: 211mins, First Published: Mon, May 7, 2018
Folium
Geographic-Data
Data-Science


As far as analytic techniques go, visualising geographic information using maps has endured the test of time. Map making was once the art of the skilled cartographer, however modern technologies mean that creating rich interactive visualisations in only a few lines of code away.

In this post we will develop a interactive mapping tool using the Folium Python library to assist in analysing geographic risk.

What will we create in this tutorial?


In this tutorial we will use the Folium Python library to generate an interactive map of vehicle accidents in the United Kingdom. The map will split the UK into a grid and provide aggregate accident figures for each grid section. We will also colour code each section of the grid, based on the number of accidents that have occurred within the grid. Finally each individual accident will be represented using a marker detailing some basic accident statistics, and these markers will also be clustered using Folium’s marker clustering tool.

The finished map will look like this (full screen):

What is Folium?


Folium builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the Leaflet.js library. Manipulate your data in Python, then visualize it in on a Leaflet map via Folium.

Installing Folium


Folium can be be easily installed via pip

pip install folium

Getting the data


Data for this tutorial is taken from Road Safety Data (data.gov.uk)

These files provide detailed road safety data about the circumstances of personal injury road accidents in GB from 1979, the types (including Make and Model) of vehicles involved and the consequential casualties. The statistics relate only to personal injury accidents on public roads that are reported to the police, and subsequently recorded, using the STATS19 accident reporting form.

Following along

If you would like to follow the code in this tutorial Jupyter Notebook provides inline plotting of Folium maps so is an ideal choice for playing around with Folium.

Loading the libraries


The key libraries utilised in this tutorial are Pandas: for data manipulation, and of course Folium: for generating the interactive maps. We will also make use of: the json library to encode Python objects in JavaScript Object Notation (JSON) (a lightweight data-interchange format), and matplotlib’s colour map, and rgb to hex functionality.

import json

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
import folium
from folium.plugins import MarkerCluster

Your first map


Generating a basic map using Folium is very straight forward. The following code generates a google maps style interactive map.

m = folium.Map(location=[51.5, -0.1], zoom_start=10)

Adding map markers


To add a marker to the map you first create an instance of the Folium Marker class, we can include an optional “popup” which will display when the user clicks on the marker, we can also specify the icon type.  We then add the marker to the map using the add_child method.

m = folium.Map(location=[51.5, -0.1], zoom_start=10)

popup = "London"
london_marker = folium.Marker([51.5, -0.1], popup=popup)

m.add_child(london_marker)

m

Generating marker clusters


A marker cluster aggregates together individual markers into a single aggregate cluster icon.

But before we create our first marker cluster, we need to load the accident data. To load the accident data we utilise the pandas library’s read_csv() method, the read_csv method reads the accident data from the csv file and into a pandas dataframe.

There are literally tens of thousands of accidents in the data set, for the purposes of this tutorial I have trimmed the dataset down to 500 accident examples. The pandas library offers all the tools we need to easily take a random subset of the dataframe. We simple call the sample() method, specifying the number of samples required n. Finally we can drop any missing lat, lon, values from the dataframe using the dropna() method.

accident_data = pd.read_csv("Accidents_2015.csv")
accident_data = accident_data.sample(n=500, random_state=42)
accident_data.dropna(subset=["Latitude", "Longitude"], inplace=True)

In order to generate the marker cluster we utilise the MarkerCluster object which can be imported from folium.plugins. We instantiate the MarkerCluster using a list of locations: simply the latitude and longitude of each marker. Optionally a list of icons for each marker in the cluster, and a list of popups can also be provided.

The code below below:

  • Initiates a folium Map as m
  • generates a list of locations by zipping together the lat, lon columns of the accidents dataframe, and converting the zip object to a list of tuples.
  • uses a list comprehension to generate a list of folium icons, one for each accident, within the list comprehension we customise the image on the icon by passing in a prefix=”fa” indicate that we want to use one of the font-awesome icons , and icon=”car” to denote a car icon.
  • we create a cluster instance passing in the locations, and icons.
  • finally we add the cluster to the map using the add_child method

First marker cluster


m = folium.Map(location=[51.5, -0.1], zoom_start=10)

locations = list(zip(accident_data.Latitude, accident_data.Longitude))
icons = [folium.Icon(icon="car", prefix="fa") for _ in range(len(locations))]

cluster = MarkerCluster(locations=locations, icons=icons)
m.add_child(cluster)
m

Choropleth maps


A choropleth map (from Greek χῶρος (“area/region”) + πλῆθος (“multitude”)) is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map, such as population density or per-capita income. (Wikipedia, 2017)

choropleth_map

Wikipedia example

Folium can generate choropleth maps by binding GeoJSON files, which define geographic features, with pandas data frame objects. You can see more examples of choropleth maps in the folium documentation.

Unless I am mistaken, the choropleth map functionality in Folium lacks the ability to provide a separate popup for each region of the map, whilst using a single GeoJSON file. In our model we want the user to be able to click on  a given region of the map and receive a summary of the exposures within that particular region. I also want a more generic solution, one that can generate a basic choropleth style map without the need for third party GeoJSON files.

So rather than use a prebuilt GeoJSON file we will construct our own series of geoJSON objects which will form a grid. We will also build in popups which will give the user a summary statistics relating to the region in question, in this case the number of vehicles involved in accidents, and the number of casualties.

Generating a grid heatmap


GeoJSON is an open standard format designed for representing simple geographical features, along with their non-spatial attributes, based on JavaScript Object Notation. (Wikipedia, 2017)

I have written the following code which generates a grid of GeoJSON objects. The user simply defines the lat,  lon of the upper right, and lower left corners of the grid, and the number of break points in the grid n.

In terms of mechanics, the function simply starts at the bottom right hand corner and for each “rectangle”/ “square” in the grid generates a GeoJSON style python object to represent its geographical features, each object is appended to a list called all_boxes, which is returned to the user.

I have also added docstrings to the function to aid interpretability and reusability of the code.

def get_geojson_grid(upper_right, lower_left, n=6):
    """Returns a grid of geojson rectangles, and computes the exposure in each section of the grid based on the vessel data.

    Parameters
    ----------
    upper_right: array_like
        The upper right hand corner of "grid of grids" (the default is the upper right hand [lat, lon] of the USA).

    lower_left: array_like
        The lower left hand corner of "grid of grids"  (the default is the lower left hand [lat, lon] of the USA).

    n: integer
        The number of rows/columns in the (n,n) grid.

    Returns
    -------

    list
        List of "geojson style" dictionary objects   
    """

    all_boxes = []

    lat_steps = np.linspace(lower_left[0], upper_right[0], n+1)
    lon_steps = np.linspace(lower_left[1], upper_right[1], n+1)

    lat_stride = lat_steps[1] - lat_steps[0]
    lon_stride = lon_steps[1] - lon_steps[0]

    for lat in lat_steps[:-1]:
        for lon in lon_steps[:-1]:
            # Define dimensions of box in grid
            upper_left = [lon, lat + lat_stride]
            upper_right = [lon + lon_stride, lat + lat_stride]
            lower_right = [lon + lon_stride, lat]
            lower_left = [lon, lat]

            # Define json coordinates for polygon
            coordinates = [
                upper_left,
                upper_right,
                lower_right,
                lower_left,
                upper_left
            ]

            geo_json = {"type": "FeatureCollection",
                        "properties":{
                            "lower_left": lower_left,
                            "upper_right": upper_right
                        },
                        "features":[]}

            grid_feature = {
                "type":"Feature",
                "geometry":{
                    "type":"Polygon",
                    "coordinates": [coordinates],
                }
            }

            geo_json["features"].append(grid_feature)

            all_boxes.append(geo_json)

    return all_boxes

We can generate a heat-map effect using matplotlib’s colour maps. The matplotlib colour map maps pixel data to actual colour values. In this case mapping a number between 0 and 1, to a colour between white and red. The colour map functions output is a rgb color, but we can easily convert the rgb representation to a hex representation using the to_hex function in matplotlib.

Matplotlib has a wide variety of color maps to choose from. Typically simple colour maps aid interpretability, but there are circumstances where more complex maps are useful.

In the visualisation below the colour represents the order in which the GeoJSON objects are plotted, white being the earliest, red the latest. Notice that the style function is lambda function.

This is the code required to generate and render the grid, colour the grid sections, and add a popup to for each section.


lower_left = [49.68, -7.669]
upper_right = [59.145, 2.77]
m = folium.Map(zoom_start = 5, location=[55, 0])
grid = get_geojson_grid(upper_right, lower_left , n=6)

for i, geo_json in enumerate(grid):

    color = plt.cm.Reds(i / len(grid))
    color = mpl.colors.to_hex(color)

    gj = folium.GeoJson(geo_json,
                        style_function=lambda feature, color=color: {
                                                                        'fillColor': color,
                                                                        'color':"black",
                                                                        'weight': 2,
                                                                        'dashArray': '5, 5',
                                                                        'fillOpacity': 0.55,
                                                                    })
    popup = folium.Popup("example popup {}".format(i))
    gj.add_child(popup)

    m.add_child(gj)
m

Pulling everything together


So far we have covered:

  • Generating a simple map
  • Adding markers, and popups to the map
  • Generating marker clusters
  • Creating custom GeoJSON objects and rendering them on the map

The code below pulls together all of the previous ideas we have covered, and in addition calculates the total number of casualties and vehicles involved in accidents within each section of the grid.

Hopefully whilst this code looks a little more complex, the individual components should look familiar.

m = folium.Map(zoom_start = 5, location=[55, 0])

# Generate GeoJson grid
top_right = [58, 2]
top_left = [49, -8]

grid = get_geojson_grid(top_right, top_left, n=6)

# Calculate exposures in grid
popups = []
regional_counts = []

for box in grid:
    upper_right = box["properties"]["upper_right"]
    lower_left = box["properties"]["lower_left"]

    mask = (
        (accident_data.Latitude = upper_right[1]) & (accident_data.Latitude = lower_left[1]) &
        (accident_data.Longitude = upper_right[0]) & (accident_data.Longitude = lower_left[0])
           )

    region_incidents = len(accident_data[mask])
    regional_counts.append(region_incidents)

    total_vehicles = accident_data[mask].Number_of_Vehicles.sum()
    total_casualties = accident_data[mask].Number_of_Casualties.sum()
    content = "total vehicles {:,.0f}, total casualties {:,.0f}".format(total_vehicles, total_casualties)
    popup = folium.Popup(content)
    popups.append(popup)

worst_region = max(regional_counts)

# Add GeoJson to map
for i, box in enumerate(grid):
    geo_json = json.dumps(box)

    color = plt.cm.Reds(regional_counts[i] / worst_region)
    color = mpl.colors.to_hex(color)

    gj = folium.GeoJson(geo_json,
                        style_function=lambda feature, color=color: {
                                                                        'fillColor': color,
                                                                        'color':"black",
                                                                        'weight': 2,
                                                                        'dashArray': '5, 5',
                                                                        'fillOpacity': 0.55,
                                                                    })

    gj.add_child(popups[i])
    m.add_child(gj)

# Marker clusters
locations = list(zip(accident_data.Latitude, accident_data.Longitude))
icons = [folium.Icon(icon="car", prefix="fa") for _ in range(len(locations))]

# Create popups
popup_content = []
for incident in accident_data.itertuples():
    number_of_vehicles = "Number of vehicles: {} ".format(incident.Number_of_Vehicles)
    number_of_casualties = "Number of casualties: {}".format(incident.Number_of_Casualties)
    content = number_of_vehicles + number_of_casualties
    popup_content.append(content)

popups = [folium.Popup(content) for content in popup_content]

cluster = MarkerCluster(locations=locations, icons=icons, popups=popups)
m.add_child(cluster)

m.save("car_accidents.html")