main

Creating an Incident Reporter with Python

This notebook contains the code necessary to create the SFPD Incident and Crime Rate Reporter.

Importing Packages

Setting up the Client and Querying the Database

Via data.sfgov.org, we can access many databases including the SFPD database which records all occuring incidents in the different police districts. For us to query it, we need to instantiate our client:

We can know get the latest 15,000 reports:

wg3w-h783 is the database ID.

Data Processing

District Specific Information

Having loaded the 15000 incidents into our dataframe, we can group the data according to police districts. This will allow us to create a crime rate graph for each district.

Total Crime Rate

Similarly, we can group the main database by date to get a sense of the total crime rate per day.

Getting the Rolling Average

Having our total crime rate, it is now possible to get the rolling mean. In this case we select a five-day window.

Adding two more days for ML inference

In order to predict the future, rows containing future dates must be created:

Similarly, we need to add null values to the rolling average:

Updating the template.html file to create the new index.html

There are various ways to achieve this. There is definitely a better way using JavaScript but for the purposees of this project we will stick with Python:

Taking the Five Latest Reports and Creating a Boostrap Table

We can now insert the html table into the updated index html table:

For asthetics, we provide the thead-dark class to our table head.

Creating the Chloropleth Map

To create the chloropleth map showing the the latest weekly crime rate we need to resample the dataframe.

Having grouped the dataset into a weekly manner, we can extract the previous to last row. The dataset is updated daily but the incidents are reported after these are approved which is not always in a timely manner. this is why the previous to last entrie is taken.

Loading and Processing GeoJson File

The GeoJSON file contains the coordinates needed to plot the different districts in the san francisco area. This file is needed by plotly to create a map plot using various plotters including folium and plotly. In this notebook we use the latter.

Creating Plotly Chloropleth Map

The plotly map is saved as an html file and inserted into our updated index.html file.

Creating Incident Category Distribution Plotly Bar Plot

We can also create a bar plot to visualize the distribution of the latest 15000 incident categories. For this we group the data by category rather than by date or district.

Machine Learning

In this section we create a simple Decision Tree model to predict the crime rate in the next pair of dates. At the begging we extracted only the latest 15,000 incidents. To train our model we need more data. For this we query the database for 500,000 rows which is more than the avaliable datapoints.

First we make remove any rows with missing data:

We want to predict the daily crime rate so we group the response by date. In other words, we resample the dataframe into one day bins.

Feature Engineering

Here, we one-hot encode the police district feature and create a feature for both the day and month:

Given that the latest incidents will be approved later in the future we cannot use the last days as training data.

Training a DT Regressor

Here we use the scikit-learn implementation to train a DT model.

Building an Data Inference Pipeline

We will get the last eight days for inference plus the future two days:

The following function allows us to get the total crime rate for each queried data. Since our model is built to predict crime on each neighborhood, we need to sum all predictions to get the total crime rate.

Gathering and Formatting the Results

Writing out the Predictions to our index.html file