Site Overlay

Machine Learning: K-Nearest Neighbors, Part 1

According to Wikipedia:

A Terry stop in the United States allows the police to briefly detain a person based on reasonable suspicion of involvement in criminal activity. Reasonable suspicion is a lower standard than probable cause which is needed for arrest. When police stop and search a pedestrian, this is commonly known as a stop and frisk.

The City of Seattle, where Maderas Partners is based, publicly released data on each Terry Stop made by Seattle Police Officers since 2015. The data include information about the stop (date, location, reason), about demographic information about the officer and person stopped. The data also include whether the subject was frisked during the stop, whether a weapon was found, and whether the stop resulted in an arrest.

In this exercise, Maderas Partners sought to build a model to predict whether a person would be arrested or frisked during a stop based on the other information provided by the city. To accomplish this goal, we employed a commonly used machine learning technique called K-Nearest Neighbors (KNN).

Before building the model, we wanted to get a better sense of the data to determine whether KNN would be a good fit. First, we looked at the number of stops, arrests, frisks, weapons found broken out by race.

According to 2010 Census Estimates, Whites account for 66.3% of the population and Blacks account for 7.7%. Therefore, it appears that Blacks were disproportionately subjected to Terry stops whereas Asians that make up 13.7% of the population were underrepresented in the number of stops. However, using the number of Terry stops as a baseline, each of the three racial groups were arrested at approximately the same rate (between 25% and 29%), but Whites were frisked less often (25.5% of Asians, 26.3% of Blacks, and 18.7% of Whites).

Next, we ran a pair of OLS linear regressions to determine whether there was any statistical relationship among the primary variables in question. We first examined whether being arrested during a stop was correlated with any of the following variables:

  • Being frisked during the stop
  • Whether the subject was carrying a weapon
  • The officer’s age at the time of the stop
  • The officer’s gender
  • The officer’s race
  • The subject’s gender
  • The subject’s race

Unsurprisingly, we found that undergoing a frisk and possessing a weapon were both correlated with being arrested. Also, male officer tended to make more arrests than female or non-binary officers, while older officers were less likely to make arrests than younger ones. However, the race variables did not provide much insight because there was not much variation among the different races.

Replacing arrested with being frisked as the dependent variable proved to be more interesting. (Also, the R-squared, while not ideal, is much higher.) The age, gender, and weapon variables remained consistent with our findings from above, but the racial variables also demonstrated significance. Black officers stand out as the only group to demonstrate a statistically significant positive correlation with frisking, suggesting that black officers tend to frisk more often than officers of other races. Turning to the subjects of the stops, males are positively correlated with being frisked (but not with being arrested). Most interesting, being black was positively correlated with getting frisked, while being white was negatively correlated. Both were statistically significant at the 5% level. This suggests that in a Terry stop, blacks were more likely to be frisked than all other racial groups, while whites were less likely to be frisked relative to all other racial groups.

Given these preliminary findings, we decided to move forward with the analysis. Part II will outline the process and outcome of our predictive model.

Leave a Reply

Your email address will not be published. Required fields are marked *