Chicago Food Inspections

Food Inspections in Chicago

Abstract


Chicago is a notoriously unequal city: Passengers travelling on its red metro line have life expectancies that vary by 30 years depending on their stop, according to The Economist. We want to investigate how geographical socio-economic divisions in Chicago reflect in food inspections in two ways: results to food inspections, and quantities of food inspections. By analysing results to food inspections, we want to find out which areas are performing better and which are performing worse, and compare this to measures of the socio-economic divide. By inspecting the quantity of inspections per area, we hope to gain insight on a potential bias in the choice of inspected establishment and whether the chances of an establishment getting inspected depend only on its performance or also on its location.



Introduction


The city of Chicago

Chicago with its approximately 2.7 million inhabitants is one of the biggest cities in the US. The city is divided into 77 community areas which are displayed on the map to the left. The notorious socio-economic divide between community areas in Chicago will become clearer throughout our data story, and makes this city particularly interesting to analyse. The city center is located on the center-east of the displayed areas, next to Lake Michigan, and is composed of the three areas Near North Side, Near South Side and Loop. You can see the the name of a community area when hovering over it.
These community areas are divided into 9 regions, but we will first focus on each specific area before diving into their relationship with other areas of the same region.


Food inspections in Chicago

With over 7000 restaurants, Chicago is known for its diverse food culture. For example, did you know that there is a Chicago style pizza? Or that Chicago hosts the worlds biggest annual food festival called "The taste of Chicago"?
In order to investigate the vast number of culinary establishments we used the Chicago Food Inspections dataset provided by Kaggle and the city of Chicago. It contains information on the establishment and the results of every food inspection carried out since January 1st 2010. The maps on the right gives a raw overview of some of this data. Select below which data to display.

Explanation

The heatmap shows the number of inspections on the area we are focusing on. We can see that the more we approach the city center of Chicago, the higher the number of inspections. However, it is also curious to see that in the north of Chicago, the number of inspections is higher than in the south. This could be due to a higher amount of establishments in this area; we check this in the other map.


Socio-economics in Chicago

As mentioned before Chicago is also notoriously unequal. Indeed, the life expectancy varies between 69 and 85 years depending on the areas.
In order to capture some of these socio-economic factors we used the Cencus Data and the Public Health Statistics datasets provided by the city of Chicago. They provide information on some important socio-economic factors for each community area in Chicago. The maps on the left give a raw overview on some of the data from those sets. Select below which data to display.

Explanation

Community areas around the center have the highest per capita income, but even there the difference is noticeable. One can check that Near North Side, Lincoln Park and Loop have the largest average per capita incomes of approximately 88'700$, 71'500$ and 65'500$ resp. which still shows a great difference. Indeed, Near North Side has average per capita income of roughly 30% more than Loop even though both are neighbours! Slightly to the south, between Near South Side and its neighbour Armour Square, the difference even larger, the first having an average per capita income of 59'000$ while the second only has around 16'000$.



Our Story


Analysing food inspections

To start, we want to take a look at the relative number of inspections, as well as their results. In the maps to the right we show the number of food inspections per establishment, their average risk, and their average inspection result (0 when it fails, 1 when it passes with conditions and 2 if it passes). It seems that the richer center and north have more inspections per establishment than the poorer west, and perhaps more surprisingly, the facilities in these richer areas seem to have in average higher risk, while also obtaining in average higher results: Why is it that rich areas seem to have facilities which are performing better with inspections but have on average a higher risk than the facilities in the poorer areas?

Explanation

The map on the right displays the average number of inspections of restaurants per area. We can see that establishments in the east, north and south tend to get inspected more often. This is slightly similar to the areas with higher per capita income (see map above for comparison). The most inspections per establishment happen in Burnside (the only red area on the map). However it is important to note, for this and further analysis, that in this area there are only 2 different establishments that appear in our dataset, so the data might be biased.


Facilities in rich areas: More risky, but better performing?

Indeed, our intution by looking at the map shows up in the correlation numbers as well:
The correlation value between every good social indicator and the average number of inspections is positive. The same can be said for good social indicators and the risk of a facility. Furthermore the correlation number between every bad social indicator and the average number of inspections or the risk is negative. If you look at the average inspections result, the sign is opposite. There are also some weak positive and negative correlations (between 0.1 and 0.3) that appear.

Will you get inspected in 2018?

We use ML to, based on a restaurant's previous history and some of its characteristics, predict whether it will get inspected in 2018. Using logistic regressions and Monte Carlo cross-validation with 1000 iterations, we estimate a validation set accuracy of 61% (classes are balanced: We chose the period so that approximately half of the facilities would get inspected at that time, and half wouldn't). However, it is not in the accuracy that the interest of this exercise lays, but in the weights the model assigns to each input variable, both categorical (after 1 hot encoding) and numerical (to which we apply MinMaxScaler) variables. The top 5 variables contributing positively are the number of inspections, the average risk, and whether the facility is a daycare, a special event, or a convenience store. The 5 variables contributing the most in a negative way are whether it is a liquor store, banquet, paleteria, the average result, and if it is a long term care facility. Unsurprisingly, we can also see that better performing facilities (w.r.t. their result) are less likely to be inspected. This shows that some establishment types are slightly more likely to be inspected than others. We already saw in the description of risk that establishments with risk of adversely affecting the public’s health should get inspected more frequently. Therefore it makes sense that for example daycare centers or hospitals (which is in the top 10 positive contributors) would be inspected more frequently.

Comparing Restaurant/Store results with Poverty

As the tendency to inspect some facilities more often than others might distort our results, we decided to focus only on the 2 most common facility types in this section: restaurants and stores. Indeed we can see that there is a correlation between the percentage of failed inspections of restaurants and stores and the percentage of households below the poverty line in areas of Chicago. Looking at the areas with highest percentage of failing restaurants and stores: Oakland, Washington Park and Riverdale, all three of them have over 40% of failed inspections. The percentage of households below the poverty line is also around 40% for the first two areas and over 56% for the last (!). It is curious that neighbour areas sometimes perform much better in both domains. For example Douglas, the northern neighbour of Oakland, has only around 21% failed inspections in restaurants and stores, and 30% households below poverty line. These numbers aren't great, but they are a lot better than its southern neighbour.
Of course the relation put in evidence here is not a dependence, but both features are at first glance indicators of how attractive a community can be. A community where more than a third of inspections fail might not be the most appealing, and the same goes for the percentage of households below the poverty line.

Poverty and per capita income

To further analyse the indicator of households below poverty line, we will now compare it to the average per capita income inside same community areas. As a plot depicting the poverty indices of community areas would be hard to geographically interpret, we now introduce the notion of region as defined by the city of Chicago and explained on this Wikipedia page. There are 9 regions in Chicago, and each one of them contains a subset of community areas. It is easier to visualize the households below poverty line w.r.t. the average per capita income inside those regions, than displaying 77 areas. On the plot below, you can click on the region names in the legend (below the plot) to put in evidence the community areas of the selected region. Hover over a point in the plot or an area on the map to see the name of the community area.
The legend consists of all regions of Chicago, with All displaying all regions and Chicago displaying the average of those indicators on the entire city. Clearly the Central region has the highest per capita incomes and few households below the poverty line (around 15%). The North Side region also seems to do very well in those measures, altough the community areas Avondale and Logan Square have much lower per capita incomes (around half of the other areas in the same regions), they have roughly the same percentage of households below the poverty line. In the southernmost regions, there seems to be a greater percentage of households under the poverty line. Indeed, when analysing southwest side, south side, far southwest side and far southeast side, one can check that community areas have around 20-30% of households below poverty line (although far southwest side is slightly better), the average per capita income rarely exceeds 30k and is usually around 20k, which is not a lot.
Displaying those two poverty metrics for different regions, one can see that the community areas in regions usually seem to share similar results. However there are some exceptions, which continues to show the economic divide in the areas. This can probably be explained by the fact that most high paying jobs are in the city center, as we could see on the previous per-capita income maps, but many wealthy people live in quiet neighbourhoods outside the center, where jobs don't pay as much.




Conclusion


Our goal is to investigate how geographical socio-economic divisions in Chicago reflect on food inspections. In order to do that, we analyse the results and the quantities of food inspections. By crossing this information with socio-economic factors, such as the per-capita incomes and life expectancies of all community areas, we are able to gain useful insight on the subject.
The results of food inspections carried out show how the establishments in the richer areas are inspected more often than in the poorer ones, and present higher average risk while obtaining better results to inspections. This shows that establishments in poorer areas tend to have bad results. However, we find out that facilities with higher risk tend to perform better, as can be seen in the correlation between the two variables. So it could be that high-risk facilities (hospitals, schools,...) are in rich areas usually perform better, or that those establishments tend to be particularly careful.
Finally, we use machine learning to predict whether or not a facility would get inspected in 2018, and determine the features that contribute positively on the result. As expected, the risk contributes positively, and good results negatively, to the probability of a facility getting inspected.
As a final conclusion, areas with better socio-economic indicators have more high-risk facilities (such as schools, hospitals, hotels, nursing homes, etc), and also perform better in inspections, than areas with lower socio-economic status: the socio-economic divide of Chicago also manifests itself in the results to food inspections.