Continuing from my previous blog about Revisiting NYC’s OpenData on 311 Service Requests, where I created a function to retrieve multiple days worth of information from NYC’s OpenData. Using that function I retrieved data for the entire month of August 2019. In this blog I will start Exploratory Data Analysis on this data.
First thing I do is import libraries I will be using into python.
import pandas as pd
import seaborn as sns
import time
import matplotlib.pyplot as plt
There were a total of 176,015 service requests calls made in August. My data was saved as a csv file and loaded into a pandas dataframe named aug.
aug = pd.read_csv('311_8-01_to_8-31function.csv')
aug.columns returns the features that are in the dataframe. len(aug.columns) will tell me how many features I have in the dataframe.

aug.info() will show how many entries each feature has and the whether it is an object, string (text), a number (float or integer). In the following image, there are 176,015 agency entries and it is classified as an object. This is different than address_type, which out of 176,015 entries, the information for address_type is only provided for 66,346 entries.

aug.describe() will show the mean, standard deviation, minimum, maximum and percent quantile of any features that are floats or integers.
One feature of particular interest to me is created_date. However, in this dataframe, created_date returns a string (text) of ” 2019-08-01T00:00:00 .” To make this more usable, we convert it into a timestamp. We import time earlier and this will allow us to convert this string to a timestamp.
aug['created_date'] = pd.to_datetime(aug['created_date']
aug['day_of_week'] = aug['created_date'].dt.day_name()
aug[‘created_date’] = pd.to_datetime(aug[‘created_date’] converts and replaces each row of created_date to a Timestamp. With that one line, we converted 176,015 rows from a string to a Timestamp.
Let’s break down what aug[‘day_of_week’] = aug[‘created_date’].dt.day_name() does. aug[‘day_of_week’] creates a new feature/column in our dataframe. What determines what will fill each row of this new feature is what we set it to found on the right of the = sign. aug[‘created_date’].dt.day_name() will look at the created_date for each row and convert it to the day of the week.
For example, aug[‘created_date’][2000] returns “Timestamp(‘2019-08-02 13:50:00’)” and aug[‘day_of_week’][2000] will return ‘Friday.’ Documentation for how pandas uses time/date functionality can be found here
Now we can check which day of the week is busiest for complaints made.

This shows that in August of 2019, NYC had the most service requests on Friday with a total of 30,036 requests made. This followed by Thursday with 29,316 and Monday with 24,483.
Next week, we will create a function that will expand on our days_of_week feature and take a look at other features for each specific borough.