fheldimaano

Creating Predictive Models Using SKLearn

We have acquired our data, performed exploratory data analysis, created charts for visualizations, created map visualizations, and feature engineered our data. Now it is time to utilize machine learning to create predictive models. The first order of business is to import the libraries we are going to use. From then we set our X andContinue reading “Creating Predictive Models Using SKLearn”

Feature Engineering and the Creation of Dummy Variables

In Logistic Regression, classification or categorical variables are present in the data. An example of a classification variable are the different boroughs in New York City. When a 311 Service Request is made, it is made in either the Bronx, Brooklyn, Manhattan, Queens, Staten Island or unspecified. This is different from a continuous variable, whichContinue reading “Feature Engineering and the Creation of Dummy Variables”

Using F1 As A Metric For Classification

When using predictive classification models in machine learning, we need a metric to determine if the predictions the model is making are “good.” Here is a confusion matrix I created from one of the machine learning models: This can also be read and interpreted as: The top left number of 116 is a true negativeContinue reading “Using F1 As A Metric For Classification”

EDA – Creating Map Visualizations with Folium

Folium is a powerful Python library that is used to create several types of Leaflet maps. Here are some examples: Using Folium, I created some map visualizations showing the Positive and Negative Resolutions for different features in the 311 Service Requests dataset. Let’s go over the code to show how this is done. so weContinue reading “EDA – Creating Map Visualizations with Folium”

EDA cont. Borough Info

Part of Exploratory Data Analysis (EDA) is creating visualizations. Continuing from blog, I wanted to look at how certain features are affected by borough. I will take a look at top 5 service requests, the top five agencies that handles the most requests, whether these requests are made by phone, online, or mobile, and aContinue reading “EDA cont. Borough Info”

Starting Exploratory Data Analysis

Continuing from my previous blog about Revisiting NYC’s OpenData on 311 Service Requests, where I created a function to retrieve multiple days worth of information from NYC’s OpenData. Using that function I retrieved data for the entire month of August 2019. In this blog I will start Exploratory Data Analysis on this data. First thing IContinue reading “Starting Exploratory Data Analysis”

My First Blog Post

Be yourself; Everyone else is already taken. — Oscar Wilde. This is the first post on my new blog. I’m just getting this new blog going, so stay tuned for more. Subscribe below to get notified when I post new updates.

Revisiting NYC’s OpenData on 311 Service Requests

https://opendata.cityofnewyork.us/ While studying in an immersive Data Science bootcamp program, one of my projects I worked on had to specifically use classification variables. An example of a classification variable is hot dog vs not hot dog. This is different from continuous variables such as predicting the sale price of your car based on make, model,Continue reading “Revisiting NYC’s OpenData on 311 Service Requests”