Big Apple Bunkdowns

skills learned: sentiment analysis, data preprocessing, model training & evaluation, spatial mapping

This project explores the intersection of Airbnb listings and crime statistics in New York City. Utilizing NYPD crime data and Airbnb reviews, we developed a safety index that assigns a score to each listing. Through advanced data mining techniques, we uncovered patterns and correlations, offering a unique, data-driven resource to help visitors and locals make informed accommodation choices. This project not only enhances the safety and experience of Airbnb users but also contributes to the broader conversation on urban safety in the bustling heart of the Big Apple.

In New York City, the advent of AirBnb has ushered in a new era of accommodation choices, offering visitors and locals alike a diverse array of lodging options. Several people prefer to choose Airbnb apartments compared to hotels and motels because of various reasons. Living in an AirBnb allows travelers to get experiences beyond what traditional hotels had to offer. Moreover, Airbnbs often tend to be more economically viable options, especially for larger families. Despite these advantages, it can be difficult for travelers to find the perfect listing to fit their needs. There are more than 40,000 Airbnb rental units available in New York City, making the choice even harder. One significant factor that most people need to consider before choosing which Airbnb to live in is safety. This is especially important when looking for Airbnbs in a new, unknown city, where one may not be aware of how safe different neighborhoods may be. Further, this information isn’t easily or directly available on the Airbnb website, where customers make the reservations for their rental space. We wanted to solve that problem.
The target audience includes travelers seeking safe and informed accommodation choices in New York City, as well as locals looking for secure lodging options within their own neighborhoods. Additionally, it is for urban planners, policymakers, and researchers interested in the intersection of crime statistics and urban living spaces.
K-Nearest Neighbors (KNN) Search
Method: Haversine distance metric
Configuration: 1500 neighbors within a 5-mile radius
Output: Average weighted offense level per listing, translating to a safety score (1: very safe, 5: very unsafe)
Visualization:
Clusters of nearby crime incidents
Safety score distribution across listings and neighborhoods
Interactive map with color-coded safety scores and listing URLs
Predictive Modeling
Approach: Random Forest Classifier with Recursive Feature Elimination (RFECV)
Features:
Numerical: latitude, longitude, price, review scores
Categorical: neighborhood, property type
Safety Labels: 1-2 (safe), 2-3.5 (moderately safe), 3.5-5 (unsafe)
Sentiment Analysis
Tool: NLTK Sentiment Intensity Analyzer
Results:
Overall positive sentiments in reviews
Neighborhoods scored based on review sentiment
Frequency analysis of negative words showed limited impact on overall sentiment
Python, Pandas, NumPy, Geopy, NLTK, Scikit-Learn, Geopandas, Shapely, Folium, Matplotlib, Seaborn, Jupyter Notebook, Git, LaTeX.