Big Apple Bunkdowns

skills learned: sentiment analysis, data preprocessing, model training & evaluation, spatial mapping

This project explores the intersection of Airbnb listings and crime statistics in New York City. Utilizing NYPD crime data and Airbnb reviews, we developed a safety index that assigns a score to each listing. Through advanced data mining techniques, we uncovered patterns and correlations, offering a unique, data-driven resource to help visitors and locals make informed accommodation choices. This project not only enhances the safety and experience of Airbnb users but also contributes to the broader conversation on urban safety in the bustling heart of the Big Apple.

  • In New York City, the advent of AirBnb has ushered in a new era of accommodation choices, offering visitors and locals alike a diverse array of lodging options. Several people prefer to choose Airbnb apartments compared to hotels and motels because of various reasons. Living in an AirBnb allows travelers to get experiences beyond what traditional hotels had to offer. Moreover, Airbnbs often tend to be more economically viable options, especially for larger families. Despite these advantages, it can be difficult for travelers to find the perfect listing to fit their needs. There are more than 40,000 Airbnb rental units available in New York City, making the choice even harder. One significant factor that most people need to consider before choosing which Airbnb to live in is safety. This is especially important when looking for Airbnbs in a new, unknown city, where one may not be aware of how safe different neighborhoods may be. Further, this information isn’t easily or directly available on the Airbnb website, where customers make the reservations for their rental space. We wanted to solve that problem.

  • The target audience includes travelers seeking safe and informed accommodation choices in New York City, as well as locals looking for secure lodging options within their own neighborhoods. Additionally, it is for urban planners, policymakers, and researchers interested in the intersection of crime statistics and urban living spaces.

  • K-Nearest Neighbors (KNN) Search

    Method: Haversine distance metric

    Configuration: 1500 neighbors within a 5-mile radius

    Output: Average weighted offense level per listing, translating to a safety score (1: very safe, 5: very unsafe)

    Visualization:

    Clusters of nearby crime incidents

    Safety score distribution across listings and neighborhoods

    Interactive map with color-coded safety scores and listing URLs

    Predictive Modeling

    Approach: Random Forest Classifier with Recursive Feature Elimination (RFECV)

    Features:

    Numerical: latitude, longitude, price, review scores

    Categorical: neighborhood, property type

    Safety Labels: 1-2 (safe), 2-3.5 (moderately safe), 3.5-5 (unsafe)

    Sentiment Analysis

    Tool: NLTK Sentiment Intensity Analyzer

    Results:

    Overall positive sentiments in reviews

    Neighborhoods scored based on review sentiment

    Frequency analysis of negative words showed limited impact on overall sentiment

  • Python, Pandas, NumPy, Geopy, NLTK, Scikit-Learn, Geopandas, Shapely, Folium, Matplotlib, Seaborn, Jupyter Notebook, Git, LaTeX.

Previous
Previous

Mutuals

Next
Next

Bias in Mortgage Lending