Header Image
Sources
Jump to San Francisco Crime Report Data Inside Airbnb Dataset
How the data was generated
Who funded its creation
What information was left out
Annotated Bibliography

San Francisco Crime Report Data

Image of San Francisco Police Department logo

To begin with, the San Francisco Police Department has compiled a .csv file containing every incident report received in the past five years, from 2018 to 2023, and is displaying the data using visualizations on its website. The file includes information ranging from police districts, type of crime, month, day, and time of the incident, and other police related information regarding the incident. The data comes from information compiled by the police department, which originates directly from calls and indecent files. Given that the police department aims to protect and serve the citizens and local communities, the information is made public for anyone to access. The website's data is also presented via Tableau and allows users to filter through years, districts, and types of crime. The information regarding the more specific details of the indecent reports is left out from if the victims were residents of the district or were from out of town and if the location of the report was in residence versus in a public location. Consequently, it is difficult to make any assertions from this data alone if neighborhoods with higher AirBNB concentrations do, in fact, lead to higher crime rates such as larceny and theft since the reports do not indicate if these occurred on private property, such as short term rental units. It is instructive to mention that regions, including downtown or Tenderloin districts, have higher crime rates, but this may be difficult to make any conclusions out of given that there is a higher population density there compared to its suburban counterparts. While this dataset alone is composed solely of hard data that is filed as incident reports, it is hard to make conclusions purely based on the provided information without the contextual knowledge and information provided by the AirBNB data.

Download the Dataset

Inside Airbnb Dataset

Image of Airbnb logo

The data comes from information compiled from the Airbnb website including the availability calendar for 365 days in the future, and the reviews for each listing. The data originates directly from Airbnb's website and its users. Airbnb takes care of gathering and organizing this information to make sure it's available on their platform, allowing users to explore and book accommodations.

How the data was generated

To begin with, Inside Airbnb took data from Airbnb's website such as the availability calendar and the reviews for each listing. The data was then verified, cleansed, analyzed, and aggregated. In order to convey a clear story they only presented a snapshot of data from listings available at a particular time. In addition, Airbnb's neighborhood names were often inaccurate, so the neighborhood names for each listing were compiled by comparing the listing's geographic coordinates with a city's definition of neighborhoods. In order to estimate how often an Airbnb listing was being rented out and approximate a listing's income, Inside Airbnb developed their own occupancy model called the “San Francisco Model”. This model used a Review Rate of 50% to convert reviews to estimated bookings. Then an average length of stay, where available, was configured for each city, and this, multiplied by the estimated bookings for each listing over a period, gave the occupancy rate. Lastly, the metrics and filters used to determine whether the number of nights booked or available per year were classified as “high availability” and/or “frequently rented” were based on the city's short term rental laws designed to protect residential housing.

Who or what organization funded the creation of the dataset?

Inside Airbnb is a mission driven project that provides data and advocacy about Airbnb's impact on residential communities. They work towards a vision where communities are empowered with data and information to understand, decide and control the role of renting residential homes to tourists. Inside Airbnb was founded by Murray Cox, an artist, activist and technologist who conceived the project, compiled and analyzed the data and built the site. John Morris, designer and artist, designed and directed the user experience.

What information is left out of the spreadsheet?

The information regarding the people who stay at these lodgings is left out, which in turns also does not reveal why locations need Airbnb's (such as the type of tourism, long term housing, retreats, etc.). Another thing that is left out is regions of the world. While this webpage aimed to highlight cities within the globe, there inherently are portions of the world that are left out or more rural regions have less of a representation on Airbnb and also on this spreadsheet. Finally, the average reviews from the past 12 months do not seem to be included, though the number of reviews per month are included.

You should also give your account of the ideological effects of the way in which your sources have been divided into data (your dataset's ontology). If your dataset were your only source, what information would be left out?

There is extensive data included in multiple dataset to shed light on how Airbnb and its usage has integrated itself into places all over the world. In addition to numerical data describing the rental property, prices, stay time, accommodations, etc., they also include information descriptions of the owners' own “about me,” reviews left by guests, and information about the particular neighborhood's attractions and appeals to their customers. In terms of the dataset's ontology, the information is focused on generating an image of the communities through the provided data. The data attempts to paint a picture of how Airbnb's are used in different areas of different cities across the world for both hosts and guests.

Nevertheless, it is important to note that beyond the hard numerical data values given to us, the information we get about the host and the guests are completely written by themselves. So while we may have the gender, marriage status, and occupation of one host, we may not have the same information about another. Furthermore, the reviews are also completely written by the guests according to what they value and look for in a rental place. It is also important to note there may be cases where the host or guest will completely omit this in their interaction. In addition, Inside Airbnb seems to have no data on the guests themselves and as a result their data is mainly focused on the host, the property, and the guest's reviews of the rental. Consequently, while the data is able to highlight communities and Airbnb use in these particular areas, we also have limited information on the host and guests which may provide more information about what they are looking for or purposes of choosing an Airbnb rental.

Download the Dataset

Annotated Bibliography

Go see our Bibliography