Super bizarre data sets you might not know exist

    Knowledge is power. That’s been true even before Sir Francis Bacon coined the phrase (in Latin) back in 1597. In today’s age of readily accessible information, data is a commodity used by everyone from scientists researching cancer cures to fantasy football fanatics looking for an edge in their league. The internet is not only constructed by data, it’s filled with unique data sets that are available to anyone with a keyboard and cover topics that range from municipal bike share programs to spam text messages.

    To celebrate data in all its wonderful forms, Stacker put together this list of super bizarre data sets you might not know existed. Obviously “super bizarre” is in the eye of the beholder, but these data sets span the spectrum from pop culture to public health and everything in between. To be included in the list, the data set had to be free and available to researchers and journalists, which eliminates a wide swath of data sets that are only accessible via a subscription or one-time payment.

    Read on to explore the wonderful wide world of incredibly specific data.

  • Wine Quality Data Set

    Wine fans can use this data set to confidently converse about all things Portuguese wine. The data set involves 12 attributes including fixed acidity, pH, and alcohol content distilled from information regarding northern Portuguese red and white Vinho Verde wine samples. The University of Minho in Portugal puts everything but the bouquet underneath your nose with this data set of almost 5,000 variants.

  • Amazon product data

    This massive data set takes 142.8 million Amazon reviews and parses it into searchable details. Everything is broken down into consumable datasets, whether by category or just by product name. The reviews and metadata span nearly 20 years from 1996 to 2014 and were put together by Julian McAuley of the University of California San Diego.

  • Every available Reddit comment

    Reddit can be a bizarre place in and of itself, but what happens when somone aggregates every comment that’s ever been made on the platform? That’s what Reddit user Stuck_In_The_Matrix aimed to find out when creating this data set that tracks over 1.7 billion comments. The comments are categorized by author, comment, score, subreddit, and more using Reddit’s application program interface. With so many comments on such a vast number of topics, anything could be hiding in this data set.

  • Million Song Dataset

    A relatable feeling for many is having that one song on the tip of the tongue, but the name just isn’t coming to mind. To assist in the self-Shazam, the Million Song Dataset is a collaboration of features made up of over a million contemporary popular music tracks. Even though it doesn’t have audio, it does break things down by features of the songs and includes a community of smaller data sets that analyze lyric data and cover songs.

  • Bob Ross Elements by Episode

    Television painting master Bob Ross calmed millions with his easy-going attitude about "The Joy of Painting" on PBS, and this data set from the statistical analysis artists at FiveThirtyEight analyzes the types of paintings Ross taught in each episode. Broken down by elements like trees, mountains, and water, the data set can be used by Bob Ross aficionados to create an accurate picture of the art teacher’s work or by a novice painter looking for inspiration.

  • Speed Dating Experiment

    Speed daters can start looking for love in the all the right places as Professors Ray Fisman and Sheena Iyengar of Columbia Business School have put together this Speed Dating Experiment. Utilizing data collected from 2002 to 2004, they have broken down the information into categories such as dating habits or beliefs people found valuable in a mate.

  • Are dog size and intelligence linked?

    Find out if a 60-pound dog is more or less intelligent than a heavier canine with this data set that pits weight versus the IQ of dogs. The data set sets out to explore the correlation between dog size and intelligence using data derived from the Intelligence of Dogs data set. The project is based on research by Stanley Coren, professor of canine psychology at the University of British Columbia.

  • UFO Reports

    The search for intelligent life in the universe gets a big upgrade with the ufo-reports data set, which tracks over 80,000 sightings from the National UFO Reporting Center over the last century. The data collected includes geo-location and time-standardization for easy comparison between sightings for those studying extraterrestrial contact.

  • Last 20 Games Major League Baseball Standings

    Every fantasy league manager wants to know who’s hot and who’s not. This data set lets the owner view data from the last 20 games by individual categories like batting average or home runs. It also allows baseball fans the opportunity to break down data about the top 10 players or dive into the massive data set to examine the complete raw data about players on every team.

  • List of cats in movies

    Felines in films have been around since 1903, according to this data set, which was compiled on OpenDataSoft, a portal for over 13,000 public datasets. This list can be sorted by director, producer, and year, and can be used to find out which decade was the most feline-friendly in film.

