Exploring Netflix Data with ChatGPT

Testing the Analytic Capabilities of ChatGPT with Netflix Data - 5/19/24

by Jake Lender

I recently wanted to test the analytic capabilities of ChatGPT. After watching some YouTube videos that highlighted the advancements in ChatGPT's data analysis abilities, I decided to give it a try myself. I uploaded a dataset directly to ChatGPT and asked it to run exploratory data analysis (EDA) on Netflix data.

Within a minute, ChatGPT delivered a comprehensive analysis with detailed graphs. It was impressive to see how quickly it could generate insightful visualizations. I was able to ask for specific visualizations, and ChatGPT provided them accurately. For instance, it created a line graph showing the counts of titles by date added, identified the top-billed cast in TV shows and movies, displayed the distribution of ratings in a pie chart, and provided a table of the top 10 most recent titles by date added.

Here are some of the key findings from the analysis:

  1. Distribution of Show Types: There are significantly more movies (6,131) than TV shows (2,676) on Netflix.

  2. Trends Over Time: The number of shows added to Netflix has increased substantially over the years, peaking in 2019.

  3. Top Countries by Number of Shows: The United States leads with 2,818 shows, followed by India (972) and the United Kingdom (419).

  4. Common Ratings: The most common rating is TV-MA, followed by TV-14 and TV-PG.

  5. Popular Genres: "International Movies" is the most popular genre, followed by "Dramas" and "Comedies".

I even did the analysis myself and was getting a lot of the same results that ChatGPT was generating. However, the difference in some visualizations made me question the validity of the data being presented. This challenge to my calculations only makes me a better data analyst, as it encourages me to explore why the differences are occurring.

In addition to EDA, I asked ChatGPT to run a regression forecast to predict the future number of movie uploads for Netflix. It provided a graph predicting an increase in the number of movies added each year through 2024.

While ChatGPT does acknowledge that it can make mistakes, the speed and reusability using prompts to generate up-to-date feedback on various datasets is very exciting. What would take hours of work can now provide a starting point to explore our data within minutes. The potential for this tool to enhance data analysis and efficiency is enormous.

In conclusion, my first experience testing ChatGPT's analytic capabilities has been overwhelmingly positive. The ability to quickly generate accurate and insightful data visualizations makes it a valuable tool for any data analyst. I am going to play around with ChatGPT on some personal projects to understand the reliability and capability. As AI continues to evolve, it will be interesting to see how it affects my daily work and data jobs.

Previous
Previous

The Opportunity to Improve my Analysis

Next
Next

The Evolution of Fitness: How Analytics is Transforming the Industry