The write-up below isn't exactly the same as the one found in the Data Sketches book. For the book we've tightened up our wording, added more explanations, extra images, separated out lessons and more.
Anyone who's ever talked to me for more than a few minutes will know that I am absolutely horrible with pop culture references. And while I like to blame the fact that I grew up in various non-English-speaking countries for half of my childhood years (and spent the other half with my head buried in schoolwork), I know it's also because I just didn't watch that many movies growing up.
So I was pretty excited to do movies for our first month: how many big, blockbuster summer time movies have I watched in my lifetime?
Like Nadieh, I found OMDb API with a simple Google search. Unfortunately, it only let me search the movies by title or IMDb id, and I needed a way to get the top summer movies by year. Fortunately, IMDb has an advanced search feature that is beautifully parameterized, and so it didn't take me long to form a search query.
I got the top US-grossing movies for June 1st to August 30th for each of the years I have been alive, and took the top 5 movie id's for each year. I then fed those id's into the OMDb API, which gave me back a detailed set of information for each movie, including ratings and genre information.
Looking through the data I found that the most popular genres for summer movies were Action & Adventure, with Comedy following third. I also learned that (unsurprisingly) I have only watched 35 of the 140 movies, a rate of 25%. Out of all the movies I watched, the top genres were Adventure, Action, Comedy, and Animation. And out of the 35 movies I have seen, only 12 were in theatres - and all after the years of 2007 (aka the summer before my senior year of high school).
On the bright side! Out of the top 20 best rated movies, I have watched 15: a whopping rate of 75% so I guess I'm doing something right 😁
Code for getting movie data
Bl.ock exploring the movie data
Since I had pulled summer time movies for my dataset, I decided I also wanted someone summer for my visualization.
My first inspiration came when standing along the Berkeley Marina on July 4th, and as the fireworks went off above our heads, I couldn't help but think: what's more summer-y than fireworks? I wanted to create different fireworks for each movie, colored by their genre, sized by their box office sales. I wanted the fireworks to boom and animate according to their release date. I wanted to have their descent last proportionally to the number of their weeks in the theatre.
The second inspiration came when I thought of sunflowers, and how fun it'd be to create summer flowers for each movie.
I was also enamored by the idea of delauney triangulation, introduced to me by Ian Johnson (@enjalot), but couldn't quite make it work:
Though in the beginning I was much more excited about the fireworks, I ultimately decided that the flowers were much prettier and more fun to code. It was also more straightforward; I wasn't as sure about the potential math the fireworks would require (I thought maybe I could do something with the gravity function of the force layout over time), and Nadieh agreed that I could file it away for another month.
The toughest part of coding the flowers was the beginning, where I had to refresh myself on svg paths, and the cubic curve command in particular. At first, I had to draw out the shapes of the petals to work out the commands:
But after a while, I was able to just code the petal shapes directly:
The most inspired shape, I think, is the cherry blossom 🌸 that Nadieh suggested:
After that, the code came naturally. I assigned both the number of petals and the size of the flower itself to the data, in particular the movies' number of imdb votes and rating.
The most fun part was adding the colors (and again, I have Nadieh to thank for her two brilliant tutorials: 1, 2), while the most dreaded was adding the legend (which took a whole afternoon).
I was incredibly happy with the result - I gushed about it to anyone who would listen for a good few days. My favorite find was definitely Batman & Robin from 1997, which had an unfortunate 3.7/10 rating on IMDb, and is the most adorably tiny spec of a flower. Others, like Inception, was marvelously large and complex. But of all the flowers, my favorite is definitely Harry Potter 7.2:
And I also had a little bit of fun plugging a different dataset - that of all the top films from the last decade - into the visualization:
Summer film flowers
All film flowers
I am absolutely horrible with pop culture references. And while I like to blame the fact that I grew up in various non-English-speaking countries for half of my childhood years (and had my head buried in textbooks for the rest of them), I know it's also because I just didn't watch that many movies growing up.
So I was pretty excited to do movies for our first topic: how many blockbusters have I seen (or not seen) in my lifetime? And since it was July, I decided to concentrate on summer blockbusters.