The write-up below isn't exactly the same as the one found in the Data Sketches book. For the book we've tightened up our wording, added more explanations, extra images, separated out lessons and more.
Alright I'll be honest, I conned Nadieh into doing Nostalgia for this month just so I could finally have a month for Harry Potter. Harry Potter (and Toy Story) will always have a special place in my heart: when the first movie came out, I was also 11 waiting for my Hogwarts letter. I'm the same age as Emma and Daniel, and I grew up alongside them and their movies. When the final movie came out in 2011, and the screen went to black and the theatre got up in a standing ovation, I had the distinct feeling that my childhood had ended.
Having said all that (sentimental stuff), I went through a few different ideas before ending up on my current; at first, I was thinking of doing something on Harry Potter and the Cursed Child, since I had just seen it in December, but then realized that there were probably copyright issues to the script. I then bought a Marauder's Map hoping to visualize that somehow, but got stuck trying to figure out what kind of data I could extract from it. And then one day I suddenly remembered my original November Harry Potter idea to scrape fan fiction and graph fan reactions to the movies and books.
I first got into fanfiction in middle school, when my friend showed me the first six episodes of Fruits Basket, but then couldn't get our hands on the rest of the episodes and needed a fix and found fanfiction.net instead. I then went on to Cardcaptor Sakura (SxS 4EVA), and finally to Harry Potter in my high school years. (The year I realized that I couldn't get into any of the new fanfiction being written because the grammar and plot were just too horrible, I realized I had "grown up" and wished dearly for my childhood back.)
For this data, I decided to scrape harrypotterfanfiction.com instead, because it had less stories (~80k vs. fanfiction.net's ~730k) and because it had better metadata (it has not only genres and characters, but also era and pairing). I used html2json to turn the page response into JSON, and wrote a (really brittle) parser to get each story and its metadata. I was worried that the hpff servers might time me out requesting ~3300 pages from them, but it seemed that my (brittle, synchronous) code took just enough seconds between each page that I had no such problems.
A quick first look at the data already proved interesting:
These are numbers by story publish year, which means that Harry potter fanfiction spiked in 2007 (when the final book and 5th movie came out) but have been dwindling to seriously sad numbers since. Here's another one:
Draco/Hermione is the top non-canon ship, which doesn't surprise me at all (I have this theory that the whole Draco/Hermione thing really took off after the third movie when Hermione punched Draco in the face. Great scene.)
I started this month's sketches as I try to for most months since the disastrous lesson of September, and wrote down all the data I had gathered. From there, I identified genres, eras, and pairings as the potentially most interesting, and jotted down my questions for the data.
The question that really stuck was: which pairings were the most popular, and which pairings did they co-occur with? Who did a character get paired with the most? So the first idea I had was to do a timeline of stories with the specific character, and then overlap whoever the character was paired with on top of the timeline; that's what this awkward part of the previous sketch was:
It was around then that The Pudding came out with their Shape of Slavery piece, and I really loved how they used both the size and color of circles to encode two dimensions (population, and percent of population that were slaves) over a map:
And I was inspired to try something similar:
So the circles would encode number of stories up to 100, and the color would indicate the number of reviews for those 100 stories. I wanted to put the circles over a timeline, and also in matrix form:
The timeline looked quite good, but the matrix (which was basically characters on both axis, and the circles indicated stories that had that pairing) didn't as much. So I decided to keep the timeline, but was back to the drawing board in terms of what other visuals I wanted to use to explore the data.
The first was that RJ gave me my angle for the month: how do canon and non-canon ships compare in popularity? So I re-thought all of my visuals to be around that (that's RJ's scribbles at the bottom left, hehe):
And this is the part of the sketch I decided to use:
And the second was that Catherine drew me these gorgeous illustrations of the top 20 Harry Potter characters:
(And she was so amazingly fast and I'm so in love them they're so pretty and she did them in only a few hours...!!)
I think this month's hard part was mainly in the ideation, and after that, the coding was pretty straight going. It only took me as long as it did (2 weeks after finally settling on a good layout) because I cared too much. And what I mean by that is because I'm so invested in Harry Potter, I really wanted to make a good visualization that would do it justice.
The first thing I did was add book and film dates to my timeline (the above is that of Harry/Ginny stories), as well as gifs to give context (though let's be real, it's mostly just because I saw Nadieh add gifs and I wanted to procrastinate by looking for Harry Potter gifs). The book and film dates did give great context though, especially the very interesting spikes between the 6th and 7th books. My first guess for those spikes were that there were a lot of anticipation for the last book, but upon closer inspection, there's always a spike in the number of stories published around December and January because of the holidays. (I couldn't figure out the March 2006 spike though...)
I then started looking into how I wanted to depict several pairings for a character:
The above is a set of area charts showing Hermione and her pairing's stories by publish date. (Believe it or not, that's the first time I've ever used d3.area, though I've used d3.line many times, and I quite liked the look.) There are 6 charts overlapping each other in total, which, while pretty, made it hard to distinguish any trends. I then tried stacking the area charts, and found it to be slightly easier to read:
But it was still hard to make out the top most areas, the pairings with the least stories that blended into each other for most of the months. I played around with a few of the curve options (how to interpolate between the points in an area/line chart), and ultimately landed on d3.curveStep:
Though it has its downside (when the lines are too close together, it's hard to tell where one ends the next starts), I still liked it the most for how clean it looked.
Other then the stacked area chart timeline, I also had to figure out how to show the genres for each pairing. The genres were important to me, because I wanted to see if specific pairings leaned heavier towards specific genres, and it turns out that they do; for example, Hermione/Ron stories tend to lean towards Drama, Fluff, and Humor, while top Hermione/Draco genres are Drama and Angst, with two or three times more in the Horror/Dark genre than all of Hermione's other pairings. James/Lily's top genre is actually Humor, closely followed by Drama, while Harry/Ginny's top genres are Drama and Action/Adventure.
To show all this, I at first considered showing the cumulative number of stories for each genre, colored and placed according to how dark/serious or light/fluffy it was:
But I was also interested in seeing if specific genres rose or fell because of occurrences in the books or films, and in the end decided to make it a timeline also (though I do wonder if my darker/lighter spectrum idea would have been more interesting):
And I lined them up with my original timeline for the pairing colored by reviews:
Finally, I put in the graph of characters linked by their pairings, which was also the way for users to select which character's stories to explore (it defaulted to Hermione who had the most Romance stories written for her):
Some legends, intros, and footers later, my Harry Potter visualization was finally ready:
I really like the look of the whole piece and how it turned out (no doubt made much better by Catherine's character illustrations), but I feel mixed about the fact every visual was a timeline. On one hand, I like that they were different takes on the same timeline, but I also wonder if I should have changed up the visuals to break up the monotony. But if there's one thing I'm for sure happy about, it's that I managed to do this whole piece with gradients of only two colors: pink and purple. I've been noticing that I'm quite over-reliant on color in my visuals, and had been wondering how I could remedy that, and I really like how this piece turned out in that context.
Alright, I'll be honest; I conned Nadieh into doing “Nostalgia” just so I could finally do a project exploring the Harry Potter universe. The Harry Potter book series (and Toy Story) will always have a special place in my heart: when the first movie came out, I was eleven years old and eagerly waiting for my Hogwarts letter. I'm the same age as actors Emma Watson and Daniel Radcliffe, who played Hermione Granger and Harry Potter, respectively. When the end credits rolled for the final movie in 2011, I had the distinct feeling that my childhood had ended.
Having said all that (sentimental stuff), I still went through a few different potential angles before settling on my final idea. At first, I wanted to do something on Harry Potter and the Cursed Child, but realized there were probably copyright issues with the script. I then bought a Marauder's Map hoping to visualize that somehow, but got stuck trying to figure out what kind of data I could extract from it.
Then one day, I suddenly remembered my original “Books” idea: to use Harry Potter fanfiction as a proxy for fan reactions to the movies and books.