June

Fun

Nadieh
amsterdam
Cardcaptor Sakura | Fifty chapters of adorable cuteness

week 1 | data

Yes, I know it's been 6 months since I finished up the April month, and yes, I know I'm supposed to do May before June. However, I'm still not quite sure about my angle for May and I did have an idea in mind for June πŸ˜…

I am starting to understand that the topics that I can get really enthusiastic about are a bit of a niche. That not many other people truly know it. I can only hope that for the people who are fans, as I am, this month will be a joy to explore. The topic that I've chosen for our month of fun is "Cardcaptor Sakura"!

Cardcaptor Sakura

A magic-girl manga (i.e. Japanese comic) from about 20 years ago. It was the first manga I owned, when manga was practically unknown in the Netherlands (still is actually πŸ˜• ). I even had to travel all the way to the biggest city of our province to buy a new volume. I'm still dreadfully jealous of how perfect and cute practically each panel of the manga looks 😍 I almost chose this subject for our January Nostalgia month, but eventually went with Dragonball Z then. Nevertheless, after 20 years a new "arc" of Cardcaptor Sakura has started again recently. Therefore, while thinking of what to do for our fun month, I just couldn't shake the feeling of wanting to do something with Cardcaptor Sakura (CCS).

One of my favorite things about CCS is how beautiful the writers, CLAMP, make each page. Especially the covers of each chapter, which are tiny works of art (the image above is the cover for chapter 23). I therefore wanted to investigate the covers through data somehow. And I've never before done any kind of analysis based on image data. I therefore thought that creating a visualization that would abstract the colors of each cover into 3 - 8 colors would be fun and new for me.

There are 50 chapters in the original CCS manga, divided into two "arcs". I went through all 12 CCS volumes to see what image was on the cover of each chapter (btw, you can also read CCS online, for example here). In my volumes the covers are printed in black and white. However, all of those chapter covers have since been published in full-color in several CCS art books. I therefore searched for and downloaded the corresponding color image from the CCS Wiki page.

All 12 manga volumes of Card Captor Sakura

Using the imager package in R, I loaded the images into R where each pixel was transformed into a multidimensional array of RGBA values. I converted that complex array into a more simple data frame of (number of pixels) * 3 (for r, g, and b value) size. To figure out which algorithm would cluster the pixel values into decent colors groups I tried several things. First I experimented with using different clustering techniques: from the standard K-means, to hierarchical clustering and even tSNE. But I also converted the RGB values of each pixel into other color spaces (where colors have different "distances" to each other and can thus result into different clustering results), using, amongst others, the colorspace package.

I often converted the results of each test into a bar chart such as below to see the color groups found. Eventually, I found that using Kmeans together with the colors converted to "Lab" visually gave the best fitting results.

Color distribution of the first CCS chapter using Kmeans

However, one of the tricky things with K-means is to figure out how many clusters should be used to create groups. I first tried a combination with hierarchical clustering, but eventually I decided to use something that was probably a better judge (but more time consuming), my own eyes! For each chapter I created a graph as below, that shows me the color distribution for 3 color clusters, up to 11 (I didn't want too many). I then compared the actual cover to these groups and chose the best fitting one; a balance between capturing all the colors and having a good blend of distinct colors. I saved the hex colors and %s (the height of the bars) of the best clustering into a json.

Color distributions of several Kmeans results with different number of clusters

To complement this data about the chapter covers, I also wanted to gather information about which characters appeared in each chapter and which "card" was captured in which chapter (CCS is about Sakura collecting so-called magical Clow cards). The CCS Wiki page on each chapter seemed like just the resource I needed. But sadly, only the first 8 chapters contain information. Well, nothing else to do but to read all chapters again myself while slowly filling an Excel file with the info I needed 😜

Due to the "layered" visual result of this month, I eventually sliced and diced all of this information (who is in each chapter/cover image, totals per character/chapter, relations between characters, color distributions, and more) into about 7 separate small files throughout the creation process. I prefer to prepare all data beforehand in R. I find that the easiest and as a bonus, it doesn't clutter my JavaScript code.

week 2 | sketch

Figuring out the design for this month came slowly. It was more a domino effect. A more concrete idea for one aspect led to a vague idea for the next part of the data which I then explored. I started with how to visualize the colors. Having a cluster of small colored circles per chapter seemed like a logical/interesting step. Placing the color clusters in a radial layout was also an obvious choice after that. Although at first I wanted to do a semi circle, with the color clusters to the right and character info the the left. However, with 50 chapters, I really needed as much space as I could get. So that's why I went with the layered approach of a small circle for all characters and around that another circle with all chapter color clusters. These two circles would then be connected by lines to show which characters appeared in which chapters.

Sketch of the overall design for the CCS visualization

I've always been fascinated by the CMYK dot printing process; where you can see the separate dots when you're looking at it up close, but move farther back and the bigger picture comes into view (I guessing I'm not the only one who could sometimes be found with her nose literally touching an old magazine, or old TV (for the RGB stripes), right....?). Recreating this CMYK dot technique for a visualization about a (printed) manga seemed like a proper style, and challenge. And challenge it was! I won't go into the details here (you can read a bit more in the code section), but below on the right page you can see some scribbles I made to understand how to recreate the CMYK effect (it has to do with rotations...).

Sketch of trigonometric functions to figure out CMYK dots

Another mathematical challenge this month was for something that I didn't use in the end... At first I wanted to connect the inner circle of characters with the outer ring of chapters with swirling lines (to say it technically, two connected SVG Cubic BΓ©zier Curves). Making sure that these lines would always flow around the inner circle and look good, took a lot of time and notes in my sketchbook! I always prefer to draw the approximate SVG path shape I have in mind to then try and figure out where new points and anchor points should be placed. The really hard part is to then understand how these points and anchor points change when the data changes; how to create a "formula" that works for all instances.

Sketch of the swirling lines between characters and chapters

The page below shows specifically how to handle the calculation of a tangent line to a circle for different circumstances. I needed this information for those swirly lines from above. But even though I ended up with different lines, I could thankfully use part of the things I'd figured out on these two pages to easily convert the lines to what became the final result (the more circular running lines).

Sketch of figuring out tangent lines

week 3 & 4 | code

I first focused on getting the ring of cover color circles on my screen. Mostly because I wanted to see how that CMYK idea would look as soon as possible. And thanks to this excellent example of multiply points of gravity by Shirley, that was actually a piece of cake! But damn, those circles had to become quite small to make room for all 50 chapters. I was starting to get my doubts if the CMYK effect would work as well in this particular design as I'd hoped...

All the main colors of the 50 chapters clustered

Nevertheless, I first went to one of Veltman's amazing blocks in which he already neatly coded up a CMYK dot effect as SVG patterns (btw, I've started collecting my favorite d3.js blocks in a Pinterest board, so I have a visual 'bookmark' for each, which makes for easier retrieval). Rewrote that to create a separate pattern for each color and I had myself a ring full of CMYK based colored circles. But on closer inspection I found something I didn't quite like. Although the circles on the inside looked exactly how I wanted them, they were still SVGs. So they had been perfectly clipped into a circle. Truly like a pattern that you cut off. But I wanted my CMYK dots to smoothly fade out, not abruptly. But I also wanted to play with the idea of partially overlapping the circles, and having the colors mix even further, which wasn't possible with this technique.

A selection of CMYK circles

I therefore did a wide search online looking for other examples. I already expected that using HTML5 Canvas was probably the way to go. And I did find two interesting options here and here that took me a good 3 - 4 hours to wrap my head around and combine into one. It took a lot of testing an tweaking...

Testing my code to create canvas based CMYK effect

But eventually I got the visual options and look I was going for. First, smooth edges. By which I mean that the dots get smaller around the sides, but all the CMYK dots are still full dots. But also that I could plot the circles on top of each other with the CMYK effects of both circles visible.

Final canvas CMYK circles than have smooth edges and can overlap

And then I applied it to all the circles from the 50 chapters and..... as expected the circles were so small that there wasn't "enough CMYK" going on. It was sometimes a bit hard to actually get a feeling for the true color of circle because it contained only a few CMYK dots in itself πŸ˜–

Well, that's how (dataviz) design sometimes works, hours of work on something that never makes it to the end. So I converted a simple version into a block to perhaps use for another time and took a closer look at the original SVG version again. What to do about those crisp outer edges? ... Hmmm ... What about adding a thick stroke? And yup, that fixed it enough for me, haha πŸ˜…

I thought the lines between the inner and outer circles would probably be the next most difficult thing to tackle, but to properly do that I first needed my inner circle. Whipping up the thin donut chart from my sketch was straightforward with d3's arc and pie layout. Still, I felt I first wanted to see if I could get the connections/relations between the characters "visually working" on the inside before I moved to the outer lines. Because if it didn't work I might have to think of a different general layout.

Time to write some custom SVG paths again! Below you can see the progress from the simplest approach, straight lines in the top left, to the final version (shape wise), in the bottom right. The final version is made up of circles, using the SVG arc command, re-using code that I had written for the small arcs in my November piece about Fantasy books.

Figuring out the SVG paths of the inner relationship lines

I then colored the lines according to the type of connection (e.g. family, love) which made me see that there weren't too many lines in there to get insights from, no visual overload, pfew. Alright, then it was really time to dive into those outer lines...

To most extreme lines that I could create would run from a character to a chapter that's on the other side of the circle. The line would then have to swirl around the inner circle, without touching any of the other character's names. I thought I could probably pull that off by combining 2 Cubic Bezier curves. But making 1 of those curves act as you want, depending on the data, can be a hassle. And I found out that 2 was more than twice the hassle o_O

With difficult SVG paths I always start out with placing small circles along the line path itself (the red one in the center below) + the anchor points (the blue, green and yellow-orange one. The pink one I placed for another reason that's too technical for me to explain here, hehe)

Placing the SVG path anchor points to understand the line movement

After some manual tweaking with fixed numbers I had a shape for the longer line that I liked. I saved those settings and did another one, a short line. I then inspected how all of the settings changed between the two options. This gives me hints on how to infer several formulas that will hopefully create nice looking lines, no matter the start and end points. But, like I said, that wasn't as easy this time as I'd hoped...

Understanding how the SVG path anchor points move for a shorter line

Ugh, I don't even want to really think back on what journey eventually led me to have the lines I needed. Most of the notes in my sketches section are about this part. Because slightly different things need to happen when the line moves counterclockwise instead of clockwise. And if you mess it all up completely, well...

Lines going all over the place

Over many, many hours of testing, drawing, thinking and fiddling did I inch closer to having all the lines at least sort of going around the center. Although here the finer details of the lines were still quite.... odd...

Weird looking, but generally correct lines

I didn't make a note of how long this particular section of "creating the lines" took, but my guess is somewhere between 8 - 10 hours. After which I was left with the following when I visualized all of the lines (they represent the chapters that each character appears in):

All the lines, correctly and at once

Awesome, that looked like one big mess! There were too many lines in there to glean any insight. That would mean I'd have to create some sort of hover interaction that only shows you a subset of the lines when you hover over a character or a chapter.

Ugh, enough time spend on those lines for now. I therefore moved on to adding the chapter and volume numbers. The inner donut chart inspired me to try something similar for the chapters as well. A donut chart with 50 equally sized, rounded-off sections, in which I could place a number. I was happy with the end result and could quickly move on.

For the volumes (typically a collection of Β±4 chapters) I started out with the same idea; a donut chart, but made even thinner, which I placed outside the ring of circles. Hmmmm, wasn't as happy with that, but in the meantime something else was bothering even more. Now that I'd placed more elements on the page and all of it seemed to get a sense of "consistency", those inner lines just felt way off πŸ˜•

Perhaps they should have more "body", by making them tapered, as I did in my January visualization about Dragonball Z? That again took more time tweaking my line formulas... Although I think it did improve things over the same-width lines I hade before:

Adding the chapter and volume indicators

And in of itself it had something nice going on when I implemented the hover interactions. Such as seeing who was in a particular chapter...

Tapered central lines for all characters that appear in the hovered chapter

...or when hovering over a character to see the chapters they appeared in...

Tapered central lines for all chapters that a characters appears in

... However, I just felt that it didn't fit the rest of the visual in terms of design. Ugh! What to do about it!? And all those hours of work! 😭

And suddenly, very quickly after knowing that my swirly lines just weren't right, I had a new idea for the lines. I don't even remember what inspired me, it seemed to come out of nowhere (although that's never truly the case). Instead of making them swirl around, I could also make them run along circular paths. A bit like subway lines or piping in a home, but then transformed to some radial layout

This idea was actually quite easy to execute. I could loop over each line to be drawn, create a tiny array of [radius,angle] points and feed that to d3's radialLine function. Together with setting an interpolation function to curve the edges just a bit. Calculating the small array of radii and angles to feed to the d3.radialLine was a walk in the park compared to my previous cubic bezier curve shenanigans! (but actually enjoying to solve geometry puzzles does help). Naturally, all that didn't go right on the first try. And thankfully I could use some of the work I'd done with the swirly lines. The screenshots below were all made within 1 hour, not bad in terms of progress πŸ˜‰

Different steps in the process of creating the 2nd iteration of lines

O, and then I converted all those lines to HTML5 canvas by using the extremely useful .context option that is available in many of d3's drawing functions (such as d3.radialLine). That made things run more smoothly on the hovers!

Final look of the eventual lines between the two circles

With that change of lines, I felt that, when hovering over a character or chapter, the resulting lines fitted perfectly with the "straight-roundedness" of the rest (I hope that made some sense, hehe). And as an added benefit, no more lines were overlapping!

The observant person will notice that I've actually implemented two slightly different line drawing "types". One is drawn by default when you're not hovering, but also when you hover over a character or color circle group. But another is used when you hover over a chapter. Try it for yourself and see what changes πŸ˜‰

The possible interactions in the final visualization

Since part of the visualization was about the covers of the chapters. And because I knew most people that would land on the page would probably not know about CCS, I wanted to incorporate some of its imagery. And I just so happened to have a nice large circular area in the center :) It took a while to manually "cut-out" a good looking square image from all 50 chapter covers. But at that time I was in an airport and on a plane anyway (coming back from a great night at the Information is Beautiful Awards where data sketches won GOLD!! Woohoo!).

Hovering over Sakura reveals an image of her

With all these elements and layers of information in the visualization, I really needed a legend. After having written lines and lines of code to create custom legends in two recent client project (such as this one for Article 19), I wasn't in the mood to do that again. Therefore, I created my legends in Illustrator instead. That saved a lot of time over creating them through code.

Legends explaining how to read the visualization

I initially placed these below the visualization. However (as I was getting used to in this month) these were not the final legends...

But let's not get ahead of myself. Now that the chart itself was nearly finished, I focused on general page layout and annotations. For the layout I had a lot of trouble coming up with something that looked even remotely interesting. To be honest, I'm still not that happy with the final result, but I just can't really design a webpage in itself, just data visualizations, hehe πŸ˜…

But then having to make that layout work on both mobile and desktop.... 😫 Not something I enjoy or want to recount here. Just know it took effort and time.

Early iterations of the general layout and annotations

While reading through all the chapters I took some notes of interesting story points that I wanted to highlight. Using Susie Lu's excellent d3-annotation library adding these around the circle was quite straightfoward. Especially with the super handy editmode, that let me drag the annotations around, see where I wanted them and then add those locations hardcoded back into my code.

But after I had placed all the annotations (see the right visual in the image above) I wasn't happy enough with the result. Typically I love the lines that run from the point that you want to annotate all the way under the lines with text. But here that was getting too much visual weight.

I therefore wanted short lines that would radiate outward from the main circle, and place the annotation around those, typically centered. And although the annotation library gives you a lot of freedom, that particular design isn't in there. So instead, I created my own lines, and then used the editmode (see the small dotted circles below, those you can drag around) to position the annotations exactly where I wanted them 😁

Using the editmode to place annotations

After I felt the visualization was ready enough, I shared it with some friends to ask for feedback and got great suggestions to improve on the interactivity understanding. But one also gave me a great example of a better legend, one that would show the visualization with its rings and explain what each ring truly meant. That was much better than those 3 separate ones that I had before. So I drew a new legend in Illustrator.

And after all that time and effort I finally had a visualization to share with everybody :) which can be found here Cardcaptor Sakura - Fifty chapters of adorable cuteness

Cardcaptor Sakura - Fifty chapters of adorable cuteness

This month took me 86 hours to create. However, a lot of that time went into things that were not used in the final result. Such as Β±5 hours on a CMYK canvas based dot effect, or Β±15 hours on swirly lines and also Β±6 hours on a page layout that isn't even really visible on the final page (except when you press "read more"), or Β±2 hours on stupid Chrome bug about horizontal scrolling (and how to come up with a "fix")

Still, I'm quite happy with how the visualization turned out :) I feel I haven't quite seen a similar radial visualization like it before. And data sketches is all about experimenting with new ideas, so it's always a joy when one turns out good ^_^ I hope you enjoy interacting with the visualization, even if you've never heard of Cardcaptor Sakura before (it's great! Trust me ;) )

Shirley
san francisco

Ok buckle up, kids (sorry I know you're not kids, I just feel like that phrase sounds better with "kids"), because it's another wild one. And probably a long one. *Heaves heavy sigh*. I'm sorry.

I guess before we get into all the technical nitty-gritty, I should talk about feelings first. I don't like talking about feelings here because I like to keep these write-ups purely about process, about the data and the sketching and the coding. But oftentimes in my projects, feelings play a big part. And for this particular one, the feelings were HUGE.

In particular, huge feelings of *INSECURITY*.

You see, I never expected Hamilton to do as well as it did. And I'm super grateful that it did so well, but for a few projects after that I was paralyzed with the fear that none of them would live up to it. And indeed, none of them has. But after a few months, I learned to be ok with that. I learned to internalize that external validation is never a good motivator, that it just will lead me into dangerous pits of despair when projects I pour my soul into don't get an equivalent amount of likes and retweets (and it's really a super entitled way of thinking: "I put a lot of work into this, so you all MUST like it!!!"). So instead, I tried to do projects that I would enjoy, to create dataviz that I'd personally want to read and consume - because then I'd have at least one consumer of the visualization. And if other people liked it, that's great. But if not, at least I liked it.

This mantra worked for most of my projects. But for some reason, for this month of June - this month of "fun", this last month of Data Sketches - I just couldn't internalize it.

I cared too much.

I decided on the dataset back in summer 2017. The new Taylor Swift single ("Look What You Made Me Do") had just come out, and it hit me: what would be more fun than doing something with the entire collection of Taylor Swift songs? Taylor Swift songs, like Hamilton, is one of my great obsessions. I feel like she's heralded me through my young adulthood, and her songs are attached to some of my greatest memories. Whenever she released a new album, I'd have it on repeat for months.

So, I cared too much.

I kept feeling like it needed to live up to the quality, the attention to detail, the obsession, the analysis and the story I had for Hamilton. I kept feeling like I needed to create a second Hamilton.

The problem was, Hamilton the Musical was a self-contained work rich with nuance and literary gold. It was purposefully full of interesting patterns and insights; it was a masterpiece. Taylor Swift's six albums (soon to be seven), though rich with easter eggs, are almost like a series of diary entries - chronicles of her dreams, her romances, her breakups, her feuds, her opinions, her views.

So of course there weren't Hamilton-level analysis to be done. It wasn't going to be another Hamilton. And it's taken me two years to be ok with that.

Thank you for reading this far about my feelings. I'm still working on my insecurities. I don't think they'll ever go away, and maybe it's better that they don't; they're one of my primary motivations for striving to be better. And in the grand scheme of things, this little project amongst the billions on the internet, is just another project. A blip in peoples' memories, if it ever makes it in there. That eases my insecurities by a lot.

Ok, now back to your regularly scheduled programming of data, sketch, and code.

year 1 | first version

When I first decided on Taylor Swift songs, I decided to look at the colors in her music videos instead of analyzing her lyrics. I had just seen Vox's analysis of how color had changed throughout Game of Thrones' seven seasons, and wanted to do something similar. I also wanted to try a different angle from the usual literary analysis.

So I went off to get my data, helped by knowledge from my October Obamas project (in fact, I'm pretty sure I just copy and pasted the first two scripts, yay for reusability!):

  1. Use the Wikipedia page for Taylor Swift's videography to compile a list of her music videos, with additional metadata including album, year, director, bpm, and youtube id.
  2. Loop through the list and download the corresponding Youtube videos (downloadYoutube.js).
  3. Take screenshots every 5 seconds for every music video (takeScreenshot.js).
  4. For every image, grab the RGB information for every 5 pixels. Plug all the colors into clusterfck, a Node package for hierarchical clustering (no longer maintained). Save each cluster color and size (getClusterColors.js).
This was the result scene by scene (the x-axis is the hue, and the y-axis is the lightness), which was pretty cool: And by video: I liked being able to see all the colors by video, but I didn't like that there was so much overlap. So I thought of having a histogram of all the hues in a video, and a heatmap underneath so that I could see the spread of the colors scene by scene: The histogram turned out to be very aesthetically pleasing: But I'm not sure if it told me anything other than that her MVs have a lot of oranges and blues. Which makes sense, since most of the oranges are various lightness/saturations of her skin tone, and the blues are commonly used to dramatically complement the orange. Nothing new or mind-blowing there.

I tried to add in the heatmap for every song, where the x-axis is still the hue, but the y-axis is the time in the music video. For some of the songs, like "Fifteen", I could see distinct scene changes: But for most of the songs there weren't anything too interesting to comment on, and the visualizations for each song took up too much screenspace. (I did like this hover interaction I built for the heatmap though.)

I wasn't quite sure how to go forward with what I had, and for some reason I thought that if I wasn't getting (what I thought was) interesting with the detailed view, then I had to dig even deeper. So I decided to build a tool where I could filter each image by hue or saturation. It was so extra. To accomplish it, I read the image data using Canvas and sent that image data (and the range of hue and saturation to filter by) to a Web Worker that looped through each pixel in the image and decided whether it could be kept or not. I also tried using the GPU since each of the pixel calculations were self-contained, but (I think I was doing something wrong) it turned out to be slower than just using the Web Worker. At one point, I even tried to send the data to the GPU from within the Web Worker, but found out that it wasn't supported in most browsers (this was early 2018). It was all so unnecessary. I think I just wanted to do it because it was technically fun, and it had nothing to do with whether the end result would be a good user experience.

But it did give some interesting results, especially when filtering down to just red colors: in 2006, she wore a red babydoll dress surrounded by red roses, the epitome of innocense. In 2017, her red dress is deep cut, her lips are deep red, and she stands in front of a bright magenta background, parodying the way media portrays her. Scenes from "Our Song", 2006 Scenes from "Look What You Made Me Do", 2017

This made me wonder if she ever repeated certain colors with certain objects across songs. I went through every song and filtered by every hue of the rainbow: I found that she frequently had red cars and trucks throughout her country songs, and red lips through her later pop songs. That was cool, but it didn't feel like enough.

(NOTHING FELT LIKE ENOUGH 😭😱)

So I decided to try another approach: since going deep in a song didn't give me enough interesting things, I decided to go broad and see if there were interesting trends across the songs. I went back to the histograms and grouped them by album: But the songs were still too far apart, and I wanted a way to quickly scan through the songs and see if there were any changes in hue, saturation, and lightness. I sketched out two ideas: one that was a combination of beeswarm and line chart that would show saturation, and another that would group arrange the colors into a block or color wheel to show hue. The saturation went all over the place (maybe looking at lightness would have been more interesting?): And as for hue, there were some interesting songs, like "Love Song" that stayed almsot all yellow and green, or the three songs in 1989 that were mostly orange and blue. But for most other songs, there weren't any such clear delineations in hue that I could see, that would be worth writing about.

By now, I was really discouraged. I had to admit to myself that I really didn't know much about color and especially color use in music videos, and that I should scrap the idea and try something completely different.

year 2 | data

That something new turned out to be emotions. I was asking myself what I liked the most about Taylor Swift songs (something I should have done from the very beginning), and after some brainstorming, realized it was because I liked the stories she told. They were easy to relate to (at least her earlier ones were...), and they were usually about daydreams or relationships or breakups - emotional things.

So I set about gathering all of her lyrics, and I found a Github repo with all of her songs until 1989. It also had some metadata for each line of a song, including title, album, and year.

I had been streaming my code progress on Twitch, and for figuring out the emotions in the lyrics, someone suggested I use the NRC Word-Emotion Association Lexicon. It was a dataset of ~14,000 common words and whether they were associated with a variety of emotions: anger, anticipation, disgust, fear, joy, sadness, surprise, trust. I went through each line in each song, and for every word that had a match in the lexicon, I noted the emotion. I then divided the count for each emotion by the total number of words, and got a "percentage" for each emotion in a song (parseLyrics.js).

I threw the newly acquired dataset into Observable (an online notebook for visualizations), and used Vega-Lite (composable charting library) to draw a line chart of the emotions over the years: (I really like this combination of Observable and Vega-Lite, that allows me to quickly explore the data - a much needed step-up from when I would build out a whole visualization to explore the data.)

The problem, though, was that because the lexicon only contained the most common 14k words, a lot of the words in the lyrics were unaccounted for. I was concerned for the accuracy of the dataset, and started to look for an alternative. I tried to look for a sentiment analysis package, but they all only told me whether a phrase was positive or negative, until I found IBM Watson's Tone Analyzer. I wrote a script to pass every line of every song into the Tone Analyzer, and remember the set of emotions Watson found in that line as well as its confidence of those emotions existing (toneAnalyzer.js). Out of all the tones Watson came back with, I decided to concentrate on just the five Inside Out emotions: joy, sadness, fear, disgust, anger. I wrote another script to reduce the data file to just those five emotions for each lyric (getTopTones.js) to get my (almost) final dataset.

year 2 | sketch & code

I was trying to figure out a visual representation/metaphor for these emotion, when on a random drive home, it hit me: emotions are gooey! They're messy and amorphous and gooey (for some reason I thought of lava lamps). And so I immediately plugged the data in, drew each emotion as a circle, and applied the "gooey effect" - a set of SVG filters that blurs the circles and then ups the contrast (CSS Tricks has a great explainer on how this all works): Each of the emotions are mapped to a color: joy is yellow, sadness is blue, disgust is green, fear is purple, and anger is red (just like Inside Out!). They are sized by Watson's confidence of that emotion existing. I also tried to position the emotions by when they appear in the song, starting at the top and going clockwise. I did this so that I could have consistency between the overview and the detailed view of each song, where the gooey emotions animate to the center when its corresponding lyric is played: I liked this idea, but I didn't like how quickly I had to animate through each line (lest the animation take too long), and how hard it was to associate the lyric with the emotion before it disappeared. So I brainstormed and sketched out some other ideas: For some reason, while thinking of the confidence level of emotions across time, I thought of the lines that lie detecters drew and overlapping each of the emotion-lines around the gooey. While thinking of an order to overlap the emotions, I also thought of vertically sorting the emotions in the gooey, so that joy - the "lightest", "airiest" emotion - floated at the top, and anger - the "heaviest" emotion - sat at the bottom.

Here's the sketch implemented: Because the lines were too messy and hard to track around a circle, I filled them in and also used a multiply blend mode so that I could see the overlapping emotions. I also added the lyrics on the right, and for any lines with emotions I denoted them with a mini version of the circles - that way I wouldn't have to keep going back and forth between the visualization and the lyrics to match the emotions with the words.

I liked this version, but it was hard to see which emotions were most prominent across the whole song, since it's really hard to compare heights when those heights wrap in a circle. I tried this next: But while I could now easily scan and see which emotions have the highest peak, I really disliked the aesthetic - it reminded me of a sun over a bunch of little mountains, and that was definitely not a look I was going for.

So I went back to the area chart around the circle, but this time I used annotations to indicate the highest emotions. It also let me bring the lyrics in underneath the visualization, and thus make each visualization more compact (which I always like): I was able to stay pretty faithful to my sketch in my implementation, and even added an extra annotation to show the cutoff for where Watson was confident (or not confident) with its analysis: But I didn't like how spiky the area chart got when there were longer lyrics, so I tried a rectangular shape instead: And also a rounded version:

At this point, I was pretty satisfied with the look of the detail view, so I decided to work on the title page - and this is where I ran into performance issues trying to animate the gooeys.

Shirley