Subscribe to our newsletter!
Presidents & Royals
October 2016

Putting Emojis on the President’s Face

The write-up below isn't exactly the same as the one found in the Data Sketches book. For the book we've tightened up our wording, added more explanations, extra images, separated out lessons and more.


Originally Nadieh and I had October slotted for the elections, so that we could potentially have something ready by voting day. But as October approached, I realized more and more that I wanted absolutely nothing to do with the election other than not-Trump. So as October approached and we were musing what to do instead, I realized: a future president getting voted in also means a current president leaving the White House.

And we may not all agree on what the Obama administration has or hasn't done politically, but I think most of us can agree: the Obamas are one damn cool couple. Like, the slightly dorky parents that I wouldn't mind associating with if they were my friends' parents. Like that time last year when Mr. President recorded a video of himself talking to a mirror so that he could plug his new Or when recently Madame First Lady Carpool Karaoke'd with James Corden and she was so hilariously relatable. And then I realized that weirdly perhaps I was going to miss them being our POTUS and FLOTUS.

So I wanted to do something silly and light-hearted to thank them for their eight years in office. And the first thing I could think of was all of their appearance on late-night talk shows I watched on Youtube (that slow jam the news so good). I started digging around to see if there was a list of talk show appearances for both Barack and Michelle - and it was IMDb to the rescue again! Both of them had their own IMDb pages, and after going through Barack Obama's 214 credits as self and Michelle's 123 while cross-referencing the Wikipedia article on the list of late-night American network TV programs I was able to get this list:

(Full list of appearances)

Using that information, I was able to use the Youtube Search API to search for keyword "obama", filtered down by channelId and published within 5 days of the interview date (code). Unfortunately, because I wasn't sure how many videos were published for each interview (if at all), I set the maxResults to 15. This meant that I would get back videos within those days that had nothing to do with the Obamas, and I had to manually go through all of the videos to weed out the irrelevant ones (there were 244 videos and 186 were ultimately filtered out).

Here is the final list (it loads quite slowly because of all the embedded videos, for which I apologize):

(Full list of videos)

The list is unfortunately incomplete, since there were interviews with past shows/hosts like the Late Show with David Letterman and the Tonight Show with Jay Leno that weren't on Youtube. I chose to keep them out instead of trying to find the videos on other websites or on unofficial channels, in the hopes that this would make for cleaner data.


D3.unconf was on the 16th and 17th of this month, and the second day was dedicated to hacking - a perfect opportunity to work on datasketches. But I was feeling a little lost on what to do with the data and mentioned this to both Ian and Erik.

Which, in retrospect, was probably a big mistake. Because Ian was like, "you should get the captions for each video and do something with the words!" and Erik was like, "wouldn't it be cool if you could run facial detection on the video and correlate their emotions with what they're saying?". And I was like, "GUYS you know I only have a month to do this?!"

But Alas, like a poor version of Inception, the idea was planted and it took root.

I couldn't find an affordable way to pass whole videos into any facial recognition software, but thankfully I was at the very resourceful D3.unconf. Within a few minutes an alternative was suggested: take screenshots of the video and pass the images into Google's Vision API (provided by Google Cloud, who just happened to be one of the sponsors for the unconference). Here's what I did:

  1. Take the list of all videos, and download them (and their captions, if available) with youtube-dl (code: downloadVideos.js)
  2. Use vtt-to-json to convert the captions into JSON and get the timestamp of every time someone talked, and use that timestamp to screenshot the video with fluent-ffmpeg (code: screenshotVideos.js)
  3. For each screenshot, upload it to Google Vision API (, which gave back data of any faces it found in the image as well as the bounds of the face and head:
  1. And much more, like what it thinks the picture is of, and what famous locations it might contain. (code: annotateImages.js)
  2. Save the videos' captions into one JSON file (code: getSubtitles.js), then join them with the annotated image data from the Vision API (code: annotationSubtitles.js)

This whole process took a few days, and once I finally had all the data cleaned I got started on the sketching. I learned my lesson from the last month, and made sure to explore the basic shape of the data, as well as all the different types of data I had on hand:

Once I had all that figured out, I started thinking about what I wanted to convey. I knew it was ambitious, but I wanted to make full use of such a rich dataset, and also get across why even though there were 44 appearances (after I did this, the number increased to 46) I only had screenshots for 29 videos. I've also been wanting to try my hand at a scrollytelling for a while to test my technical skills, so I mapped out my first few sections:

After the previous sketch, I actually took a few days to build out those first few sections first. This also helped me get a better feel of the video data, which I hadn't explored in as much detail as the hosts and appearances data. I had originally envisioned something where all the videos would be lined up in rows, and a timeline would show a bar graph of word frequence based on the captions, and all the times that the POTUS and FLOTUS laughed. But I soon realized that that was WAY too much data to try and show on one screen. So I came up with the following sequence to introduce the videos and to only show the selected video's timeline of captions and laughs:

I was originally even more ambitious and had planned on showing details for each caption, not just the screenshot at the time of the caption with the emojis on the faces. I had planned on showing for each word in the caption the emojis and hosts most frequently associated with that word in a timeline of its own. I'm so glad I didn't do that, because it would have just been overkill. (Sometimes, even if we have the data, we don't gotta.)


So as soon as I figured out that the Google Vision API gave me the bounds of faces and heads, I knew I had to put emojis on the faces:

And boy, do I not regret; it brought me so much joy. The rest of the visualization is basically just a build up to the end tool to explore all the photos with emojis on their faces.

But before I get into the details of the implementation, let me just say: I am NEVER getting this ambitious for a one-month *side* project ever again. This month's implementation brought me through quite a few first-time technical challenges:

And because I was just trying to get it all done, I didn't take much time to take screenshots of my progress (which I regret ☹️).

The first thing I wanted to do was to implement animation tied to scrolling like Tony's A Visual Introduction to Machine Learning - it always seemed like a fun technical challenge. The key is to interpolate between two sections, and I worked out the logic to be:

So when the user got to be halfway down the section (I actually ended up using 25% instead), I would calculate the elements' positions for the current section and for the next section and interpolate between them. From there until the user entered the top of the next section, I would pass the amount scrolled into the interpolation to get the x/y positions (and even, for the videos, the radius). Here is the code, which took quite a bit of fiddling.

(It's much smoother on the website - I ran the Chrome Dev Tools Profiler religiously to weed out any unperformant code!) One of my favorite parts is when I scroll to the first timeline, and the links animate between the host and the Obamas to connect them.

Another interaction that made me really happy was hovering the hosts to see the corresponding guest appearances. A very simple implementation, but helped so much in making the tangle of links easier to navigate:

Another favorite; This one's showing all the videos with captions, and the filled circle radius is the number of views, and the other ring radius is the duration of the video. It's fun to see that some of the videos got a lot of views despite being shorter in duration, or vice versa. The small dots on the ring is every time Google Vision API said there were expressions of joy, so you can see whether the laughter was even throughout the video, or were concentrated in specific blocks. My favorite part about this though, is actually in the description, where I calculate the number of times there were laughs for the POTUS and FLOTUS and finding out that the FLOTUS has significantly more laughter:

And here is the timeline with the emojis on their faces:

And as soon as I saw that timeline, I knew I needed to have a fisheye effect. I had seen the New York Times implementation of the horizontal fisheye to navigate through fashion week images, and Nadieh had adjusted that code a few months ago when we were considering using the effect for the front page of datasketches. But the code was for canvas, and I was already committed with SVG; it took a bit of digging to find an SVG equivalent for cropping images (preserveAspectRatio and the key is to set the viewBox width the same as the image width). It took a few more passes of Nadieh's code and the original fisheye code to realize that all I needed was the little bit of code that mapped an x-coordinate to its new position based on the distortion. From there, it was just passing in the hovered x-position and re-rendering.

What I'm happiest about with this fisheye though, is that I made it work on mobile! When I used touchmove by myself, scrubbing was extremely buggy and finicky on mobile. After thinking the whole day of a different interaction I could have with the timeline on mobile, I realized that d3 already had a great touch implementation with d3.drag(); the interaction on mobile using d3 drag for the fisheye was so smooth it was magical butter 😍

And finally, here is the (intro of the) finished piece:

This month was definitely killer, but I am also super happy I stuck it through. I'm really proud of all that I was able to figure out, and that I now have a completed scrollytelling under my belt. The most suprising though, is my newly acquired sense of fearlessness, where I really feel that if there's something I want to implement, I'll be able to figure out how to do it given enough time and the right Google searches.

Originally Nadieh and I slotted October for the US presidential elections, in order to have something ready by Election Day. But as October approached, I realized more and more that I wanted absolutely nothing to do with the 2016 election (other than to vote). We mused on what to do instead, and I realized: a future president getting voted in also means a current president leaving the White House.

While we might not all agree on what the Obama administration did or didn’t do politically, I think most of us can agree that the Obamas are one cool couple. I thought of them as the slightly dorky parents that I wouldn't mind associating with if they were my friends' parents. Like that time when Mr. President recorded a video of himself talking to a mirror so that he could plug, or when Madame First Lady Carpool Karaoke'd with James Corden and she was so hilariously relatable. This made me realize that I would, in some way, miss them being our POTUS and FLOTUS.

Read the full write-up in our book "Data Sketches"