The write-up below isn't exactly the same as the one found in the Data Sketches book. For the book we've tightened up our wording, added more explanations, extra images, separated out lessons and more.
So just like Nadieh, this month took me forever to figure out a dataset and angle for. So even though it says May, I didn't finish this until December 2018 (wow I'm bad), and I'm writing this in January 2019 (wow I'm SUPER bad).
And just like Nadieh says, we thought this would be a super project with RJ and Mara about all the different tellings of Cinderella. But after a few months of working on it on the side, the project quietly fizzled out (probably mainly to do with time issues with everyone's busy schedules). After that, I was quite honestly at a loss. We had chosen this topic, myths & legends, because it had sounded like a cool topic with a lot of potential. But when I actually sat down to think about it, what I came up with either didn't excite me, or would have been logistically difficult to get the data. I bounced from Chinese and Asian mythology, to class Chinese literature, to Mythbusters episodes.
Then, the idea came to me after watching Crazy Rich Asians. I loved Michelle Yeoh in the movie, but it wasn't until I went and read up more about her that I learned she was accomplished and legendary and badass. It made me wonder about all the legendary women that I have no idea about, and the idea took shape from there.
Once I had the idea about legendary women, the next question was how to go about gathering the data. Pretty early on I decided to use Wikipedia, and I thought I might try to scrape it for the "top" women on there. I didn't have any idea how to define "top", though (page views? page length? links into page?) and even though I saw a Pudding article that scraped Wikipedia, I still wasn't sure how I wanted to go about it. (I was also on a bit of a time crunch.)
Then it hit me that perhaps I should be looking for a definitive list, and I somehow ended up on a page of the 51 female Noble Laureates. I hadn't heard of most of them, and I wanted to learn more about them. It was perfect.
(Part of the Wikipedia page of the 51 women.)
From there, all I had to research was the different ways to access the Wikipedia API. It was a little hard to navigate (I wasn't sure if I should be using Wikidata or MediaWiki which is what comes up when I search "Wikipedia API", for example), but thankfully Pudding had me covered. A quick dig through their wiki-billboard-data repo led me to their script using wikijs, and I could just interface with that node package instead.
From there, all I had to do was copy and paste the table of female Noble Laureates into a spreadsheet, and do some light formatting/cleaning. (This is probably one of the most important lessons I learned about data gathering from Nadieh, which is that spreadsheets are great for cleaning data. The unenlightened me used to try and write code to scrape it, and it'd take me so much longer. When I gathered data for Hamilton, I literally typed out the commas between each cell in a row in my CSV, I was so stupid.) I then exported the spreadsheet as CSV, then used an online converter to get the data into JSON format.
After that, I wrote a simple script using wikijs to grab some more information about each woman, including the number of links in to their page ("backlinks") and the number of sources at the bottom to get the final dataset.
Even though the idea took forever to come up with, the actual data gathering only took a few hours 😅
I'll be honest, I've been thinking on and off about this project for so long that I can't even remember all the ideas I had come up with. All I remember is that I was going through the Information is Beautiful Awards shortlist when I realized I should be pinning the entries I really like. Which made me realize that I should clean up and organize my Pinterest board better, since I used to have all my dataviz inspiration in one board and I should put them into even more descriptive categories like "radial", "spatial", "network", "scrolly", etc.
And as I was going through and cleaning my board, I came across this gorgeous painting of crystals by artist Rebecca Chaperon that I had pinned years back:
And immediately, I knew I wanted to programmatically recreate them. Because how beautiful would it be if I could represent these legendary women as bright, colorful crystals?
Maybe I had other ideas before this (I probably did), but I can't remember them anymore - this felt so immediately, compellingly right.
It wasn't long before I had come up with the other details: the size of the crystals would represent the woman's influence, the number of links into her Wikipedia page. The number of faces on the crystal would be mapped to the number of sources at the bottom of her page (because she's "multi-faceted" hehehe get it), it would be colored by the category of her field. The only thing that evaded me for a while was how to position the crystals; for the longest time, I could only think to lay them out in 2-dimensions, in x/y positions and have the reader scroll through them.
And then I took Matt DesLauriers's Creative Coding workshop on Frontend Masters, where he taught canvas, three.js, and WebGL. The workshop opened my mind up to the third dimension, and I knew immediately that I was going to use the z-axis for the date they received their award; the closer they were, the more recent the award. RJ further suggested that I should have threads connecting those that collaborated with each other, and have their positions be affected by those pulls also (I didn't have time to implement this).
All of this came to me so quickly and naturally, that I didn't do a single sketch. (I went through my notebook, and there aren't any at all.)
This was a great month for code.
For the last few years, I've been wanting to make a physical something, and last year (2018) I finally decided to make it a goal to be part of a physical installation. But for majority of the year, I didn't know what I wanted to do for a physical installation. Every time I thought about it, I'd just get stuck thinking in terms of 2D projections on the walls and not know how to take advantage of all the floor space.
And then one day, it hit me (I'm not sure what triggered it) that of course I don't know how to think in 3D physical spaces, because I work digitally in 2D all day long. So if I could teach myself to work in 3D digitally, then it should follow that I could think in 3D physical spaces also. I put three.js and WebGL at the top of my to-learn list.
In late October, I took Matt's Creative Coding workshop, and learned the basics of three.js and an intro to fragment and vertex shaders. I learned the right-hand rule: use the thumb for x-axis (increases going right), index finger for y-axis (increases going up, which is the opposite of SVG and canvas), and the middle finger for z-axis (increases out the screen and coming towards us). I learned that the WebGL world doesn't operate in pixels, but rather units (that we can think of as feet or meters or whatever we like).
(Notes from the workshop, and from the Book of Shaders.)
Then in mid-November, David Ronai asked me if I was interested in participating in Christmas Experiments, an annual WebGL advent calendar. I was really hesitant to accept, since I had never worked in WebGL before, but David encouraged me to give it a try and that he'd put me later in the month so that I'd have more time. I agreed, knowing that the deadline would give me the motivation I needed to complete the project.
I started on December 1st, and made it a goal to do a little bit each weekday until I could get to something presentable on the 23rd (the slotted date for my Christmas XP).
I started by reading the first two chapters of WebGL Programming Guide that Misaki Nakano recommended for me, which taught me how WebGL was set up. I then re-took just the three.js section of Matt's workshop so that I could see what heavy lifting three.js was doing for us. After the workshop, I started an Observable notebook to figure out the minimum amount of setup it took to draw something in three.js (a renderer, a camera, and a scene, and then call renderer.render(scene, camera) to draw). I always like understanding how something works at its base, so this was really helpful to figure out.
After setting up the notebook, I wanted to create a crystal shape. I decided to use the PolyhedronGeometry because I could just define a set of vertices and then specify the vertices that would make up a triangular face. On the first day, I only managed to create one triangle:
And on the second day, a crystal (which took two attempts because the first had incorrect math):
And then eventually the crystal shape I had in my mind:
Even though I later realized there were better ways to do what I wanted (and in fact, PolyhedronGeometry was a rather tedious way to do what I wanted), I'm really glad for the practice it gave me in thinking through WebGL's x/y/z coordinates.
Once I was satisfied with creating shapes, I moved on to learning how to color the shapes. For this, I went back through Matt's section on vertex and fragment shaders, and played around with his shader code:
The next goal was to use the fragment shader to color the crystal shape. I took a bit of a detour here, because I couldn't figure out how to use glslify (a node-style module system for GLSL, the language shaders are written in, that Matt's code used to load in the noise function) in Observable. Instead, I started to explore different bundlers/build tools so that I could eventually deploy my code to the web. In the end, I decided to go with Parcel (instead of the Vue CLI that I've been regularly using for the last half year), because it had built-in support for both Vue and GLSL.
Here's the crystal with that same noise pattern applied:
I wasn't a fan, so I decided I needed to learn more about shaders and using colors in shaders. This is when I turned more heavily to Patricio Gonzalez Vivo's Book of Shaders, and in particular his chapter on Shaping Functions. I learned about sines, cosines, polynomials and exponentials - functions that can take a number (or a set of numbers) and output another. I also learned about mixing colors, and how we could take two colors and not just get a new color between those two, but also to mix colors at the RGB levels and make completely new colors. And once we combine that with the shaping functions, we can get gradients and shapes:
(These were accomplished by mixing blue and yellow, and tweaking the RGB values at each position with powers, sines, absolutes, steps, and smoothsteps.)
Once I felt happy with the potential colors, I switched gears and plugged in the data: number of faces for number of references at bottom of Wikipedia page, size for number of Wikipedia backlinks, color for the category of their Nobel Prize.
But I really didn't like this output for two reasons:
It was around this time that I came across a demo for Bloom (a post-processing effect to give glow), and in that example there were round, gem-like objects that looked quite close to the crystals I wanted:
When I looked at the code, I learned that they were using SphereGeometry but with flatShading on - so instead of rendering a smooth surface it showed each distinct face. And that was when I realized: just like how I can manipulate and mix colors to get new colors completely different from the original, I can actually manipulate the settings on a geometry (like the number of faces) and get a new geometry that looks similar but different from their name.
So I swapped out the PolyhedronGeometry with SphereGeometry, set the height segment to 4 and the width segment to the data, stretched out the shape by setting the vertical scale to twice the horizontal scale, added jitter to each vertex, and I had much more interesting shapes:
Now that I had the shape solved, I went back to the color. This time, I used two colors and mixed them with the shaping functions:
(I love how much the first one looks like sweet potatoes 😆)
Because the shader by default just wraps around the shape (at least that's how I like to think about it), I lost the edges. Thankfully, Matt taught me how to get the definition back: by calling computeFlatVertexNormals() on the geometry, getting the normals in the vertex shader, and passing it to the fragment shader and adding it to the color. This not only made the edges really apparent, but also gave a fake sense of light and shadow:
(Code for adding face normals)
From there, I played around with two sets of gradients: one for "humanities" (Peace, Literature, Economic Sciences), and another for "natural sciences" (Physics, Chemistry, Medicine).
Next came background. I created a "floor" by using a PlaneGeometry and randomly jittering the y-position of each vertex (inspired by this article from codrops), and the "sky" by creating a huge sphere around the scene. I experimented with three different kinds of lights: hemisphere and ambient to give the "sky" a nice sunset/sunrise sort of glow, and directional to cast shadows from the crystals to the "floor".
To finish up the piece, I added "stars" that represent all the men who have won the prize in the same time period, as well as annotations for each crystal and the decades. It was a fun lesson trying to get the text in, mainly because with all the text I had, using TextGeometry was completely unperformant. The solution I found upon Googling was to render the text within an HTML5 Canvas, create a PlaneGeometry, and use that Canvas as a texture to fill the PlaneGeometry with - what an interesting approach. (Here is the code.)
My favorite part was that I was able to find a good reason to use that third dimension: the decade of their awards. So the closer that they are to the front, the more recent their award, and the further back they are, the further back in history. But I don't reveal those decades unless the user "flies up" to view the crystals from above. If they are "walking through" the crystals at the ground level, I only show information about each woman - because I want the visitor to concentrate on only the women as they "walk through".
Here is the final (I'm sorry for the poor quality of video, I compressed it down to <1MB):
This month was super fun, and I'm so proud that I was able to finish it in 3 weeks - which I haven't been able to achieve since my September project about travel. I was also able to learn and teach myself three.js and a little bit of GLSL which I've been wanting to do for a long time - it's really given me the confidence to go forward with more 3D projects and I'm excited to do more with it in the future. But most importantly, I'm so glad I chose this dataset of women Nobel Laureates - it's taught me a lot, and have given me a whole set of topics I'm interested in exploring and building out as dataviz.
Just like Nadieh, it took me forever to decide on a good dataset and angle for this project. We chose “Myths & Legends” because it sounded like a great topic with a lot of potential, but the ideas I came up with either didn’t excite me much, or were difficult from a data gathering perspective. I wanted to do something related to my Chinese background and bounced from Chinese and Asian mythology to classic Chinese literature to Mythbusters episodes.
Then, the idea came to me after watching Crazy Rich Asians. I loved Michelle Yeoh in the movie, but it wasn't until I read more about her that I learned how accomplished and legendary she was. It made me wonder about all of the legendary women across history that I’ve never heard of, and the idea took shape from there.