The write-up below isn't exactly the same as the one found in the Data Sketches book. For the book we've tightened up our wording, added more explanations, extra images, separated out lessons and more.
We originally wanted to do something with the US elections, but with the way that turned out... So Shirley said she instead would love to look into Obama. However, as a non-American that topic didn't really interest me. However, perhaps how the Americans see/feel about the presidential family is how Europeans might feel about the royal families in Europe. So that's how we ended up with our current theme.
The first thing that came to mind when I thought about the royals, was their bloodlines. Due to all of that intermarriage I was sure most royals in Europe would be relatives of each other. The question was then, how close of a relative? As always I was pretty sure there would be loads of nice datasets to find about the royal bloodlines. But, as in the previous projects, that turned out differently. Sure, many people had drawn some images of the closest relatives of some royals in Europe. But I really didn't want to do too much manual data gathering this time.
Luckily there was 1 file that I found that seems to be the information source in terms of royal genealogy. However, it was made in 1992 so it is missing the recent additions to the royal families. Also, when I was looking for "interesting" royals to highlight, such as the last king of Italy, or Franz Ferdinand (who's assassination sparked WWI) I found that often these lines stopped a few generations before. I therefore manually added about ±150 people to the file (which was a bit of a challenge because the file is in a strange format; a gedcom file) that took at least 5 hours over the two weeks...
I did the data preparation in R as usual. Family trees are something for a network layout, so the odd format (see the screenshot below, it contained lines on individuals & families) had to be transformed into a nodes & links file that d3's force layout knows how to handle. Not too difficult, but it still required quite some lines of code to eventually be able to run through the whole file and get the info out and into some data sets.
And that was all I did beforehand, just getting the node & link files in good shape. But while wrangling my way through d3's force layout I thought about new things to add or values to pre-calculate to get more insights from the structure of the network.
For example, I wanted to do something with the birth year of a person, but this wasn't know for everybody. I therefore wrote a small script that tries to find a good guess; either by looking if the death date is know (and then subtracting about 60 years), or looking at the birth or death date of the spouse, children or parents, in that order (again subtracting or adding some years to account for generational differences). Not perfect, but I didn't need perfect dates for the idea that I had in mind.
My final addition to the data was to calculate who was the royal each person was most closely related to (and within how many steps). I wanted to have this pre-calculated so that I had access to this beforehand in the visualization. Saves a lot of calculation time!
I truly didn't know what kind of network the data was going to give me, so this project I actually didn't really sketch on paper, but sketched with code in some way I guess. Getting to know d3v4's new force functions and trying out different settings for practically all variables.
My first result was a major network that blew up far beyond my humble 1000 x 1000px SVG.
Increasing the gravity made a useless hairball...
I eventually found some settings that seemed to reveal a bit of structure, but I needed to increase the SVGs size to 6000 x 6000px, I couldn't get it too fit into a smaller size... By now I saw that I really needed some more context to find insights. Therefore, I decided to add birth year to my data, since it is the best underlying variable that brings sense to the connections in this network, and then color the nodes by that (dark blue = long ago - yellow = recent).
Because birth year wasn't known for about 40% of the people in the network I first tried a lot of different things to somehow turn off the gravity for these nodes, so that they would be held in place through the links they had to other nodes for which the birth year was known. Like little springs. I asked around for help, looked into d3v4's underlying code of the force functions, but eventually there was no way around it.
So I instead decided to estimate the birth year. Finally, there was some form of network that was getting slightly insightful.
But I still felt like it wasn't really getting anywhere. I continued again on another day and I was so depressed with the network and its potential that I didn't even take screenshots of my attempts for quite some time. I was thinking of just giving up on this network and try to use the data in a different way (or use some other data, how much the royals are getting from the taxpayers money is always a hot topic for example).
But I wanted to try one final thing to see if it turned out better, looking more at the design itself; the colors and such. The 12 current royal hereditary leaders were most important so I made them big and noticeable. I also spread them out evenly along the vertical axis in an attempt to pull apart the network a bit more, taking into account which were closely connected (placing these together) and the most loosely connected royal leaders on the outside.
The connected nodes reminded me of constellations of stars in the sky, so I switched to a dark theme where the nodes were colored yellow and added a touch of glow. And with the initial idea of connections between current royal leaders I calculated the distance from each node to a royal and then adjusted their opacities to depend on their closest distance to a royal. So the people closest to a royal would be very visible and this would diminish to almost completely transparent for those more than 6 steps away from all of the 12 royal leaders.
And that's when I finally started to see some potential! I decided that I had to focus on that connection to a current royal leader more and see where that would take me. The visual still needed about 2500px in width to have a nice spread of the nodes, but horizontal scrolling is not preferred, so I turned it 90 degrees so it would be a vertical scroll. To pull apart the network even more I calculated the (average) royal to which each node was closest. I used that info to generate a slight horizontal pull of the nodes to their closest royal descendants. As you can see in the image below right, where I heavily increased the strength of the vertical gravitational pull, this genealogy is mostly focused on the English bloodline.
I wasn't really interested in the people born more than ±300 years ago, because it was already apparent that the current royals are most highly connected in the last 200 years, so I decided to remove everybody with an (estimated) birth date from before 1000. I also squished the vertical date scale even more towards older times to make the total visual less high and more focused on the last 2 centuries.
And that's when I finally ended up with a network structure that I was happy with. That even in a static state already showed some insight into how connected the current royal leaders are. Now I could finally move on from my digital sketching and think about the details & interactivity to make the closest connection between people more apparent.
There's definitely a very vague line between this project's sketching and coding, since the sketching already took quite a bit of coding. But after getting the layout in the right position I focused on getting the interactions in there. There were two main things I wanted to achieve; showing how far "6-degrees of separation" (i.e. 6 generations back & 6 generations forward) would spread from a hovered over person into the network: "To which other current royal leaders is the King of Spain connected within 6 steps?" for example. The second thing was showing the shortest path. When I had a basic hover working I saw that some people had very big networks within 6 steps, so it was difficult to really see how two people were connected. I therefore wanted to be able to click on a person and then click on some other person in the network to see the shortest path between those people.
I usually start out very simple, so to look for 6 steps beyond a person I just wrote a while loop that ran 6 times and looked at the people connected to the selected persons from the previous loop. Not very smart or performing. But it's easy, fast to set up and then I could determine if the result on the visual was as I'd hoped. Only after I saw that the hover of 6 steps had potential did I look into using key-value maps to quickly request all the people connected to one person, which made the calculation almost instant.
I'm sure some people think my choice of color for the hovers is a non optimal one, going from a blue to white to red; not very intuitive. But this was a very deliberate choice. Since I was seeing the network as stars & constellations I thought it would be a nice analogy to color the "stars" to the colors that actual stars shine in. The very hottest, biggest stars are blue, going to white, yellow, orange and then red stars which are the coolest (& smallest) stars. Due to my Astronomy background this color scheme is straightforward to me (and I build these projects as experiments for myself foremost).
Due to more of a mistake I ended up with a version in which you see the steps grow, from 1 step away from a person to 2, 3 and so on. And this was giving a much better intuitive feel about what these colors represented that I actually put it into the final version. I'm still stumped that I didn't think of doing that from the start.
Because the nodes are a bit small I found that hovering over a node exactly was a bit of a chore. So I used the voronoi technique (I wrote a blog about this a year ago) in which I attach bigger, but clipped, circles underneath the nodes that capture the mouseover event. In the image below I've made them visible when I was investigating an error (the straight vertical line in there was wrong).
At first I was afraid that with almost 3000 nodes in the network that a shortest path calculation would take too much time. I nevertheless wanted to first actually make it work and only then see if it was something that might survive until the final version. I knew about Dijkstra's algorithm, but no need to invent the wheel again, so I found a script on Github that I adjusted to my needs. Surprisingly enough it performed amazingly well, it returned a path before I could blink. Personally, I very much like the small dotted rings that rotate around the two selected people.
What caused me most headaches during this project was performance in general. How to make sure that something like a mouseover would only trigger when somebody was hovering over a node instead of just moving their mouse over the grid of nodes. How to only run a mouseout when the 6-degrees-animation had kicked in and not on every switch from node to node (and thus performing a transitions from an already yellow end state to the same end state). And even more of these things, all to make sure the browser only did things when a visual change was needed (and then only on the "affected" parts of the visual). This did cause the end result to work very smooth on my 0.5 year old Macbook in Chrome, but strangely enough it is taking significantly longer to do the visual updates in both Firefox and Safari (if somebody can explain why please let me know!). Of course it's also slower on my personal 6 year old Macbook, but not too bad.
The animated gif below shows the interactions available; hover, click and 2nd click (It's a bit slow because I have to capture gifs on my 6 year old Mac and my browser slows down during a capture).
Which brings me to the final section. Like the Olympics projectI again did this project in half the allotted time. This time because I'm going on vacation for 3 weeks (yay!). But... if I had had 2 more weeks: I would try to create a version in canvas (and figure out how to do the interactions in there) that would hopefully be more performing. I would want to create a search function where you can find any royal in the data (although the names aren't always in there spelled perfectly, but it would be a start). And finally I would want to look into making the network even more squished so that you would be able to see more of the total height at once.
But I'm happy with what I managed in 2 weeks (while also having other obligations in my free time). It may not be as smooth in terms of interactions as I had hoped, it does do exactly what I wanted: showing how interconnected the European royal families really are, who the "linking pins" are in this network and what the shortest path is between two persons. And I learned a heck of a lot more about d3v4's force options. You can find the final result here.
Talk about timing. Our original project idea was going to be around the US elections, but with the way that turned out, we decided to look for something different. As a European, focusing on American presidents didn’t interest me much, but I felt that the European royal families would be a good equivalent to how Americans might see the presidential families.
The first thing that came to mind when I thought about the royals was their bloodlines. Due to all of the intermarriages in the past, I was sure most royals in Europe were related. The question then was: how close of a relative? Were they all cousins twice removed, or perhaps even closer than that?