Tidy Tuesday: Simpsons Guest Stars

Posted by Amelia E. Barber on Friday, August 30, 2019

Tidy Tuesday time, y’all

It’s been months since I did a Tidy Tuesday, so I set aside an hour and said I had to post SOMETHING at the end of it. This week’s dataset was on Simpsons guest stars over the 30 (!!!) years the show has been on the air. Though I was initially tempted to do something looking at the most “famous” guest stars that have been on the show or the connectivity of the cast and guest stars in other projects outside of the show, both of those require additional data and more time than I had, so I went with the next thing that popped into my head, which was simply mapping the most common guest stars and their number of apperances in each season.


Step one, load the packages we’ll be using and grab the data.

library(tidyverse)
library(cowplot)

simpsons <- readr::read_delim("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-08-27/simpsons-guests.csv", delim = "|", quote = "")

Looking at the most commonly-appearing guest stars, we see that Marcia Wallace (the voice of Edna Krabappel) is by far the most frequent guest star on the show with 175 apperances. The list drops off pretty quickly with Glenn Close rounding out the top 10 with nine apperances over the 30 seasons.

simpsons %>% 
    count(guest_star, name = "total_n", sort = TRUE)
## # A tibble: 795 x 2
##    guest_star               total_n
##    <chr>                      <int>
##  1 Marcia Wallace               175
##  2 Phil Hartman                  52
##  3 Joe Mantegna                  29
##  4 Maurice LaMarche              25
##  5 Frank Welker                  21
##  6 Kelsey Grammer                21
##  7 Jon Lovitz                    19
##  8 Kevin Michael Richardson      18
##  9 Jackie Mason                  11
## 10 Glenn Close                    9
## # … with 785 more rows

For the purpose of making a cleaner visualization, I’ll focus on the 8 most frequently appearing guest stars, so let’s make a dataset with just these stars and their total number of apperances.

top_stars <- simpsons %>% 
    count(guest_star, name = "total_n", sort = TRUE) %>% 
    head(8) 

We’ll use this as our filtering dataset for plotting, simultaneously adding the total number of apperances per guest star which we’ll use to order our data for plotting (and incorporate as text labels into our final plot).

Ok, let’s get to plotting!

simpsons %>%
    count(season, guest_star) %>%
    inner_join(top_stars, by = "guest_star") %>% 
    filter(season != "Movie") %>% # editorial choice; its not a season
    mutate(guest_star = fct_reorder(guest_star, total_n),
           season = fct_inseq(season)) %>% 
    ggplot(aes(season, guest_star)) +
    geom_point(aes(size = n)) +
    scale_size("# appearances")

Not a bad start using the default aesthetics, but it leaves some room for improvement in the prettiness department and we can make it cleaner, so its easier to understand.


Looking at the raw data, some guest starts always reprise the same role - for example Marcia Wallace always plays Mrs. Krabappel and Joe Montegna always voices Fat Tony. Other guest stars rarely voice the same character twice, as we can see for Maurice LaMarche below:

## # A tibble: 25 x 2
##    episode_title               role                                             
##    <chr>                       <chr>                                            
##  1 A Star Is Burns             George C. Scott; Hannibal Lecter; Captain James …
##  2 The Seemingly Never-Ending… Commander McBragg                                
##  3 Treehouse of Horror XVII    Orson Welles                                     
##  4 G.I. (Annoyed Grunt)        Recruiter #2; Cap'n Crunch                       
##  5 The Wife Aquatic            First Mate Billy; Oceanographer                  
##  6 Stop or My Dog Will Shoot   Farmer; Horn Stuffer                             
##  7 You Kent Always Say What Y… Fox announcer                                    
##  8 Treehouse of Horror XVIII   Government Official                              
##  9 Husbands and Knives         Jock                                             
## 10 Dangerous Curves            Toucan Sam; Cap'n Crunch; Trix Rabbit            
## 11 No Loan Again, Naturally    Dwight D. Eisenhower                             
## 12 Waverly Hills 9-0-2-1-D'oh  City Inspector                                   
## 13 Once Upon a Time in Spring… Nuclear Power Plant Guard                        
## 14 Chief of Hearts             David Starsky                                    
## 15 Angry Dad: The Movie        Anthony Hopkins                                  
## # … with 10 more rows

Let’s incorporate the names of the main characters each actor voiced into our plot. It would also be nice if our plot listed the total number of guest apperances by each actor or actress. Once we make these changes and add more neutral theme we get this:

Now we’re getting somewhere! The last thing we’ll do is add some Simpsons-themed colors and font, both of which are easily accessible through the tvthemes package by Ryo

And…huzzah!

(Ok, the whole thing took me more than an hour by the time I went back and added the blog text to the markdown, but this was still a useful exercise for me who typically spends too much time messing around with tiny details)

View the rest of my Tidy Tuesday contributions here.