7 min read

Movie Pairs

Have you ever wanted to get a list of movies that a particular pair of people have worked on? For example, what movies have Tina Fey and Amy Poehler been in together? I’ve not seen a search for movie pairs addressed elsewhere, so I thought I’d give it a shot.

I looked for an R package that would help me access a movie API, for free. I found TMDb, which provides an R interface to The Movie Database API. First, I had to sign up for an account. Then I had to login and apply for an API key.

Sys.setenv(TMDB_API_KEY = "YOUR_KEY_HERE")

The first thing I want to do is find all of the movies associated with a given person. I wrote functions to handle this in two steps: finding the ID of a person with name2id(), and finding the movies associated with an ID with id2movie().

Finding the ID of a person was a little tricky because sometimes the search_person() function yields more than one person for a provided name. In this case, I show a data frame of names and the movies they’re known for. The user will then have to indicate in future calls which person from this data frame they mean. I tested it out a nonexistent name, a fragment of a name, and a name with more than one person.

library(tidyverse)
library(TMDb)

name2id <- function(who, which=NA, mykey=Sys.getenv("TMDB_API_KEY")) {
  peeps <- search_person(mykey, who)
  if(peeps$total_results<1) {
    warning(paste("No", who, "found."))
    return(NA)
  } else {
    peeps2 <- peeps$results
    if(peeps$total_results>1) {
      if(is.na(which)) {
        peeps3 <- cbind(name=peeps2$name, 
          works=sapply(peeps2$known_for, function(df) 
            paste(df$title, collapse=", ")))
        print(peeps3)
        warning("Multiple people found.  Rerun setting the argument 'which' to the number of the person you mean.")
        return(NA)
      } else {
        peeps2 <- peeps2[which, ]
      }
    }
  }
  return(peeps2$id)
}

name2id("sigour")
## [1] 10205
name2id("xlty")
## [1] NA
name2id("Ron Howard")
##      name           
## [1,] "Ron Howard"   
## [2,] "Ronald Howard"
##      works                                                                 
## [1,] "A Beautiful Mind, The Da Vinci Code, Rush"                           
## [2,] "Murder She Said, The Browning Version, The Curse of the Mummy's Tomb"
## [1] NA
name2id("Ron Howard", 1)
## [1] 6159

Getting the movies for a given ID is also a bit tricky, because the discover_movie() function returns only one page of 20 movies at a time.

id2movies <- function(id, mykey=Sys.getenv("TMDB_API_KEY")) {
  init <- discover_movie(mykey, with_people=id)
  npages <- init$total_pages
  movie.list <- lapply(1:npages, function(p)
    discover_movie(mykey, with_people=id, page=p)$results)
  movies <- bind_rows(movie.list) %>%
    mutate(released=as.Date(release_date)) %>%
    arrange(!is.na(released), desc(released)) %>%
    select(id, title, released)
  return(movies)
}

library(DT)
datatable(id2movies(name2id("Emma Stone")))


Now I can apply these functions to a pair of names, then combine their movies to see which ones overlap. (Hover over points to identify the release year and movie title.)

library(plotly)

moviepair <- function(People, Which=c(NA, NA)) {
  Person1 <- id2movies(name2id(People[1], Which[1])) %>%
    mutate(person1 = -1)
  Person2 <- id2movies(name2id(People[2], Which[2])) %>%
    mutate(person2 = 1)
  pair <- full_join(Person1, Person2) %>%
    mutate(
      person1 = replace(person1, is.na(person1), 0),
      person2 = replace(person2, is.na(person2), 0),
      Who=factor(recode(person1+person2+2, 
        `1`=People[1], `2`="Both", `3`=People[2]),
        levels=c(People[1], "Both", People[2]))
    )
  if(sum(pair$Who=="Both") < 1) stop("No movies in common.")
  p <- ggplot(pair, 
    aes(Who, released, text=paste0(substring(released, 1, 4), ": ", title))) +
    geom_point() +
    geom_text(data=pair[pair$Who=="Both", ], aes(Who, released, 
      label=paste("\n", title))) +
    labs(x="", y="") +
    theme_bw()
  ggplotly(p, tooltip="text")
}

moviepair(c("Tina Fey", "Amy Poehler"))

I’d like to make this blog post interactive, so anyone could put in a pair of names and see the resulting plot of movies. A Shiny app would do it. Perhaps another day.