Using gganimate with ggmap

Combining gganimate with ggmap can be used to create animations of geographic data. I give a few examples using data from Nashville Open Data.

Maximilian Rohde https://maximilianrohde.com
01-06-2021
library(tidyverse)
library(ggmap)
library(gganimate)

# ggplot2 themes
library(cowplot)

# Formatting of HTML tables
library(kableExtra)

Overview

Animating your ggplot2 visualizations is easy using the gganimate package. But did you also know that gganimate can be used with the ggmap package to animate geographic data? Using data from Nashville Open Data, we’ll create an animation to visualize the spatial evolution of parks in Nashville over time.

Data cleaning and exploration

First, let’s load in the data. We can load the data directly from the URL using read_csv().

# Read in data
df <- read_csv("https://data.nashville.gov/resource/74d7-b74t.csv")

Let’s take a look at the first couple rows.

head(df, 2) %>% 
  # table styling for HTML output
  kable(format = "html") %>%
  kable_styling("striped") %>%
  scroll_box(width = "100%")
park_name acres year_established community_center nature_center playground ada_accessible restrooms_available picnic_shelters picnic_shelters_quantity dog_park baseball_fields basketball_courts volleyball soccer_fields football_multi_purpose_fields tennis_courts disc_golf skate_park swimming_pool spray_park golf_course walk_jog_paths hiking_trails horse_trails mountain_bike_trails boat_launch camping_available_by_permit fishing_by_permit lake canoe_launch community_garden historic_features notes mapped_location
Paradise Ridge Park 98.41 2013 Yes No Yes Yes Yes Yes 1 No No Yes No No No No No No No No No Yes No No No No No No No No No No community meeting space, gym, playground, walking trail, 1 reservable picnic shelter, outdoor basketball court 3000 Morgan Rd Joelton, TN 37080 (36.335583, -86.85995)
Crooked Branch Park 84.00 2012 NA NA No No No No 0 No No No No No No No No No No No No No No No No No No No No No No No Natural area with walking trails 116D Ray Ave Lakewood, TN 37138 (36.241016, -86.640834)

A few basic questions I have about the dataset are the following:

Are parks duplicated?

df %>%
  filter(duplicated(park_name)) %>%
  kable(format = "html") %>%
  kable_styling("striped") %>%
  scroll_box(width = "100%")
park_name acres year_established community_center nature_center playground ada_accessible restrooms_available picnic_shelters picnic_shelters_quantity dog_park baseball_fields basketball_courts volleyball soccer_fields football_multi_purpose_fields tennis_courts disc_golf skate_park swimming_pool spray_park golf_course walk_jog_paths hiking_trails horse_trails mountain_bike_trails boat_launch camping_available_by_permit fishing_by_permit lake canoe_launch community_garden historic_features notes mapped_location
Riverfront Park 5.29 1983 No No No No Yes No 0 No No No No No No No No No No No No No No No No Yes No No No No Yes No Amphitheater, Pleasure/Commercial Boat Docking 100 1st Ave South Nashville, TN (36.162279, -86.774364)
Riverfront Park 12.54 2015 No No No Yes No No 0 Yes No Yes No No Yes No No No No No No Yes No No No No No No No No No No Outdoor fitness equipment, dog park, Ascend Amphitheater 310 1st Ave S Nashville, TN (36.159082, -86.772376)
Riverfront Park 3.49 1977 No No No No No No 0 No No No No No No No No No No No No No No No No No No No No No No No James Robertson Statue, ,small stair amphitheater 170 1st Ave N Nashville, TN (36.164111, -86.77555)

It looks like three parks have the name Riverfront Park, but judging by year_established and mapped_location, they are different parks.

How many parks are there?

nrow(df)
[1] 123

123 parks in Nashville – not bad!

Which years are represented in the data?

range(df$year_established)
[1]    0 2015

Strange, it looks like the oldest park was established in year “0”. This must be a mistake.

# Newest 5 parks
sort(df$year_established) %>% tail()
[1] 2013 2013 2014 2014 2015 2015
# Oldest 5 parks
sort(df$year_established) %>% head()
[1]    0 1901 1903 1907 1909 1910

The true range of the data is 1901 - 2015. It looks like the park recorded as being established in year zero was a mistake. Let’s remove it from the dataset.

df <-
  df %>%
  filter(year_established != 0)

Now let’s take a look at the distribution of the years when parks were established. First, we’ll make a histogram.

df %>%
  ggplot() +
  aes(year_established) +
  geom_histogram(bins=30, color="black", fill="grey") +
  labs(
    title = "Number of parks established in Nashville per year",
    subtitle = "",
    x= "Year Established",
    y= "Frequency") +
  cowplot::theme_cowplot(font_family = "Source Sans Pro",
                         font_size = 12)

The rate of new park development looks to be increasing over time. An ECDF plot supports this observation.

df %>%
  ggplot()+
  aes(year_established) +
  stat_ecdf() +
  labs(
    title = "Cumulative distribution of Nashville parks over time",
    subtitle = "",
    x= "Year Established",
    y= "ECDF") +
  cowplot::theme_cowplot(font_family = "Source Sans Pro",
                         font_size = 12)

Creating the Animation

Now that we’ve cleaned and explored the data, let’s create an animation with gganimate and ggmap. I would like to visualize the spatial evolution of Nashville’s parks by year.

First, let’s try to just plot the locations of all the parks. We see that location is stored in the mapped_location column.

head(df$mapped_location)
[1] "3000 Morgan Rd\nJoelton, TN 37080\n(36.335583, -86.85995)"                 
[2] "116D Ray Ave\nLakewood, TN 37138\n(36.241016, -86.640834)"                 
[3] "21 Joelton Community Center Rd\nJoelton, TN 37080\n(36.316932, -86.870111)"
[4] "1266 Stones River Road\nHermitage, TN 37076\n(36.189647, -86.652059)"      
[5] "Grand Ave at 14th Ave\nNashville, TN 37212\n(36.1461984, -86.789001)"      
[6] "Snell Blvd at Panorama Dr\nNashville, TN \n(36.1797136, -86.8395478)"      

Luckily, the (latitude, longitude) coordinates are provided, but we need to extract them from the text. We can use a regular expression to do this using the stringr package.

# Get a matrix of matched latitudes and longitudes
match <- str_match(df$mapped_location, "\\((.*), (.*)\\)")

# Add the latitudes and longitudes to the data frame
# data is recorded as character, so need to convert to numeric
df$lat <- match[,2] %>% as.numeric()
df$long <- match[,3] %>% as.numeric()

Let’s verify we extracted the coordinates correctly.

# View first 5 coordinates
df %>%
  select(lat, long) %>%
  head() %>%
  kable(format = "html") %>%
  kable_styling("striped") %>%
  scroll_box(width = "100%")
lat long
36.33558 -86.85995
36.24102 -86.64083
36.31693 -86.87011
36.18965 -86.65206
36.14620 -86.78900
36.17971 -86.83955

Now we can make the plot. We start with the qmplot() function from ggmap, which is a shortcut for plotting on maps, just like qplot() in ggplot2. We pass in the latitude and longitude coordinates and the data frame. The argument maptype = "toner-lite" indicates the type of basemap to use as the background. We also specify alpha=0.5 so we can see when the points overlap. I would like larger parks to be represented by larger circles, so we can map size to acreage by aes(size=acres). Then we add the extra theming using the cowplot package.

qmplot(long, lat, data = df,
       maptype = "toner-lite", alpha=0.5) + 
  aes(size=acres) +
  labs(
    title = "Nashville Parks",
    x= "Longitude",
    y= "Latitude",
    caption = "Area corresponds to acreage \n Data available from Nashville Open Data") +
  cowplot::theme_cowplot(font_family = "Source Sans Pro",
                         font_size = 12) +
  theme(legend.position = "none")

Looks pretty good already. Now let’s make this animated!

We will add the transition_states() function from gganimate and specify that each state of the animation is determined by year_established. We also set subtitle = "Year: {closest_state}" to display the year of the current frame.

qmplot(long, lat, data = df,
       maptype = "toner-lite", alpha=0.5) + 
  aes(size=acres, group=year_established) +
  labs(
    title = "Nashville Parks",
    subtitle = "Year: {closest_state}",
    x= "Longitude",
    y= "Latitude",
    caption = "Area corresponds to acreage \n Data available from Nashville Open Data") +
  cowplot::theme_cowplot(font_family = "Source Sans Pro",
                         font_size = 12) +
  theme(legend.position = "none") +
  transition_states(year_established)

It’s animated now, but there are two problems.

First, the points are disappearing after each year. We can add shadow_mark(color="black") to have the points stay on the plot. We specify that the old points are colored black so that we can color the current points red, to highlight which points were just displayed.

Second, the passage of time is not constant. We want to have each frame change in increments of one year. In our current animation, the years are skipping between the years present in the data. To fix this, we convert year_established to a factor, and fill in the missing years.

df$year_established <-
  df$year_established %>%
  # convert to factor
  as.factor() %>%
  # add extra years
  fct_expand(1900:2019 %>% as.character) %>%
  # sort years
  fct_relevel(1900:2019 %>% as.character)

Now that we’ve made those changes, let’s try again.

qmplot(long, lat, data = df,
       maptype = "toner-lite", alpha=0.5, color="red") + 
  aes(size=acres, group=year_established) +
  labs(
    title = "Nashville Parks",
    subtitle = "Year: {closest_state}",
    x= "Longitude",
    y= "Latitude",
    caption = "Area corresponds to acreage \n Data available from Nashville Open Data") +
  cowplot::theme_cowplot(font_family = "Source Sans Pro",
                         font_size = 12) +
  theme(legend.position = "none") +
  transition_states(year_established) +
  shadow_mark(color="black")

Looks good! But wait a second… the animation only goes to 1950. Wasn’t it supposed to go to 2015? This is a little quirk of gganimate. By default, the animation is capped at 100 frames. For the transition_states() animation, by default a single frame is allocated for each state, and another frame is allocated for transitions between states. So 100 frames can represent 50 years of data. The animation is cut short because we have more than 50 years of data.

Let’s fix this by saving the animation to a variable, and then using the animate() function to increase the number of frames.

parks_anim <- 
  qmplot(long, lat, data = df,
       maptype = "toner-lite", alpha=0.5, color="red") + 
  aes(size=acres, group=year_established) +
  labs(
    title = "Nashville Parks",
    subtitle = "Year: {closest_state}",
    x= "Longitude",
    y= "Latitude",
    caption = "Area corresponds to acreage \n Data available from Nashville Open Data") +
  cowplot::theme_cowplot(font_family = "Source Sans Pro",
                         font_size = 12) +
  theme(legend.position = "none") +
  transition_states(year_established) +
  shadow_mark(color="black")

animate(
  parks_anim,
  nframes=300, # number of frames to compute
  )

My preferred method of rendering the animation is to use ffmpeg, instead of the default GIF renderer, because it creates videos (.mp4) rather than GIFs. You will need to install ffpmeg on your computer separately. Using ffmpeg also allows for finer control over the frame rate of the animation and creates smaller files. I’ll show how to use it below.

The animate() function has parameters for duration (total duration in seconds), fps (frames per second), and nframes (total number of frames). You can specify any two. For our case, we give the duration and number of frames, and gganimate figures out the proper frame rate to fit the specified number of frames into the specified number of seconds.

We also set res=300 to increase the resolution. This has the side effect of making the font appear larger, so we decrease the font size in the call to theme_cowplot().

Be warned that this may take a bit of time to animate. Usually, I put my code in a .R file and run it from the terminal, rather than in RStudio, which seems to crash less and render more quickly. Otherwise, when I render the animations in an R Markdown file, I set cache=TRUE in the cell so that I don’t have to render each time I knit the document.

parks_anim <- 
  qmplot(long, lat, data = df,
       maptype = "toner-lite", alpha=0.5, color="red") + 
  aes(size=acres, group=year_established) +
  labs(
    title = "Nashville Parks",
    subtitle = "Year: {closest_state}",
    x= "Longitude",
    y= "Latitude",
    caption = "Area corresponds to acreage \n Data available from Nashville Open Data") +
  cowplot::theme_cowplot(font_family = "Source Sans Pro",
                         font_size = 10) +
  theme(legend.position = "none") +
  transition_states(year_established) +
  shadow_mark(color="black")

animate(
  parks_anim,
  duration=15, # duration of the animation in seconds
  nframes=768, # number of frames to compute
  height = 6,
  width = 6,
  units = "in",
  res = 300, # resolution of the output
  renderer = ffmpeg_renderer() # render to video with ffmpeg
  )