Author
Published

Wednesday, September 1, 2021

I’m delighted to announce the release of taylor 0.2.0. taylor provides data on Taylor Swift’s discography, including lyrics from Genius and song characteristics from Spotify.

You can install it from CRAN with:

install.packages("taylor")

This blog post will highlight the main features of the package.

library(taylor)

Data sets

The main focus of taylor is to provide data for Taylor Swift’s discography. There most data sets. The first is taylor_all_songs. This data set contains meta data about each song (e.g., album, track number, release date); audio characteristics from the Spotify API such as the key, danceability, and valence; and a list column, lyrics, that contains the lyrics for each song. See ?taylor_all_songs for a complete description of all the variables that are included.

taylor_all_songs
#> # A tibble: 213 × 29
#>    album_name  ep    album_release track_number track_name     artist  featuring
#>    <chr>       <lgl> <date>               <int> <chr>          <chr>   <chr>    
#>  1 Taylor Swi… FALSE 2006-10-24               1 Tim McGraw     Taylor… <NA>     
#>  2 Taylor Swi… FALSE 2006-10-24               2 Picture To Bu… Taylor… <NA>     
#>  3 Taylor Swi… FALSE 2006-10-24               3 Teardrops On … Taylor… <NA>     
#>  4 Taylor Swi… FALSE 2006-10-24               4 A Place In Th… Taylor… <NA>     
#>  5 Taylor Swi… FALSE 2006-10-24               5 Cold As You    Taylor… <NA>     
#>  6 Taylor Swi… FALSE 2006-10-24               6 The Outside    Taylor… <NA>     
#>  7 Taylor Swi… FALSE 2006-10-24               7 Tied Together… Taylor… <NA>     
#>  8 Taylor Swi… FALSE 2006-10-24               8 Stay Beautiful Taylor… <NA>     
#>  9 Taylor Swi… FALSE 2006-10-24               9 Should've Sai… Taylor… <NA>     
#> 10 Taylor Swi… FALSE 2006-10-24              10 Mary's Song (… Taylor… <NA>     
#> # … with 203 more rows, and 22 more variables: bonus_track <lgl>,
#> #   promotional_release <date>, single_release <date>, track_release <date>,
#> #   danceability <dbl>, energy <dbl>, key <int>, loudness <dbl>, mode <int>,
#> #   speechiness <dbl>, acousticness <dbl>, instrumentalness <dbl>,
#> #   liveness <dbl>, valence <dbl>, tempo <dbl>, time_signature <int>,
#> #   duration_ms <int>, explicit <lgl>, key_name <chr>, mode_name <chr>,
#> #   key_mode <chr>, lyrics <list>

We can see the lyrics using tidyr::unnest(). For example, if we want all of the lyrics from Lover, we can see those with

library(tidyverse)

taylor_all_songs %>%
  filter(album_name == "Lover") %>%
  select(album_name, track_name, lyrics) %>%
  unnest(lyrics)
#> # A tibble: 931 × 6
#>    album_name track_name        line lyric               element  element_artist
#>    <chr>      <chr>            <int> <chr>               <chr>    <chr>         
#>  1 Lover      I Forgot That Y…     1 How many days did … Verse 1  Taylor Swift  
#>  2 Lover      I Forgot That Y…     2 'Bout how you did … Verse 1  Taylor Swift  
#>  3 Lover      I Forgot That Y…     3 Lived in the shade… Verse 1  Taylor Swift  
#>  4 Lover      I Forgot That Y…     4 'Til all of my sun… Verse 1  Taylor Swift  
#>  5 Lover      I Forgot That Y…     5 And I couldn't get… Verse 1  Taylor Swift  
#>  6 Lover      I Forgot That Y…     6 In my feelings mor… Verse 1  Taylor Swift  
#>  7 Lover      I Forgot That Y…     7 Your name on my li… Verse 1  Taylor Swift  
#>  8 Lover      I Forgot That Y…     8 Free rent, living … Verse 1  Taylor Swift  
#>  9 Lover      I Forgot That Y…     9 But then something… Pre-Cho… Taylor Swift  
#> 10 Lover      I Forgot That Y…    10 I forgot that you … Chorus   Taylor Swift  
#> # … with 921 more rows

The lyrics is are in a nice tidy format with one row in the tibble per line in the song. The goal is to make the lyrics as ready for text analysis as possible. If you’re into that sort of thing, I highly recommended checking out Emil Hvitfeldt and Julia Silge’s Supervised Machine Learning for Text Analysis in R.

In addition to taylor_all_songs, there are two additional data sets included. The first is taylor_album_songs. This is just a filtered version of taylor_all_songs that includes only songs from Taylor’s studio albums. This means that single-only releases (e.g. Only the Young, Christmas Tree Farm) are not included, nor are songs that Taylor is only featured on (e.g., Renegade by Big Red Machine, Gasoline (Remix) by HAIM). Additionally, this data only includes versions of the songs Taylor owns where possible. For example, Fearless (Taylor’s Version) is included in taylor_album_songs, and the original Fearless. Additionally, although Red is currently included, it will be removed in favor of Red (Taylor’s Version) when that album is released in November.

taylor_all_songs
#> # A tibble: 213 × 29
#>    album_name  ep    album_release track_number track_name     artist  featuring
#>    <chr>       <lgl> <date>               <int> <chr>          <chr>   <chr>    
#>  1 Taylor Swi… FALSE 2006-10-24               1 Tim McGraw     Taylor… <NA>     
#>  2 Taylor Swi… FALSE 2006-10-24               2 Picture To Bu… Taylor… <NA>     
#>  3 Taylor Swi… FALSE 2006-10-24               3 Teardrops On … Taylor… <NA>     
#>  4 Taylor Swi… FALSE 2006-10-24               4 A Place In Th… Taylor… <NA>     
#>  5 Taylor Swi… FALSE 2006-10-24               5 Cold As You    Taylor… <NA>     
#>  6 Taylor Swi… FALSE 2006-10-24               6 The Outside    Taylor… <NA>     
#>  7 Taylor Swi… FALSE 2006-10-24               7 Tied Together… Taylor… <NA>     
#>  8 Taylor Swi… FALSE 2006-10-24               8 Stay Beautiful Taylor… <NA>     
#>  9 Taylor Swi… FALSE 2006-10-24               9 Should've Sai… Taylor… <NA>     
#> 10 Taylor Swi… FALSE 2006-10-24              10 Mary's Song (… Taylor… <NA>     
#> # … with 203 more rows, and 22 more variables: bonus_track <lgl>,
#> #   promotional_release <date>, single_release <date>, track_release <date>,
#> #   danceability <dbl>, energy <dbl>, key <int>, loudness <dbl>, mode <int>,
#> #   speechiness <dbl>, acousticness <dbl>, instrumentalness <dbl>,
#> #   liveness <dbl>, valence <dbl>, tempo <dbl>, time_signature <int>,
#> #   duration_ms <int>, explicit <lgl>, key_name <chr>, mode_name <chr>,
#> #   key_mode <chr>, lyrics <list>

Finally, there is a small data, taylor_albums, that contains meta data for Taylor’s albums, including the release date and the Metacritic score. A full description of all the variables can be seen using ?taylor_albums.

taylor_albums
#> # A tibble: 12 × 4
#>    album_name                          ep    album_release metacritic_score
#>    <chr>                               <lgl> <date>                   <int>
#>  1 Taylor Swift                        FALSE 2006-10-24                  NA
#>  2 The Taylor Swift Holiday Collection TRUE  2007-10-14                  NA
#>  3 Beautiful Eyes                      TRUE  2008-07-15                  NA
#>  4 Fearless                            FALSE 2008-11-11                  73
#>  5 Speak Now                           FALSE 2010-10-25                  77
#>  6 Red                                 FALSE 2012-10-22                  77
#>  7 1989                                FALSE 2014-10-27                  76
#>  8 reputation                          FALSE 2017-11-10                  71
#>  9 Lover                               FALSE 2019-08-23                  79
#> 10 folklore                            FALSE 2020-07-24                  88
#> 11 evermore                            FALSE 2020-12-11                  85
#> 12 Fearless (Taylor's Version)         FALSE 2021-04-09                  82

Other Features

Although the main focus of taylor is to provide access to audio characteristics and lyrics, I also built in a few additional features, mostly as an opportunity to learn some new tools! Each of these features will get detail in future posts, but I’ll provide a high level overview here.

Color Palettes

First, inspired by Josiah Parry’s work on the cpcinema package, taylor includes a special vector class that allow users to build and visualize their own color palettes. The color palettes are built using the vctrs package. In a future post, I’ll describe how the color_palette class works internally. Several palettes are built into taylor based on Taylor’s album covers. For example, here a palette based on Taylor’s debut album, Taylor Swift:

album_palettes$taylor_swift
#> <color_palette[5]>
#>     #1D4737 
#>     #1BAEC6 
#>     #523d28 
#>     #AD8562 
#>     #E7DBCC

The full printing is not rendered here, but in your console, you will see a color swatch next to each hex code showing the color. All of the album-based palettes can be see, with full rendering, can be seen on the taylor website. In addition, there is an album_compare palette which includes one color from each individual album palette.

ggplot2 Scales

The other feature is a set of ggplot2 color scales that can be used to easily apply the color palettes to plots. For an example, let’s look at the palmerpenguins data. We can use scale_color_taylor_d() to apply an album color palette to a discrete variable. As might be expected there are also scale_color_taylor_c() and scale_color_taylor_b() variants for continuous and binned scales as well.

library(palmerpenguins)

ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm)) +
  geom_point(aes(color = species, shape = species)) +
  # scale_color_manual(values = album_palettes$evermore[1:3])
  scale_color_taylor_d(album = "Lover")

There is also the scale_fill_albums() function, which will automatically map colors from each album palette to the appropriate album name.

taylor_albums %>%
  filter(!ep) %>%
  mutate(album_name = factor(album_name, levels = album_levels)) %>%
  ggplot(aes(x = metacritic_score, y = album_name)) +
  geom_col(aes(fill = album_name), show.legend = FALSE) +
  scale_fill_albums()

For more examples, check out the plotting article on the taylor website. In a future post I’ll describe how I built ggplot2 scales on top of the color palettes.

Conclusion

This is the initial release of taylor, so I expect that you will find bugs, or maybe even a song or two that I missed. If you do, please file an issue on the GitHub repository. I’m planning another release of taylor in November after Red (Taylor’s Version) drops, so I will aim to have any fixed in place by then. In the mean time, stay tuned for my upcoming posts on using vctrs classes and creating ggplot2 scales within taylor!

Acknowledgments

Featured photo by Getúlio Moraes on Unsplash.