Tidy Sports Analytics, Part 4: tidyverse

This is the final post in the tidy sports analytics series, in which I’ve been using play-by-play from the 2016 NFL season to demonstrate the power of the tidyverse. Previously, I’ve discussed:

  • Part 1: Data manipulation using dplyr;
  • Part 2: Data reshaping and tidying using tidyr;
  • Part 3: Data visualization using ggplot2.

This post doesn’t feature any new data analysis. Instead, I want to use this last post to talk about the tidyverse more generally and cover some of other advantages of using these packages for data analysis.

tidyverse

Although I chose three of the main tidyverse packages to highlight in these posts, there are many more packages that fall under this umbrella. In addition to dplyr, tidyr, and ggplot2, the core tidyverse also includes readr for reading in data, purrr for functional programming, and tibble for a new type of data frame. There are also packages outside of the core tidyverse for importing data, wrangling data, programming, and modeling. These packages are all used for more specific use cases, rather than the general use of the core packages. For example, lubridate is used for date-time variables, magrittr provides the forward pipe (%>%) along with other piping operations, and glue makes it easier to combine string and date variables.

Because all of these packages use a consistent API, they are all compatible with the pipe operator, making data analysis more streamlined and also more reproducible. By using the pipe operator, your code becomes more readable for others, which facilitates code review and reproducible research.

Community support

In addition to the programming benefits of the tidyverse, there is a supportive community that contributes to the development of the tidyverse packages and environment. The “tidyverse” tags on Twitter and Stack Overflow are great places to go for help. Here, you’ll be able to ask your questions and get feedback to help solve your problems or answer any questions you might have.

In addition, there are many developers that are creating tidyverse-adjacent packages. These packages aren’t technically part of the tidyverse, but they enhance and further functionality. For example, there is a large development environment around ggplot2. These extensions to ggplot2 provide additional compatible tools such as network graphs, joy plots, and animation. Another good example is the tidytext package, which is used for analyzing text passages.

However, these are not the only ways to contribute to the tidyverse. You don’t have to be a developer or even be able to answer questions on Twitter or Stack Overflow in order to contribute. You can also contribute by using the reprex package to report issues that you find or contribute documentation to existing packages. No matter how advanced your R skills are, there are ways for you to not only use the tidyverse, but also contribute to the community!

Conclusion

The tidyverse is a great resource for the greater R community. This group of packages provide tools for data science that have a consistent API and greatly improve the readability and reproducibility of your code. In this series of posts, I used NFL play-by-play data as a use case to show how the main components of the tidyverse work. We talked about using:

  • dplyr for data manipulation;
  • tidyr for reshaping and tidying data;
  • ggplot2 for data visualization.

With the huge amount of data that is now available for analyzing sports data, the tidyverse is able to efficiently wrangle and manipulate data with just a few lines of code, making it an invaluable resource for this and many other analysis projects. For more tidyverse resources, checkout:

comments powered by Disqus