This is the final post in the tidy sports analytics series, in which I’ve been using play-by-play from the 2016 NFL season to demonstrate the power of the tidyverse. Previously, I’ve discussed:
Part 1: Data manipulation using dplyr; Part 2: Data reshaping and tidying using tidyr; Part 3: Data visualization using ggplot2. This post doesn’t feature any new data analysis. Instead, I want to use this last post to talk about the tidyverse more generally and cover some of other advantages of using these packages for data analysis.
This is the third post in the tidy sports analytics series. In this series, I’ve been demonstrating how the collection of tidyverse packages can be used to explore and analyze sports data. Specifically, I’ve been using the 2016 NFL play-by-play data from Armchair Analysis. Part one in the series showed how dplyr can be used for data manipulation, and part two demonstrated reshaping and tidying data using tidyr. This post focuses on data visualization using ggplot2.
This is the second in a series of posts that demonstrates how the tidyverse can be used to easily explore and analyze NFL play-by-play data. In part one, I used the dplyr package to calculate the offensive success rate of each NFL offense in during the 2016 season. However, when we left off, I noted that really we should look at the success rate of both offenses and defenses in order to get a better idea of which teams were the best overall.
Welcome to the first in a series of blog posts where I’ll be using sports data to demonstrate the power of the tidyverse tools in the context of sports analytics. The tidyverse is a suite of packages developed mainly by Hadley Wickham, with contributions from over 100 other people in the R community. The goal of the tidyverse is to provide easy to use R packages for a data science workflow that all follow a consistent philosophy and API.
Models developed for the prediction of individual games, European domestic leagues, and the UEFA Champions League.