A while back, I was inspired by this Twitter exchange to create a bot that would tweet out tidyverse related material. Armchair Business Man Suggestion: Ever thought about having a Twitter account that automatocally tweets each new topic accompanied with link to said post ala Reddit? @mxlearn and other forums? Would be super useful for the #rstats commune. — Sudo | Science📊 (@AgentZeroNine) November 14, 2017 Last week I finally had enough time to sit and put some work in to get this idea up and running!
This time last year, I submitted a graphic to the Educational Measurement: Issues and Practice (EM:IP) cover showcase competition. In April at the annual National Council on Measurement in Education conference, it was announced that I was one of four winners that would be featured on the cover of EM:IP this year. Earlier this week, the issue with my graphic was released! The graphic demonstrates how different levels of compensation in multidimensional item response theory models (MIRT).
This is the final post in the tidy sports analytics series, in which I’ve been using play-by-play from the 2016 NFL season to demonstrate the power of the tidyverse. Previously, I’ve discussed: Part 1: Data manipulation using dplyr; Part 2: Data reshaping and tidying using tidyr; Part 3: Data visualization using ggplot2. This post doesn’t feature any new data analysis. Instead, I want to use this last post to talk about the tidyverse more generally and cover some of other advantages of using these packages for data analysis.
This is the third post in the tidy sports analytics series. In this series, I’ve been demonstrating how the collection of tidyverse packages can be used to explore and analyze sports data. Specifically, I’ve been using the 2016 NFL play-by-play data from Armchair Analysis. Part one in the series showed how dplyr can be used for data manipulation, and part two demonstrated reshaping and tidying data using tidyr. This post focuses on data visualization using ggplot2.
This is the second in a series of posts that demonstrates how the tidyverse can be used to easily explore and analyze NFL play-by-play data. In part one, I used the dplyr package to calculate the offensive success rate of each NFL offense in during the 2016 season. However, when we left off, I noted that really we should look at the success rate of both offenses and defenses in order to get a better idea of which teams were the best overall.
Welcome to the first in a series of blog posts where I’ll be using sports data to demonstrate the power of the tidyverse tools in the context of sports analytics. The tidyverse is a suite of packages developed mainly by Hadley Wickham, with contributions from over 100 other people in the R community. The goal of the tidyverse is to provide easy to use R packages for a data science workflow that all follow a consistent philosophy and API.
I recently converted my website from Jekyll to Hugo with blogdown. If you haven’t tried out blogdown yet, Yihui Xie just hosted a webinar that does a great job of introducing the package. This post won’t focus on how to use blogdown to create a website, but rather how to host that website on GitHub pages and use Travis-CI to automatically update the website. For this post, I’m assuming that you’re making a user or organization site.
If you haven’t seen it yet, there’s a great example of why it’s always important to visualize your data making its way around the Twitter-verse. A great demonstration of why we need to plot the data and never trust statistics tables! https://t.co/JyUb57v0or pic.twitter.com/hsivGZdpZ1 — Taha Yasseri (@TahaYasseri) May 1, 2017 Despite looking very different, all of these datasets have the same summary statistics to two decimal places. You can download the datasets, get details about the project, and read the whole paper by Justin Matejka and George Fitzmaurice here.
March Madness officially tips off tomorrow with the First Four games in Dayton before the round of 64 begins on Thursday. In this post, we’ll look at each team’s chance of advancing and winning the national title. We’ll also look at who was help and hurt most by how the committee seeded the tournament. As always, the code and data for this post are available on my Github page. The Ratings The team ratings come from my sports analytics website, Hawklytics.
The Big 12/SEC challenge tips off tomorrow. This will be the 4th year of this competition, and the Big 12 has never lost. In this post, we’ll use a Monte Carlo simulation to estimate the Big 12’s chances of continuing this streak for another year. As always, the code and data for this post are available on my Github page. The Ratings The team ratings come from my sports analytics website, Hawklytics.