Recreating the Datasaurus Dozen Using tweenr and ggplot2

If you haven’t seen it yet, there’s a great example of why it’s always important to visualize your data making its way around the Twitter-verse. A great demonstration of why we need to plot the data and never trust statistics tables! — Taha Yasseri (@TahaYasseri) May 1, 2017 Despite looking very different, all of these datasets have the same summary statistics to two decimal places. You can download the datasets, get details about the project, and read the whole paper by Justin Matejka and George Fitzmaurice here.

Previewing the 2017 Men's NCAA Basketball Tournament

March Madness officially tips off tomorrow with the First Four games in Dayton before the round of 64 begins on Thursday. In this post, we’ll look at each team’s chance of advancing and winning the national title. We’ll also look at who was help and hurt most by how the committee seeded the tournament. As always, the code and data for this post are available on my Github page.

Predicting the Winner of the 2017 Big 12/SEC Challenge

The Big 12/SEC challenge tips off tomorrow. This will be the 4th year of this competition, and the Big 12 has never lost. In this post, we’ll use a Monte Carlo simulation to estimate the Big 12’s chances of continuing this streak for another year. As always, the code and data for this post are available on my Github page. The Ratings The team ratings come from my sports analytics website, Hawklytics.

Making Win Probability Plots with ggplot2

Last week I premiered my in game win probabilities for KU basketball. These have been available for a while on Hawklytics, but were always made after the game rather than in real time. Now that they are going live, I thought it would helpful to document how these are made using R and the ggplot2 package. Calculating Win Probability The win probabilities are based on the Elo ratings that I calculate for the team ratings on Hawkytics.

Evaluating Election Forecasts

After more than a year and a half, the 2016 presidential election is finally over, with Donald Trump projected to win. This is in contrast to many of the election forecasts, which almost unanimously predicted a victory for Hillary Clinton. Now that the results are in, we can finally answer the question of who did the best (or least worst?) job of forecasting the election. There are several ways we could look at this question, but for the purpose of this analysis we’ll focus on two.