Author

Jake Thompson

Published

June 14, 2021

2019 CRMDA Summer Statistical Institute 2021 AAI Summer Research Methods Camp

Tidy Data Science with the tidyverse and tidymodels

This week-long workshop provides an introduction to data science using R. Participants learn an entire pipeline of data analysis, from importing data to creating reproducible reports. Along the way we introduce several core tidyverse packages, which work together to make the process of data analysis seamless. These include dplyr for data transformation, tidyr for data tidying, and ggplot2 for creating data visualizations. We then introduce core tidymodels packages for creating modeling pipelines including, parsnip for creating model specifications, recipes for feature engineering, and yardstick for evaluating model performance. Finally, we discuss how to use rmarkdown for creating reproducible reports.

This workshop has previously been presented at:

  • 2019 Summer Statistical Institute, hosted by the Center for Research Methods and Data Analysis at the University of Kansas
  • 2021 Summer Research Methods Camp, hosted by the Achievement and Assessment Institute at the University of Kansas

See the workshop websites for more, including links to the GitHub repositories for all workshop materials.