As cheating on high-stakes tests continues to threaten the validity of score interpretations, approaches for detecting cheating proliferate. Most research focuses on individual scores, but recent events show group-level cheating is also occurring. The present IRT simulation study extends the Bayesian Hierarchical Linear Model (BHLM) for detecting group-level aberrance to a hierarchical IRT model. Results show good parameter recovery across conditions, suggesting that the model could be successfully applied to real world data. An evaluation of decisions reached using various plausible decision thresholds shows that decision-makers should carefully consider their assessment and population before deciding how to flag groups as possible cheaters. Higher decision thresholds provide lower false positive rates and improved precision at the expense of power; this trade-off seems worth it given the high stakes associated with decisions about cheating.