'C' of Stats: Appendix to Methods

Accurate data is the foundation of any research project. If the raw data is bad, resulting analyses suffer. This post isn't suggesting that data from the mainstream stats providers is useless, but it is not error-free. I've personally noticed errors in NHL play-by-play files, and rink bias is a well documented phenomenon. I'm unaware of the precise methodology used by these stats providers, but their data sets suggest they do not follow the scientific method.

This blog and the Passing Project as a whole are putting sound methodologies in place to ensure the accuracy of our data. Most importantly, we test for inter-rater reliability. Inter-rater reliability is the process of making sure two or more "raters" collect the same data for the same games. While this is difficult to do with few data trackers, we test all games for which there are multiple trackers. The reliability of the data increases as the number of trackers increase (just one of the many reasons you should join the Passing Project! Hit us up on Twitter @cofstats / @RK_Stimp or send an email to hockeypassingstats@gmail.com.)

Inter-rater reliability testing is critical for several reasons:

It corrects for errors. We're human. We miss things, we make typos, you name it.
It corrects for bias. I'm a Flames fan. I think I'm unbiased because I started this blog to gain a deeper understanding of the Calgary Flames, good or bad. But at the end of the day it doesn't matter what I think.
Inter-rater reliability testing highlights data points in which there is disagreement among trackers. These specific plays can be reviewed to ensure trackers know how to appropriately code that type of play. Tracker disagreement can also suggest the need for improved data definitions.

It cannot be said enough: the accuracy of the data is critical. Without it, all else fails. We're paying our due diligence here at the Passing Project.

***

If you'd like to join the Passing Project and collect data for an NHL team, you can reach out to Ryan Stimson on Twitter @RK_Stimp / or by email hockeypassingstats@gmail.com
***

References

http://en.wikipedia.org/wiki/Inter-rater_reliability

'C' of Stats

Wednesday, January 7, 2015

Appendix to Methods - Data Accuracy

No comments:

Post a Comment