Thursday, April 30, 2015

Meta Analytics

In this post we take a step back from analytics' measures and metrics, and focus instead on the analytics movement as a whole. From this vantage point we can better respond to the self-perceived shortcomings of the community, such as the notion that analytics aren't accepted by NHL management. More importantly, we see how we can effectively communicate and acquire knowledge as a community. This search for a deeper understanding of the game is why we're all here.


Knowledge


If knowledge is what is known, then the acquisition of knowledge is the process of converting the unknown into the known. Fifteen years ago, few if any knew that shot attempt comparisons would be more predictive of future wins than winning itself. Vic Ferrari and other bloggers discovered this unknown, and knowledge was gained.

One of the greatest fallacies is to treat the unknown as though it doesn't exist at all. Within the analytics community this has resulted in a belittling of 'intangibles' such as work ethic, leadership, and team identity. Just because these are not formally measured (publicly), we cannot discount the impact they potentially have on the success of a hockey club (we would also be discounting the body of research on organizational culture and sport psychology). A myopic prescription to the known has led many to believe the game of hockey comes down to possession and luck. Such a view discounts what we haven't yet measured, and in a sense don't yet know.

The fallacy of treating the unknown as non-existent provides insight into the perceived lack of acceptance of analytics at the NHL level. In all likelihood we underestimate the degree to which NHL teams use analytics because we don't know the degree to which NHL teams use analytics. Brian Burke is still pegged as 'anti-analytics' despite his multiple assertions to the contrary. At the MIT Sloan Conference this past February, Burke states "analytics is a tool in the toolbox, and it’s an important tool" and in a video titled Brian Burke Still Not A Big Analytics Fan For Evaluating Players, he admits "we use [analytics] a great deal". 

In this video, which predates the Summer of Analytics by nearly two years, Calgary Flames Director of Video & Statistical Analysis Chris Snow discusses the PUCKS software, used by 17 NHL teams. This excerpt in particular is noteworthy:
"Our video coach is marking all the events that the league does not. So things such as exits from your own zone, breaking out, entries, dump-ins, anything that will be important for the coaches as a teaching tool [...]. Let's say that we really value how and where we dump the puck. He could have a category that says 'dump-in' and there could be a sub-category such as a 'soft chip' to myself or a 'hard rim'. So we could very quickly drill down and look at how a particular player dumped the puck in as we attempted to get our forecheck going."

PUCKS software is more geared towards video analysis than statistical analysis, but the point is that teams care a great deal about this information and have been tracking it for a long time. It's also in a team's best interest to keep quiet about their analytical activities. As Oilers analytics consultant Tyler Dellow put it, "if you do something good now that crosses my plate I'm not telling anyone about it."

Analytics are intel—business critical information. Considering the competitive advantage they confer, teams have incentives to mislead outsiders about their use of analytics. There are counter-incentives to this, namely that knowledge is best nurtured in a community (think open source), and indeed this post highlights how the analytics community is in itself a competitive advantage. Still, professional sports teams have the monetary incentives and the resources to use and advance analytics. Almost a year removed from the Summer of Analytics, it's likely that the community is now behind most NHL teams with respect to data-driven hockey knowledge. Assuming otherwise would be foolish and serves no practical purpose other than to massage the ego. Such is the nature of the knowledge-fallacy that treats the unknown as non-existent.

Known metrics like Corsi and PDO have their merits, but a focus on what we don't yet know and a determination to make it known are what lead to breakthroughs and new discoveries. As knowledge seekers, it is our responsibility to venture into the unknown and shine a light on what we discover. The challenge is twofold: how do we discover what is unknown, and how do we make it known? To answer these questions, we turn to the Scientific Method.

Discovering the Unknown

I know that I know nothing. —Socrates
Scientific papers begin with the literature review for good reason. Discovering the unknown first requires that we know what's already known. Regularly reviewing the literature helps to refine our current knowledge base, and often uncovers new research and insights previously unknown to us.

The key when reviewing the literature is to seek out the limitations in the research. Limitations highlight the unknowns, providing a clear direction for further research. This is so important that many scientific articles end with a section entitled "Future Directions" which explicitly states what future research can be done to account for the limitations. In Stefan Wolejszo's Five Core Components of a Critical Approach to Hockey Analytics, he writes "put limits of methods front and center." As Tyler Dellow put it, “your work is better if you try and figure out where the mistakes are in it.”

Finding limitations in our own work and the work of others is challenging. One useful approach is to deconstruct the definitions and methodologies used. To use a personal example, this blog joined the Passing Project in response to the shot quality debate. In Tom Awad's paper Does Shot Quality Exist?, shot quality is defined using five factors. The critical question to ask is: do these five factors capture shot quality, or is there more to it? If there is more to shot quality, then the conclusion cannot fully speak to shot quality's existence or lack thereof, and further research is necessary.

Note that this does not invalidate Awad’s research. Not at all. In fact his research improved upon previous methodologies investigating shot quality, and his findings revealed insights into each of the five factors he investigated. Awad’s research provided the foundation for projects like the Passing Project, the Shot Quality Project, and Steve Valiquette’s work on Red and Green shots.

The literature review, in addition to expanding our own knowledge, allows readers to seek out the prerequisite knowledge they need to understand the matter at hand. This is hugely beneficial because it removes barriers to entry, allowing the community to grow at a faster rate. As Wolejszo notes in his excellent piece on confirmation bias, "[...] the success of the scientific method has stemmed from the fact that scientists are very motivated to disprove the work of others rather than a critical attitude that individual scientists bring to their own work." In other words, we need one another to hold each other accountable and to push knowledge forward. The more contributors the community has, the better it is at doing this.

Note that this post is not suggesting that every analytics article be worthy of submission to a scientific journal. Nor is it suggesting that every analytics article have a formal lit review. However, citing the work our contribution builds on—recognizing the shoulders of giants we stand on—is the critical first step to advancing knowledge.

Observation

"They asked the panel ‘what should you do to get into the business?’ and not one person said watch games. They all said work on your algorithms, or whatever else you do. This is still an eyeballs business." —Brian Burke at the Sloan Conference, Feb 2015.
Discovering the unknown requires scientific observation on two fronts: a survey of the literature, as discussed, and observation of the subject itself. According to Wolejszo, "research consistently shows that the best method of mitigating the potential for confirmation bias is having experience within the substantive domain". With respect to hockey analytics, this means that watching and playing the game are vital to producing good research. Steve Valiquette and Chris Boyle, two researchers that have challenged the notion that shot quality is relatively insignificant, are both goaltenders (Valiquette being a former NHL-er). This is not a coincidence. Goalies are trained to stop shots and they know, from experience, that some shots are harder to save than others.

Zone entry research is a great example of a discovery realized because the authors were up to date on the literature and watching the games with a critical eye. The genesis of the research seems to stem from a March 2011 post in which Travis Hughes shows game footage of the Flyers turning the puck over in the neutral zone, and he suggests the Flyers keep it simple by getting more pucks in deep. The following month Eric Tulsky with the help of Geoff Detweiler published zone entry results for their first tracked game. A revelatory finding was the metric 'shot attempts per entry', which combined their novel data with previous knowledge on Corsi (shot attempts). Tulsky and Deitweiler discovered a new way to judge performance because they worked off the literature and drew inspiration from the game itself.


Making the Unknown Known


Collecting data is critical to research projects like Eric Tulsky’s and Steve Valiquette’s. Novel data records information previously unconsidered, and begins the process of making the unknown known. Data collection requires that we watch the games. In other words, we are scientifically mandated to partake in the eye-test.

Rather than doing away with the eye-test, the goal is to proceduralize the eye-test so it doesn't fall prey to biases. The collected data points must have operational definitions that limit observer subjectivity. For example if we're collecting 'shots on goal', what a shot on goal is and isn't must be explicitly defined. Inter-rater reliability testing is also critical. This ensures that different observers collect the same data for the same set of occurrences. Ryan Stimson's Passing Project and Tulsky's zone entry research both use operational definitions and test for inter-rater reliability. The NHL does neither.

The availability of the NHL’s data is a luxury in that the data collection process is taken care of, but it has also created a disconnect in the scientific process. The NHL's data collectors could care less about how the data is analyzed, and those who analyze the data take the collection process for granted. The result is a disturbing amount of error in the data and a fanbase polarized between those who watch the games and those who crunch the numbers. This is a false dichotomy. The reality is that the eye-test and analytics are both integral components of the scientific process.


Conclusion


As a community, we have the potential to push the boundaries of hockey knowledge further than they’ve ever been. This requires a critical reading of the literature, with a focus not on what we already know, but a focus on what we don’t know. Watch the games. If possible, play the game. At the end of the day we’re studying the game of hockey and insights into how we can further study hockey come from the game itself. As a community we have the ability to collect large sets of novel data, which are the foundation of new discoveries and knowledge. Join a public tracking project such as Emmanuel Perry’s Blueline Events or Ryan Stimson’s Passing Project. As more professional teams invest resources into proprietary analytics, we have no choice but to use our advantage in numbers.

The other option is irrelevancy.

* * *

References

McKenzie: The real story of how Corsi got its name. Bob McKenzie. http://www.tsn.ca/mckenzie-the-real-story-of-how-corsi-got-its-name-1.100011

SSAC15: The Future of the Game. https://www.youtube.com/watch?v=XfqMn3Dg-aE

Brian Burke Still Not A Big Analytics Fan For Evaluating Players. https://www.youtube.com/watch?v=miaOJ_ln6rU



SSAC15: Changing on the Fly: The State of Advanced Analytics in the NHL. https://www.youtube.com/watch?v=cjR4lX36i0E






Introducing the Shot Quality Project. Chris Boyle. http://www.sportsnet.ca/hockey/nhl/introducing-the-shot-quality-project/

Video Breakdown: How Steve Valiquette will change how we think about goaltending. Kevin Power. http://www.blueshirtbanter.com/2015/1/6/7500845/video-breakdown-how-steve-valiquette-will-change-how-we-think-about

Revisiting Confirmation Bias. Stefan Wolejszo. http://www.storiesnumberstell.com/revisiting-confirmation-bias/

Step 1, Identify The Problem: Flyers Offensive Game Lacks Simplicity. Travis Hughes. http://www.broadstreethockey.com/2011/3/15/2050075/philadelphia-flyers-turnovers-problem


Using Zone Entry To Separate Offensive, Neutral, And Defensive Zone Performance. Eric Tulsky, Geoffrey Detweiler, Robert Spencer, Corey Sznajder. http://www.sloansportsconference.com/wp-content/uploads/2013/Using%20Zone%20Entry%20Data%20To%20Separate%20Offensive,%20Neutral,%20And%20Defensive%20Zone%20Performance.pdf

Preliminary Analysis of Error in NHL's RTSS Data. 'C' of Stats. http://cofstats.blogspot.ca/2015/02/draft-preliminary-analysis-of-error-in.html