Wednesday, February 25, 2015

NHL Analytics: The Good, The Bad, & The Future

The NHL has officially adopted advanced enhanced #fancystats analytics. Now more than ever, we need to take a critical look at where these analytics stand, and where they are going.


SAT (formerly, Corsi)


SAT (shot attempts or Corsi) is an indirect way of measuring a player or team's offensive zone possession. Since possession time is not directly measured by the league, SAT% measures this by comparing the total shot attempts a team takes versus the total shot attempts that same team allows. This possession proxy is so powerful that it has proven to be more predictive of future wins than any other single metric.

But SAT metrics are not without limitations. To them, every shot attempt is considered equal whether that shot is a one-timer from the slot or a backhander from the point. This equal weighting is useful for approximating possession, but as a result SAT metrics provide no insights into the quality of shots teams generate. As the NHL and its teams adopt these metrics, it's more important than ever to grasp what these metrics are, what they are not, and how they can be used. A misuse now has real consequences.

A former NHL coach is on the record as saying:
"Analytics to me are no more than stats. So if you're for goals and assists and points then why wouldn't you be looking at these other stats too? I was a big fan of [our analytics guy]; I pushed hard for the hire. I like going in depth and checking everything off. You know, if you're going to stand back and say 'hey it takes 95 points to make the playoffs,' can't you say that 'hey if you get a Corsi of fifty percent, seventy percent of those teams make the playoffs,' or 'if you get a team Corsi of fifty-two and a half percent, ninety percent of those teams are going to make the playoffs.' The problem with just looking at 'hey 95 points and you're in', it's hard to go back and, you know, measure 'ok what do you need?' When you get into the team Corsi, and this is where things worked really well with [our analytics guy], is you're actually able to go in and check off every part of your system-work, compare it to other teams that are tops in certain categories, see what they're doing, see if you need to adjust it, where's your personnel at, what are the challenges. But it just adds another layer of putting together a system and a team."
At first glance, this seems like a refreshing use of advanced analytics in the NHL. But upon further analysis, serious misconceptions are evident. Let's go through this, red flag by red flag.

"Analytics to me are no more than stats. So if you're for goals and assists and points then why wouldn't you be looking at these other stats too?"
In and of itself this rhetorical question is innocuous. This note serves only to point out the critical difference between "goals, assists and points" versus "other stats". Goals (and by association, assists) are the only stats that count for wins. Team points are the only stats that count for playoffs. This isn't to say that "other stats" aren't important; indeed the ultimate goal of any organization is to find the right combination of factors - at every level - that lead to success. But an NHL organization must understand that goals and points are the only true measures of success. (These are Laws #1 and #2 of the influential and evolving 2008 piece, The Ten Laws of Hockey Analytics).

"[...] can't you say that 'hey if you get a Corsi of fifty percent, 70% of those teams make the playoffs,' or 'if you get a team Corsi of fifty-two and a half percent, 90% of those teams are going to make the playoffs.'"
The coach is venturing into dangerous territory. His if-Corsi-then-playoffs semantics imply causality. It's true that SAT% and the probability of making the playoffs are strongly related, but there is no evidence to suggest that SAT% causes teams to make the playoffs.

"When you get into the team Corsi [...] you're actually able to go in and check off every part of your system-work".
Further questioning would likely yield nuance, but in and of itself this statement is not true. There's a hell of a lot more to the game and its systems than a comparison of shot attempts. Corsi/shot attempts/SAT is an approximation for possession. Nothing more, nothing less.

Later in the interview, the coach says: "Last year we were at like 44% Corsi [...] and we were able to push that to right around 51 [...]. And that number of 51 over the long term will eventually pay some dividends."
The coach is stating that taking the majority of shots (i.e. having a Corsi greater than 50%) will cause goals and wins - in his words "pay some dividends". This is in direct violation of Hockey Analytics Law #7 which warns: "do not confuse correlation with causation". This is critical. Variables that are non-causal are not to be manipulated directly. Attempts to do so will have no effect on the desired outcome, or worse, have detrimental effects. In the case of this team, they increased their SAT% from 44 to 51 relative to their prior year, yet their record and goal differential declined.


The take-home message


SAT% is not a causal variable. It's an output, not an input. Good teams have good SAT percentages. A good SAT percentage doesn't make a team good.

The ironic reality with SAT is that it is only useful insofar as teams and players do not try to manipulate it directly. The following passage from The Ten Laws of Hockey Analytics serves as a good reminder to this fact:
We know that [SAT] tells us something. But [SAT] is still a measurement of the result and only an indirect observation of the process. It is also a reflection of hockey culture and era. As an example, consider the Traditional Russian Style of hockey. This style absolutely emphasized puck possession, but in this era and culture the puck was never to be wasted on a poor scoring chance – the name of the game was high shot quality. [SAT] would not have been a useful tool.
What contributes to puck possession? [SAT] describes it, today, to a degree. The search is ongoing for better measures of the underlying process. 


PDO (officially, SPSV%)


[Note: PDO is a better acronym than SPSV, so we're sticking with PDO. Percent Defense Offense is a good reading even though this was not the original intent.]

When the coach cited above states that his team's SAT% "will eventually pay some dividends", he's likely referring to PDO, the so-called 'puck luck' stat. PDO adds together a team's save percentage with its shooting percentage, and amazingly most teams are unable to maintain a high or low PDO. Thus, the logic goes, a high PDO reflects a team's good fortunes, and a low PDO bad fortunes.

But keep in mind what the metric actually measures: save percentage + shooting percentage. There are factors other than blind luck that can affect these statistics. A team's poor defensive play can result in a lower save percentage. A team that shoots from the perimeter will have a lower shooting percentage.

PDO is a neat metric, but it raises more questions than it answers. Is the team getting lucky? Or is the team's play actually generating higher quality scoring chances? Or both? Is the goalie not playing well? Getting unlucky? Fatigued? Or is the team allowing higher quality chances against? Are the forwards not back-checking? Are coverages in the D-zone being blown? Is the team turning the puck over too much?

This frustration has been expressed within NHL organizations. Elliotte Friedman quotes one assistant GM as saying, “The thing I hate most about [PDO] is how the (bleep) am I supposed to guess when a player’s luck is supposed to change? Do I just guess? If I trade him when he’s lucky and he continues to stay lucky, are you going to tell your fans, ‘Well, the law of averages said he wasn’t supposed to continue like this’?”

This frustration highlights the current limitations of hockey analytics. SAT% indicates possession and PDO reveals potentially unsustainable levels of shooting and save percentage, but they don't tell us why or how. Hence the search for "better measures of the underlying process."


Zone Exits, Entries and Passing Data.


Zone exits, entries, and passing data are the foundation of the next level of analytics. These datasets capture the systems within the game of hockey - the inputs, the "underlying process". Zone entry data has already shown that controlled carry-ins create more than twice as many shots and goals than dump-ins. Unlike SAT and PDO, these findings prescribe clear directives for teams and players. In this case, teams should carry the puck into the zone whenever possible.

Beyond the direct application at the team level, this data pinpoints specific strengths and weaknesses in a player's game. The same study on zone entries uncovered that Danny Briere was better at entering the offensive zone with possession than his linemate Wayne Simmonds. Yet it was Simmonds who attempted most of the zone entries, a strategical flaw the Flyers could have corrected had they been armed with the information.

Passing data puts the shot quality debate to rest. Early studies indicated that shot location was the only meaningful factor in determining a shot's success rate, but this research didn't consider the sequence of events preceding each shot. From his own datasets, former NHL goaltender Stephen Valiquette has separated shots into "red shots" and "green shots" based on specific criteria, such as whether the shot results from a pass across the Royal Road. His preliminary findings show that 76% of all NHL goals come from green shots. Chris Boyle's Shot Quality Project reveals that a goalie's save percentage is only "0.651 on shots immediately following a pass". Among other pieces of data, Ryan Stimson's Passing Project records the zone from which passes originate, providing insights into rush shots and shots generated from the cycle.

The major hurdle facing these new measures is data acquisition. Zone exits, entries, and passing data are not collected by the NHL, so it's left to dedicated hockey researchers to watch the games and record the data. Many are waiting on Sportvision's "advanced player tracking" technology to provide this data, but questions remain. What data, if any, will be released to the public? And when?

These datasets will also have their own limitations. Assessing intent is critical with regards to passing, but to Sportvision a pass in traffic would be indistinguishable from a play in which the puck is knocked off the player's stick. The technology will undoubtedly revolutionize hockey analytics, but the fact remains that human observation will be an instrumental part of the data collection process for the foreseeable future.

Who said science was easy?


* * *


References

What Statistics Are Meaningful In A Given Season? Steve Burtch. http://www.pensionplanpuppets.com/2013/7/10/4508094/what-statistics-are-meaningful-in-a-given-season-corsi-fenwick-PDO-hits-fights-blocked-shots

The Ten Laws of Hockey Analytics. Alan Ryder(?) http://hockeyanalytics.com/2008/01/the-ten-laws-of-hockey-analytics/

30 Thoughts: Analytics say Oilers luck should turn. Elliotte Friedman. http://www.sportsnet.ca/hockey/nhl/elliotte-friedman-nhl-30-thoughts-edmonton-oilers-john-davidson-montreal-canadiens/

Using Zone Entry Data To Separate Offensive, Neutral, And Defensive Zone Performance. Eric Tulsky, Geoffrey Detweiler, Robert Spencer, Corey Sznajder. http://www.sloansportsconference.com/wp-content/uploads/2013/Using%20Zone%20Entry%20Data%20To%20Separate%20Offensive,%20Neutral,%20And%20Defensive%20Zone%20Performance.pdf

Does Shot Quality Exist? Tom Awad. http://www.hockeyprospectus.com/puck/article.php?articleid=540

Unmasked: Analytics provide new evalutation tools. Kevin Woodley. http://www.nhl.com/ice/news.htm?id=744483


No comments:

Post a Comment