Wednesday, December 31, 2014

Methods (Proposed) - Defining and Collecting Pass and Shot Data

These methods outline a simple and comprehensive way to collect passing data for every shot attempt. In a sense, this dataset captures primary and secondary "assists" for Corsi events (and then some). Why am I doing this? Because passing is a critical part of hockey and as such requires measurement. And until SportsVU comes to the NHL, the data must be collected manually. For more information on why this blog is doing what it's doing, refer to the Introduction.

Proposed Stats and Definitions


The following stats and definitions are based on my interpretation of Ryan Stimson's methods and definitions for the Passing Project. I'm also collecting additional pieces of data that have the potential for deeper insights into player and team performance.


Collected Game-State Data

This data accounts for the score, strength, and time at which each play takes place, and the team generating the play. The pieces collected are:
  1. Period - Marks the period in which the play takes place. 
  2. Score - The score of the game at the time of the play. When a goal is scored, the score is recorded as it was before the goal took place because this is the score state at the time of the play. This data allows us to account for score effects. 
  3. Strength - The first number is the number of away skaters on the ice, the second number is the number of home players on the ice. E.g. 5v4 means the home team is shorthanded. 5v6 means the home team has pulled their goalie. 
  4. Time - The time listed on the clock when the play takes place. This allows us to sync our data with other sets. 
  5. Team - The team making the play. 

Collected Shooter Statistics

This data is collected for players who take a shot attempt in the offensive zone only. Shots that are taken from outside the offensive zone are rarely dangerous (aka this data is noise) and are therefore not recorded.
  1. SOG - Shot on Goal. 
  2. MS - Missed Shot. 
  3. BS - Blocked Shot. A shot that is blocked or tipped by a defending player, or a shot blocked by the body of a shooter's teammate. 
  4. Posts - I'm tracking posts separately from MS (that's what they're normally tracked as, right?) because I think posts, as being almost goals, could indicate the play is a higher quality scoring chance. 
  5. NS - Non-shot. A play in which a player has possession of the puck in the scoring chance area, has the opportunity to shoot, but for whatever reason does not get the shot off. This is a way of accounting for scoring chances that aren't recorded as shot attempts. 
  6. SCS - Scoring Chance Shot. Credited in addition to any of Shooter Stats 1 to 5 if the shot attempt is taken from the scoring chance area. 
  7. Goal - credited to any shooter whose SOG is also a goal. 
Note: Shots that are tipped by a teammate en route to the net are generally recorded as A1 passes (see below), and the player tipping the puck gets credit for the shot (either as SOG, MS etc. depending on the result of the tip).


Collected Passer Statistics
  1. A1 (similarly, A2) - Credited to a player who passes the puck to a player who then takes a shot attempt (A2 is credited to the passing player who is one pass removed from the shooter). 
  2. A1 D/N/O (similarly A2 D/N/O) - The location from which the A1 (or A2) passer passes from; Defensive zone, Neutral zone, Offensive zone. 
  3. SCA (similarly, SCA2) - Scoring Chance Assist. Credited to a player whose pass directly leads to a shot in the scoring chance area (SCA2 is credited to the A2 player who's pass directly leads to the A1 player's pass coming from the scoring chance area, which leads to a shot attempt. See example play below.) 

Failed Passes

I'm also tracking FP - Failed Passes - and FPL - Failed Pass Location. These are passes that do not reach their intended target and result in a change of possession. The pass's starting location and intended location are recorded. Generally failed passes are the fault of the passer, but the pass receiver can instead be credited with the FP if the tracker judges that the receiver is at fault for not receiving the pass. Failed Passes are a potentially reliable way to capture player errors.

Recording the Data


The data is recorded in Excel using an event-based model (i.e. each play is recorded as an event). Each row represents a play, and each column represents a stat (listed in the header row) credited to a player involved in the play. Player numbers are inputted into the fields depending on what the player contributes to the play. Refer to the sample image below (game-state data not shown). Included is the header row, and two rows representing two separate plays.


The first row contains the data for this goal by Curtis Glencross, setup beautifully by Hudler and Monahan.

Glencross (20) is awarded with:
  1. SOG because he gets a shot on goal 
  2. SCS because his shot is taken from the scoring chance area 
  3. Goal because he gets a goal 
Hudler (24) is awarded with:
  1. A1 because his pass leads to the shooter's shot 
  2. O because his pass comes from the offenzive zone 
  3. SCA1 because his pass leads directly to the shooter's shot which takes place in the scoring chance area 
Monahan (23) is awarded with:
  1. A2 because he is one passer removed from the shot (and because this play results in a goal, he's awarded the second assist). 
  2. N because he passed from the neutral zone. 
  3. SCA2 because his pass leads to Hudler skating into the scoring chance area unimpeded (Muzzin is angling Hudler away from the net, but Hudler is able to skate into the scoring chance area as a direct result of Monahan's pass). 

The second row of the Excel image represents a failed pass. A pass from Engelland from the defensive zone to the neutral zone is intercepted. "29" (Engellund's #) is recorded in the FP column and "D-N" is recorded in the FPL column.

Data Outcomes


The event-based data model allows for a large amount of raw data to be collected relatively simply, and offers many advantages:
  1. Time-stamps allow this dataset to be synced with others. 
  2. Each play is associated with the score of the game, making score effects easy to account for. 
  3. Each play is associated with the manpower situation (strength), allowing us to collect data for both even strength and man advantage play. 
  4. The raw nature of the data allows for an endless possibility of analyses, including the ability to calculate metrics such as Corsi and Stimson's SAGE. 
  5. The location data captures where on the ice the different elements of a play are taking place. 
  6. The data captures the interactions between players involved in the same plays. 

***
Readers, methods can always be improved. If you have suggestions for better definitions or methods for the data collection, please let me know.

If you'd like to join the Passing Project and collect data for an NHL team, you can reach out to Ryan Stimson on Twitter @RK_Stimp / or by email hockeypassingstats@gmail.com
***

References

Stars GM Jim Nill wants to see the NHL implement SportsVU league-wide. Thomas Drance. http://www.thescore.com/nhl/news/542630

2013-2014 Devils Passing Review: A Passing Stats Primer. Ryan Stimson. http://www.inlouwetrust.com/2014/7/21/5899095/a-passing-stats-primer

Goal by Curtis Glencross. http://www.nhl.com/gamecenter/en/boxscore?id=2014020539

Monday, December 29, 2014

Introduction - The Power of Passing

Welcome to C of Stats, the inevitable result of a scientist and Flames fan who discovered advanced hockey stats. There's no better place to start this blog than with a miniature review of some of these "fancy" stats, focusing on what they are, why they're powerful, and how we can take them further. This serves as a concise introduction to the analytics community and sets the foundation for this blog's existence.

Our advanced stats saga begins with the Corsi and Fenwick stats, admittedly awkward names attributed to an NHL goalie coach who had nothing to do with the stat, and a blogger. Simply enough, Corsi and Fenwick count all the shots a team directs towards the opponent’s net: shots on goal, missed shots, and (for Corsi only) shots that are blocked. Both teams’ shot totals are then compared in what's called a shot differential. The premise is simple: the team that takes the most shots is the team that possesses the puck more. After all, puck possession is a prerequisite to shooting, and as it turns out, possession is critical to winning hockey games.

Corsi and Fenwick are solid metrics. They correlate strongly with winning hockey games and more importantly, they are better predictors of future goals and wins than measures using traditional stats. Note that the most salient feature of advanced stats is their predictiveness. Predictive hockey analytics are proven to repeatedly and reliably relate to goal scoring, which means we are actually permitted, by the laws of math and statistics, to use these numbers to judge and predict performance. And Corsi and Fenwick are better at this than any other metric to date (including, perhaps most of all, the opinion of many sports broadcasters). Like anything though, Corsi and Fenwick have limitations. Insofar as every shot directed towards the opponent's net is given equal weight, Corsi and Fenwick can only infer anything about possession, and offer no insight into the quality of shots teams generate. Surely shot quality is important, right?

The answer to this question is more complex and less conclusive than you might expect. There's no question shot quality exists – studies have shown it has a measurable and statistically significant impact on the game. However, these same studies reveal that shot quality pales in comparison to Corsi and Fenwick as a predictive variable for future goals and wins. "[Shot quality] is 4 to 5 times less important than shot differential," Tom Awad concluded back in 2010. It comes as a surprise to most – and leaves many observers including myself feeling uneasy – but shot quality as measured in this research does not offer much insight in to the game of hockey.

But how exactly is shot quality measured? In Awad’s analysis (and in the analyses conducted before him), five key factors are identified: the distance from which the shot is taken, whether the shot comes directly after a rebound, whether the shot comes directly after a turnover, the shot type (e.g. slapshot), and the manpower situation. These are all logical factors, but as mentioned they lack any substantial amount of predictability. So is that it for shot quality? Does the analytics community move on? Many are, and the research certainly seems to support this position. But could there be more to shot quality than the five factors identified? And if so, could additional factors be any more predictive? Recent (albeit preliminary) studies suggest this could very well be the case.

Since the start of the 2013-2014 season, Ryan Stimson has been tracking passes that lead to shot attempts, known as SAG (Shot Attempt Generation). His preliminary findings show that shots on goal taken as the direct result of a pass are about 50% more likely to result in goals than shots that are not the result of a pass. This makes intuitive sense because goalies and the defending players are out of position in the moments following a pass. Furthermore, Stimson discovered that the efficiency with which shooters can convert passes into shots on goal, known as SAGE, correlates with goal scoring to the tune of 88 to 99 percent. This is nearly a perfect correlation, which suggests that goals are reliably scored because of shots generated from passes. SAGE also correlates more strongly than any other metric - Corsi and Fenwick included - to wins. If you’ll pardon the pun, these are game-changing findings. Preliminary no doubt, but they suggest that passing is a critical factor of shot quality.

These promising findings require further analysis, and this is precisely the inspiration for C of Stats. This blog has joined Ryan Stimson's Passing Project, and is collecting passing data for all Calgary Flames games starting with the 2014-2015 season. Regardless of Stimson's findings, as hockey enthusiasts we can all agree that passing is an essential part of the game. It needs to be tracked. This data will be incredibly revelatory and provide new insights into the Flames, their opponents, and the game of hockey as a whole. I can't wait to share the data and resulting analyses with you.


***

Whether you're new to advanced analytics, an expert statistician, or just a straight up hater, don't hesitate to contact me with questions, comments and the like. If you'd like to join the Passing Project, contact Ryan Stimson on Twitter @RK_Stimp or by email at hockeypassingstats@gmail.com

***

References

McKenzie: The real story of how Corsi got its name. Bob McKenzie. http://www.tsn.ca/mckenzie-the-real-story-of-how-corsi-got-its-name-1.100011

What Statistics Are Meaningful In A Given Season? Steve Burtch. http://www.pensionplanpuppets.com/2013/7/10/4508094/what-statistics-are-meaningful-in-a-given-season-corsi-fenwick-PDO-hits-fights-blocked-shots

Numbers On Ice. Does Shot Quality Exist? Tom Awad. http://www.hockeyprospectus.com/puck/article.php?articleid=540

How passing relates to shooting percentage and close situations. Ryan Stimson. http://www.hockeyprospectus.com/stimson-how-passing-relates-to-shooting-percentage-and-close-situations/

The pass-tracking project: Passing and goals. Ryan Stimson. http://www.hockeyprospectus.com/the-pass-tracking-project-passing-and-goals/

2013-2014 Devils Passing Review: Efficiency and Winning. Ryan Stimson http://www.inlouwetrust.com/2014/7/28/5901789/efficiency-and-transition-offense