Wednesday, December 31, 2014

Methods (Proposed) - Defining and Collecting Pass and Shot Data

These methods outline a simple and comprehensive way to collect passing data for every shot attempt. In a sense, this dataset captures primary and secondary "assists" for Corsi events (and then some). Why am I doing this? Because passing is a critical part of hockey and as such requires measurement. And until SportsVU comes to the NHL, the data must be collected manually. For more information on why this blog is doing what it's doing, refer to the Introduction.

Proposed Stats and Definitions


The following stats and definitions are based on my interpretation of Ryan Stimson's methods and definitions for the Passing Project. I'm also collecting additional pieces of data that have the potential for deeper insights into player and team performance.


Collected Game-State Data

This data accounts for the score, strength, and time at which each play takes place, and the team generating the play. The pieces collected are:
  1. Period - Marks the period in which the play takes place. 
  2. Score - The score of the game at the time of the play. When a goal is scored, the score is recorded as it was before the goal took place because this is the score state at the time of the play. This data allows us to account for score effects. 
  3. Strength - The first number is the number of away skaters on the ice, the second number is the number of home players on the ice. E.g. 5v4 means the home team is shorthanded. 5v6 means the home team has pulled their goalie. 
  4. Time - The time listed on the clock when the play takes place. This allows us to sync our data with other sets. 
  5. Team - The team making the play. 

Collected Shooter Statistics

This data is collected for players who take a shot attempt in the offensive zone only. Shots that are taken from outside the offensive zone are rarely dangerous (aka this data is noise) and are therefore not recorded.
  1. SOG - Shot on Goal. 
  2. MS - Missed Shot. 
  3. BS - Blocked Shot. A shot that is blocked or tipped by a defending player, or a shot blocked by the body of a shooter's teammate. 
  4. Posts - I'm tracking posts separately from MS (that's what they're normally tracked as, right?) because I think posts, as being almost goals, could indicate the play is a higher quality scoring chance. 
  5. NS - Non-shot. A play in which a player has possession of the puck in the scoring chance area, has the opportunity to shoot, but for whatever reason does not get the shot off. This is a way of accounting for scoring chances that aren't recorded as shot attempts. 
  6. SCS - Scoring Chance Shot. Credited in addition to any of Shooter Stats 1 to 5 if the shot attempt is taken from the scoring chance area. 
  7. Goal - credited to any shooter whose SOG is also a goal. 
Note: Shots that are tipped by a teammate en route to the net are generally recorded as A1 passes (see below), and the player tipping the puck gets credit for the shot (either as SOG, MS etc. depending on the result of the tip).


Collected Passer Statistics
  1. A1 (similarly, A2) - Credited to a player who passes the puck to a player who then takes a shot attempt (A2 is credited to the passing player who is one pass removed from the shooter). 
  2. A1 D/N/O (similarly A2 D/N/O) - The location from which the A1 (or A2) passer passes from; Defensive zone, Neutral zone, Offensive zone. 
  3. SCA (similarly, SCA2) - Scoring Chance Assist. Credited to a player whose pass directly leads to a shot in the scoring chance area (SCA2 is credited to the A2 player who's pass directly leads to the A1 player's pass coming from the scoring chance area, which leads to a shot attempt. See example play below.) 

Failed Passes

I'm also tracking FP - Failed Passes - and FPL - Failed Pass Location. These are passes that do not reach their intended target and result in a change of possession. The pass's starting location and intended location are recorded. Generally failed passes are the fault of the passer, but the pass receiver can instead be credited with the FP if the tracker judges that the receiver is at fault for not receiving the pass. Failed Passes are a potentially reliable way to capture player errors.

Recording the Data


The data is recorded in Excel using an event-based model (i.e. each play is recorded as an event). Each row represents a play, and each column represents a stat (listed in the header row) credited to a player involved in the play. Player numbers are inputted into the fields depending on what the player contributes to the play. Refer to the sample image below (game-state data not shown). Included is the header row, and two rows representing two separate plays.


The first row contains the data for this goal by Curtis Glencross, setup beautifully by Hudler and Monahan.

Glencross (20) is awarded with:
  1. SOG because he gets a shot on goal 
  2. SCS because his shot is taken from the scoring chance area 
  3. Goal because he gets a goal 
Hudler (24) is awarded with:
  1. A1 because his pass leads to the shooter's shot 
  2. O because his pass comes from the offenzive zone 
  3. SCA1 because his pass leads directly to the shooter's shot which takes place in the scoring chance area 
Monahan (23) is awarded with:
  1. A2 because he is one passer removed from the shot (and because this play results in a goal, he's awarded the second assist). 
  2. N because he passed from the neutral zone. 
  3. SCA2 because his pass leads to Hudler skating into the scoring chance area unimpeded (Muzzin is angling Hudler away from the net, but Hudler is able to skate into the scoring chance area as a direct result of Monahan's pass). 

The second row of the Excel image represents a failed pass. A pass from Engelland from the defensive zone to the neutral zone is intercepted. "29" (Engellund's #) is recorded in the FP column and "D-N" is recorded in the FPL column.

Data Outcomes


The event-based data model allows for a large amount of raw data to be collected relatively simply, and offers many advantages:
  1. Time-stamps allow this dataset to be synced with others. 
  2. Each play is associated with the score of the game, making score effects easy to account for. 
  3. Each play is associated with the manpower situation (strength), allowing us to collect data for both even strength and man advantage play. 
  4. The raw nature of the data allows for an endless possibility of analyses, including the ability to calculate metrics such as Corsi and Stimson's SAGE. 
  5. The location data captures where on the ice the different elements of a play are taking place. 
  6. The data captures the interactions between players involved in the same plays. 

***
Readers, methods can always be improved. If you have suggestions for better definitions or methods for the data collection, please let me know.

If you'd like to join the Passing Project and collect data for an NHL team, you can reach out to Ryan Stimson on Twitter @RK_Stimp / or by email hockeypassingstats@gmail.com
***

References

Stars GM Jim Nill wants to see the NHL implement SportsVU league-wide. Thomas Drance. http://www.thescore.com/nhl/news/542630

2013-2014 Devils Passing Review: A Passing Stats Primer. Ryan Stimson. http://www.inlouwetrust.com/2014/7/21/5899095/a-passing-stats-primer

Goal by Curtis Glencross. http://www.nhl.com/gamecenter/en/boxscore?id=2014020539

No comments:

Post a Comment