top of page

Expected Goals Model

goals_contour.png

Expected goals (xG) lay the foundation for many hockey analytics. At their core, they are simple yet powerful. Each unblocked shot that is taken, regardless of whether it results in a shot on goal, is assigned a value. This value represents the likelihood that the shot will result in a goal, and is based on a number of descriptive values. For my model, I took inspiration from online resources, such as MoneyPuck's xG model and evolving-hockey's xG model. The features included in my model, in order of importance, are:

  • defending_manpower = number of teams on the ice for defending team

  • distance_from_net = straight line distance from the shot to the center of the goal line

  • goalie_present = whether the goalie is in the net or the net is empty

  • shot_type = snap, slap, wrist, tip, backhand, etc.

  • y_coord = the y coordinate of where the shot is taken from

  • strength_situation = how many players the shooting team has on the ice minus how many players the defending team has on the ice

  • shot_angle = the angle from the shot location to the center of the goal line

  • prev_event = what the previous play was (hit, giveaway, shot, faceoff, etc.)

  • off_wing = whether or not the shooter is on their off-wing

  • position_group = the position of the shooter, either forward or defenseman

  • angle_change_speed = the speed of the angle change since the previous event*

  • distance_change_speed = the straight line distance speed from the previous event (hit, giveaway, shot, faceoff, etc.) to the shot location*

  • cross_ice_speed = the cross-ice or east-west speed since the previous event*

  • prev_y_coord = the y coordinate of the event (hit, giveaway, shot, faceoff, etc.)  prior to the shot

  • shooting_manpower = the number of players the shooting team has on the ice

  • period = what period the shot is taken in

  • transition_speed = the speed of vertical distance covered since the previous event*

  • time_remaining = the amount of time remaining in the period

  • prev_x_coord = the x coordinate of the previous event (hit, giveaway, shot, faceoff, etc.) 

  • x_coord = the x coordinate of where the shot is taken from

*Speed is calculated as the distance or angle change divided by time elapsed.

​

​

​

​

​

​

 

 

 

 

 

 

 

 

 

 

 

​

​

​

​

​

To create my xG model, I used extreme gradient boosting, an open-source machine learning library. I trained the model using 380k shots taken over the past three years in the NHL. For each shot, the model ingests all of the above characteristics and whether or not that shot resulted in a goal or not. After training, the model can be given a new shot and predicts the likelihood that it will result in a goal.

 

Many current NHL xG models use AUC (area under curve) to measure the accuracy of the model. My model scored an AUC of 0.77, comparable to most popular xG models used online. 

​

​

feature_importance.png
bottom of page