Expected Goals Model

Expected goals (xG) lay the foundation for many hockey analytics. At their core, they are simple yet powerful. Each unblocked shot that is taken, regardless of whether it results in a shot on goal, is assigned a value. This value represents the likelihood that the shot will result in a goal, and is based on a number of descriptive values. For my model, I took inspiration from online resources, such as MoneyPuck's xG model and evolving-hockey's xG model. The features included in my model, in order of importance, are:

defending_manpower = number of teams on the ice for defending team
distance_from_net = straight line distance from the shot to the center of the goal line
goalie_present = whether the goalie is in the net or the net is empty
shot_type = snap, slap, wrist, tip, backhand, etc.
y_coord = the y coordinate of where the shot is taken from
strength_situation = how many players the shooting team has on the ice minus how many players the defending team has on the ice
shot_angle = the angle from the shot location to the center of the goal line
prev_event = what the previous play was (hit, giveaway, shot, faceoff, etc.)
off_wing = whether or not the shooter is on their off-wing
position_group = the position of the shooter, either forward or defenseman
angle_change_speed = the speed of the angle change since the previous event*
distance_change_speed = the straight line distance speed from the previous event (hit, giveaway, shot, faceoff, etc.) to the shot location*
cross_ice_speed = the cross-ice or east-west speed since the previous event*
prev_y_coord = the y coordinate of the event (hit, giveaway, shot, faceoff, etc.) prior to the shot
shooting_manpower = the number of players the shooting team has on the ice
period = what period the shot is taken in
transition_speed = the speed of vertical distance covered since the previous event*
time_remaining = the amount of time remaining in the period
prev_x_coord = the x coordinate of the previous event (hit, giveaway, shot, faceoff, etc.)
x_coord = the x coordinate of where the shot is taken from

*Speed is calculated as the distance or angle change divided by time elapsed.

To create my xG model, I used extreme gradient boosting, an open-source machine learning library. I trained the model using 380k shots taken over the past three years in the NHL. For each shot, the model ingests all of the above characteristics and whether or not that shot resulted in a goal or not. After training, the model can be given a new shot and predicts the likelihood that it will result in a goal.

Many current NHL xG models use AUC (area under curve) to measure the accuracy of the model. My model scored an AUC of 0.77, comparable to most popular xG models used online.