This is an extension to the stochastic differential games chapter of my Ph.D. disseration, Adversarial Risk Analysis for Decision-Making in Sports, which can be found here.
The larger decision modeling requires multiple stages of modeling. There will also be a decision evaluation component which will require some predictive modeling as well.
The challenge to tackle is to determine how to measure "effort" in basketball. One possible proxy is the collection of hustle statistics that the NBA collects. While these are based on tracking data, tracking data generally isn't publicly available. So, we will use hustle statistics fetched from stats.nba.com. While there are definitely other more direct ways of measuring effort (perhaps using biometrics), hustle stats are the most readily available for each game.
The next question is how do we take this collection of hustle measures to create a singular composite effort proxy measure. One approach is to perform a multi-stage regression. We first separate out the offensive and defensive hustle measures and try to assign weights in some data-driven way. We use offensive and defensive ratings to accomplish this.
Why do we use offensive and defensive ratings as the target variables here? Well they are defined as
Once we estimate these models, we'll obtain composite offensive and defensive effort:
Now, we want to combine these into a single composite effort variable. To do so, we run another regression model now with net ratings as the response regressing on the above two predicts to again obtain optimal weights:
With this model estimated, we obtain the composite effort variable.
Again, why do we use net rating here as the target? Net rating is defined as
We can specify it by team as well:
We can similarly specify the away team net rating as well. The net rating now represents the scoring rate differential, which is almost exactly what we will want to model.
The effort level strategy won't be chosen in a vacuum. It will be chosen to maximum the end-game score differential (while accounting for some risks) in the context of a continuosly evolving game. We model this continuous evolution with a system of stochastic differential equations (SDEs). The outcome of the game is determined by the score differential. As a result, we'd like to know how much effort contributes to the ebbs and flows of the game, i.e., the score differential dynamics. Chances are that if a strong team is playing a weak team, the strong team is going to win without significant extra effort. However, for more evenly matched teams, effort will likely be more of a determining factor (admittedly, this statement is opinion, not data-driven evidence). So, to connect effort to the score differential, we also want to account for other factors that determine the relative strength of the two teams in a particular matchup. Consider the SDE:
In practice, we will have a couple more terms in the drift representing additional effort from the home and away teams, and the current drift terms will be constant values estimated according to a third regression model using the output of the Step 1. The estimated regression will produce coefficients for home advantage, relative team strengths using Dean Oliver's four factors, and relative team efforts.
Once estimated, each term in this model will be in the drift term of the SDE, including the intercept which will now represent the home advantage. There will be a handful of games at international venues which are considered neutral, but for the vast majority of games, there will be a home team. So while, it's not technically an intercept, it will effectively be one and will retain a useful interpretation.
