The Arbitrarian: Marginal productivity of box score statistics

David Sparks is the contributing statistics writer for Hardwood Paroxysm. His Arbitrarian column runs every Thursday here at HP. For more of his work, you can read his blog. This week’s entry is indeed a true stats column, and is probably the first post on here in a while that doesn’t have the words “snake eggs” in it. David’s our classy guy. This week’s discussion is on his own work with Box Scores. Enjoy.

Thus far, you’ve gotten to read me wax philosophical and discourse on the ideas of others. Today, I’m going to enter the arena, so to speak, and present some of my own work.

Imagine for a moment that you’re interested not only in estimating player value (as in “Most Valuable Player,” not the best player, nor the most talented, nor the most clutch, etc. Value is a direct function of productivity, not ability.), but estimating it well, and doing so for essentially all of professional basketball history. Perhaps you could use (adjusted) plus/minus? Well, no, unfortunately, the play-by-play data necessary to construct plus/minus goes back only a few seasons–no one was keeping track of Bill Russell’s on-court versus off-court team scoring totals.

It would be nice if we had a good way of measuring defense, other than just blocks and steals–maybe it would be possible to pore over video of every game ever played and count the number of “shots changed” and “ball-handlers pressured” for each player… except I’m not sure if video would be available for all 50,000+ games played. Last week, when I asked if there was still room for development in NBA analytics, the overwhelming response was “yes” and the second most overwhelming response was “Defense!” Apparently, it is well-known that box score stats fail to capture some of what makes a player a good defender. Two commonly-cited examples of good defensive players undervalued by traditional statistics are Shane Battier and Bruce Bowen; both are often assigned to guard the opponent’s best perimeter player, but judging from box score statistics alone, it might be hard to see why.

If one is interested in historical comparison, the data options are somewhat limited. Even certain box score stats, like steals, blocks, three-pointers (which are a relatively modern addition to the rulebook), and offensive/defensive rebounds have not been tracked for all of basketball history. However, I contend that for any season prior to roughly 05-06, box score-based metrics are the best option, given that they are essentially the only option. Further, what I propose here goes a long way toward indirectly capturing some “unmeasured” defensive ability, and though it may still be systematically biased against certain lockdown-type defenders, such players are (subjectively) relatively rare.

Defining value through productivity

I will go into much more depth next week on the topic of value, but for now, I will suggest that value is a function of productivity. In “counting stat” terms, basketball productivity can be seen as the accumulation of points, rebounds, steals, personal fouls, and so on, by a player or group of players. However, each of these possible production items is worth something different: a player who contributes 5 fouls in a game is certainly affecting the final score in a different way than a player who contributes 5 points in a game, ceteris paribus. Offensive and defensive rebounds might be differentially productive, as might be missed free throws and missed field goals. It should be fairly obvious to most observers of the game that merely “adding the good and subtracting the bad” is not an appropriate way to estimate productivity (See “Efficiency“), though it may be better than focusing heavily on scoring numbers alone.

That different box score contributions have different values is generally widely accepted; a problem arises in identifying the appropriate/actual set of weightings to use. Is an assist worth one-half of a point? How much more (or less) is an offensive board worth than a defensive rebound? The problem, as I’ve noted before, is that a statistic can be developed to support any conclusions you wish to find. Do you think that the Allen Iverson/Carmelo Anthony duo is the greatest of all time? Weigh scoring heavily relative to other contributions, and assign small (if any) negative values to missed shots. Think Mark Eaton and Dikembe Mutombo’s defensive prowess make them the best ever? Well, when you consider that a blocked shot prevents two points and may also give the blocking team possession, it’s really worth three times the value of a point–it all adds up. My point is that, intentionally or not, biases may easily slip into our analysis. This is why it is important to make public any metric-determining methodology, and subject it to review and criticism.

At any rate, I plan to construct a productivity metric based on a linear-weighting system not too dissimilar from that of Berri and Hollinger, although it differs in the exact weights, and makes fewer “adjustments.” Such linear systems are often criticized, but as I have outlined above, they are one of only a few options open to those with an interest in assessing the players of the past. Further, my value metric (as opposed to my productivity metric, if you’re still with me… there is a difference) incorporates more than just the linear-weighting system, as you will see next week. The key contribution I’m making today is to put forward what I believe to be highly significant, verisimilar linear regression results that help us find “true” weightings.

A data problem

I will not bore you with the details, but this is an endeavor I have attempted many times. Regression analysis allows us, in one interpretation, to estimate the marginal value (in terms of a dependent variable) of an additional unit of an independent variable, on average. For example, a model estimating baseball production might find that for every additional home run hit by a team, their runs scored total increases by 1.44. In baseball, regressing things like singles, doubles, triples, home runs, steals, ground-into-double plays, walks, etc. on runs scored works like a charm (maybe I’ll post analysis this if it’s a very slow news day, but I imagine the baseball metricians have already covered it).

In basketball, at the season level, such is not the case. Regressing box score stats on wins doesn’t really seem to work (by which I mean coefficients which “should be” positive come out negative, for example), nor does regressing on average point differential, points scored, points against, and so on. One option is to do as Dr. Berri has done, and develop a somewhat indirect, albeit reasonably convincing, system by which to connect individual player productivity to team success. (See his 1999 paper here.) Another option is to increase the resolution, and use game-level data:

Box score contributions to team scoring margin

Using a sample of tens of thousands of modern NBA game box scores, I set up a regression using the following formula¹:

MARGIN = B1 + ISHOME*B2 + MIN*B3 + UBX*B4 + FTX*B5 + AS*B6 + OR*B7 + DR*B8 + ST*B9 + BK*B10 + OUST*B11 + PF*B12 + OUBX*B13 + OFTX*B14 + OAS*B15 + OOR*B16 + ODR*B17 + OST*B18 + OBK*B19 + OUST*B20 + OPF*B21


Where:

  • MARGIN = Team total points scored less opponent total points scored
  • ISHOME = A dummy variable indicating whether or not the team of interest is playing at home
  • MIN = Duration of the game in minutes
  • UBFGX = Un-blocked missed field goals = team missed field goals less opponent blocks
  • FTX = Missed free throws
  • AS = Assists
  • OR = Offensive rebounds
  • DR = Defensive rebounds
  • ST = steals
  • BK = blocks
  • UST = Un-stolen turnovers = team turnovers less opponent steals
  • PF = Personal fouls
  • The “O” prefix indicates the same variable measured for the team’s opponent

This regression returns the following output:


Residuals:
Min 1Q Median 3Q Max
-21.90690 -3.57014 -0.04017 3.56452 22.23100

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.124810 1.113502 -1.010 0.3124
mp 0.009060 0.004918 1.842 0.0654 .
ishome 0.070037 0.073806 0.949 0.3427
tubx -1.059533 0.013189 -80.334 <2e-16 ***
tftx -0.606574 0.013822 -43.886 <2e-16 ***
tas 0.346423 0.007267 47.669 <2e-16 ***
tor 1.052038 0.015221 69.117 <2e-16 ***
tdr 0.531251 0.013246 40.107 <2e-16 ***
tst 1.580819 0.012076 130.903 <2e-16 ***
tbk 0.952582 0.016660 57.177 <2e-16 ***
tust -1.462616 0.014012 -104.381 <2e-16 ***
tpf -0.209380 0.009467 -22.116 <2e-16 ***
oubx 1.004537 0.013118 76.578 <2e-16 ***
oftx 0.567970 0.013906 40.843 <2e-16 ***
oas -0.352181 0.007303 -48.223 <2e-16 ***
oor -1.007247 0.015194 -66.294 <2e-16 ***
odr -0.491897 0.013352 -36.840 <2e-16 ***
ost -1.625631 0.011970 -135.807 <2e-16 ***
obk -1.009805 0.016927 -59.657 <2e-16 ***
oust 1.433541 0.013909 103.065 <2e-16 ***
opf 0.240950 0.009476 25.427 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.292 on 24689 degrees of freedom
Multiple R-Squared: 0.8492, Adjusted R-squared: 0.849
F-statistic: 6949 on 20 and 24689 DF, p-value: < 2.2e-16

Here are the coefficients, along with standard errors, expressed in graphical form:

Note that the standard errors are all pretty small (essentially invisible), and all of the coefficients are significantly different from zero.

To arrive at the weightings I use for my linear productivity estimator, I averaged the magnitude of the Team and Opponent coefficients for each statistic, resulting in the following weights:


tubx -1.0320351
tftx -0.5872716
tas 0.3493022
tor 1.0296423
tdr 0.5115741
tst 1.6032249
tbk 0.9811934
tust -1.4480786
tpf -0.2251647

The great thing here is that (almost) all of these weights seem to make perfect theoretical/subjective sense: A missed field goal is worse than a missed free throw, since many missed free throws are the first of two attempts and cannot be rebounded, and the shooting team is often in a better position to defend a missed free throw defensive rebound counterattack. Offensive rebounds are worth more than defensive, because though both indicate the capturing of a possession, an offensive rebound puts teams in a better position to score than a defensive rebound, which must be moved up the court and is subject to turnover and likely a more difficult shot attempt. A steal is worth (slightly) more than the typical turnover, because a possession change resulting from a steal probably generally results in an easier shot attempt than a possession change coming from, for example, an inbound from a three-second violation. Personal fouls, though they sometimes result in free throws for the other team (and are thus detrimental), are also often used to prevent an easy two-point scoring opportunity or to disrupt the flow of an offense, and are often employed for strategic purposes with the intent of increasing the fouling team’s score relative to that of their opponent.

The only theoretically problematic coefficient is…

The troublesome assist

I believe the regression results. Given the apparent verisimilitude of each other coefficient, I think that these estimates are reasonably accurate reflections of reality, and that each additional assist adds only 0.348 to the final margin, on average. However (subjectivity alert!), I do not think that an assist fully captures the contribution of the player doing the assisting. Not only are many good passes made on missed field goals, but some credit might be given to players for moving the ball up the court, running the offense, etcetera, above and beyond attribution for the single penultimate act of passing to the player who scores. Thus, since without such an adjustment, point guards are almost entirely absent from the upper echelons of the productivity list, I re-estimate the assists coefficient:

To do so, I regress team and opponent assists alone on final margin. Using the resulting coefficients (1.196574 and -1.188791, respectively), I take an average as done above, to find my operating coefficient: 1.192683.

Thus far, our coefficients allow us to approximate the number of points a player helped to create for his team, the number of points a player prevented his own team from scoring, the number of points a player allowed the other team to score, and the number of points he prevented them from scoring. To this, we add the most direct contribution to winning margin: points scored. Each player is credited with “all” of his points–there is a direct, one-to-one relationship between each additional point scored and final scoring margin. Thus, I give you an elegant linear-weighted box score-based productivity metric, Model-Estimated Value:

MEV = pts – 1.032*fgx – 0.587*ftx + 1.193*as + 1.030*or + 0.512*dr + 1.603*st + 0.981*bk – 1.448*to – 0.225*pf

Note: In past seasons, offensive and defensive rebounds were not recorded separately. Thus, for such years, I replace the OR and DR factors with (total rebounds) * 0.669, which is the weighted average of the value of all rebounds since offensive and defensive boards have been counted as distinct. Also in years past, turnovers, blocks, and steals were not tracked. I feel that it would be inappropriate to impute estimates of such statistics for historical players, and so I am more or less content to allow no penalty for all unrecorded turnovers past, nor give credit for uncounted blocks and steals. The value metric I’ll detail next week should make this a more comfortable accommodation.

Pre-emptive rebuttals to likely criticism

Certainly this metric is not perfect, and there are many criticisms which could be leveled against it. Here, I will try to address some likely concerns, while avoiding straw men.

C: MEV is box score-based, and so fails to adequately capture, among other things, defense, hustle, heart, desire, clutch, etc.

R: I tried to address this to some extent in my preamble above. I would be happier if MEV did a better job of capturing all aspects of the game (especially defense, though the enhancement I detail next week, I feel, helps somewhat), but given data restrictions, I have decided that box-score statistics are an evil necessary to a universally applicable estimator.

C: A steal (rebound, three-pointer, turnover, etc.) in the last seconds of a close contest is worth much more than at another point in time, and is certainly worth more than in a blowout contest.

R: The first clause is highly debatable: a steal made in the middle of the second quarter might obviate the need for any late-game heroics, and all points scored are given equal credit in their accumulation toward the final score. The second clause is similarly misguided: any additional box score stat will contribute just as much to the final scoring margin, on average, in any game.

C: You keep saying “on average,” but there is no “average” blocked shot. Some are rebounded by the shooting team, some are swatted out-of-bounds, others prevent the game-tying shot, etc.

R: I say “on average” because that is what my methods permit me to say. Part of this is a data availability problem. Until the day we have exhaustive categorizations of every single event and its result, for all NBA games past and future, I am content to make do with the average. Further, over the course of many observations, the averages should not systematically bias the estimates in favor of, or against, any single player. Michael Jordan had many “significant” field goals, but he also had many less “significant” ones.

Incidentally, this argument is often proffered by those opposed to statistical approaches in general. It may indeed be true that some nuance is lost when dealing with recorded numerical observations of the game as compared to narrative, subjective observations. However, it is my contention that the gains in objectivity, accuracy, and consistency afforded by a statistical approach vastly outweigh the losses associated with the possibility that Big Shot Rob doesn’t get more credit for his Biggest Shots (in fact, he will get credit next week, to some extent). Further, as I have mentioned before, I do not see qualitative/quantitative approaches as a binary dichotomy.

C: MEV overweights/underweights statistic X, Y, and Z.

R: I have attempted here to be as transparent as possible in detailing exactly how I arrived at my estimates. I think there may exist some room for disagreement on some of the scalars, but I have detailed the reasons that these coefficients are theoretically satisfying, and empirically-derived. I would be willing to consider an argument with a sound theoretical basis and empirical verification (by which I mean, run your own regression), but for now, I am very comfortable with the weights as they stand.

The one exception is the value credited to an assist, which I may have under-justified. I do feel like (subjectivity alert again!) 1.192 is not an unreasonable amount of credit, falling as it does between the value of an offensive rebound, block, or missed field goal, and the value of a made two-pointer, turnover, or steal. Also, one would have to feel bad for all those point guards who spend all their time trying to pass instead of shooting, and hardly get any credit for it. Please, think of the point guards.

C: MEV should, but does not, account for pace, playing time, strength of opponent, and the quality of one’s teammates.

R: You are right that it does not, but next week I will deliver a pace-agnostic value metric. Further, I am interested in measuring productivity and value, not quality, ability, or technique (all of which are much harder to measure). Productivity per unit time will be addressed next week, but corrections for other players and teams, or positions played, have nothing to do with production. If the player scores a point, it matters not where he is, how big he is, or who else is on the court, it still adds +1 to the final margin. In the playoffs, when the stakes are high, and There Can Be Only One, a missed shot is still going to set your team back about 1.018 points. I may, at a future date, look into estimating quality or talent, but for now, I’ll leave that to my more subjective brethren.

C: Team-level MEV does not correlate well with team wins, and even if it does, that’s only because points are included.

Though MEV does correlate positively and significantly with team wins, this is not a relevant concern. It is directly derived from game-level team scoring margin, and teams only win games if this margin is positive. Further, next week, I will introduce a value measure which incorporates MEV and, at the team level, correlates perfectly with team wins.

The most productive

For those of you who have stayed with me, here’s the payoff. Using MEV, as derived above, we can estimate the productivity of every player who has ever played professional basketball. Here is a table of every player (each team played for) for the 2007-08 season, sorted by a commonly-seen value measure, points per game:

Now, click on the “MEV/G” tab at the bottom, to see the second sheet, which ranks each player by their MEV per game. The list changes fairly substantially. King James, who has a pretty well-rounded game, is still near the top. But Bryant and Iverson drop a spot or two, as do Wade and Anthony. Where does Kevin Martin go? Michael Redd? Richard Jefferson? Corey Maggette? Kevin Durant??? On the other side of the coin, here comes Chris Paul, Dwight Howard, Kevin Garnett and Deron Williams, to the top of the productivity rankings. Click on the third tab, “Value Added,” to see each player’s MEV less points scored, per game. This is an estimate of the non-scoring ways in which each individual helps his team and hurts the other team. Pass-first point guards, defensive-minded bangers, and well-rounded contributors rise to the top. Chuckers (see: Ben Gordon), often characterized by flashy scoring numbers, sink to the bottom. These players still contribute positively, through their ability to score, but their positive value is diminished by the number of shots they miss, turnovers they give up, and the other things they fail to do to help their team improve that final margin.

What if we expand our analysis to the careers of the NBA’s all-time greats? Below is a set of three tables, mirroring those above, except that it covers the duration of 500 of the NBA’s most productive playing careers, according to MEV.

Jordan’s and Chamberlain’s greatness is still validated by MEV; both players contributed through much more than just scoring. Other NBA legends, such as Bill Russell, Magic Johnson, and Oscar Robinson, however, are inadequately captured by their PPG numbers. Again, at the bottom of the Value Added barrel, we see some famous score-first players.

Conclusion

I hope you have found this loquacious discourse both interesting and convincing. I have attempted to develop a theoretical grounding for the appraisal of player value, and used empirical data to estimate a set of scalars with a high degree of face validity. I believe that much of the justification for the accuracy of this metric can be found in its application to actual players. Many individuals commonly known to contribute above and beyond their scoring ability are identified as such by MEV, while those whose points come at a cost are likewise singled out. It is my impression that this productivity estimator finds a happy medium, at which theory meets regression output; scorers are punished for missing, not for just shooting; and credit and blame are meted out fairly.

Please come back next week, when I will go into similarly lengthy detail about value estimates!

¹ This analysis is somewhat similar to that performed by Dan Rosenbaum in estimating statistical plus/minus. I encountered his work after estimating my own regression, and tend to prefer my variable choices and results, but in the interest of openness, I wanted to reference this prior work.

Seth Carstens