Author Archives: Jon Nichols

Nichols and Dime: NBA Player Pair Data

Last Thursday, I published player pair data for every NCAA Division I team at my own site, This was inspired by the NBA player pair data that has been available at for the last few years. As I mentioned in my last article:

82games has compiled statistics showing how teams have performed with two specific players on the floor together. These “player pairs” are a complementary data view to our 5-man unit stats that measure unit performance. By focusing on two players at a time we can better understand which guys bring out the best in each other.

Unfortunately, that data is not currently available at 82games for NBA players. If it becomes available, I’ll be happy to point you in that direction. Until then, I have calculated the player pair data for the current NBA season, and it can be viewed here:


A Look at Steals: Does Gambling Pay Off?

Creating turnovers on the defensive end certainly is a good thing.  After all, it is one of the Four Factors.  Still, the ability of a defense to generate steals in particular is not always assumed to be beneficial.  Perhaps it is better to play more safe and solid D.  I’ve decided to look at the numbers and see what conclusions we can draw.

Using play-by-play data, I calculated the Steal Rate (the percentage of opponents’ possessions that ended in a steal by the team in question) for each lineup that appeared in at least 400 possessions last season.  I then compared that lineup’s Steal Rate to its Defensive Rating (points allowed per 100 possessions) and plotted the results in the chart below.  If steals are important, a higher steal rate should lead to a lower Defensive Rating, and therefore a negative slope:


So far, it appears as though steals are important.  Despite a low r-squared, these results certainly are meaningful and are very much statistically significant.  We can’t say that the number of steals entirely explains how well a defense will do (as evidenced by the low r-squared), but we can say that there is a correlation between high steal rates and low Defensive Ratings.

But we should pause for a second.  This graph can be very misleading.  Perhaps there are some confounding variables (hidden factors) that make the results appear to be this way when they really shouldn’t be.  In other words, maybe good defensive teams just have more athletic players in general.  This may cause them to get more steals, but it doesn’t mean steals are the reason they’re better.  If a bad team were to go for more steals, they’d still be a bad team and have a poor Defensive Rating.

However, there is another approach that we can take.  For each lineup, I’ve calculated the projected Defensive Rating based on the individual Defensive Ratings of each player in the lineup.  I then calculated the difference between the lineup’s projected Defensive Rating and actual Defensive Rating.   This difference was regressed against the lineup’s Steal Rate.

What is the point of this?  This method attempts to zoom in on just steals.  By taking a lineup’s projected Defensive Rating into account, we’re trying to adjust for other confounding variables.  This way, if there is a negative correlation between the difference and steals, it is further evidence that steals are important.   A negative slope in the graph below indicates that steals are important:


Again we see more evidence suggesting that going for steals is generally beneficial.  The r-squared is low but the results are statistically significant.

Of course, these graphs don’t specify what types of steals are good.  Risky attempts may very well hurt the defense.

In conclusion, based on the evidence I’ve presented today, I would suggest that lineups (and, theoretically, players) that record more steals are often better on defense.  To some, this may be obvious, but to others it may not be.  We can never know for sure how important steals really are, but the stats think they matter.

Recalculated Advanced Stats for the 2009-2010 Season

Some of you may recall an article I wrote back in November, when I re-calculated a few common advanced stats using play-by-play data.  That was for last season, and today I will provide the numbers for this season.

The benefit of using the play-by-play data is simple: instead of using estimates for different stats, we can know the real things.  For example, instead of estimating the percentage of Spurs’ rebounds that DeJuan Blair grabs, we can calculate the actual number.  The following stats will be presented: Rebound Rate, Offensive Rebound Rate, Defensive Rebound Rate, Assist Rate, Block Rate, Steal Rate, and Usage Rate.

The numbers are embedded in the table below.  Click the team filter at the top to sort by team.  Also, if you click the arrow in the very top left, you can download the spreadsheet as an Excel document or view it full-screen.

As a side note, for those interested in college basketball, check out exclusive NCAA plus-minus numbers at my home site, Those will be updated in the next day or two.

The Diminishing Returns of Shooting

Recently I took a look at the diminishing returns of rebounds, assists, steals, and blocks. As you may or may not have noticed, one common type of statistic was missing: shooting. Today I’m going to fill in the blanks using the same approach as last time.

If you haven’t read the previous article, the premise is simple. For each lineup in the NBA last year that appeared in at least 400 plays, I project how they will do in each stat using the sum of their individual stats. For example, to predict a lineup’s offensive rebound rate, I simply add the offensive rebound rates of each of the five players in the lineup. I then compare this projection to the actual offensive rebounding rate of the lineup. These steps are followed for each lineup and for each statistic.

If there are diminishing returns (i.e. in a lineup of five good rebounders, each player ends up stealing a little bit from his teammates), the correlation between the projected rates and the actual rates will be significantly lower than one. In other words, for each percentage of rebounding rate a player has individually, he will only add a fraction of that to the lineup’s total because some of his rebounds will be taken away from teammates.

If this still isn’t clear to you, be sure to check out the old article. Once you’ve done that, this article will make more sense.

Back to shooting. I’ve decided to take a look at the diminishing returns of eight aspects of shot selection/efficiency: three-point shooting percentage, three-point attempt percentage (the percentage of a player’s total attempts that are threes), close (dunks/layups) shooting percentage, close attempt percentage, midrange shooting percentage, midrange attempt percentage, free throw shooting percentage, and free throw attempt percentage.

To project a lineup’s percentage in one of those categories, I can’t simply add up the five individual percentages. For example, a lineup of five 30% three-point shooters is not going to shoot 150% from beyond the arc. Instead, I have to calculate a weighted average for the lineup. Therefore, each player’s three-point shooting percentage is weighted by the amount of threes he took. The same approach can be taken with attempt percentages.

For some statistics, such as free throw percentage, we shouldn’t expect to see any diminishing returns. After all, adding a great free throw shooter to a lineup shouldn’t make the other players in the lineup shoot worse from the foul line. However, with other stats (especially attempt percentages), diminishing returns seem more possible.

To start, let’s take a look at the diminishing returns of three-point shooting percentage:

Here we see the slope is just about 1. However, the standard error for this slope is 0.21, so the results are pretty inconclusive.

How about three-point attempt percentage?


Again the slope is just about 1. This time, though, the standard error is just .04. Therefore, we can say with pretty good certainty that there are no diminishing returns for three-point attempt percentage. In other words, adding a player to your lineup that likes to shoot threes is going to add a proportional amount of three-point attempts to your lineup total.

Up next we have close shooting percentage:


The slope is actually above 1 this time, although it’s less than one standard error away from 1. There definitely is no clear evidence of diminishing returns for close shooting percentage. Adding an efficient player around the basket to your lineup will probably not make your other players less efficient around the basket.

Close attempt percentage:


The standard error for this slope is just .05, so we may be seeing slight diminishing returns. But not much.

Midrange shooting percentage:


The standard error for this one is pretty large (0.15), but again there are no real signs of diminishing returns.

Midrange attempt percentage:


These results are pretty similar to those of close attempt percentage. The slope is less than 1 and the standard error is pretty small. Again, though, the diminishing returns effect appears to be quite small.

Free throw percentage:


As I mentioned in the beginning of the article, we shouldn’t expect to see diminishing returns on free throw percentage, and we don’t.

Free throw attempt percentage:


Just like the rest of the stats we looked at, we don’t really see a hint of diminishing returns for free throw attempt percentage.


Unlike statistics such as rebounds, assists, steals, and blocks, shooting (in all of its forms) doesn’t seem to have the problem of diminishing returns. A player’s shooting efficiency will have a proportional impact on a lineup’s shooting efficiency, and his shooting tendencies will have a proportional impact on a lineup’s shooting tendencies. There are other ways to attack this question, though, and in the future I plan on doing just that.

The Diminishing Returns of Rebounds and Other Stats


One thing many people have wondered is whether or not there are diminishing returns for rebounds. Basically, what that would mean is that not all of a player’s rebounds would otherwise have been taken by the opponent; some would have been collected by teammates. Therefore, starting five lead leaguers in rebounds would probably be overkill because eventually they’d just steal them from each other. At some point, there are only so many rebounds a team can grab, and some are just bound to end up in the hands of the opponent.

This principle is very important to statisticians who wish to develop player ratings systems. These ratings often assign weights to different statistics (including offensive and defensive rebounds), so knowing that a defensive rebound collected by one player would most likely otherwise have been collected by a teammate makes that stat less “valuable” in terms of producing wins.


To test the effect of diminishing returns of rebounds, I decided to go through the play-by-play data (available at Basketball Geek) and compare each lineup’s projected rebounding rates (the sum of each player’s individual rebound rates for the season) to their actual rebounding rates (what percentage of rebounds that lineup grabbed while it was on the floor). After doing some research, I found out a very similar study was done by Eli Witus (formerly of, currently of the Houston Rockets). Before you proceed with the rest of my article, you should read his. Although my method is slightly different, he provides a great explanation of why it’s useful to do the research this way and he also lists some advantages and disadvantages of this method.

Before I show you the results, I should explain the intricacies of my research and also some of the differences between Eli’s study and mine. The individual rebound rates I used were taken from the rebound rates I calculated myself using the play-by-play data. Because both the individual rates and the lineup rates were calculated from the same data, there’s less risk of error due to silly things such as differences in calculations or incomplete data. Also, to reduce the effects of small sample sizes due to lineups that didn’t receive a lot of minutes together, Eli chose to group lineups into bins based on their projected rebound rates. He then regressed each bin ‘s (a collection of different lineups with similar projected rebound rates) projected rebound rate to its actual rebound rate.

When I was coming up with my idea, I chose to do things a little differently, although the purpose is the same. Instead of grouping the lineups into bins, I simply only selected the lineups that met a minimum qualification for plays. Only lineups that appeared in at least 400 plays were included in my study. This left me with a sample size of 475 lineups. Like Eli, I then regressed the projected rebounding rates versus the actual rebounding rates. One final difference between us two is that his article was written in February of 2008, so I’m presuming he used data from the 2007-08 season. I’m using data from the 2008-09 season.

Offensive Rebound Rate

The graph for Offensive Rebound Rate is below:


The key to understanding this graph is looking at the slope of the line. Here, it is 0.7462 (close to the 0.77 number he got). If there were no diminishing returns for offensive rebounds, the slope would be 1. This would mean that for each additional rebound a player could offer to his lineup, he would actually add one rebound to the lineup’s total. If the slope is less than one (such as in this case), it means that each additional offensive rebound by the player adds about 0.75 to the lineup’s total, because some of those would have been taken by his teammates anyways. The slope I have here is pretty high, though, indicating that the diminishing returns effect for offensive rebounds isn’t too strong.

Defensive Rebound Rate

In his study, Eli found that the diminishing returns effect was much stronger for defensive rebounds. Can I replicate his results? Below is the graph for defensive rebounds:


Eli found a slope of 0.29. Mine was close, but slightly higher at 0.3331. Regardless of the minor difference, we both can come to the same conclusion: there is a much stronger diminishing returns effect at play with defensive rebounds than there is with offensive rebounds. While each offensive rebound adds 0.75 to the lineup’s total, each defensive rebound only adds 0.33, indicating that many defensive rebounds are taken away from teammates. Of course, individual cases can vary.

These results help explain why a lot of player rating systems make defensive rebounds “worth” less than offensive rebounds. Eli has a good explanation of it at the end of the article here. For example, in his PER system, John Hollinger assigns offensive rebounds a value more than double the value of defensive rebounds. This is partly due to the diminishing returns effect we found here today and originally in Eli’s work. As it turns out, my numbers indicate that offensive rebounds are in fact worth a little more than double the value of defensive boards. So hats off to Hollinger and his many contemporaries who have managed to weight rebounds appropriately.

Further Exploration

I could stop here, but I’d like to take this research a little further and see what other insights we can come up with. First, I’d like to break down the data by location (home and away).

Home Data

One thing to note is that the projected rebounding rates for the lineups are based on overall individual ratings, not just for home games. If rebounding was usually in favor of the home teams, this would lead the projected lineup rebounding rates to usually underestimate the actual rates in this case. However, since it would presumably do this for all lineups, we can still take a look at the effect of diminishing returns.



With that being said, how does the home data compare to the overall data? For offensive rebounds, the slope is flatter, indicating a stronger effect of diminishing returns. However, for defensive rebounds, the slope is slightly higher, indicating a lesser effect. The differences are minor, though.

Away Data

We can also take a look at the away data:



As you would expect given what we now know about the home data, the effect of diminishing returns appears to be much weaker on the road for offensive rebounds. In fact, as we can see, the slope is getting close to 1. This indicates that there isn’t much in terms of diminishing returns for this type of rebound. Intuitively, this makes sense. If teams rebound the ball better at home, there are less offensive rebound opportunities for the visiting team. Therefore, it is more likely that an offensive rebound by a visiting player would otherwise have been grabbed by the opponent as opposed to one of his teammates, which in turn makes good offensive rebounders more valuable on the road. The same pattern doesn’t follow for defensive rebounds, though. In both cases, the difference isn’t gigantic, so we should be hesitant to draw any serious conclusions.

The one difference that is large and consistent is the difference in slopes between offensive and defensive rebounds, no matter the location. Confirming what Eli found in his original studies, this data says that the effect of diminishing returns is much stronger on defensive rebounds than it is on offensive ones. Therefore, offensive rebounding is a more “valuable” skill in terms of how you rate players, and some of the best player rating systems do take this into consideration.

Other Statistics

So far, this whole article has been about the diminishing returns of rebounds. However, we can also use the same lineup-based approach to look at other statistics. Today I’ll also explore the diminishing returns of blocks, steals, and assists. Eli already used his method to take a crack at the usage vs. efficiency debate, and I recommend you read that article for some fascinating insight.

Block Rate

Block Rate, for a lineup, is defined as the percentage of shots by the opposing team that is blocked by one of the players in the lineup.

Blocks are an interesting statistic to examine. After all, there are only so many block opportunities around the basket and occasionally on the perimeter. When you also take into consideration the fact that teams often funnel players into the waiting arms of a dominant shot-blocker, it seems as though the diminishing return for blocks should be relatively strong. That is, if you add a shot blocker that normally blocks 4% of the opposing team’s shots to your lineup, you shouldn’t expect to block nearly that many more as a team because of diminishing returns. To see if this is true, I used the same methodology that I did for rebounding and came up with this graph:


As it turns out, the slope is at 0.6015. This puts Block Rate somewhere in the middle between Offensive Rebounds and Defensive Rebounds. A lineup full of good shot blockers will almost certainly block more shots than a weaker lineup, but the difference may not be as much as you might think due to effects of diminishing returns.

Steal Rate

Up next we have Steal Rate. For an individual, it is defined as the number of opponent possessions that end with the given player stealing the ball. Therefore, for a lineup, it would be defined as the number of opponent possessions that end with a steal by anyone from that lineup. The graph for Steal Rate is below:


Here, we see the slope is nearly 1. This indicates that there is practically no diminishing returns effect on steals. If you add a player 2% better than average in terms of steals to your average lineup, you should expect to steal the ball almost 2% more than you currently do. Another way to put it is that usually, if a given player steals the ball, it’s not likely that someone else would have stolen the ball if he failed. Of course, like with every graph so far, the R^2 is still very low. This means that we can’t really predict how many steals a lineup will get simply by adding the Steal Rates of all of its players.

Assist Rate

Finally, we have Assist Rate. For an individual, it would mean the number of field goals made by a player’s teammates that he assisted on. For a lineup, it means the percentage of made field goals that were set up by an assist. The graph is below:


Of any graph presented on this page so far, this one has by far the lowest slope. Normally this would indicate that there is a huge diminishing returns effect for assists. However, I’m not sold on this explanation just yet for various reasons, so for now I will just present the data as is.


I discussed a number of different issues today, so I think it’s good to recap what I’ve presented. First, using a method similar to the one Eli Witus used at, I found that there is a large diminishing returns effect for defensive rebounds that is significantly larger than the effect for offensive rebounds. This confirms the common belief that offensive rebounds are “worth” more than defensive ones. When we split the data into home and away, it appears that individual offensive rebounding skill is particularly important on the road, indicated by a very high slope on the graph. Finally, I took a look at the diminishing returns of a few other advanced statistics and found the strongest effect on assists and a weaker but still significant effect on blocks.

If you have suggestions or comments about my work, please e-mail me at [email protected]. And again, much credit must go to Eli Witus, who originally thought of these ideas well before I did.

Nichols and Dime: Tracking Hustle Plays in the Clippers’ 98-88 Victory Over the Grizzlies

Something I’ve wanted to do for a while, and something I imagine every team does, is watch a game from start to finish and track all of the hustle plays made by both teams. In a league in which every player is not always giving 110%, sometimes a little bit of hustle and effort can go a long way.

With that in mind, the game I chose to track was Sunday’s afternoon contest between the Los Angeles Clippers and the Memphis Grizzlies. The Grizzlies have been playing well lately, including an impressive victory over Portland. Altogether, they came into the game having won five of their last seven contests. The Clippers were also relatively hot, having won three of their last four. As one would expect from a Grizzlies-Clippers game, the matchup was pretty lackluster through three quarters in terms of excitement and competitiveness. Then the fourth quarter arrived and the Clippers exploded, including a 22-0 run that won them the game. Los Angeles outscored Memphis 33-7 in the fourth quarter. After it was all said and done, the game ended up being a very memorable one for the Clippers and one the Grizzlies would soon like to forget.

Hustle, of course, played a large role. Below is a link to a spreadsheet which has the results of the statistics I tracked for the game. Those statistics included loose ball attempts, charges drawn, good sprints down the court (on either offense or defense), deflections, and missed blockouts.

  • Until the Clippers caught fire in the fourth and their energy increased, this was not a hustle-filled game. Memphis did build a large lead partly because of some nice hustle by their big men on both ends of the floor. But overall, hustle was not the reason they were leading.
  • Neither team drew a charge.
  • Even when the Clippers were losing, they had active hands. They finished with 28 deflections.
  • Marc Gasol showed not only skill but also effort for most of the game. He went for loose balls and ran the floor hard in the first half.
  • Baron Davis was a menace with his on-the-ball defense. He played up close on Jamaal Tinsley and Mike Conley and had six deflections.
  • Al Thornton was the Hustle MVP. He was active on both ends and sprinted the floor all night. His effort was key to the Clippers’ victory.
  • Did the Clippers dominate the fourth because they hustled more or was it the other way around? Either way, the difference between their first three quarters and the fourth was like night and day. Seemingly every Memphis possession in the final period ended with a deflection by a Clipper, a Clipper grabbing the loose ball, and a layup on the other end.
  • Eric Gordon showed impressive speed in running the floor and led a couple of late fast breaks very well.
  • Rudy Gay finished with 10 rebounds, but he was often outworked by Thornton.

A couple of non-hustle related notes:

  • As I mentioned earlier, Gasol was brilliant at times. He showed a variety of post moves and was very efficient on offense, converting 13 of his 18 attempts.
  • Although he struggled late, Jamaal Tinsley deserves a lot of credit for the way he’s stepped in and performed for Memphis. On Sunday he ran the offense brilliantly. He also made a number of smart plays defensively and finished with three steals.
  • Zach Randolph hustled early, but he struggled immensely on offense. He missed seven of his eight attempts and wasn’t exactly a rock on defense.
  • Going against O.J. Mayo and Rudy Gay, Thornton and Gordon were the stars for the Clippers. This may not be a surprise.
  • Marcus Camby made some big defensive plays late in the game. He also finished with 14 boards, although he often is lazy in terms of getting good position and instead just relies on his length and athleticism to collect rebounds.
  • Thornton’s hustle can best be illustrated by his rebounds. Although he only finished with six, five of those were offensive.

Nichols and Dime: Does Defense Get Better With Age?

As players get older, the belief is that they learn the tricks of the trade and get better at defense. During their first few years, they’re ill-equipped and unable to have a positive impact on defense, despite their superior athleticism and energy.

Do the numbers support these beliefs? We must turn to the always-useful Using its Player Season Finder, I put together a spreadsheet containing every season from every player (minimum 500 minutes played) for the past five years. Using this data, we can see how Defensive Ratings change as players get older. Defensive Rating was developed by Dean Oliver, and it estimates the number of points a player allows per 100 possessions. Obviously, a lower number is better. To read more about it, check out the Basketball-Reference glossary. Let’s take a look at the chart:


I limited the age range from 19 to 36 to avoid outliers. On the x-axis, we have the age, and on the y-axis, the average Defensive Rating for that age. The results seem to confirm the common belief. Younger players tend to post higher (worse) Defensive Ratings than older players. Real life doesn’t work perfectly, so there are some fluctuations. However, the correlation is strong, indicated by the relatively large R^2 (explanation here). Therefore, there does appear to be something to the notion that players get better defensively as they get older.

We can also produce a similar graph using Defensive Win Score, a similar measure to Defensive Rating (for more information, check the glossary again). Basically, DWS is the amount of wins a player adds to his team through his defense. The chart is below:


The R^2 is slightly smaller, but the general idea is the same. Players get better defensively as they get older. Not considerably so, but statistically significantly so.

However, we must approach these results with caution. Let’s say, hypothetically, that big men generally have lower Defensive Ratings. Let’s also say, hypothetically, that big men stay in the league longer than their shorter counterparts. These two scenarios would combine to make it look like players get better defensively with age. What’s a simple way to account for complications such as this? Take a look at the data position by position.

To start, let’s look at centers:


The results appear to be clear as day here. The line is a little wavy, but centers sure seem to get better defensively as they get older. The average for 35-year olds is over three points per 100 possessions lower than the averages for 19-, 20-, and 21-year olds. Do power forwards react the same way to age?


Simply put, yes. These results tend to go with common logic. Many raw and young big men commit silly fouls, ignore help defense, go for the spectacular block too often, etc. However, we should not treat these results as gospel, as I will explain later.

How about small forwards?


Just like the previous two positions, it appears small forwards age well, at least on the defensive end. The magic number for this position appears to be 29. Small forwards that were at least 29 years of age during the last five seasons performed much better on the defensive end than their younger counterparts did. Let’s take a look at shooting guards:


We keep seeing the same results. No matter what position you look at, the story is the same. Players get better on defense as they get older. Finally, let’s take a look at the inevitable and see how point guards get better defensively with age:


Woops. That trend line has an oh-so-slightly negative slope, but it’s not exactly a great fit for the data (the R^2 is practically 0). Clearly, then, point guards don’t follow the same path as other positions. Older is not better in this case. For a position that often relies so much on speed and quickness, this makes sense. However, even point guards in their prime (around the age of 27) don’t perform significantly better than the young ones.

To wrap this up, we can make the following statement based on the data: Except for point guards, players generally get better on the defensive end as they get older. However, there are a number of issues to address before we go too far and actually believe that bold statement I just made:

  • The statistic I used, Defensive Rating, is far from perfect. Defense is one of the hardest things to measure accurately with statistics, and this measure is no different. It is highly team-dependent. Good defenders on poor defensive teams will be underrated, and vice versa.
  • Although there is a correlation between age and Defensive Rating, that doesn’t mean it’s a causal relationship. It may not be that all older players are better defenders. Perhaps the only way to stay in the league if you’re getting older is to play solid defense, so the ones that don’t are selectively removed. Or maybe strong defensive teams like to acquire veterans, which boosts those player’s Defensive Ratings.
  • Finally, although it is a pretty large sample-size (five years of data for 1,641 data points), the data still could be misleading. For example, if there happened to be a strong crop of old centers during the past five years, that position’s results may be inaccurate. I limited the sample to five years because I don’t like using data that is very old. The style of the NBA changes constantly, so using information from say, 10 years ago, may not be smart.

UPDATE: After doing some more research, we may have to re-think things. Thanks to suggestions by Ryan Parker and Mike G at the APBRmetrics board, I decided to plot the average change in Defensive Rating (the difference between the current year and the last) for each age. It is below:


Looking at the graph above, we notice a couple of things. First, over the last five years, players of all ages tend to get worse defensively on a year-by-year basis. Whether it’s because of improving offenses or declining defenses, scoring has increased during each of the last five years.

More importantly for this study, we see that older players are declining faster than younger players are. For example, during the last five years, a 26 year-old is likely to have a Defensive Rating 0.5 points higher than he did a year ago. On the other hand, a 35 year-old is likely to have a Defensive Rating 1.5 points higher than he did a year ago. The difference between old and young isn’t much, but we can probably say that old isn’t definitively better than young.

Like I said in my original post, selective bias may be a problem. After all, this most recent research doesn’t dispute the fact that as a whole, when you look at all the old players, they tend to be better defensively than the young players. But that’s not because they got better as they got older. The data shows this. What we may be able to say now is that aging doesn’t improve your defensive abilities, and if you want to stay in this league as a veteran, you better be good at defense, because teams will “selectively remove” you from the league if you’re not.

Check out for much more like this, where I presumably get things right the first time.

Nichols and Dime: How Accurate are some of the Advanced Stats That We Use?

Last week, I calculated my own version of various advanced statistics, such as Rebound Rate, Assist Rate, and Usage Rate. The difference between my versions and the ones you normally see are that mine were based on actual play-by-play data, rather than estimates. Although my method isn’t perfect (partly because the play-by-play isn’t always reliable), I figured it was more accurate to base our stats on stuff that has actually happened as opposed to estimates of what happened.

Under that assumption, the question is how accurate are the numbers we’ve grown to know and love? Although they’re not too difficult to calculate, the play-by-play figures aren’t always available, so we need to know if we can count on the data that is most common. How far off are these estimations? Are there certain types of players for which these stats are usually inaccurate?

To recap, these are the stats in question:

  • Rebound Rate
  • Offensive Rebound Rate
  • Defensive Rebound Rate
  • Assist Rate
  • Steal Rate
  • Block Rate
  • Usage Rate

Let’s start with a simple test. How well do the estimated numbers correlate with the play-by-play numbers? Below is a table that includes the R^2 (explanation) and standard error of each linear regression, as well as the average difference between the two types:


Thankfully, we see that all of the estimations appear to be pretty darn accurate. The R^2’s are all extremely high, and the standard errors are low. Of the seven stats I’m examining, Steal Rate appears to be the most inaccurate. It fares the worst in each of the three table columns. Overall Rebound Rate appears to be the most accurate. From this table, we are given no reason to doubt the validity of the box score estimations.

Although they may be accurate as a whole, perhaps these numbers are inaccurate just for certain players. Specifically, I was wondering if players that rate either really high or really low in a certain statistic are generally rated accurately by the box score estimation. To try to answer that question, I ran another regression. This time, the box score estimation was the independent variable, and the difference between the box score and play-by-play was the dependent variable. The results are in the table below:


There are some things to look out for. Although the adjusted R^2’s are all quite low, even negative sometimes, the slopes are all positive. This would indicate that as a given player gets better in a certain statistic, the box score data is more likely to overrate him in that category. The biggest problems occur with Assist Rate, which has a moderately sized R^2 value.

If that table doesn’t seem intuitive, I’ve also decided to present the results graphically. In each chart below, the x-axis is the box score estimate’s value, and the y-axis is the difference between the estimate and the play-by-play calculation.




All three Rebound Rates look pretty accurate, although they become more unpredictable as the numbers get high, especially with respect to Defensive Rebound Rate. When the Rate is around 10, the errors are pretty closely scattered around 0. However, when you get to 17.5 or 20, the errors become larger.


As I mentioned before, Assist Rate seems to have some major issues. For low Assist Rates, the differences are pretty small. However, when you get to the top assist men, the differences can be quite large. For example, Chris Paul’s Assist Rate for last season, according to the box score data, was 54.5. However, the play-by-play data has it at 51.2. For someone like him, where the number is astronomically high no matter which method you choose, the difference might seem trivial. But it does appear that top assist men are overrated the most by Assist Rate.


There’s not much to gather from the Steal Rate chart, although it becomes clear that my play-by-play computations are generally lower than the box score estimates.


Like Rebound Rate, Block Rate becomes particularly difficult to estimate when the numbers get high. As a percentage of the Block Rate, though, the difference is actually pretty consistent.


Finally, we have Usage Rate. There aren’t any major issues except for one outlier at the bottom, which is the result of complications due to the weirdness of Luc Richard Mbah a Moute’s name (seriously).

In conclusion, my research has shown me that, despite some minor issues, the box score estimations of things such as available rebounds are actually pretty close. They aren’t always perfect, and they can be particularly unreliable when the numbers get large, but overall they do a good job. Hopefully this work will provoke discussion on how we can continue to perfect those stats.

Nichols and Dime: Recalculating Advanced Stats Using Play-by-Play Data

Recently at his web site, Basketball Geek, Ryan Parker used play-by-play data to calculate Dean Oliver’s offensive and defensive ratings. I’ve decided to use Ryan’s approach (and data!) to calculate some of the other advanced statistics out there, many of which were developed by John Hollinger.

Many of these statistics are usually calculated using estimates based on the data available in box scores. However, with the play-by-play data in hand, we can turn these estimates into actual numbers. To calculate the stats, I used the formulas available in the Basketball-Reference glossary. For today, the following numbers will be presented:

  • Rebound Rate: The number of available rebounds a player collected while he was in the game.
  • Offensive Rebound Rate: The number of available offensive rebounds a player collected while he was in the game.
  • Defensive Rebound Rate: The number of available defensive rebounds a player collected while he was in the game.
  • Assist Rate: There are a few ways to calculate this. I defined it as the percentage of field goals a player’s teammates made that he assisted on while he was in the game.
  • Block Percentage: The percentage of opponent field goal attempts blocked by a player while he was in the game.
  • Steal Percentage: The number of opponent possessions that ended with the player stealing the ball while he was in the game.
  • Usage Rate: The percentage of team plays used by a player while he was in the game.

There are a number of different ways to calculate Assist Rate. I calculated my version based on the method used by people such as Ken Pomeroy and Ed Kupfer. Ryan defines his Assist Rate as the “percentage of possessions used that were assists.” There are subtle differences, I believe.

So what’s the difference between my calculations and the usual ones? The following changes:

  • For rebound rates, the number of available rebounds for a player is usually estimated based on the team’s rebounding rates and the player’s minutes. With my method, the actual number of rebound opportunities is determined.
  • For assist rate, the number of field goals made by teammates when a player is on the court is normally estimated based on the player’s minutes and the team’s total field goals. With my method, the actual number of teammate field goals is determined.
  • For block percentage, the number of opposing field goal attempts when a player is on the court is estimated. I use the play-by-play data to get an actual count.
  • For steal percentage and usage rate, player and team possessions are normally estimated, but we can use the play-by-play to count the actual number of possessions.

The numbers for every player are available in the Google Docs spreadsheet below:

My next step is to calculate PER using these numbers, and I plan to get to that shortly. Much credit again must go to Ryan Parker for inspiring me to do this.

Predicting a Player’s Impact on Teammates’ Three-Point Shooting

To wrap up my series of articles on the impacts of players on their teammates’ three-point shooting, I thought I’d take a look at perhaps the most important aspect: can we use the available data to evaluate players and predict the future? Predicting the interactions of players is nearly impossible, but how close can we get to modeling certain aspects of these interactions?

I’m going to take a few different approaches today. The first approach is to see if we can predict a player’s impacts on his teammates’ three-point shooting based on other advanced statistics. For example, if we know an individual’s Player Efficiency Rating, can we estimate what kind of impact he’s having on his teammates? I ran a series of simple linear regressions between 12 different advanced stats and the impacts on three-point shooting. The results are displayed below:


The correlations can be interpreted as follows: if Player A scores one more point per 40 minutes than Player B, then Player A increases his teammates’ three-point attempt percentage (how often they shoot threes) by 0.16% more than Player B. Because increases in three-point attempts and three-point percentage generally are good things, it makes sense that most of the correlations in the table are positive. Players that perform better in these advanced statistics have a more positive impact on their teammates, with the exception of Rebounding Rate. For the statistically inclined, all of these are significant at the .01 level, with the exception of Rebounding Rate.

So far, so good. The numbers seem to agree with common sense: better players help their teammates more. But how much can these statistics really tell us? In other words, if we only know Chris Paul’s Assist Rate, can we predict how much influence he has on his teammates’ three-point shooting? To answer these questions, I turned to the R-squared values ( of each correlation. R-squared values range from 0-1, and they essentially tell us how much of an outcome (in this case, impact on three-point shooting) is explained by an independent variable (in this case, PER or Assist Rate or any of the other stats). The results are in the table below:


With the exception of Minutes Per Game (which may just be a reflection of overall ability), the R-squared values are all very low. In a hypothetical and easier world, they’d all be higher. Unfortunately this is the real world, and basketball is much too complicated for us to be able to predict complex player interactions based on a simple stat or two.

Before I go any further, let’s recap what we know so far. There is a significant correlation between most advanced statistics and interaction effects on three-point shooting, so we know that these things aren’t random. But these stats explain only a very tiny part of the story, so we know that interactions are very complex.

The next step I took was to attempt to develop a model that would predict a player’s impacts on three-point shooting using a combination of the different statistics. After playing with the numbers, I was able to achieve an R-squared value of 0.26 for Impact 3PA and 0.36 for Impact 3PCT. These numbers were boosted to 0.45 and 0.51, respectively, if we took the other impact number into account (using Impact 3PCT in the Impact 3PA regression and vice versa). But this defeats the purpose of the study, so we’ll ignore those most recent numbers.

What does this next step tell us? Within the limits of linear regression, we can only explain about 26% or 36% of a player’s impact on his teammates’ three-point shooting using various available advanced stats. In the real world, those numbers aren’t horrific, but they’re far from being the keys we need to truly figure out the game of basketball.

Finally, let’s switch gears and examine how consistent these interaction effects are from one year to the next. After all, if they are totally random, there should be no correlation. Using the numbers from 07-08 and 08-09, I ran regressions for Impact 3PA and Impact 3PCT. Both regressions resulted in statistically significant, positive correlations. That’s good. But the R-squared values for the regressions were .05 and .1, respectively, which is not so good. In other words, knowing how a player affected his teammates’ three-point shooting last year will only tell you a little bit about how he’s going to do this year.

Another interesting thing to look at is the impacts for players that switched teams. If players have similar impacts no matter what team they’re on, we may be on to something. When we limit the sample to these players, we again get statistically significant but not particularly informative results. Both regressions are significant at the .02 level but produce R-squared values under .1.

No matter which way you slice it, you get the same idea: good players make the players around them shoot threes more often and more efficiently, but that’s all we can say for sure. We can quantify what’s already happened, but we can’t predict the future.

If you’re still reading now, you’re undoubtedly more interested in this stuff than the average fan, and you may have some suggestions for me. If so, send me an e-mail at [email protected].