Clusters of Scarcity


Photo by a100tim via Flickr

Ian Levy is the author of Hickory High, a contributor to Indy Cornrows, The Two Man Game and HoopSpeakU, and now Hardwood Paroxysm, because you know, we felt like we didn’t have enough people writing here. He begins today in a discussion of talent clusters from the Sloan Sports Analytics Conference, and the confluence of megastardom. You can follow Ian on Twitter at @HickoryHigh. He’s also a pretty smart dude. – Ed.

Revolutions, loud and subtle, all share certain characteristics. New fronts are progressively opened, expanded and solidified, and the assimilation of advanced statistical analysis into the standard NBA experience is no exception to this revolutionary repetition. One of these new bubbles of creative production has been the move from evaluating individual players to evaluating combinations of players. Mountains of lineup and unit data are now publicly available, but these numbers are still mostly historically descriptive in nature. They look back at the recent past and tell us what happened. When it comes to examining why something happened or what circumstances may make it likely to happen again, statistical analysis hasn’t had much more to offer than that old stand-by – subjective observation.

At the MIT Sloan Sports Analytics Conference, three different analytic methods were presented,  each using numeric data to answer the precise questions of how to best fit players together and ensure synergistic skills sets are on the floor. Of the three, I found ‘Big 2’s and Big 3’s: Analyzing How A Team’s Best Players Complement Each Other’ by Robert Ayer, the most compelling. I will admit, with just a modicum of shame, that Ayer’s paper appealed to me because enough of his method was explained in the presentation that I could, in my own rudimentary way, access and use the information.

The title of the paper refers to two and three-man combinations of a team’s best players. Ayer’s method involved defining players by clusters of statistical production. This clustering looks beyond traditional positions to the types of production that are provided by different players. For example Pau Gasol-ish power forwards are considered differently than Ryan Anderson-ish power forwards. He then ran a multiple regression analysis to determine the extent to which certain combinations of those player clusters, among a team’s two or three best players, affected that team’s win total. The NBA’s Efficiency Rating was the metric used for separating out a team’s three best players.

The clusters Ayer identified are below, with his descriptions and examples.

Cluster 1 – Limited, role-playing centers: Erick Dampier, Tree Rollins
Cluster 2 – High scoring, dynamic guards, typically not great three-point shooters, or if they are they don’t shoot very many: Kobe Bryant, Dwyane Wade, Tracy McGrady, Adrian Dantley
Cluster 3 – Somewhat limited, role-playing backcourt players: John Paxson, J.J. Barea
Cluster 4 – Wing, three-point shooters: Dan Majerle, Shane Battier
Cluster 5 – Dynamic, well-rounded power forwards, strong rebounding, dynamic 3’s: Chris Webber, Pau Gasol, Kevin McHale
Cluster 7 – High scoring, high assists, high steals, high turnover point guards, who don’t shoot three-pointers: Kevin Johnson, Isiah Thomas
Cluster 8 – Multi-faceted, high scoring wings, with high assists for their position who are great three-point shooters: Paul Pierce, Danny Ainge
Cluster 9 – Pass first, low scoring point guards: Avery Johnson, Mark Jackson
Cluster 10 – Limited 4’s, very strong rebounders, defense oriented: Dennis Rodman, Ben Wallace, Buck Williams
Cluster 11 – Three-point shooting bigs: Rasheed Wallace, Antawn Jamison, Detlef Schrempf
Cluster 12 – High scoring post players, high rebounds, high blocks: Shaquille O’Neal, Hakeem Olajuwon, David Robinson
Cluster 13 – Well-rounded small forwards; generally don’t shoot many three-pointers: Luol Deng, James Worthy
Cluster 14 – Role-playing big men without an exceptional skill, but contribute in several categories: Udonis Haslem, Kurt Thomas

Here were the combinations he found to have an effect on team performance:

Big 2’s

  • Cluster 2 – Cluster 2: +3.97 wins
  • Cluster 10 – Cluster 12: +4.69 wins
  • Cluster 2 – Cluster 8: +4.35 wins
  • Cluster 8 – Cluster 11: +4.75 wins
  • Cluster 8 – Cluster 12: +7.59 wins
  • Cluster 8 – Cluster 8: -4.05 wins

Big 3’s

  • Cluster 2 – Cluster 2 – Cluster 5: +3.70 wins
  • Cluster 2 – Cluster 5 – Cluster 8: +3.43 wins
  • Cluster 7 – Cluster 8 – Cluster 12: +13.60 wins
  • Cluster 8 – Cluster 10 – Cluster 12: +5.43 wins
  • Cluster 5 – Cluster 5 – Cluster 9: -8.47 wins
  • Cluster 2 – Cluster 2 – Cluster 7: -4.78 wins
  • Cluster 5 – Cluster 8 – Cluster 8: -3.61 wins

It’s important to note that these relationships are ‘talent agnostic’. Having a 2-2-5 combination among your best three players has historically been worth an extra 3.7 wins. However, if your team has 20 win talent, including those three best players, that combination only bumps you up to 23 or 24 wins.

Intrigued by this entire project I decided to try and overlay Ayer’s player clusters with this season’s data, and see if any teams appear to have one of the statistically significant combinations he identified. I used similarity scores to place the top three players on each NBA team into one of those clusters. Like Ayer, I used the NBA’s Efficiency Rating to determine the top three players on each team.

The individual results can be found here. The first sheet of the spreadsheet contains the master results. There are also sheets, listed across the bottom, for each player showing how their statistics fit into each cluster.
Here are the teams that had statistically significant combinations:

Big 2’s

  • Cluster 2 – Cluster 2: +3.97 wins
    • Miami Heat: LeBron James (2) – Dwyane Wade (2)
    • Oklahoma City Thunder: Kevin Durant (2) – Russell Westbrook (2)

Big 3’s

  • Cluster 2 – Cluster 2 – Cluster 5: +3.70 wins
    • Miami Heat: LeBron James (2) – Dwyane Wade (2) – Chris Bosh (5)
  • Cluster 2 – Cluster 5 – Cluster 8: +3.43 wins
    • Sacramento Kings: DeMarcus Cousins (5) – Tyreke Evans (2) – Marcus Thornton (8)

In his presentation at Sloan, Ayer highlighted the positive value of a pair of Cluster 2 (High-scoring dynamic guards) players as a finding that bucked conventional wisdom. Looking at this season’s data we find two perfect examples of successful combinations swimming against the current of public opinion. The Heat and the Thunder are both elite teams, but there is a steady Gregorian chant from fans and analysts alike that their success is because immense talent is overwhelming the bad fit and duplication of their top players. The idea persists that their ceiling has somehow been lowered by the particular arrangements of talent. Maybe it’s time to acknowledge that duplication of skills and good player fit are not mutually exclusive.

No team this season has hit the ultimate jackpot with the 7-8-12 combination that Ayers found to be worth +13.60 wins. But the Clippers weren’t far off. They have their Cluster 7 player in Chris Paul. They have their Cluster 12 player in Blake Griffin. That gaping hole the Clippers have been sporting all season at shooting guard would be a perfect place to plug in that Cluster 8 player, the ‘Multi-faceted, high-scoring, high-assist, 3PT shooting wing’. The Timberwolves are in the same boat with a Cluster 12 in Kevin Love and Cluster 7 in Ricky Rubio. Finding a talented Cluster 8 could be the difference in both team’s long-term success.

Golden State is also an interesting situation. Up until the trade deadline they had a 5-8-8 combo, with David Lee, Monta Ellis and Stephen Curry, worth -3.61 wins. Just removing Ellis may prove to provide some measure of relief through addition by subtraction.

As interesting as what Ayer’s work is for evaluating current and potential player combinations, I was just as interested in what his clustering work reveals about the makeup of the NBA and the scarcity of talent at certain positions. Our data set here is somewhat skewed because we’re looking at the top three players for each team as opposed to the 90 best players in the NBA. Still there are some striking holes.

The chart below is an analysis of the prevalence and relative talent of each player cluster. The blue bars represent the number of players from our group of 90 that fell into each cluster. The black lines stretch from the Efficiency rating of the least talented player in that cluster to the most talented player in that cluster. The red diamonds are the average Efficiency Rating for all the players in that cluster. If you’re unfamiliar with Efficiency Rating it’s a per game average calculated with this formula: ((Points + Rebounds + Assists + Steals + Blocks) – ((Field Goals Att. – Field Goals Made) + (Free Throws Att. – Free Throws Made) + Turnovers)). To put the numbers into context, LeBron James leads the league with an EFF of 30.0. An average NBA player comes in around 10.0.

There is a lot going on in this graph, but look closely and you’ll see a fairly clear representation of how scarcity affects team building. The scarcity on this graph is of two different varieties. In our set of 90 players not a single one fell into Cluster 1 or 3. This is not because those player-types are few and far between, but because we were looking only at the three most productive players for each team. Cluster 12s on the other hand are rare in our sample because they are rare everywhere. Just three appeared in our sample, and it would be difficult to spot another three anywhere else in the league. That group also had the highest average Efficiency Rating of any player cluster.

Productive players in Cluster 8, 10 and 11 are fairly common, but few provide an elite level of production. It seems like relying on these types of players to drive a team, may lead an organization to set up camp on the 40-win plateau, trying to milk out a few extra wins a season with luck and a strategically designed supporting cast.

Clusters 8 and 2 appear in almost every single positive combination that Ayer identified, and a large group of players fell into both clusters. However, while it’s difficult to find elite production in Cluster 8, the average Cluster 2 player in our group had an EFF of 19.9, roughly twice the league average. While teams have shifted focus towards elite point guards or the handful of dominant big men, here is proof that the Jordan/Iverson model of building around an elite wing scorer still has merit.

While I really enjoyed Ayer’s work and think there is a lot to chew on here, I would be remiss in wrapping up with mentioning a few concerns I have about the clustering techniques. Ayer’s used per game averages to define his clusters. In a large study, over many seasons, it was probably plenty accurate. In just looking at a single season like I did, it almost certainly skewed the clustering for the handful of players who come off the bench and play significantly less minutes than some of their counterparts.

As I’m sure many of you did, I also found the way he described each cluster confusing. The moniker Cluster 1 or Cluster 3 means nothing to any of us, so Ayer has added some descriptive details about each. I appreciate the rationale for this but on some level it’s self-defeating.

The clusters provide a richer statistical complexity than the common terms he’s used to describe each. Cluster 11, is described as ‘Three-point shooting bigs’. But there is more to the cluster than that, otherwise the whole clustering exercise wouldn’t be necessary. Kemba Walker fell into this cluster. He is clearly not a three-point shooting big, but he is most certainly a Cluster 11. The generic terms used to describe SOME characteristics of MOST of the players in the Cluster omit some of the nuance of the way players are clustered.

While this ultimately has no bearing on the results or conclusions, I’m sure some readers have had a knee-jerk reaction that made it tougher to buy into this admittedly lengthy post. One of the challenges of a project as ambitious as Ayer’s is that there is no language to act as a bridge between the new and old ways of describing players. The solution is to use the new language consistently and comprehensively, but in the meantime it must be acknowledged that this will leave some basketball fans on the outside.

Ian Levy

Ian Levy (@HickoryHigh) writes about basketball from the wilds of Southern Vermont. In addition to his work for Hardwood Paroxysm, he is the man behind the curtain at Hickory-High and a contributor to Indy Cornrows, The Two Man Game and HoopChalk.