In Defense of the Most Maligned Presentation at Sloan, “Automatically Recognizing On-Ball Screens”

“What was the point of that?”

That question, or some similar version, resonated through the halls of the 2014 MIT Sloan Sports Analytics Conference after one particular presentation — “Automatically Recognizing On-Ball Screens,” a dense tome of numbers and self-referencing computer jargon. Essentially, the paper seemed to prove that on-ball screens do in fact exist. WOOOHOOO! Our eyes aren’t lying to us. Hail and rejoice!

Yet the importance of this paper is reflected in one of its most vocal critics. Stan Van Gundy touched on it when he ranted about the accuracy of play categorizations in Synergy Sports:

“I read some of the stuff that people write on ESPN.com and stats on pick-and-roll defense and stuff that came off Synergy.com or somewhere else. I don’t know who is recording that information.”


via Fifty NBA notes, quotes and anecdotes from the 2014 Sloan conference | The Point Forward – SI.com.

It’s a problem encountered by anyone who’s used the scouting tool before; while there is a certain acceptable degree of accuracy in the labeling of each individual play, the accuracy on any one play can vary greatly. People are behind this: a dedicated team must slave over game film and dissect it into its various components for easy digestion by the end-user. They work diligently and do their very best, but they’re human. Sometimes, they get it wrong.

And even when they get it right, the process is incredibly labor-intensive. That’s an issue, as resources are far from limitless for this kind of endeavor. Analytics is a growing field, but there’s only so much money and time to go around. The SportVU data tracking cameras were supposed to remedy this, but they’re not some sort of magical black box. All of the data captured by the cameras is there for manipulation and analysis, but it comes in a rather raw form — X,Y,Z data with event tags that correspond to the play-by-play data, along with some standard tabulations and formatting provided by STATS. From there, it’s up to the teams to figure out how to turn raw information into actionable intelligence. When teams get the best and brightest data scientists, programmers, and graphic user interface designers, the numbers become art in motion.

That video, of course, comes from Zach Lowe’s thorough investigation of the Toronto Raptors’ use of SportVU technology. The video is elegant and simple; it conveys an infinitely complicated process in a way so basic as to be comprehensible to the average basketball viewer. Such simplicity is misleading, however. To generate that kind of visualization from a jumble of coordinates is really, really hard. The computers don’t innately know how to do it; like some sort of sci-fi dystopian nightmare, they have to be trained to do our bidding before they can replace us.

This is the maddening genius behind the “On-Ball Screens” panel. It’s an infuriatingly simple concept that, when presented in the native language of programming and data, begs the question as to the very validity of the study. Why are we here, the audience asks, when we’ve all known what an on-ball screen is since we were children?

In the back of the room sits the vanguard of basketball analytics, a machine born into naivety and ignorance who’s never been taught that nuance. If given the chance, the computer could identify all of the on-ball screens that ever happened in a thousandth of the time that it would take a team of people watching video. Indeed, for a first iteration, this particular study did rather well. It had an inclusive accuracy of 98% — that is, of all plays that actually included an on-ball screen according to standard visual categorization, the set of plays generated by this specific model included 98% of those plays. 1 in 50 on-ball screens slipped through the cracks. The exclusive accuracy — the percentage of all plays in the computer-generated set that included an on-ball screen — was 80%.

…jargon, right? So here’s the story of what this research paper actually did. The authors got a hold of that aforementioned SportVU data in its base XYZ coordinate form. Using the interaction between ball and players, they designed a system of algorithms and parameters to recognize plays that might include an on-ball screen. The computer flagged all such plays, which gave them a set of plays from which to work. Of that giant pool of individual plays, 98% of all possible on-ball screens were included. And 80% of all of the plays in that giant pool did in fact include an on-ball screen.

Again, though, so what? This might make it easier on the Synergy video logger or the entry-level analytics employee for an NBA organization, but what does that mean for the general fan? Consider the case where the first of those two numbers is 100%, as this initial iteration nearly achieved. In that scenario, a data team has already narrowed its focus from looking at every single play a team runs to just this set of plays that we know includes an on-ball screen. Furthermore, knowing that 80% of all the plays they’re about to look at will include an on-ball screen shifts the nature of the task from identification to verification. A job that previously required 10 hours of brute force is done in 2. With resource scarcity a legitimate concern, that’s 8 hours freed for analysis. Where time previously was allocated to the back-end, it can be shifted to the front-end, which in turn makes for a better product.

So when we clamor for new and more advanced “advanced stats,” what we’re really looking for is this kind of development. It’s incremental improvement to render mind-shattering revelations into the public eye. Bit by bit, we measure and survey our way across an unknown terrain, entombed on all sides by EPV and xPPS and ESQ and RAPM and a dozen other analytic acronyms that are only as stable as the underlying assumptions and data upon which they shift. The more we let technology map the basic topography, the firmer our conclusions. To marvel at the above Raptors video is, consciously or not, to appreciate the coding and math intrinsic to the art.

It’s not that “Automatically Recognizing On-Ball Screens” advances our knowledge of the game at all; it’s that by automatically recognizing on-ball screens, we let the machines do the grunt work. And that’s key, because for all of the growth in neural networking and artificial intelligence, computers can’t dream the way humans do. Inspiration draws through flesh and bone, not silicon and circuitry. When Stan Van Gundy says that not all screens are created equal, he’s right — and for now, such differentiation is the domain of the thinking mind. Yet we still dedicate too much brainpower to the menial tasks. It’s time to let the computers assume that burden, though it means teaching them in baby steps.

There was a time when you didn’t automatically recognize an on-ball screen, but you grew out of that. Give the machines time, and they’ll learn, too.


Andrew Lynch

When God Shammgod created the basketball universe, Andrew Lynch was there. His belief in the superiority of advanced statistics and the eventual triumph of expected value-based analytics stems from the fact that he’s roughly as old as the concept of counting. With that said, he still loves the beauty of basketball played at the highest level — it reminds him of the splendor of the first Olympics — and the stories that spring forth from the games, since he once beat Homer in a game of rock-paper-scissors over a cup of hemlock. Dude’s old.