I’d say I kinda love basketball. I spend far too much time prowling reddit.com/r/nba and other basketball forums gathering the latest buzz from around the league. Being that I work in the data analytics space, I’m always on-watch for statistical insights that Kerr or Lue might have missed.
I recently came across Big Data Ball, an NBA stats distributor. They offered a dataset called: “NBA Play-By-Play Stats – 2004 to 2017”. It includes all events that occur in a game including: active lineups, shot distances, shot locations in X, Y coordinates, assists, time remaining, and tons of other interesting data points. Game on!
The first step was to get all that data into the MapD Core database. After a little bit of data munging to convert the “Time Remaining”, “Time Elapsed”, and “Shot Clock” columns to small integers and join a game results data set found on Basketball Reference, I was quick to import it through the MapD Immerse “table importer”.
Importing The NBA Data
Initial Exploration: Thousands of Hoop Hours at My Fingertips
At first I was rather overwhelmed by the amount of data I had at my disposal -- almost a decade and a half of play-by-play records from every NBA game. I didn’t have any specific questions to ask the data initially, so I just started exploring.
I created a scatterplot of the x and y positions of all shots over the 13-year period.
Creating a Scatter Plot of All Shots from 2004 - 2017
This filtered down the number of records from 7,588,492 to 2,744,973 rows.
which means that 36% all of logged records are shots. This shows me that the majority of this data is based around ref whistles, turnovers, jump-balls, steals, substitutions, rebounds, etc.
I then proceeded to create a bar chart with the players as the dimension and the number of records as the measure.
Creating a Number Of Shots By Player Bar Chart
I now had a cross-filterable environment where I could interactively filter on a particular name and see the shot chart for that player.
Filtering on Various Players
I found it particularly interesting to see the shot chart for a 3-point monster like Stephen Curry vs. Dwight Howard, a player who lives in the paint.
Shot Chart of Steph Curry Vs. Dwight Howard
From there, the rabbit hole got deeper and deeper. Before I knew what had happened, I found myself completely immersed in the world of NBA Statistics.
This was me: NBA-Stats John Nash
After a couple of hours of poking around, here’s what I found.
The Rise of the 3-point Shot & Death of the Mid-range Game
Over the past 6 seasons, the evolution of the Splash Brothers has solidified the 3-point shot as a notable weapon in the league. But just how just important is it? I applied a quick filter to the Shot Distance Histogram, and found out.
Adding a 3 Point Filter to the Shot Distance Histogram
In March 2005, the total number of 3-point attempts were: 5,918
In March 2017, the total number of 3-point attempts had almost doubled to 11,280.
That’s a 90% increase over the past 12 years!
Number Of 3 Point Attempts Per Month (March 2005 Vs. March 2017)
If you look at regular season 3-point attempts year over year. You can see the attempts increase dramatically starting in the 2012-2013 seasons. It grew at an annual rate of 15 - 20% per year for the next 5 years. It looks to be a common trend.
The 3 Pointer On The Rise
So what’s causing this increase of 3-point shots? At first I guessed that players had become better 3-point shooters over the last couple of years. Not true. 3-point percentages (3P%) have stayed consistent month-to-month for the last 13 seasons--hovering between 0.34 and 0.37, with a few playoff outliers in June.
The (mostly) Consistent 3-point % From 2004 - 2017
Actually, where the correlation lies is that more players are taking 3-point shots:
More Players in the League are Taking 3 Point Shots
With more big men coming into the league that are comfortable with taking the 3, such as Porzingis, Horford, and Cousins, we should see this trend continue.
Combined 3-Point Attempts by Porzingis, Horford, and Cousins
It’s even more interesting if you start to filter on long 3-point shots. More players are getting comfortable pulling up at 29-feet, and draining at a decent percentage.
Crossfiltering Foot By Foot to see More Players taking shots from 28+ Feet Out
But what about the mid-range? What is happening to the bread and butter of Michael Jordan and Kobe Bryant?
Michael Jordan Is Crying Somewhere
As you can see, players seem to be taking far fewer mid-range shots overall. But why is that?
Average Points Outcome Per Shot by Distance
The chart above proves that it makes strategic sense for players to shoot more 3-point shots.
A team will average 1.13 points per shot from 24 feet.
A team will only average 0.8 points per shot from 10 - 20 feet.
That’s approximately 50% more points per shot!... which kinda makes sense; this chart suggests that the two most efficient places to score are either between 0 - 3 feet and 24 - 27 feet.
What’s even more nuts is this: there’s a better ROI taking a shot from 28 feet (0.9 pts) than taking one from 14 (0.79 pts).
Average Points Shooting From 13-14 Ft. Vs 28-29 Ft.
Based on these numbers, we should continue to see the trend for 3-point shots continue to increase by anywhere between 10 - 20% per year and the overall total scores of the game to continue reaching new highs.
3-Point Shooting Fouls Increased by More than 50% from Last Year
In regular season play of 2014, the number of 3-point fouls called were 600.
In 2015: 695.
Last season: 1,044.
Bar Chart of 3-point fouls called
If we look at the data as a line-chart, the trend is apparent.
Line Chart of 3-point Fouls Called Over The Last 13 Seasons
Looking at a list of leading instigators, one can see that James "I somehow always get the whistle" Harden is on top with 241 fouls drawn. Over his career, he has drawn 40% more 3-point whistles than any other player.
Table Chart of all fouls
Filtering just the 2016-2017 regular season, Harden was able to draw 122 fouls.
Filtering 2016-2017 Regular Season to Show Number of 3-Point Fouls Drawn By Player
In the last year alone, Harden more than doubled the amount of fouls drawn. In fact, he received 248% more 3-point foul calls than any other player!
But I was curious why refs are calling 3-point fouls more often. I started looking at other data points. After grouping the number of fouls called by quarter, I was very surprised with the results.
3-Point Fouls Called By Quarter
The refs called 26% more 3-point fouls in the 4th quarter than any other period!
Just to compare, I wanted to look at all whistles that resulted in 2 foul shots -- so I changed the
of 3global filter to
Changing The Global Filter
As you can see, the number of fouls called are much more evenly distributed throughout the years -- but it still seems like referees tend to call more fouls which result in 2 free throws in the 4th quarter, with calls being made 24.7% more than in any other period.
I wanted to get a more granular sense of when the fouls were actually called. So I grouped the number of 2-point fouls by minute of play:
Number of 2-Point Fouls Called Per Minute (Broken down by Quarters)
This makes sense because teams get into a bonus situation further along in a period, so it will equate to more free throws. There's also a large spike in the last minute of the 4th quarter. This is likely because of all those boring intentional fouls that draw out the game.
I then added back 3-point fouls and was really shocked:
Number of 3-Point Fouls Per Minute (Broken down by Quarters)
Why does it look like in the last minute of play for every quarter, the number of 3-point fouls called doubles or even triples in some quarters?
Here's what it looks like when we combine all quarters in one chart:
Number of 3-Point Fouls Per Minute (Combined Quarters)
Wait, how could this be possible!? There were 470 three point fouls called between the 10th and 11th minute, but it more than doubles to 1,032 three point fouls called between the 11th and 12th minute. I immediately thought: "There's no way refs are 119% more likely call a 3-point foul in the last minute of play." This is really weird, because there is no bonus when it comes to 3-point fouls. These are all shooting fouls!
The correlation must be that players simply take more 3-point shots in the last minute of play, thus more fouls. So I looked:
Number of 3-Point Fouls Per Minute (Combined Quarters)
Hmmm, very odd. Although players tend to take more 3-point shots in the last minute of play, there is still a massive gap in that last minute of play:
3-Point Shots Per Minute vs. 3-Point Fouls Per Minute (Combined Quarters)
I decided to break up the time even further by looking at number of 3-point fouls called in 20-second intervals and stumbled upon something more fascinating:
3-Point Shots Per 20-seconds vs. 3-Point Fouls Per 20-Seconds (Combined Quarters)
What the...?!? It looks like players are about 350% more likely to get a 3-point shooting foul call in the last 20 seconds of a quarter.
I immediately jumped to the conclusion that there was a flaw in the data; I needed to drill down on individual records to see if this was actually true, so I did:
Well, hot damn! It looks to be true!! In case you missed what happened, here's the breakdown:
- Filtered on the last 20 seconds of play for combined quarters.
- Filtered on "James Harden" getting fouled for 3-Point Attempts.
- Filtered the "Boston vs. Houston on Jan 25, 2017" game.
It revealed two 3-point fouls that occurred in that game. I then went to basketball-reference.com to see the play-by-play log to check if those fouls happened within the last 20 seconds of play. And they did!
It correlates something like this:
Someone get Tim Donaghy on the phone and find out why this is happening! These refs need to do some explaining!
So there you have it: Referees are much more likely to call 3-point fouls in the 4th quarter and significantly more likely to blow the whistle within the 20-seconds of play.
I’m not a data analyst, just a hoops fan. But I found it incredibly quick and easy to start finding interesting NBA tidbits. What was most fascinating about the experience is I didn’t really have too many questions going into the data exploration. I simply started poking around until charts started revealing information to me. Sometimes we really don’t know what data we have until we can see it in a user-friendly manner. That’s what we’re creating with MapD Immerse.
Explore the data yourself and share your insights. We’d love to see what you find: