Similarity Measure

Eyal Shafran
Apr 1, 2016
4 min read

A similarity measure - how much alike are two data sets, is a typical tool in data mining. In this post I will show how it can be used to measure the shot selection similarity between two players. The shot similarity can then be used for various things. For example:

Find the most similar and dissimilar players.
Cluster players into groups based on their shot selection and then check for correlation with their playing position.
Check a player's shooting selection similarity over multiple seasons to see if their game evolved.
Are teams more likely to succeed if their players have similar shot selection or each player has their own favored spots on the court?

Basic idea

The x and y location of each shot in the NBA since the 2003-04 season is available on the NBA API. For a given season I compared the similarity between the shot selection of each two pair of players in the NBA (with at least 500 shots in that season). This analysis results in a similarity measurement between 0-1 for each pair of players. Similarity of 1 means that the players take the same ratio of shots from exactly the same locations on the court. Similarity of 0 means exactly the opposite. The total of number of shots a player takes does not affect the similarity score - only the relative amount of shots from each spot. For example, if two players shoot all of their shoots from under the basket but player A took 1000 shots and player B took 500 shots, their similarity score would still be 1 since they both take 100% of their shoots from underneath the basket.

Most similar and dissimilar players

This can be used to find the most dissimilar players in the NBA based on their shot selection. For the 2015-16 season Kyle Korver and Dwight Howard have a similarity score of 0.13 which was the lowest of any pair of players compared:

Not surprising that the lowest similarity is between a player which takes most of his shots from under the basket, and a 3 point specialist.

Here are the most similar players:

We can also do the same for other seasons. For example, the 2003-04 season:

Lowest similarity is again between a 3 point specialist and a center - this time the similarity is a little higher (S=0.21).

While Shaquille O'Neal and Ben Wallace have an almost identical shot selection during that season, Shaq made his shots at 58.4% while Wallace made his shots at 42.1%. I guess this is the reason why Shaq was considered such an amazing offensive force.

Sometimes the most similar players are not centers:

Time evolution

This method can also be used to compare the same player over multiple seasons. This is a quick way to see change over time without needing to explore each season individually. One interesting example is Chris Bosh:

If we want to compare the 2015-16 season to the 2003-04 season we need to look at the top right pixel (or the bottom left). We see that the similarity between those two seasons is only S=0.65 - an indication the Bosh's game evolved over time. If we compare changes between consecutive seasons the changes are fairly modest with the biggest change between the 2009-10 season and the 2010-11 season (S = 0.82). What happened during that time? Bosh left the Toronto Raptors and took on a different rule with the Miami Heat.

We can make heatmaps of Bosh's shot selection for the four seasons mentioned above (2003-04,2009-10,2010-11, 2015-16) to visualize the change:

The change between the 2003-04 and the 2015-16 (S=0.65) is very noticeable as Bosh started shooting much more 3s later in his career and less from under the basket. Even the smaller change between his last year with the Raptors and his first year with the Heat is apparent (S = 0.82). With the Heat Bosh started shooting midrange jumpers much more frequently.

What else can we do with this?

In the next post I will show how this method can be used to cluster players based on their shot selection. These clusters can then be compared with the position of the players to see if there is a strong correlation between players position and their shot selection.

This similarity method can be applied to any information which contains spatial coordinates. For example:

Similarity between player's spatial position on the court (with the ball and without the ball).
Shot quality map similarity.
Assist-shot chart similarity.
Rebounds coordinates similarity

Technical details

To find the similarity the shot coordinates for each player are first converted to a heatmap via Kernel Density Estimator with a Gaussian kernel. The heatmaps of each pair of players are then compared.

The sigma of the kernel acts as a similarity measure between close shots - larger sigma means that shots further apart are still going to have some similarity while a smaller sigma means exactly the opposite. I found that a Gaussian kernel with sigma of 3 feet gives good results.

Blogging about sports analytics

Similarity Measure

Comments

RECENT POSTS:

Similarity Measure

Shot Quality Episode II: Shot Quality Map

Shot Quality Episode I: Going The Distance

Curry, Curry and a little more Curry

Not all assists are created equal

It ain't over till it's over

NBA weight, height and cocaine era

2014/15 Assist-Shot Charts

Offensive Rebounds

SEARCH BY TAGS: