Predicting NCAA Tournament over- and under-performers
Looking at recent tournaments and month-by-month performance as a guide
Every year I come across more and more interesting analysis trying to predict how teams will play in the NCAA Tournament, but I haven’t been too preoccupied with doing a ton of it myself. This season I got inspired by Will Warren’s terrific work at Stats By Will and the Field of 68, where he looked at how teams were rated month-by-month using Bart Torvik’s wonderful website (which allows you to filter ratings for specific date ranges).
Will found some really interesting stuff when looking at how teams over or underperformed their expected wins, depending on whether they were rated as a top-10 team in November, December, January or February. I won’t spoil his findings here (subscribe to his stuff), but it got me thinking. Rather than looking at teams wins vs what’s expected for their seed, however, I compared each team’s monthly ranking (among the 68 tourney teams) to their ranking solely based on NCAA Tournament games. For example, a 9 seed can play amazingly well but come up just a bit short because they played against a 1 seed that also plays really well. The 9 seed would only have 1 win, but they might rate as one of the best performing teams in the tournament. For example, Memphis graded out as the 4th best performing team last season after they smoked Boise State and narrowly lost to Gonzaga. It’s useful to identify what profile they had that might lead to teams in this year’s tourney playing well and possibly getting more wins.
Here’s the methodology I used:
Each of the 68 tourney teams was ranked 1-68 on their performance in each of 4 time periods: November & December, January, February & March (conf tourney), and NCAA Tournament. For example, 2022 UNC was 38th in Nov/Dec, 21st in Jan, 26th in Feb/Mar, and 1st in the NCAA Tournament.
I then compared each team’s ranking in each time period to the average that their seed would imply. For UNC (an 8 seed in 2022), they should average out to a ranking of 30.5. They were a little worse in Nov/Dec, better in Jan and Feb/Mar, and WAY better in the NCAA Tourney.
The range between a team’s best and worst ranking across the 3 pre-tourney periods had a standard deviation of 10 places, so I allowed each team to be within 17 places (above or below) their seed’s midpoint in any time period; any variance more than 17 places was flagged as that team “overperforming” or “underperforming” their seed. I used 17 places because +/- 1.7 standard deviations should get 90% of observations in a normal distribution. The plus/minus ranges aren’t quite normal because a high seeded team can’t perform much better than their seed, but I dealt with those a bit differently (as you’ll read about later).
I then identified teams that over or under-performed in the NCAA Tournament by my required range, and compared their track record during the 3 in-season time periods. At a high level, I found that there is a pretty clear trend that teams that tended to underperform their seed during at least 1 regular season period did so during the NCAA Tournament, and vice versa. Teams who were within the 17-place range during the season had a wide variety of outcomes in the Tournament, but were equally likely to under or overperform…and curiously enough, the average variance from their seed midpoint was exactly zero.
The part of this article where I identify 2023 NCAA Tournament teams who might over or underperform will be for paid subscribers only, but here’s a quick list of the 2022 teams used as comps for under or overperformers:
This is where looking at pure wins vs rating tells two different stories in some cases. Arkansas made the Elite Eight, but beat two double-digit seeds by single digits in the first 2 round, racked up an impressive win over Gonzaga, and then lost to Duke by 9. They’re rated as the 20th best team when an average 2 seed should be about the 14th best.
As you’ll see above, this method is designed to highlight lower seeds who could win some games or at a minimum play very well in a tough first round loss (like Vermont vs Arkansas), or identify higher seeds who may be vulnerable to earlier losses or will let worse teams hang around.
For paid subscribers, I’ll use this to identify 2023 teams that meet these patterns!
Possible overperformers
This year the tournament has several lower seeded teams who look dangerous, which is in line with analysis I’ve read that the gap between upper and lower half of the tourney is smaller than it usually is. There are several popular upset picks above, such as Utah State, VCU, Drake, and Kent State. Meanwhile, a team like Iona could outperform a typical 13 seed but still lose to a very good Connecticut team. Remember, this isn't taking matchups into account, it’s just trying to identify teams who might outplay their seed during their games. They’ll have a better chance to rack up extra wins, but that’s not a sure thing by any means.
Two really intriguing things stood out to me. First, Creighton had an exceptionally strong January and met 6 seed expectations otherwise; this is hard to do especially given some injuries they had. Texas and Houston met this criteria last season and Houston made an Elite Eight run, while Texas lost in a 6-3 matchup in the 2nd round. Creighton certainly isn’t likely to be upset, and could make a nice run with some breaks.
Second, 3 8/9 seeds show up here, which could be an indicator that some 1 seeds could have tough 2nd weekends. Memphis met this criteria last season and nearly upset Gonzaga in the 2nd round; I think there’s a good chance one of these 8/9 seeds pulls off an upset and makes the Sweet 16.
Penny Hardaway must really know something about coaching in February and March, as his Tigers are 10th among tourney teams in that time period this year and were 6th last season:
There wasn’t any clear pattern last season as to when teams needed to be playing best to overperform, but a double digit seed who played like a top-15 team in any month of the season is worth noticing.
Possible underperformers
Again, this list is trying to identify vulnerable teams who are higher seeded. No #1 seeds show up here, as none of them ranked worse than 18th in any segment (Kansas in January). In fact, Houston and Alabama were top-10 teams in every segment:
Kansas does look like a weak #1 seed, as they didn’t rate higher than 10th in any of the time periods. In fact, UCLA ranked higher than Kansas in every single segment. UCLA does have injuries, and it would have been shocking to see UCLA at a 1 seed ahead of Kansas, but this points to value with the Bruins. Kansas may be a bit risky pick to go very far as well, and no top-2 seed last season failed to finished better than 10th in any time period.
Arizona and Texas show up as possible underperformers, although I think they deserve a bit of a pass given the fact that Kansas had a similar profile last season and won the title. Villanova was in a similar position last season, too. Kansas and Villanova actually had the two lowest ranked months of any 1 or 2 seed last season, but made the Final Four:
Historically top-2 seeds who had a single bad month tend to perform very well, assuming they were elite in other periods. There may be some benefit to working through a rough patch, or the fact that they showed so much strength in other periods to offset one bad month ay actually be a positive sign.
A few of the possible 2023 underperformers are in first round matchups with possible overperformers, making these especially intriguing games:
Miami FL vs Drake
Kentucky vs Providence
Missouri vs Utah State
Maryland vs West Virginia
Meanwhile, there could be chaos afoot in the East Region as 6-seed Kentucky and 3-seed Kansas State are flagged as possible underformers with 11-seed Providence a possible overperformer. I’d expect some close games in that pod, and possible upsets.
It’s been a few years since I really tried any predictive work around the NCAA Tournament, but it’s pretty interesting to see some patterns emerging from recent tournaments. If anyone wins their bracket pool or cashes any parlay tickets, my standard fee is 10%.
Good read Sean. Thanks for the work on this