What Math Can Teach Us about March Madness and First-Round Upsets




Every year around St. Patrick’s Day, NCAA Division I men’s basketball holds a single elimination tournament to select a national championship. It’s more commonly known as the “Final Four” or “March Madness.” For math aficionados, it provides a wealth of data to analyze.

 Basketball

Since 1985, the tournament has started with 64 teams entering the First Round. (Since 2001, there have been Opening Round games prior to the First Round, but we’re going to focus on the field of 64.) The 64 teams are divided into four regions (usually East, West, Midwest, and South). In each region of 16 teams, the teams are seeded, or ranked, from 1 to 16, with #1 considered the top seed.

There are 32 games in the First Round. In each of the four regions, the games are:

#1 seed plays #16 seed

#8 plays #9

#5 plays #12

#4 plays #13

#6 plays #11

#3 plays #14

#7 plays #10

#2 plays #15

With such a rich set of data, I decided to analyze First Round upsets. Since the tournament is single elimination, an early upset could have devastating consequences for predictions in later rounds. I decided to look back at the First Round games since 1986: 32 years of 32 games, for a total of 1,024 data points.

I’ve always maintained anecdotally that the most likely upsets have been #9 beating #8, and #12 beating #5. Let me check my assumptions using PTC Mathcad.

Upsets by Year

First, I created a spreadsheet where I marked each First Round upset with a U:

 Spreadsheet showing historic March Madness first round upsets

Then I read the information into PTC Mathcad and performed a basic analysis of the upsets:

 

Historically, there have been about 8 upsets in the First Round every year, or 25.2%. That’s actually higher than I expected.

Next, let’s write a program to count the upsets for each year:

 Program in PTC Mathcad to evaluate upsets by year

Some built-in functions allow us to determine that the maximum number of upsets was 13 in 2016, and the least was 3 in 2000:

 Minimum and maximum first round upsets by year

We’ll plot the data to see any fluctuations:

 Upset data graphed in PTC Mathcad, showing fluctuations by year.

Oddly, the number of upsets generally appears to ping-pong. It’ll be interesting to see if there are an unusually high number of upsets in 2018.

Upsets by Seed

I’m itching to test my hypothesis regarding which seeds are most likely to experience upsets. Again, I’ll write a program, but this one will need a nested for-loop, since there are four games each year per seed:

 Program in PTC Mathcad to predict which seeds most likely to experience upset

The program generates a matrix listing the seed, the total number of upsets, the fraction of upsets for that seed (upsets divided by 128 games), and upsets per year (upsets divided by 32 years):

 Program results shows likelihood of  upset by seed.

Conclusions about seed upsets:

  • Generally, the closer the seeds are to one another, the greater the likelihood of an upset, as we would expect.
  • A #16 team has never beaten a #1 seed. People act like #15 beating a #2 is a rare occurrence, but it happens about once every 4 years.
  • The #12 team does beat the #5 team a disproportionate number of times. The #12 is as likely to upset the #5 as often as the #11 upsets the #6.
  • #9 upsets #8 almost twice every tournament, and 48.4% of the time.

Well, whaddaya know, my anecdotal hunches have a ring of truth. 

Upsets by Region

We would expect that the four regions should balance out in terms of upsets. I’ll write a program similar to the previous one, with nested for-loops to count the upsets per region:

 Program in PTC Mathcad for finding upsets by region

(A note about regions: The Southeast region was replaced by the South region in 1998, but made a reappearance in 2011. In this analysis, South and Southeast were collected together. Also in 2011, there was no Midwest region, but a Southwest region. The 2011 Southwest results are included in the Midwest results.)

Once again, we’ll generate a matrix of results:

 Results by region.

Wow! The East, West, and South / Southeast are about the same, but the Midwest has a disproportionately high number of upsets.

What We Learned

I’ve been following the NCAA tournament for years, but this is the first time I’ve taken advantage of the information that’s always been available to me. Analyzing the rich dataset allows me to follow the events with additional insights and expectation for the outcomes. What assumptions can you confirm or dispel by analyzing your data with PTC Mathcad?

Want to check some championship hypotheses for yourself?  Download PTC Mathcad Express for free. 

download mathcad express free