People who love college basketball might need to reconsider before relying on artificial intelligence for a flawless March Madness bracket.
Even though artificial intelligence has become very popular in daily life recently, its use in bracketology circles is not particularly new. Nonetheless, the yearly bracket competitions still offer many unexpected outcomes for computer science enthusiasts who have spent years refining their models using past NCAA Tournament results.
They have discovered that machine learning alone cannot completely solve the limited data and unpredictable human elements of 'The Big Dance.'
“All these things are a combination of art and science. And they involve understanding people just as much as they involve analyzing statistics,” said Chris Ford, a data analyst living in Germany. “You really need to comprehend human behavior, and that's what makes it so challenging.”
Casual fans may spend some time this week strategically deciding whether to rely on the team with the best 'mojo' — like Sister Jean’s 2018 Loyola-Chicago team that reached the Final Four — or to go with the hottest-shooting player — like Steph Curry and his breakout 2008 performance that took Davidson to the Sweet Sixteen.
Technology enthusiasts are pursuing goals even more complex than predicting the winners of all 67 matchups in both the men’s and women’s NCAA tournaments. They are refining mathematical functions in the quest for the most unbiased model for predicting success in the upset-heavy tournament. Some are utilizing AI to improve their codes or to determine which aspects of team resumes they should prioritize.
The likelihood of creating a perfect bracket is very low for any competitor, no matter how advanced their tools may be. An 'informed fan' making certain assumptions based on past results — such as a 1-seed defeating a 16-seed — has a 1 in 2 billion chance at perfection, according to Ezra Miller, a mathematics and statistical science professor at Duke.
“Roughly speaking, it would be like choosing a random person in the Western Hemisphere,” he said.
Artificial intelligence is likely very effective at calculating the probability of a team winning, Miller said. However, even with the models, he added that the 'random choice of who’s going to win a game that’s evenly matched' is still a random choice.
For the 10th consecutive year, the data science community Kaggle is hosting “Machine Learning Madness.” Traditional bracket competitions are all-or-nothing; participants write one team’s name into each open slot. But “Machine Learning Madness” requires users to submit a percentage representing their confidence in a team's advancement.
Kaggle offers a large data set of past results for people to develop their algorithms. This includes box scores with details on a team’s free-throw percentage, turnovers, and assists. Users can then utilize an algorithm to determine which statistics are most predictive of tournament success.
“It’s a fair fight. There are people who have extensive knowledge about basketball and can apply that knowledge,” said Jeff Sonas, a statistical chess analyst who helped create the competition. “It is also feasible for someone with limited basketball knowledge but good at learning how to use data to make predictions.
Ford, a fan of Purdue who witnessed the shortest Division I men’s team shock his Boilermakers in the first round last year, approaches it differently. Since 2020, Ford has attempted to forecast which schools will be part of the 68-team field.
In 2021, his most successful year, Ford mentioned that the model accurately identified 66 of the teams in the men’s bracket. He utilizes a “fake committee” of eight different machine learning models that take slightly different factors into account based on the same inputs: the team's strength of schedule and the number of quality wins against tougher opponents, among others.
Eugene Tulyagijja, a sports analytics major at Syracuse University, shared that he dedicated a year’s worth of spare time to creating his own model. He utilized a deep neural network to identify patterns of success based on statistics such as a team’s 3-point efficiency.
Although his model inaccurately predicted that the 2023 men’s Final Four would include Arizona, Duke, and Texas, it correctly included UConn. As he adjusts the model with another year’s worth of information, he acknowledged certain human elements that no computer could ever take into account.
“Did the players get enough sleep last night? Will that impact the player’s performance?” he said. “Personal matters — we can never factor those in using data alone.”
No approach will encompass every relevant factor in play on the court. The necessary balance between modeling and intuition is what Tim Chartier, a Davidson bracketology expert, refers to as “the art of sports analytics.
Chartier has been studying brackets since 2009, creating a method that heavily relies on home/away records, performance in the second half of the season, and the strength of schedule. However, he highlighted that the NCAA Tournament’s historical results present an unpredictable and small sample size — a challenge for machine learning models, which depend on large sample sizes.
Chartier's objective is not for his students to achieve perfection in their brackets; his own model still cannot account for Davidson’s 2008 Cinderella story.
In that enigma, Chartier finds a valuable reminder from March Madness: “The beauty of sports, and the beauty of life itself, is the randomness that we can’t predict.”
“We can’t even predict 63 games of a basketball tournament where we had 5,000 games that led up to it,” he tells his classes. “So be forgiving to yourself when you don’t make correct predictions on stages of life that are much more complicated than a 40-minute basketball game.”