Things are a bit different this Fall at Northwestern University, and it feels great! The Wildcats are undefeated and ranked ahead of Michigan, who they will meet on Saturday in one of the biggest games for the Wildcats in recent memory. Yet, Michigan is favored in the game. Go figure. College football offers us a great opportunity to look at the Big Data of fandom and the business of college football and even that of universities.
Building on my recent post that looked at NFL fan allegiance by location, I thought looking at some data visualization of college football allegiance would tell us a bit about who roots for which team and a bit more about how Big Data, Data Science, and data visualization are helping us understand complex problems in life and business.
The above graphic comes from the New York Times, using Facebook data to determine the dominant college football team by location. Lighter shades of the same color suggest a lower level of fan following than those of a darker shade (and thus the presence of some fan mixing). There are many interesting revelations. Fans are quite loyal to state boundaries. Schools like Auburn, Texas A&M, and even the University of Virgina have very small dominant footprints, in spite of their fiercely loyal fans and storied histories. The Ivy League and lower tiered football teams do not even register. Perhaps academic allegiance is different from and unrelated to football allegiance – but maybe not entirely. More generally, each state is dominated by one team, except for California, Florida and perhaps Texas (three of the most populous states, by the way). Although this data looks only at football fan allegiance, it says something about the recognition of universities by location and even their prowess in athlete and student recruiting. It says something (in part) about who might want to attend a university, too. That is big business, too!
Some other attempts to measure fan interest were attempted with a survey a few year ago. The following graphic was produced from such a survey via commoncensus.org.
Although some similarities are found in the two graphics, the sampling by commoncensus clearly included some really passionate fans of Michigan State and even Tulane fans living in (or was it just visting?) upstate NY. And nobody seemed to awake the ‘Bama fans from their National Championship hangover of a few years ago to take the survey, showing the state of Alabama to be shared among other schools, which is surely a huge error.
Samples pose problems and often very big ones in Data Science. Increasing sample size is not always the answer, nor does it even reduce the problems with the data collected. We must look at how data is gathered and what really our sample measures. There are many Big Data lessons in this.
First, active sampling is inherently flawed in very dangerous ways. I consider active sampling or measurement the act of asking a person to respond. It is hard to get a broad and fair representation of the population. Many people do not participate due to the effort involved; reaching people can be hard; participants may not trust the process (and thus change their answers in various ways), answers are often skewed by some enthusiastic or motivated participants, and the resulting data is not representative of the overall population, accordingly. The sample reflects those that heard of the sample and cared to participate. The Facebook data did not involve asking people to state their favorite team. Instead, it simply looked at “likes” and what millions of Facebook users were doing as part of their normal course of life and business. Nobody even knew they were in a sample. Such measurement, I call passive data capture. Facebook passively measures college football fan allegiance; and the result is not only a bigger sample, but more confidence in the measures, and a better understanding of the phenomena of fan allegiance, because fan allegiance is not sought but found. Although also a sample, passively captured data is not contaminated by the act of actively asking and human reactions to that. Systems that passively measure our behavior are now commonplace in our lives. Smart phones measure our location without us barking it out. Smart phones also measure within a few minutes the time we awake (as many people check their smart phones within minutes of waking up). Nest thermostats passively measure our presence in our homes, and well the list is growing with wearables, cameras and senors in more of the things we operate and use. This is creating not just bigger data, but it is creating more powerful data, for the same reasons we saw in the above graphics on football allegiance.
So, What is Up with Chicago?
College football allegiance in the Chicago area is complicated. Check out this graphic.
Even dear Northwestern University struggles to get a dominance of fan west of US 41 or maybe that is Sheridan Road (not good news!). Michigan fans are even on the doorstep of Ryan Field! It does not help that Northwestern has a large graduate student body, bringing fans with undergraduate fan baggage. Notice how Michigan, Notre Dame, and even Wisconsin have encroached on not just Northwestern but also Illinois. It is a tough place to grow a fan allegiance, especially if you have had years of mediocre seasons and nearby teams have had success. The hodgepodge of allegiance in Southside Chicago is even more perplexing. Maybe the fans in Hyde Park are still missing the University of Chicago football program.
In my Analytical Consulting Lab class at the Kellogg School of Management, my Kellogg MBA students are undertaking projects to help Northwestern Athletics understand approaches to better engage with our fans using analytics and data. The goal is to increase the following of our university sports teams. Analytics help us frame the problem and identify winning strategies. Looking at college football fan allegiance shows there is more to the problem than just beating Michigan and winning over fans with on the field performance. The problem is more complicated and related to overall college football interest in Chicago.
The above graphic shows us where college football is popular overall. Deepest colors correspond to about 30-35% of the Facebook users in that area saying they like college football (of any school). This data is built on passively captured data, too. Although it is not perfect (and no sample ever is) the limitations and manipulation issues inherent with active data capture are significantly reduced. Notice that college football fever has a hold over Sweet Alabama and the fever has apparently spread across the Red River to Oklahoma and north to the Cornhuskers of Nebraska, too. The Oregon Duck fans are rapid with it, showing the biggest interest in college football in all of the west. The hotspots jump out and explain where college football is most popular. Amazing. Notice, the big void around Chicago. Amazing, too!
College football is just not as popular in the Chicago area as it is in the South. Is it cultural? Might the highly international and cosmopolitan cities of San Francisco, Chicago, New York, and even Miami have something in common about a lower interest in college football? Probably. The people in these cities are from all over the world and college football is new to many of them. This is an opportunity as much as a perceived weakness. So, this last graph puts things into perspective. Northwestern might find changing the tune in the Chicago area a bit easier, as the task is not to change the allegiance of a Michigan fan, but rather to make a non-fan, maybe an international student, love the Wildcats of Northwestern. Beating Michigan involves more than winning football games. It also involves an off the field competition, for fan allegiance.
These ideas on the importance of passive data capture over active data capture and the use of Data Science to create value from Big Data are some of those that I explore in great detail in my recent book, From Big Data to Big Profits: Success with Data and Analytics (Oxford, 2015).
About Russell Walker, Ph.D.
Professor Russell Walker helps companies develop strategies to manage risk and harness value through analytics and Big Data. He is Clinical Associate Professor of Managerial Economics and Decision Sciences at the Kellogg School of Management of Northwestern University.
His most recent book, From Big Data to Big Profits: Success with Data and Analytics is published by Oxford University Press (2015), which explores how firms can best monetize Big Data. He is the author of the text Winning with Risk Management (World Scientific Publishing, 2013), which examines the principles and practice of risk management through business case studies.
He greatly enjoys the college football season and roots for the Florida teams (Florida being his home state), Northwestern, Cornell, and UVA. He looks forward to Northwestern beating Michigan this weekend.