The Importance of Data Visualization and Automation in the Real of Big Data
The heralded economist and Nobel Laureate Herbert Simon is known for saying, “A wealth of information creates a poverty of attention and a need to allocate that attention efficiently among the overabundance of information sources that might consume it.” Indeed, our capacity to process information must have a physical limit and we may be approaching that limit.
The human processing of data is, of course, much different than how an algorithm or computer processes data. The human mind processes graphics and graphical information well, making graphics the ideal vehicle for conveying data. Graphics are powerful in the communication of data because we can remember images well, but we fail to remember numbers well. The task of remembering seven numbers in an order results in more than 50% of test takers failing. This simple observation was noted by George Miller in 1956 and is known as Miller’s Law. Graphics can be more easily remembered and recreated from memory than numbers. Graphics also allow for communication of scale and comparison in data. This is most helpful to humans, when the data is being examined for a decision. Valuable graphics are those that communicate many dimensions and relationships in data. The statistician John Tukey famously said, “There is no data that can be displayed in a pie chart, that cannot be displayed BETTER in some other type of chart.” Graphics compress data and provide it in a visual form that is easily recollected by humans. For businesses and people to process more data, data compression and data visualization are needed. The software package Tableau is a great example of how complex relationships and multidimensional data can be easily presented.
This figure (developed by the Northwestern MSiA team of Adrian Montero, Sari Nahmad and Kathryn Wolf) showcases how a data visualization tool, like Tableau, can communicate multiple dimensions of data in one view. In this case, many dimensions and scale in the data can be used to communicate the importance of the data and lead to its correct interpretation. The data relates behaviors and trends of a popular bicycle-sharing program in Chicago called Divvy in one just graphic! The data is compressed, allowing for maximum data consumption by our sensory functions. In the graphic, data on popular routes, stations, and seasonality are simultaneously presented along with the geographic location of the bicycle stations. The behaviors of customers and subscribers are compared in the histogram at the upper right, made with Tableau. The map provides a geographical perspective on the location of stations and their relative location to each other. The most important features of the complexity of the bicycle-sharing program can quickly and efficiently be communicated. This includes the fact that the customers, who buy one-time rides, are using the bicycles in the tourist zones of Chicago, whereas subscribers, who have purchased a bicycle-sharing plan and are expected to use the bicycles on a regular basis, commute between important transportation and work centers. Of special pride to Chicagoans is the significant number of bicycle riders that brave the winter months, as noted in the seasonality plot. Various decisions about the management, use, and even expansion of this bicycle program can be resolved from such a compression of multiple dimensions in one graphic. This compression of data into a valuable visual view is also necessary as human data consumption trails at the rate of data creation. The Chicago bicycle-sharing program is also a great example of Big Data in a new realm of our lives. The bicycles are turned into stations that record the time of use and return time. Since each bicycle is uniquely numbered, routes and customer patterns can also be measured. All of this data on bicycle usage can be used to make better decisions on where and how many bicycles to make available in Chicago.
The above graphic relates the approximate growth rates on information creation, information storage, information processing, and human information consumption. Note that we consume far less than we can create or process. This excess creation of data over consumption and processing suggests that some insights are inevitably lost. It will not be humanly possible to process and consume all data generated. Instead, algorithms and artificial intelligence approaches will be needed to mine Big Data for value.
Data visualization and data compression alone will not allow us to process all of the data that is available, or even all of the data that is attractive to process. The fact that information creation and information processing are growing more rapidly than the human consumption of information suggests we have encountered a limit on human processing of information and that a great deal more of the processing that occurs in the future will be done without direct human intervention and consumption, namely it will be done via automation through machines and electronic devices.
These important implications of Big Data use and the role of automation in the creation and processing of Big Data are developed in greater detail in my recent book, From Big Data to Big Profits: Success with Data and Analytics. The book examines the evolving nature of Big Data and how businesses can leverage it to create new monetization opportunities. Using case studies on Apple, Netflix, Google, LinkedIn, Zillow, Amazon, and other leading-edge users of Big Data, the book also explores how digital platforms, including mobile apps and social networks, are changing customer interactions and expectations, as well as the way Big Data is created and managed by companies. Companies looking to develop a Big Data strategy will find great value in the SIGMA framework, which assesses companies for Big Data readiness and provides direction on the steps necessary to get the most from Big Data.
 Herbert A. Simon, “Designing Organizations for an Information-Rich World,” in Martin Greenberger, ed., Computers, Communication, and the Public Interest (Baltimore, MD: Johns Hopkins Press, 1971).
 George A. Miller, “The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information,” Psychological Review 63 (1956): 81-97.
 J. J. O’Connor and E. F. Robertson, “John Wilder Tukey,” June 2004, http://www-history.mcs.st-andrews.ac.uk/Biographies/Tukey.html.
Analytics, Asymmetric Information, Automation, Bicycle Sharing, Big Data, Big Data Analytics, Big Data to Big Profits, Bike sharing, Chicago, Data Analytics, Data Monetization, Data Products, Data Visualization, Divy, Location Based Services, Mobile, Statistics, Technology