Data mining: the theory of everything?
Chris Anderson, editor-in-chief at Wired, is generating
quite a bit of buzz with his article, "The End of Theory: The Data
Deluge Makes the Scientific Theory Obsolete." The controversial article
suggests that in the "Age of the Petabyte," scientific theory is
becoming outdated. He cites Google's search engines as the example par excellence
in supporting his claim that "Correlation supersedes causation, and
science can advance even without
coherent models, unified theories, or really any mechanistic
explanation at all." Anderson suggests that mining through gargantuan
amounts of data will produce as many new discoveries and insights as
years of scientific research.
There is no denying the importance of new data technology, but Anderson fails to recognize that data mining cannot replace science. Data collection is only one step of the scientific method: observation. He points to the work of biologist, J. Craig Venter, who uses supercomputers to sequence entire ecosystems. In doing so, he has discovered thousands of previously unknown species of bacteria and other life-forms. But what is the point of knowing these species are out there (we're already aware of that) if we don't know anything about them? As one blog post points out, with Venter's data we can only make a few guesses about the properties of the organisms based on who their relatives are -- an activity that requires a little scientific theory called evolution.
Data mining can really only point us in the right direction of new discovery by showing us relationships between data points; it can't generate new discoveries alone. Anderson quickly throws out every theory of human behavior: "Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data the numbers speak for themselves." Call me old-fashioned (Anderson probably would), but to me, the "what" doesn't really matter without the "why." Stripped of its context, a number is just a factoid, a small puzzle piece without the larger picture.
Without science, data is no better than babel. While data can lead to new levels of understanding, Anderson's theory misses the point of the study of science: to intelligently understand the natural world. Anderson may be too busy praising the Google gods to take note of the possibility of the semantic web, where the goal is more than crunching data -- it's understanding information. Although data mining may change the rules of the science game, it's definitely not the end of theory.
As we transition to the Age of the Petabyte, I don't see new technology leaving scientific theory in the dust. Rather, theory will be alive and kicking, as technology and science continue to evolve side-by-side. And with the unwavering certainty of the clock as it ticks and tocks on, new rules will become old ones, and one day even data mining will be replaced by a shiny new method of generating information.
Swivel Home
Comments