Found Sci (2013) 18:583–586 DOI 10.1007/s10699-012-9318-3 COMMENTARY
Coal to Diamonds Johannes Lenhard
Published online: 20 November 2012 © Springer Science+Business Media Dordrecht 2012
Abstract In this commentary to Napoletani et al. (Foundations of Science 16:1–20, 2011), we put agnostic science in a wider historical context of philosophy of mathematics. Secondly, the parallel to Tukey’s “exploratory data analysis” will be discussed. Thirdly, it will be argued that what is new is the mutually interdependent dynamics of data (on which Napoletani et al. focus) and of computational modeling—which puts science closer to engineering and vice versa. Keywords
Computational modeling · Exploratory data analysis · Mathematization
In their paper on “agnostic science” (Napoletani et al. 2011), Napoletani, Panza, and Struppa (NPS) make the case for the “microarray paradigm”, a new methodological paradigm of data analysis. A microarray, also known as DNA array, is a chip with thousands of manufactured short strands of DNA on it. If a sample is tested (washed over it), constituents of the sample bind to the chunks on the chip, depending on both the composition of the probe and the spatial distribution of the strands in the array. Thus, the resulting pattern—created with the help of artificial coloring of the probe—somehow mirrors what the probe consists of. However, the exact shape of the patterns depend on complicated conditions that are hard to control, like the exact location and constituents of the single DNA strands. In sum, the resulting patterns contain a wealth of information, but at the same time they are noisy to a high degree. Because of the sheer bulk of data, even a high level of noise leaves intact the chances to detect the signal, i.e. extract relevant information about the probe. Efron (2005), e.g., sees the 21st century marked by data giganticism, and takes microarrays as paradigmatic. In this regard, DNA arrays are a typical example of cases where new technological high-throughput devices deliver great amounts of data. NPS write, “We argue that the modus operandi of data analysis is implicitly based on the belief that if we have collected enough and sufficiently diverse data, we will be able to answer any relevant question concerning the phenomenon itself.”
J. Lenhard (B) Department of Philosophy, Bielefeld University, P.O. Box 100 131, 33501 Bielefeld, Germany e-mail:
[email protected]
123
584
J. Lenhard
This viewpoint obviously entails a kind of data-driven optimism. NPS combine it with what they call “agnosticism”. Their agnosticism refers to a characteristic of the mathematical methods: They work on resemblance of patterns but do not involve theoretical hypotheses and structural assumptions about how the patterns emerge. In this respect, the mathematical techniques of data analysis are (indeed, I think) “agnostic”. On a very general level, NPS’ data-driven optimism reminds one of Bacon’s modest empirical program. At the same time—and this makes their viewpoint interesting—they talk about mathematization, a process that is often connected to the groundbreaking work of Galileo, epitomized by Newton. There, mathematization serves to hypostasize theoretical structures of phenomena, structures that can be treated formally to make predictions, but also reveal the deeper structure of the phenomena themselves. Mathematized science in this sense is, if a pun is allowed, a gnostic science. This tension is encapsulated in the very term “agnostic science” which is a concoction of Greek and Latin that says something like “non-knowing knowledge”. Staying for one more moment on this general level, I’d like to remark that Kant’s epistemology is about how objective knowledge is possible without isomorphic relations to the world (“things in themselves”), instead based on human constructive activity. Mathematics plays a paradigmatic role for Kant (who was deeply impressed by Newton’s success) and a variety of different philosophers and scientists, from Cassirer to Poincaré or Hilbert, have argued for Kantian viewpoints in this sense. I do not want to zoom into these epistemological issues, rather indicate the noble heritage of the seemingly paradoxical notion of agnostic science. In my view, this makes the paper excellently suited to “Foundations of Science”. Furthermore, I agree with all my heart that issues of mathematization reach into the core of science. While close acquaintance with the mathematical content indeed matters, the philosophical topic of mathematization is not an esoteric affair of interest only to hyper-specialized philosophers of mathematics. The interdisciplinary team of authors delivers a nice example of a philosophically significant, mathematically well-informed, and readable paper. In the following, I concentrate on two points: statistics and the older (mid-20th century) program of data analysis, and, second, the role of modeling, especially computational modeling, in the ongoing story of agnostic science. In statistics, NPS’s claims are strikingly parallel to Tukey’s famous program of ‘exploratory data analysis’ (EDA), which they do not touch upon. Tukey elaborated this program in explicit opposition to the post-WW II mainstream of confirmatory statistics as Tukey called the approach of Neyman–Pearson and, partly, Fisher (cf. Tukey 1962, 1977). The data should speak for themselves, not be looked upon through the glasses of theoretical hypotheses that the data confirm or disconfirm. Tukey can be seen as a strong proponent of agnostic science in the sense of NPS, i.e. mathematization acts in the service of information deciphering, largely independent of theoretical assumptions about the data and phenomena under investigation. The conception of statistics elaborated by Fisher proposed a goal-directed data reduction to highlight the relevant information—which is in obvious tension to the data-driven optimism, like NPS describe it, of obtaining this information exactly by not reducing the data. So, which one of these conflicting philosophies of statistics has got it right? I think that an appropriate reply to this question should maintain the complementary nature of the problem. Historically, Tukey did not favor an extreme position, rather took steps that mediated his position with some Fisherian insights (cf. Lenhard 2006), thus aiming for reduction without distortion. Both for Tukey’s EDA and NPS’s agnostic data analysis, mathematics serves as a kind of instrumental answer to the complexity and massiveness on the data production side. Tukey
123
Coal to Diamonds
585
was a prolific inventor of mathematical techniques in the service of EDA (like stem-and-leaf diagrams and many more) that address human experience, or exploratory activity rather than the theoretical structure of the phenomena themselves. What, then, makes the microarray paradigm a new paradigm for mathematical statistics? I see a twofold dynamics at work: a boost on the data side is matched by a boost on the computational side. In my view, new data processing and computational technologies together re-vitalize an old promise: Sufficiently rich data plus computational capabilities will present every relevant information one seeks. Or, stating NPS’s thesis in more metaphorical words: Coal can be converted into diamonds. An intelligent commentary about the difference between computational modeling and traditional mathematics is “it is like digging for coal instead of diamonds”. Obviously, coal serves other purposes than diamonds, so this is a different kind of endeavor. However, artificial diamonds made from carbon can serve certain technical needs! This brings me to my second point, computational modeling. The dynamics discussed here depends not only from the data side but also from the side of computational modeling, both are mutually interdependent. True, the data side is important: However, this does not diminish the importance of the modeling side. Computational high-throughput devices plus computational models together create the new twist. Statistical techniques of measuring patterns produced by microarrays essentially rely on the computational capabilities of computers. There are many examples of this sort, like Markov-Chain Monte-Carlo methods, or the bootstrap. In short, via computational modeling, mathematization conquers new landscapes. In the literature, there exist superior examples of how to investigate probability and statistics in a philosophically and historically revealing way (cf. the works of Hacking, Krüger, Daston and others, see for instance Krüger et al. 1987, 1989). In my view, computational modeling and the developments since the early 1990s present an intriguing case that calls for a treatment of similar sophistication. I see the NPS paper as taking part in this endeavor. What do computational methods achieve? They seem to foster an instrumental viewpoint: “This methodology used in data analysis suggests the possibility of forecasting and analyzing without a structured and general understanding.” NPS take a careful stance in later passages of their text. They suggest that after having achieved predictive success in an “agnostic” manner, later steps, especially incorporation into the body of scientific knowledge, may need a more structure-based understanding. Still, NPS seem to carry with them a kind of bad conscience: According to them, structural understanding should be the right aim for science, even if some mathematical techniques deliver a surrogate. Somehow it does not feel like the real thing. I take this observation as a sign of the tension between computational modeling and more traditional mathematical modeling. Maybe a deeply enculturated desire for diamonds instead of coal. I see some indications that a change of aesthetic judgement is on its way that will lead (under application pressure) to a higher appreciation of artificial diamonds. Galileo was a pivotal figure for mathematization in science. His verdict that the book of nature is written in mathematical language is widely known. However, Galilei also was an engineer, and his proposed two ‘new sciences’ were about ballistics and materials, i.e. two fields in which (even today) mathematization enables effective prediction and intervention, not so much structural understanding. Both fields, by the way, have been deeply influenced by computational models. Forecasting without structured understanding has a counterpart on the side of models, namely their great plasticity. NPS’s paper contains an example: boosting and iteration in artificial neural networks. Such models function on patterns while they use a generic architecture. It is the high plasticity of the model that calls for exploration and iteration to adapt
123
586
J. Lenhard
the dynamics to the desired performance. The wealth of available data and the plasticity of models, i.e. the degree to which the model performance can be adapted through use of the data, are reflecting each other. When Napoletani et al. state: “mathematics provides powerful techniques and general ideas which generate new computational tools.” I would like to agree, but also to add that this is true vice versa. This mutual relationship is what I aim to bring out in my own work, see, e.g. Johnson and Lenhard (2011) about the predictive mode of modeling. Computational technologies and modeling strategies bring mathematization to new fields, especially complex phenomena, but they also limit prospects of traditional ideals of science, like “general understanding”. This puts science closer to engineering—and engineering closer to science.
References Efron, B. (2005). Bayesians, frequentists, and scientists. Journal of the American Statistical Association 100(469), presidential address, pp. 1–5. Johnson, A., & Lenhard, J. (2011). Toward a new culture of prediction: Computational modeling in the era of desktop computing. In A. Nordmann, H. Radder, & G. Schiemann, Science transformed? Debating claims of an epochal break (pp. 189–199). Pittsburgh, PA: University of Pittsburgh Press. Krüger, L., Daston, L., & Heidelberger, M. (1987). The probabilistic revolution. vol. 1: Ideas in history. Cambridge: Cambridge University Press. Krüger, L., Gigerenzer, G., & Morgan, M. (1989). The probabilistic revolution, vol. 2: Ideas in the sciences. Cambridge: Cambridge University Press. Lenhard, J. (2006). Models and statistical inference: The controversy between fisher and Neyman– Pearson. British Journal for the Philosophy of Science, 57, 69–91. Napoletani, D., Panza, M., & Struppa, D. C. (2011). Agnostic science: Towards a philosophy of data analysis. Foundations of Science, 16, 1–20. Tukey, J. W. (1962). The future of data analysis. Annals of Mathematical Statistics, 33, 1–67. Tukey, J. W. (1977). Exploratory data analysis. Reading (Mass.): Addison-Wesley.
Author Biography Johannes Lenhard received his PhD in mathematics from the University of Frankfurt and now teaches philosophy at the University of Bielefeld. His research centers on the philosophy of science and of mathematics. The recent work focuses on simulation and modeling and he is part of a new project on “Mathematics as a Tool” that started in the fall of 2012 at Bielefeld’s Center for Interdisciplinary Research.
123