Unified Knowledge

I had an exam in the Philosophy of Science this week, so I’m still somewhat on the philosophical note. Science has of course interested me for a long time, but I had not really taken a good look at the foundations before. This should of course be obligatory for anyone that wants to be a scientist. If you really expect to become a scientist, looking at the world with critical eyes – one of the most obvious things to be critical about is of course the methodology or framework you’re working within.

Anyway, that was not what I was going to write about. One of the main subjects of the exam was scientific knowledge, how it’s accumulated and how it is linked, building up our interwoven web of knowledge. Some theories say that all science is one fact building on many others and so on until we reach an axiom, something that is taken to be so granted that it needs no further explanation.

My question here is: if this is the case, shouldn’t we be able to computerize our scientific knowledge? And in any case, are we doing enough to make sure that the web of scientific knowledge is as tightly interwoven as it could and should be?

The ideology that has gone furthest in claiming that all scientific knowledge is only branches of a single knowledge tree is the so-called logical positivism⚠. One of the tenets of this ideology is that all science in the end relies on logical statements, i.e. any theory can be deduced to be put forth in the language of logic or mathematics.

The logical positivists dreamt about unified science where all science would have a single, well defined, foundation. As an example, a theory in chemistry can be deduced to physics which can then be deduced to mathematics. Similarly, theories in biology or medicine could be deduced to chemistry, hence math. Even the social sciences could be fitted into this picture. Psychology is the biology and chemistry of the brain; Sociology is a wider application of psychology and same could even be said about economics. Anything that doesn’t fit into this scientific tree is not science.

There is something very intriguing about this idea. It has however proven hard to fit all science into it and even though it can be said for the most part that chemistry and physics have rigid roots in mathematics, taking the step from biology down to chemistry is still pretty far off, let alone psychology. Nevertheless, nobody in their right mind would write biology or psychology off as non-sciences. I think that many scientists, at least within the natural sciences, would agree that these links exist even though they have not been established thoroughly.

If this is the case, we can certainly claim that scientific knowledge can be computerized. If we can deduce every theory down to mathematics, computers surely have the home-court advantage. Such computerization could have dramatic consequences. A computer, working with the body of our entire scientific knowledge, even in a limited field, should be able to calculate its way to theories that human scientists have not yet found. Just the fact that the computer would be working with all this material at once, means that it would link together and find correlation between facts in otherwise unrelated studies. Human experts can obviously only read a tiny fraction of the information available in their field, but a computer should be able to work with ALL the information at once, making new connections on the fly. It would also not be confined to a specific field but could seek references from different branches of science, a practice that is too uncommon in the specialization of today’s science but has throughout history proven to be the source of some of our biggest discoveries.

But back to reality. As said before, only a small fraction of science is yet based on a mathematical foundation, and to my knowledge nobody has really made an effort to put even those parts forth in a computable format. Let’s also keep in mind that nobody has yet proven that all of science is really linked in the aforementioned way.

Today’s science largely relies on publication of scientific articles in science journals and citations from one article to another. This is an enormous amount of information. PubMed, an online database developed by the National Center for Biotechnology Information, covering only publications from medicine and partially biology journals, covers over 14 million article abstracts, adding on average more than 7,000 new abstracts per week!

It goes with out saying that this is many orders of magnitude more material than anyone can keep up with. New York Times has a very interesting piece [requires free registration] this week on how some scientists use text mining tools to help them make sense of this information overflow, and even to help them make new connections by searching for patterns in unrelated articles. In an example as old as 1988, a primitive search of this sort related migraine with magnesium deficiency, a relation that would later be backed up by experiments.

Text mining tools like these don’t make any discoveries on their own, but can be extremely helpful tools for human scientists and point out possibilities that would otherwise go unnoticed. But these tools face big challenges. Science articles are written in different languages. Within the same language, the vocabularies can vary between different fields and even from one scientist to the next. Worst of all, access to articles, especially in a digital format, is often restricted as journals require subscriptions or other compensation for access to their material. Rightfully so, but it can be frustrating to know that this is without doubt standing in way of many groundbreaking scientific discoveries.

One could envision a vast database with all scientific articles ever published. The articles would be indexed in countless ways and could be rated based on things like the journal that published them, the reviewers, authors, citations, etc. Mining such a database would not only point to a lot of new possible correlations, but also point out unnoticed discrepancies in the already published material.

Taking this even further, it could be of value to take into this sort of database also rejected articles and even non-scientific articles from magazines and newspapers. With careful handling these without doubt also hold important clues. As for the rejected articles, science often seems to forget that not only something that is proven to be the case is of value, but also theories that prove not to hold true. Such a fact may mean that this path need not to be tried again (or at least not in the same manner), and also that even though the theory didn’t prove right, the experiments can still hold some very valuable information.

I have for a long time held the view that with the origin of useful tools to exchange our ideas (read: language), humans in fact stopped evolving based on the trial and error mechanisms of biological evolution and started evolving based on our accumulated knowledge and ideas. As ideas can be spread and developed extremely rapidly, this means that evolutionary steps that took several generations before could with the help of language be taken in only a few moments of discussion or reasoning. This is in no way my idea. Richard Dawkins is probably its best known ideologist, but nobody may have put it as elegantly as Karl Popper when he says that this ability: “permits our hypotheses to die in our stead.”

If this is the case, shouldn’t we take better care of this core of the reason for our rapid evolution that is our accumulated knowledge? And then use our best tools to try and make the most of it?

Even if reality proves a little less philosophical, it is obvious that the wealth of information and knowledge is too vast for scientists to work on it without help from computers. We should therefore try to make as much of it as possible available to such tools and find ways to help them “make sense of it”, so that they can help us advancing science even further.

Stay in the loop