From Ancient Philosophy to Drug Safety at GSK by Gary H. Merrill, Semantic Technologies Group, GSK

I would like to take this opportunity to describe the type of work that the Semantic Technologies Group in Statistical and Quantitative Sciences has been involved with over the past five or so years.  Since most SQS and DDS members are unfamiliar with this area, this letter will be a bit longer than is customary.   We start with a brief history lesson.
 

In the sixth century BC, Thales (THAY-leez) of Miletus hypothesized that everything was fundamentally composed of water.  It can be argued that Thales thereby created the discipline of ontology, a sub-area of metaphysics in philosophy that deals with what exists and how whatever exists is related to other things that exist.  This, of course, requires a lot of deep thought.  Thales is also reported to have brought geometry (in both theoretical and practical aspects) to ancient Greece and was something of a scientist in addition to being a deep-thinking philosopher.  In his quest to determine the ultimate substance from which all others were composed he also studied olive oil.  (Exactly why he didn't decide that everything was made of olive oil, I don't know.)   In the process he learned much about how olives grow and he began successfully to predict weather cycles and their effects on olive crops.  One year, after several years of drought, he predicted a year of abundant rain, bought options on all the olive presses in the area, and made a killing when his prediction came true.  Thus Thales also seems to have been an innovator in predictive modeling.  Of course, he was quite wrong about the whole water thing; but that's the nature of empirical science.  It was a good try.   And we can see that deep thought about things is often useful in science.

Fast-forward to the early 1700s and the work of Gottfried Wilhelm Leibniz, a professional diplomat who was also a full-time philosopher and mathematician (he invented differential and integral calculus independently of Newton).  In talking about his own work in ontology, Leibniz remarks that
 
"To these two kinds of arrangement [synthetic and analytic] we must add a third. It is classification by terms, and really all it produces is a kind of Inventory. The latter could be systematic, with the terms being ordered according to certain categories shared by all peoples, or it could have an alphabetical order within the accepted language of the learned world. ... And there is even more reason why these inventories should be more useful in the other sciences, where the art of reasoning has less power, and they are utterly necessary in medicine above all.  [Emphasis added.]"
 
So Leibniz viewed ontology as the creation of a system or systems of categories to be used as the basis of scientific reasoning, and he observed (over 300 years ago!) how critical such systems are in the domain of medicine.
 
Fast-forward again to the late 20th century where – in the age of high-speed digital computers and sophisticated methods in statistics, computer science, artificial intelligence, and machine learning – scientists begin to see the critical importance of creating formal ontologies in science that can be used to make sense of the mountains of data that confront them, and can be employed to enhance data analysis, knowledge discovery, information retrieval, and inferencing.  They begin to create such ontologies and to employ them in their work; and so we have the Gene Ontology, the Disease Ontology,  the Open Biomedical Ontologies consortium, the National Center for Ontological Research (NCOR), and the set of over 100 ontologies, thesauri, or coding schemes incorporated into the Unified Medical Language System (UMLS) that has been created and is maintained by the U.S. National Institutes of Health as part of the National Library of Medicine.
 
Over the centuries it turns out that the field of ontology has moved, at least in part, from esoteric philosophy in a more formal and scientific direction; and it is now largely regarded as an inter-disciplinary domain involving philosophy, formal logic and semantics, computer science, and linguistics.  It is, in my experience and opinion, the very basis for what is referred to as "information science" or "informatics", and there are now well over one hundred departments in universities in the U.S. alone that offer degrees in informatics.  As an example, the School of Informatics at Indiana University was founded by two philosophers, and significant schools of informatics are maintained by such well-known institutions as the Mayo Clinic, Columbia University, and Stanford University.  Locally to RTP, Duke University has recently created the Duke Ontology Group within its biostatistics department.
 
So how has work in the area of ontology had any impact at GSK and on GSK's products?   And how is work in ontology (and its use in biomedicine) being pursued at GSK?
 
A partial answer to this question can be found by looking at some of the papers and software that have been made available on our Biometrics.com site.  Some of these pertain to foundational work in creating or analyzing formal biomedical ontologies (such as understanding the complexities of the UMLS or of translating one formal ontology to another for use in drug safety analysis).  Some pertain to our CodeSlinger software application that has recently been made publicly available for use by academic, industry, and government researchers.  Other papers and presentations have come out of our collaboration with the Logic and Cognitive Science Initiative (LACSI) at N.C. State University.  And still others document work that was done as part of the SafetyWorks pharmacovigilance project here at GSK.  Some work that does not explicitly appear among these papers pertains to contributions made to the use of the DEX (Dictionary EXchange) capabilities as part of the data stewardship initiative within DDS.
 
Most of this work has come out of our satellite site on the Centennial Campus of N.C. State, and in collaboration with other DDS organizations such as Epidemiology and Decision and Quantitative Sciences.  At least one Ph.D. dissertation by a GSK-supported student (see Domain Enhanced Analysis of Microarray Data Using GO Annotations) has made use of ontological techniques investigated and developed at that site.  Finally, a significant contribution of GSK to the Observational Medical Outcomes Partnership involves the creation and analysis of "mappings" from one ontology to another, and their use in pharmacovigilance analysis.
 
It is difficult to see precisely what the future holds for all of this.  But it is clear – particularly in the context of anticipated advances in the use of electronic health records – that  there is still much work to do in extending the original ideas of Thales into 21st century biomedical science.