Making Sense of Unstructured Data in Medicine Using Ontologies




Changes in biomedical science, public policy, information technology, and electronic heath record (EHR) adoption have converged recently to enable a transformation in the delivery, efficiency, and effectiveness of health care. While analyzing structured electronic records have proven useful in many different contexts, the true richness and complexity of health records—roughly 80 percent—lies within the clinical notes, which are free-text reports written by doctors and nurses in their daily practice. We have developed a scalable annotation and analysis workflow that uses public biomedical ontologies and is based on the term recognition tools developed by the National Center for Biomedical Ontology (NCBO).This talk will discuss the applications of this workflow to 9.5 million clinical documents—from the electronic health records of approximately one million adult patients from the STRIDE Clinical Data Warehouse—to identify statistically significant patterns of drug use and to conduct drug safety surveillance. For the patterns of drug use, we validate the usage patterns learned from the data against FDA-approved indications as well as external sources of known off-label use such as Medi-Span. For drug safety surveillance, we show that drug–disease co-occurrences and the temporal ordering of drugs and disease mentions in clinical notes can be examined for statistical enrichment and used to detect potential adverse events.


Dr. Nigam H. Shah is an Assistant Professor of Medicine (Biomedical Informatics) at the Stanford School of Medicine. Dr. Shah's research is focused on developing applications of bio-ontologies, specifically building novel approaches to annotate, index, integrate and analyze diverse information types available in biomedicine. Dr. Shah holds an MBBS from Baroda Medical College, India, a PhD from Penn State University, USA and completed post-doctoral training at the Stanford Medical School.