Ontology-Based Annotation of Biomedical Time Series Data

Raimond Winslow

Institute for Computational Medicine, The Johns Hopkins University

Almost one million Americans die of cardiovascular (CV) disease each year. More than 70 million Americans live with some form of heart disease. Understanding the cause and treatment of CV disease will require a truly integrative approach, spanning the molecular to the systems level. Clinical studies collecting multi-scale data (e.g., gene sequence, mRNA expression, protein expression, multi-modal imaging, and clinical data) from subjects in large cohorts are already underway.

Each of these studies face a common challenge - how to integrate and explore these data to identify the phenotype of specific CV diseases, and to discover features that predict disease risk, treatment, and outcome. Recently, the CardioVascular Research Grid (CVRG) project has been established to develop and deploy resources for representing, federating, sharing, and analyzing multi-scale CV data. This project is using emerging standards for describing diverse types of biomedical data. However, it is remarkable that there is currently no comprehensive ontology or data model for describing the single most commonly collected biomedical time-series data type in modern health care, the electrocardiogram (ECG).

In this Driving Biological Project with the National Center for Biomedical Ontology, we will use NCBO tools for creating and managing biomedical ontologies to develop an ontology that describes ECG data collection protocols, features of time-evolving ECG waveforms, ECG analysis algorithms, and data derived from analysis of the ECG. NCBO tools will also be used to integrate this ontology into an ECG data management and analysis portal being developed as part of the CVRG project. This project is important because every clinical study of CV disease collects ECG data in conjunction with one or more data types. The ability to annotate and share ECG data will make it possible to perform data quality assessment, reproduce study results, and integrate data across multiple studies. The ability to unambiguously label and describe variables derived from the ECG will make it possible to apply machine learning algorithms and discover features in these and other data that support diagnosis of heart disease, prediction of risk for sudden cardiac death, and suitability of patients for implantable cardioverter defibrillator placement.

See more information about this Project.