Semantic cross-domain integration: The intersection of research, public, and clinical data; creating applicable knowledge for decision support in patient-centric healthcare



While cross-domain data integration has been acknowledged to play a key role in translational research, it has remained challenging in conventional settings due to time and cost barriers for initial integration and lack of flexibility and extensibility as resources and needs evolve. This is where semantic methods excel, not only due to their support for dynamic needs, but also due to the fact that a network-based data model is more apt to handle complex of biological systems.

Biomarker qualification and validation with functional insights on a systems scale is demanding. Unifying public resources and internal datasets and proper weighing of markers is non-trivial. Therefore, the use of biomarkers from multiple modalities (-OMICs, imaging, clinical endpoints), while increasing in the scientific community, has, for the most part, lagged behind the promise for their use in patient screening and decision support. This is partly because of the difficulties in meaningful semantic integration of heterogeneous experimental and public data, and the complexity in understanding the involved biological functions. Both of these essential challenges need to be addressed to make a resulting knowledgebase applicable for decision support in clinics.

Building on advanced resource description framework standards (RDF), the ease of semantic integration methods to overcome these challenges is shown. Integration of heterogeneous sources, taxonomies, ontologies, non-standardized vocabularies and the complexity in meaningful integration of multiple -OMICs data sets with clinical observations will be demonstrated. In this case, the Sentient™ Suite will be applied to capture semantic patterns and create predictive network models, using virtually any combination of internal experimental data and / or external published information. These patterns apply semantic SPARQL query technology to visually build complex searches across multiple information sets.

Based on three recent customer examples, Sentient has applied semantic standards to provide the framework for:

•    Assessment of treatment effectiveness for combinatorial prostate cancer therapies;
•    Pre-symptomatic detection, scoring and stratification of transplant patients at risk of heart or kidney failure; and
•    Impact of inflammatory responses in high risk plaque rupture

Semantic linking of experimental correlation networks with curated public domain knowledge networks (via direct queries or from SPARQL  endpoints, such as LODD) helps researchers gain a better understanding of mechanistic aspects of biomarkers at a functional level. In this demonstration, application ontologies derived from experimental data and analytical results will be merged with formal public ontologies (such as NCBO, OBO). Resulting hypotheses can be captured in arrays of rich SPARQL queries representing biological signatures. This session will conclude with a discussion on how these profiles can be stored in an Applied Semantic Knowledgebase (ASK) for further validation or application unique to a specific research focus and how this knowledge is applied to highly sensitive, specific and scored patient screening – providing decision support for life sciences and personalized patient-centric healthcare.


1)  R. Stanley,  B. McManus,  R. Ng,  E. Gombocz,  J. Eshleman, C. Rockey: Case Study: Applied Semantic Knowledgebase for Detection of Patients at Risk of Organ Failure through Immune Rejection, World Wide Web Consortium (W3C) Semantic Web Use Cases and Case Studies (2011).
2) E. Gombocz, R. Stanley, J. Eshleman: Computational R&D in Action: Integrating Correlation and Knowledge Networks For Treatment Response Modeling and Decision Support  Advanced Strategies for Computational Drug R&D (2010).


Dr. Erich Gombocz has more than 30 years of experience in Life Science research, laboratory automation and data management in scientific and distributed systems environments, and more than 30 years programming experience in instrumentation control, user interface, database design, scientific analysis, and on-line laboratory automation as well as being developer of innovative software algorithms and architecture.


Focusing on semantic data integration and knowledge management in life sciences, he founded IO Informatics in 2003 together with Bob Stanley to apply systems biology approaches to challenges in the area of pharmaceutical and clinical decision-making.


Dr. Gombocz has published over 60 scientific publications and holds currently more than 40 biotechnology- and software-related US and international patents. He is an international expert in separation science and bioinformatics, a member of several professional organizations, and serves on the editorial board of a number of scientific journals.