Semantics and Services Enabled Problem Solving Environment for T. Cruzi

Amit Sheth

Kno.e.sis Center, Wright State University

University of Georgia (CTEGD, LSDIS)

T.cruzi is a protozoan parasite and a relative of other human pathogens that cause African sleeping sickness and leishmaniasis. Approximately 18 million people in Latin America are infected with this parasite, and as many as 40% of these are predicted to eventually suffer from Chagas disease, which is the leading cause of heart disease and sudden death in middle-aged adults in the region.

The use of industrial scale experimental methods has led to an enormous increase in the size of available datasets in life sciences. For example, annotated genomes and comparative analyses of two T.cruzi related kinetoplastids, Trypanosoma brucei and Leishmania major, and a whole organism (all life-cycle stages) proteome analysis of T.cruzi have been published. These dynamic datasets, addressing different but logically related aspects, are distributed over multiple databases that undergo frequent additions and curation. Biologists often need to query multiple sources of data simultaneously in the course of their research, which pertain to the same domain (e.g., T.cruzi) but are distinct types of data (e.g., genomic, proteomic, immunologic). For example the "diagnostic technique for identification of best antigens in T. cruzi" requires the use of microarray transcripts with associated provenance metadata, information from biomedical literature, and invocation of services to query remote databases.

In the Semantic and Services-enabled Problem Solving Environment for T.cruzi project we are creating a comprehensive infrastructure for management, querying, analysis and visualization of scientific data using the following approaches:

  1. Semantic provenance-enabled cyberinfrastructure for T.cruzi experimental data - Semantic provenance describes lineage or history of data modeled in a domain-specific provenance ontology. Provenance information is critical metadata to enable the verification of scientific results and validation of experimental process.
  2. Semantic text analysis approaches for extraction of knowledge from biomedical literature - Biomedical literature, for example Pubmed, represents a vast and valuable resource for life sciences research. The ability to extract relevant knowledge from biomedical text and its representation in Semantic Web standard formats such as RDF is an important research issue that is being addressed in this project.
  3. A Semantic services-based smart mashups environment for T.cruzi - The Web2.0 initiative has enabled users to "mash" data from multiple sources into useful applications. These mashups are increasingly becoming popular with life science researchers, but there are important research issues yet to be addressed in this field including use of semantic annotations with Web APIs.
  4. Query interface for complex query formulation and execution underpinned by ontology and services.

In this talk, Dr. Amit Sheth (PI) will discuss the details associated with each of the four approaches, the preliminary work and the objectives for the first year of this project. You can find more information on the project web page ( or on NCBO's Kno.e.sis project page.