Difference between revisions of "SO:Composite Terms"

From NCBO Wiki
Jump to: navigation, search
(Replacing page with 'Redirected to: [http://wiki.geneontology.org/index.php/SO:Composite_Terms Composite_Terms] on GO wiki')
Line 1: Line 1:
SO contains cross-product definitions (aka genus-differentia
Redirected to: [http://wiki.geneontology.org/index.php/SO:Composite_Terms Composite_Terms] on GO wiki
definitions, aka intersection definitions) for many composite
terms. This document describes the methodology. Some familiarity with
the obo file format is assumed.
This document is aimed primarily at ontology editors and
technical/software/database people who consume the ontologies. It
isn't intended for the end-users of ontologies, much of this will be
invisible to them.
Here is an example of a term done using the pre- crossproduct
  id: SO:0000283
  name: engineered_foreign_transposable_element_gene
  is_a: SO:0000111 ! transposable_element_gene
  is_a: SO:0000281 ! engineered_foreign_gene
  is_a: SO:0000805 ! engineered_foreign_region
This is problematic. We multiple is_a parents, due to a lack of
consistent axis of classification. This leads to tangled DAGs and
problems of ontology maintenance, visualisation and reasoning.
Note the editor has to manually check for possible other is_a parents
such as "engineered_transposable_elemenent_gene" (ETEG). Furthermore,
if ETEG is added, the is_a parentage of EFTEG must be changed. This is
tedious, time consuming and error-prone.
The problems continue further up the DAG:
  id: SO:0000281
  name: engineered_foreign_gene
  is_a: SO:0000280 ! engineered_gene
  is_a: SO:0000285 ! foreign_gene
  is_a: SO:0000804 ! engineered_region
If we were to examine the whole DAG we would see a lot of redundancy,
and no modularisation
Here is an example (showing *is_a* only):
=The cross-products solution=
The first aspect of the solution is '''modularity'''. We realise the
separation between the core feature types (such as gene, region) and
between the qualities (properties, attributes) of those
features. Examples of feature qualities are "being engineered" and
"being foreign". These live in a separate part of the ontology, and
trace their is_a parentage solely to "feature_attribute", not to
We also introduce a new relation "has_quality", which obtains between
some kind of quality-bearing entity (such as a gene) and a quality.
Using these ingredients we can provide 'Genus-differentia' definitions
of terms in a form that is computationally visible. In a definition of
this form, a term is defined using a broader category (the genus), and
a collection characteristics that distinguish from other instances in
the same category (the differentia).
Genus-differentia definitions form one of the core best practices in
the OBO Foundry (http://www.obofoundry.org). These definitions can be
written as "A <G> 'which' <D>". For example, we can define an
engineered foreign transposable element gene as "A transposable
element gene *which* is engineered and is foreign". The genus is
"tranposable element gene" and the differentia are "is engineered" and
"is foreign".
We can also expose these definitions in a way that is computationally
visible. [add picture of editing in oboedit here].
==obo file representation==
The underlying representation in oboedit is as follows:
  id: SO:0000283
  name: engineered_foreign_transposable_element_gene
  intersection_of: SO:0000111 ! transposable_element_gene
  intersection_of: has_quality SO:0000783 ! engineered
  intersection_of: has_quality SO:0000784 ! foreign
The "intersection_of" lines list the necessary and sufficient
conditions for inclusion in a class (term). For this to be a G-D
definition, there should be one intersection_of line without a
relation (the genus) and at least one line with a relation (the
Of course, most people will not be looking at obo files. Oboedit provides a plugin for editing these genus-differentia definitions (see below for screenshot)
Using these definitions, a computer can calculate where EFTEG should
be placed in a DAG (provided similar definitions are provided for
other terms). The computer can also calculate that EFTEGs should be
returned in queries for ETEGs or EFRs ('''engineered_foreign_region'''s).
These caclulations are typically done with a 'reasoner'. oboedit has a reasoner built-in.
The blue squiggly lines are 'is_a's that have been inferred by oboedit using the genus-differentia definitions. They have 'not' been asserted by the person editing the ontology.
This is all well and good for oboedit users, but not everyone uses uses this tool. Whilst there are many other reasoners available, we should still provide the DAG fully classified so that there are no additional dependencies required by consumers of the ontology.
We can configure oboedit to save all inferred 'is_a' links (see issues, below). The saved file will have entries like this:
  id: SO:0000283
  name: engineered_foreign_transposable_element_gene
  intersection_of: SO:0000111 ! transposable_element_gene
  intersection_of: has_quality SO:0000783 ! engineered
  intersection_of: has_quality SO:0000784 ! foreign
  is_a: SO:0000111 ! transposable_element_gene
  is_a: SO:0000281 ! engineered_foreign_gene
We call the is_a links above 'asserted', because they are explicitly stated in the file, rather than implicitly inferred by the oboedit reasoner.
This means that software can ignore the intersection_of lines safely,
the old tangled DAG can still be displayed as normal.
When the ontology with asserted 'is_a' links is viewed in oboedit, it will look like this:
The red arrows indicate asserted 'is_a' links that could have been inferred had they not been there
The public version of the ontology contains the logical definitions
The genus-differentia matrix can be manipulated as an excel file
[[Media:so-xp.xls]] -- generated 2006/08/25
The management of the tangled is_a DAG is
handled automatically by software, so the ontology editor does not need
to worry about it. Downstream tools should not be affected.
However, second-generation tools can choose to use the intersection_of
lines; they can be used to present the ontology DAG to the user in a
more tractable, modular fashion. The genus in the definition can be
used as the "core" is_a parent. The differentia could be presented in
a separate display.
=open issues=
==saving inferences==
oboedit does not allow you to save all inferred 'is_a's. Currently
so-xp is saved without the inferred is_a parents which limits its
applicability to first-generation obo tools (ie those without reasoning capabilities).
Until oboedit can do this, it may be necessary to semi-manually add
the is_as (oboedit shows you these visually but it doesn't provide a
way to materialize them in the resulting saved obo file).
Another option is to convert to owl and use a third-party open source
reasoner such as pellet to do the classification, then convert back to
obo. This could all be automated in a script. The curator version
(so-xp.obo) would not have the is_as, but the so.obo file that is for
public consumption and use by first-generation tools would have the
is_as materialised.
UPDATE: we used Pellet to do the initial classification. Results still being checked.
Once John is back we can discuss ways of making it easier to save the oboedit classification results, or using obo2obo to fill these in, but Pellet seemed to work as a one-off
===what happens on changes?===
One advantage in never asserting the inferrable 'is_a' links is never having to worry about recreating 'is_a  links when the core parts of the ontology change.
For example, if we were to create an intermediate type between "gene" and "region" (for example, "functional region") and also wanted to created terms like "engineered functional region") we would simply go ahead and do that, provide genus-differentia definitions, and let the reasoner compute the is_a DAG on-the-fly.
However, as we stated earlier, we want to save the obo file with the DAG fully classified, since most tools that consume the obo file will not be reasoner-aware. We can still use oboedit to create the is_a links automatically, and configure it so that these are saved. The problem here is that change in one part of the ontology can percolate to large sections of the DAG - how do we know which links to replace and which to preserve?
One way is to keep around information on which links were asserted directly by a curator '''not''' as a result of reasoning, and which were originally asserted by the reasoner? For example, we could use trailing qualifiers:
  id: SO:0000283
  name: engineered_foreign_transposable_element_gene
  intersection_of: SO:0000111 ! transposable_element_gene
  intersection_of: has_quality SO:0000783 ! engineered
  intersection_of: has_quality SO:0000784 ! foreign
  is_a: SO:0000111 ! transposable_element_gene          {inferred=true}
  is_a: SO:0000281 ! engineered_foreign_gene            {inferred=true}
The reasoner would know that these could be discarded if they can no longer be inferred.
This is still under discussion. For now, these links may have to be removed manually - which is no worse than the pre-reasoner situation when everything was done manually
Currently SO has its own ontology of feature attributes; eventually we
may want to merge this with PATO [[PATO:Main_Page]]
So also uses its own has_quality relation. Eventually it should use
the version that will be in RO [[RO:Main_Page]].
=applicability of methodology to other ontologies=
This work was carried out as part of a larger project within the Gene Ontology and the http://www.obofoundry.org [OBO-Foundry] to create logical and computable genus-differentia definitions for terms, linking across ontologies where appropriate. See [[XP:Main_Page]]
We are applying the same methodology to GO, although the xps are not
yet part of the public release. We are focused on xps for GO terms
that refer to CL terms right now.
=other resources=
==mail lists==
==oboedit guide==
Link to appropriate section of oboedit guide here...
==background reading==
===definitions in the OBO Foundry===
Forthcoming paper
Obol paper; see link on:
===Modularity in ontologies===
These tutorials are very OWL and Protege centric, but much of it also applies to obo1.2 and oboedit:

Latest revision as of 18:17, 13 December 2008

Redirected to: Composite_Terms on GO wiki