The term gene was the hardest term in the ontology to define. We were not looking for your standard unit of inheritance definition, but needed something more concrete to be able to pin this term down to a location on sequence, and be able to reason over the parts of gene annotations. To define the term gene in SO, we asked several questions. What is the extent of a gene on biological sequence? What are its parts? Can it have parts that are geographically disjoint? What is the relationship between a gene and regulatory regions like enhancers and promoters. What is the product of a gene? Do we need to revise our understanding of the relationships we are using? Is the part_of relationship sufficient to describe the parts of a gene? What do we mean by part_of anyway?
We agreed that genes are associated with regulatory regions and transcripts. In the canonical gene annotation,[song.sourceforge.net/gff3.shtml] a transcript sequence can be located within the gene sequence and a gene is therefore composed of parts: transcripts, exons, introns, promotors etc. But this solution does not work for cases where the regulatory regions are dispersed to the point of being on different chromosomes, the gene is trans-spliced, or polycistronic. The problem was that we were making topological assumptions about the part_of relation.
This lead us to consider the part_of relationship we had been using to naively describe the structure of our gene. The implication being that if a region of sequence is a part_of another region of sequence, then the parts coordinates will be located within the wholes coordinates. This is not always true for some of the parts of a gene. Do we mean different things when we call something a part? There is much written about parts and mereology, which is out of the scope of the discussion of genes, but the part_of relationship can be divided into subtypes based on three criteria: substance, configuration and invariance. (Winston et al)
Relating this back to SO we found that we had two different kinds of part_of in the ontology: composite part_of object (exon composite_part_of transcript) and member part_of collection (regulatory_region member_part_of gene).
So a gene is thought of as a collection of transcripts and regulatory_regions. Transcripts on the other hand are composite objects made of exons and introns. It makes no sense for an intron to be located out of the bondary of a transcript. The type of gene: protein_coding, or non_protein _coding is captured by the kind of transcript it produces such as mRNA, tRNA etc.
The SO textual definition of gene: A locatable region of genomic sequence, corresponding to a unit of inheritance, which is associated with regulatory regions, transcribed regions and/or other functional sequence regions.
SO meetings 2004 http://song.sourceforge.net/SO_meeting.shtml
Winston M, Chaffin R, Herrmann : A taxonomy of part-whole relations. Cog Sci 1987, 11:417-444.