We created a set of mappings by applying simple lexical matching to preferred names and synonyms across all 4,021,662 concepts in 140 BioPortal ontologies and 67 vocabularies in the Uniﬁed Medical Language System (UMLS). This process resulted in 4,001,775 mappings.
By analyzing these mappings, we were able to produce data on the connectivity of ontologies.
The data gathered over the set of mappings can be found here. The data is in a .xls format spreadsheet, with several worksheets containing data. Please be sure to consult the worksheet labeled "Introduction" for explanations of the data contained in the spreadsheet.
Additionally, we created several graphs of the ontologies for different thresholds of percent-normalized links. These graphs can be found here. Each graph image has the filename ontoX.tiff, where x is the the similarity threshold for links included in the graph. For example, every edge in the graph onto70.tiff has at least 70% of concepts in the source ontology mapped to concepts in the target ontology.
Finally, we make avaialable the mappings we plan to upload into BioPortal. They are available here. The mappings file is a SQL dump of a table containing all mappings between BioPortal ontologies generated by our algorithm. It can be used to generate a table of mappings with the following columns:
- source_ont: the id of the source ontology
- source_version_id: the version id for the source ontology
- source_ont_name: the name of the source ontology
- source_id: the identifier for the concept from the source ontology being mapped
- source_name: the preferred name of the concept from the source ontology
- target_ont: the id of the target ontology
- target_version_id: the version id for the target ontology
- target_ont_name: the name of the target ontology
- target_id: the identifier for the concept from the target ontology being mapped
- target_name: the preferred name of the concept from the target ontology