New cloud-based OMIM text mining service for enhanced genotype-phenotype analysis


Advanced NLP text analytics over gene-disease database brings powerful search benefits for target identification, NGS annotation, clinical genetics and diagnostics.


(Cambridge UK and Boston USA) ... Linguamatics announces that it is making the Online Mendelian Inheritance in Man® (OMIM) data available with its market-leading text analytics platform, I2E.  The new service will be offered on the cloud via Linguamatics I2E OnDemand platform and is part of Linguamatics’ ongoing strategy to expand the range of off-the-shelf content accessible through its text mining and knowledge discovery solutions.

I2E OnDemand provides access to a wide variety of data such as MEDLINE, FDA Drug Labels, Patents,, PubMed Central (open access subset) and NIH grants. The addition of OMIM allows users to accurately identify and extract information to reveal genetic associations for unusual clinical case presentations or phenotypes; or to search for potential targets for a particular therapeutic area, for initial target selection.

OMIM is a comprehensive catalogue of all known human diseases with a genetic component. The database includes documented associations to the relevant genes in the human genome, and related information including gene and disease descriptions, clinical synopsis, animal models, inheritance, mapping, history, and more.

Dr David Milward, Chief Technology Officer at Linguamatics, explains “Combining genotype-phenotype relationships extracted from OMIM with relationships from other sources, such as MEDLINE, gives I2E users an excellent resource for NGS annotation, target discovery, and clinical genomics, in order to better target the molecular basis of disease.”

The key benefits of accessing OMIM in I2E are:

  • Use of domain-specific ontologies (e.g. for diseases, genes, mutations and other gene variants) to enable high recall compared to searching via the OMIM interface
  • Powerful querying, either out-of-the-box or custom, to enable deeper access to the valuable scientific detail within each OMIM record, such as extraction of gene-gene interactions, gene/protein mutations, mouse models, clinical details
  • Ability to pull out relationships (e.g. between genes and phenotypes) from both structured data within the OMIM record, and the unstructured text fields
  • Creating structured output from both the structured and unstructured text means results can be visualized for rapid decision support


About OMIM

Online Mendelian Inheritance in Man® (OMIM) is a comprehensive, authoritative compendium of human genes and genetic phenotypes that is freely available and updated daily. The full-text, referenced overviews in OMIM contain information on all known Mendelian disorders and over 12,000 genes. OMIM focuses on the relationship between phenotype and genotype. It is updated daily, and the entries contain copious links to other genetics resources.

About Linguamatics

Linguamatics is the world leader in deploying innovative natural language processing (NLP)-based text mining for high-value knowledge discovery and decision support. Linguamatics I2E is used by top commercial, academic and government organizations, including 17 of the top 20 global pharmaceutical companies, the US Food and Drug Administration (FDA) and leading US hospitals. I2E can be used to mine a wide variety of text resources, such as scientific literature, patents, Electronic Health Records (EHRs), clinical trials data, news feeds, social media and proprietary content. I2E can be deployed as an in-house enterprise system, or as Software-as-a-Service (SaaS) on the cloud.

Linguamatics is a winner of the Queen’s Award for Enterprise 2014 for International Trade.

For further information, visit or




Linguamatics is the world leader in deploying innovative natural language processing (NLP)-based text mining for high-value knowledge discovery and decision support.