Unlocking Diagnosis With Deep Phenotyping: From Rare Diseases to Chronic Conditions

Natural LanguageComputer VisionGuest AuthorsMachine Learning

By Calum Yacoubian On Jul 6, 2022

Within precision medicine, and specifically rare diseases, clinicians and researchers rely on genetic and diagnostic testing to help drive accurate diagnosis and treatment. However, genomic data alone are often insufficient to unlock the life-changing diagnoses of rare diseases. Well-curated and accurate phenotype data, which may include quantified observable traits such as short stature, low set ears, and blood biochemistry, along with genetic and diagnostic test results, are vital for shortening the diagnostic journey of these patients and identifying the most effective treatments available.

Recommended Healthcare News and Insights:

AI: Continuing the Chase for Brain-Level Efficiency

The need for accurate patient phenotyping is not a new concept.

In fact, over 20 years ago, Isaac Kohane, Chair of the Department of Biomedical Informatics and the Marion V. Nelson Professor of Biomedical Informatics at Harvard Medical School, predicted that the accurate practice of patient phenotyping would become essential as the volume of genomic information continued to surge. And outside the context of genomic medicine, the art and practice of phenotyping is as old as medicine itself.

The phenotype is simply the clinical and physical manifestations of genes and disease and therefore has been the basis for treatment decisions since the dawn of medicine. Within today’s genomic context, deep phenotyping refers to the practice of accurately capturing clinical information from patients – from signs and symptoms to laboratory results – in order to identify and understand an individual’s underlying genetic makeup.

As Professor Kohane predicted – there has been a flood in genomic information. Rather than rely on manual, time-consuming chart review to enable deep phenotyping efforts to “keep up,” researchers and clinicians are employing artificial-intelligence-based technologies such as natural language processing (NLP) for deep computational phenotyping or deep phenotyping.

NLP enables computers to “read” complex medical documentation and transform unstructured, text-based data into structured information that is more suitable for analysis.

Rapid diagnosis is critical

Nearly 1 in 10 Americans, or roughly 25 to 30 million people, have one of approximately 7,000 identified rare diseases. A disease is considered rare if affects fewer than 200,000 people in the United States. All pediatric cancers are considered rare.

In treating rare diseases, time is of the essence, which means that rapid diagnosis is critical.

Lumerin Launches Morpheus Public Testnet for Decentralized AI Compute

Jul 26, 2024

Pryon Unveils ETL Ingestion Engine to Unlock Value of Unstructured Data for Enterprise AI

Jul 26, 2024

AVANT and PolyAI Partner to Provide Advanced Voice AI Solutions to Call Centers

Jul 26, 2024

Prev Next 1 of 7,138

NLP-based deep phenotyping can help clinicians accelerate diagnoses by pulling out key features to create well-structured data for research and analysis. NLP is particularly valuable for mining, capturing and normalizing messy, free-text data that are often found in the “notes” sections of electronic health records (EHRs).

In cases of rare diseases, NLP can transform free text to Human Phenotype Ontology (HPO) terms to capture phenotypic data that are created when a patient undergoes testing. Clinicians can then use this information in downstream tools to better understand the results of genetic tests based on any phenotype presentation.

NLP Improves Diagnosis and Identification: Two Examples of Deep Phenoyping Applications

Following are two examples of how healthcare organizations are leveraging NLP to improve diagnosis and identification:

Turning the Unknown to Known

Chromosomal microarray is a diagnostic test routinely used in clinical practice. Testing will often identify a known variant or causal mutation, though sometimes testing finds a Variant of Unknown Significance (VUS).

At one pediatric center where this test is routinely done, a VUS is reported in 40% of the tested patients. In an effort to better understand these variants – and determine if they are indeed causal – researchers historically spent many hours reviewing clinical documentation and EHR data to pull out phenotype information. Because the process was so laborious, the organization looked for alternative ways to conduct this deep phenotyping, and ultimately landed upon NLP.

With NLP, the pediatric center was able to identify a significantly deeper phenotype – on average 71 terms per patient versus 29 terms from manual curation. The NLP process was also significantly faster, allowing researchers to phenotype 10 patients in just 10 minutes, compared to 10 patients in 34 hours. As a result of this deeper phenotyping, some variants were reclassified as significant, which, once published, improves the likelihood that a child – and children in the future who have the same mutation – will received treatment that is better-tailored to their condition.

Pinpointing Heart-Disease Patients From a Large Population Using NLP and Deep Phenotyping

The same technologies that enable us to unlock the diagnostic phenotypes in rare disease can also be turned to different conditions and populations.

A large California health provider wanted to understand the prevalence of aortic stenosis (a narrowing of the aortic valve opening) in its patient population. However, the organization struggled to identify these patients because structured data (such as diagnostic codes) underrepresent the true affected population, and other data that contains diagnostic information – such as echocardiogram reports – are completely unstructured.

To surmount these barriers, clinicians used NLP to analyze over 1 million patient records and echocardiogram reports to extract diagnostic criteria such as ejection fraction, physician-documented severity of disease, and end diastolic volume. With this application of NLP, the health system was able to identify 54,000 patients with the condition, 35% of whom did not have a corresponding administrative code for the disease. This was a significant improvement over manual search methods, demonstrating the value of the unstructured medical record as a treasure trove of deep phenotype data that can be used for individual patient management, as well as to increase population level-understanding of disease prevalence and burden.

[To share your insights with us, please write to sghosh@martechseries.com]