John Snow Labs De-Identification Solution Enables Data Monetization and Healthcare Innovation

By AIT News Desk On Oct 17, 2022

Healthcare Organizations Can Now Leverage Larger and More Diverse Datasets to Improve Operations and Care

John Snow Labs, the Healthcare AI and NLP company and developer of the Spark NLP library, announced improvements to its automatic de-identification solution. The company recently established a new state-of-the-art record on the n2b2 standard de-identification benchmark, achieving an F1 score of 96.1%, and decreasing its error rate by 33%. By enabling organizations to automatically de-identify large datasets, John Snow Labs empowers product innovation and cost savings for healthcare organizations worldwide.

Providing custom de-identification required for the monetization of data, John Snow Labs’ automatic de-identification solution is already proving valuable for users. The service is based on the company’s Spark NLP for Healthcare library, built on top of the Spark big data framework, enabling the processing of millions of records on large Spark or Databricks clusters. The de-identification solution can be delivered as an end-to-end system or a software library with optional professional services.

“We are using John Snow Labs to de-identify patient notes on a massive scale and the results from the out-of-the-box de-identification models have been remarkable,” said Nadaa Taiyab, Senior Data Scientist, Providence Health Intelligence. “It has been simple to fine-tune models with our own annotated data and improve pipeline results by adding regular expressions and text matching where needed. Overall, the code is very modular and easy to use, making the challenges and complexities of such a large-scale project much easier to navigate.”

Telit Cinterion and Nokia Collaborate to Deliver Next-Generation Mission-Critical Connectivity and Edge Intelligence for AI-Powered Industrial Operations

Jan 7, 2026

Norm Ai Launches AI-Driven DDQ and RFP Completion Solution to Transform Institutional Questionnaire Workflows

Jan 7, 2026

Cloudfy Launches Enterprise v5: Modular, API-First, and Ready for the AI-Driven Future of B2B Commerce

Jan 7, 2026

Prev Next 1 of 42,436

Healthcare providers possess vast amounts of unstructured patient-level data. This data has tremendous value, but often remains untapped due to legal and regulatory requirements. However, by removing protected health information (PHI), the data becomes usable and has the potential to create new revenue streams and spark healthcare innovation. However, this can be challenging, as stricter de-identification rules lower the risk of re-identification, but also decrease the usability of the data.

AI ML in Marketing: AI and Big Data Analysis Used to Find Brands’ Emotional Connection

While manual removal of PHI is possible, it’s often rife with human error, and requires multiple reviews. Additionally, the larger the data set, the more labor- and cost-intensive the project. Academic literature shows that for a team with an average cost of $83 per hour total compensation, processing 135 notes per hour of an average length of 130 words, costs $0.61 per note. For large data sets consisting of millions of records, this is simply not feasible.

“Natural language processing has made it possible to automatically de-identify valuable, but otherwise unusable, unstructured patient-level data, like clinical notes, images, and scanned documents,” said David Talby, CTO, John Snow Labs. “Once de-identified, the datasets can be shared more safely and easily with researchers and builders, ushering in a new generation of accurate an innovative healthcare solutions. Without large-scale automatic data de-identification, this would not be possible at scale.”

Future of AI-driven Customer Relationship: Microsoft’s Viva Sales and the Future of AI-driven Customer Relationship and Experience Management

[To share your insights with us, please write to sghosh@martechseries.com]

John Snow Labs De-Identification Solution Enables Data Monetization and Healthcare Innovation

Healthcare Organizations Can Now Leverage Larger and More Diverse Datasets to Improve Operations and Care

Quick Links

Visit Our Other Sites

Follow Us

Interested in our Customized Editorial Services?

Please fill your details and we’ll get in touch with you!

NEWS

INTERVIEWS

INSIGHTS

AI RADAR

SERVICES

SUBSCRIBE

CONTACT US

Brought to you by

To repurpose or use any of the content or material on this and our sister sites, explicit written permission needs to be sought.

Copyright © 2026 AiThority. All Rights Reserved. Privacy Policy

John Snow Labs De-Identification Solution Enables Data Monetization and Healthcare Innovation

Healthcare Organizations Can Now Leverage Larger and More Diverse Datasets to Improve Operations and Care

Quick Links

Visit Our Other Sites

Follow Us

Interested in our Customized Editorial Services?

﻿Please fill your details and we’ll get in touch with you!

NEWS

INTERVIEWS

INSIGHTS

AI RADAR

SERVICES

SUBSCRIBE

CONTACT US

Brought to you by

To repurpose or use any of the content or material on this and our sister sites, explicit written permission needs to be sought. Copyright © 2026 AiThority. All Rights Reserved. Privacy Policy

Please fill your details and we’ll get in touch with you!

To repurpose or use any of the content or material on this and our sister sites, explicit written permission needs to be sought.

Copyright © 2026 AiThority. All Rights Reserved. Privacy Policy