IBM AI Provides Ultra-Modern Captioning for News Broadcasts

By Viraj T On May 15, 2019

IBM researchers have devised a software architecture that can achieve best-in-class results for captioning news broadcasts. Only about two years ago, the company had achieved something similar with transcriptions which is not as easy as it sounds. The machine learning driven initiative had to outsmart a plethora of obstacles before reaching its goal. Now, researchers of the Armonk, New-York based software giant have achieved a breakthrough in captioning capabilities. They have detailed their findings in a paper and will be presenting it later at a conference in Brighton.

IBM states the technology was hard to develop considering background noises and news anchors speaking about a wide range of topics. Also, there was a large volume of disparate subjects like onsite interviews, multimedia, TV show clips et al.

As IBM researcher Samuel Thomas explains in a blog post, the AI leverages a combination of long short-term memory (LSTM) — a type of algorithm capable of learning long-term dependencies — and acoustic neural network language models, along with complimentary language models. The acoustic models contained up to 25 layers of nodes (mathematical functions mimicking biological neurons) trained on speech spectrograms, or visual representations of signal spectrums, while the six-layer LSTM networks learned a “rich” set of various acoustic features to enhance language modeling.

MAXISIQ Launches Dedicated AI Consulting Services Group

Apr 24, 2026

Scality Appoints Former Inlayer CEO Greg Difraia as Senior Vice President of AI Alliances and Partnerships

Apr 24, 2026

Corvic AI Launches V3, Joins Google Cloud Marketplace

Apr 24, 2026

Prev Next 1 of 42,845

IBM researchers followed the below-mentioned modus operandi –

The entire system was fed with 1,300 hours of data that was imported from the Linguistic Data Consortium
The researchers deployed AI on the test set — the set consisted of two hours of data from six shows all tied together by 100 overlapping speakers
Then there was a second test with four hours of data from 12 shows with 230 overlapping speakers
For measuring results, IBM worked with speech and search technology firm Appen
The results — 6.5% & 5.9% on the first and second test respectively
This was deemed a little poorer than human performance (3.6% and 2,8% on the first and second test respectively)

IBM AI Provides Ultra-Modern Captioning for News Broadcasts

Quick Links

Visit Our Other Sites

Follow Us

Interested in our Customized Editorial Services?

Please fill your details and we’ll get in touch with you!

NEWS

INTERVIEWS

INSIGHTS

AI RADAR

SERVICES

SUBSCRIBE

CONTACT US

Brought to you by

To repurpose or use any of the content or material on this and our sister sites, explicit written permission needs to be sought.

Copyright © 2026 AiThority. All Rights Reserved. Privacy Policy

IBM AI Provides Ultra-Modern Captioning for News Broadcasts

Quick Links

Visit Our Other Sites

Follow Us

Interested in our Customized Editorial Services?

﻿Please fill your details and we’ll get in touch with you!

NEWS

INTERVIEWS

INSIGHTS

AI RADAR

SERVICES

SUBSCRIBE

CONTACT US

Brought to you by

To repurpose or use any of the content or material on this and our sister sites, explicit written permission needs to be sought. Copyright © 2026 AiThority. All Rights Reserved. Privacy Policy

Please fill your details and we’ll get in touch with you!

To repurpose or use any of the content or material on this and our sister sites, explicit written permission needs to be sought.

Copyright © 2026 AiThority. All Rights Reserved. Privacy Policy