Multimodal Model Chameleon by Meta
What is The News About?
The focus of generative AI competitions is moving toward multimodal models, and Meta has teased a possible response to the models developed by frontier laboratories. Instead of combining components with different modalities, its new model family, Chameleon, has been built to be multi-modal natively.
Read: 10 AI ML In Data Storage Trends To Look Out For In 2024
Meta hasn’t disclosed the models just yet, but their trials demonstrate that Chameleon outperforms the competition in text-only tasks and achieves state-of-the-art performance in image captioning and visual question answering (VQA), among other tasks.
Built from the ground up to learn from an interleaved blend of images, text, code, and other modalities, Chameleon employs an “early-fusion token-based mixed-modal” architecture. As language models do with words, Chameleon converts pictures into individual tokens. Text, code, and picture tokens form its unified language. Now it’s feasible to use the same transformer design for sequences that include both text and image tokens.
Researchers found that Google Gemini—which also uses an early-fusion token-based approach—was the most similar model to Chameleon. Chameleon is an end-to-end paradigm that processes and generates tokens, in contrast to Gemini, which uses independent image decoders during the production phase.
Week’s Top Read Insight:10 AI ML In Supply Chain Management Trends To Look Out For In 2024
Why Is It Important?
The above image has been taken from the company’s website. Patching together models trained for different modalities is the common technique to generate multimodal foundation models. Using what is known as “late fusion,” an AI system takes in data from multiple sources, each of which has its unique modalities, and then uses this encoded data to make inferences. The models’ capacity to generate sequences of interleaved pictures and text and integrate information across modalities is limited by late fusion, notwithstanding its effectiveness.
Must Read: What is Experience Management (XM)?
[To share your insights with us as part of editorial or sponsored content, please write to sghosh@martechseries.com]
Comments are closed.