[bsfp-cryptocurrency style=”widget-18″ align=”marquee” columns=”6″ coins=”selected” coins-count=”6″ coins-selected=”BTC,ETH,XRP,LTC,EOS,ADA,XLM,NEO,LTC,EOS,XEM,DASH,USDT,BNB,QTUM,XVG,ONT,ZEC,STEEM” currency=”USD” title=”Cryptocurrency Widget” show_title=”0″ icon=”” scheme=”light” bs-show-desktop=”1″ bs-show-tablet=”1″ bs-show-phone=”1″ custom-css-class=”” custom-id=”” css=”.vc_custom_1523079266073{margin-bottom: 0px !important;padding-top: 0px !important;padding-bottom: 0px !important;}”]

Llemma Outperforms Google’s Minerva Model

An Open Mathematical Language Model

The team at EleutherAI has released Llemma, an open mathematical language model, coupled with the Proof-Pile-2 dataset. The academic and scientific community has taken a keen interest in this endeavor since it was constructed using CodeLlama’s ongoing pretraining.

While similar to Minerva, a closed model developed by Google Research specifically for mathematics, this new invention from EleutherAI actually outperforms Minerva when compared on an equal-parameter basis. Llemma is unique among mathematical language models since it can perform a wider variety of tasks, such as those involving the use of tools and formal mathematics.

The paper’s first author, Zhangir Azerbayev, explains how the development of Llemma began with the compilation of a massive dataset of mathematical tokens. This dataset included the ArXiv subset of RedPajama, the recently released OpenWebMath dataset, and the debut of the AlgebraicStack, a code dataset designed specifically for mathematics. By covering all bases, we were able to train on an unprecedented 55 billion tokens.

Read: AI and Machine Learning Are Changing Business Forever

Llemma Outperforms Minerva

Related Posts
1 of 40,728

Lemma stands out because it can handle larger model sizes than any other open base model, including Google’s Minerva, at both the 7 billion and 34 billion parameter levels. The feat is all the more impressive given that the Llemma model, with just half as many parameters (34 billion), is getting close to the performance of Google’s Minerva (with 62 billion).

Models in Llemma were seeded with Code Llama weights before being trained on StabilityAI’s Ezra cluster’s network of 256 A100 GPUs. Training for the 7-billion model took place for over 200 billion tokens and 23,000 A100 hours, whereas training for the 34-billion model lasted 50 billion tokens and 47,000 A100 hours.

Llemma outperforms Minerva on chain-of-thought tasks when the two systems are compared using the same set of parameters, and this advantage is compounded by the fact that Llemma uses majority voting to make decisions.

Read the latest blogs: Navigating The Reality Spectrum: Understanding VR, AR, and MR

The development of Llemma is the result of teamwork amongst researchers at several universities and institutes, including Princeton, EleutherAI, the University of Toronto, Vector Institute, the University of Cambridge, Carnegie Mellon, and the University of Washington.

[To share your insights with us, please write to sghosh@martechseries.com]

Comments are closed.