[bsfp-cryptocurrency style=”widget-18″ align=”marquee” columns=”6″ coins=”selected” coins-count=”6″ coins-selected=”BTC,ETH,XRP,LTC,EOS,ADA,XLM,NEO,LTC,EOS,XEM,DASH,USDT,BNB,QTUM,XVG,ONT,ZEC,STEEM” currency=”USD” title=”Cryptocurrency Widget” show_title=”0″ icon=”” scheme=”light” bs-show-desktop=”1″ bs-show-tablet=”1″ bs-show-phone=”1″ custom-css-class=”” custom-id=”” css=”.vc_custom_1523079266073{margin-bottom: 0px !important;padding-top: 0px !important;padding-bottom: 0px !important;}”]

FriendliAI Announces New System to Hike Up the Serving Efficiency of Large-scale AI Models at OSDI 2022

The New System ‘Orca’ reduces the costs of using large-scale generative models to make it affordable for a wide range of users

FriendliAI released the serving system ‘Orca’ which dramatically enhances the serving efficiency of large-scale generative models at OSDI 2022. FriendliAI is a startup company that provides PeriFlow, a platform for developing large-scale AI models.

Recommended AI News: CodeLogic Launches First-of-its-Kind Continuous Software Intelligence Platform

Orca is a serving system that enables the efficient operation of large-scale AI models. It can remove the inefficient delays of existing serving systems by using two core techniques: ‘iteration level scheduling’ and ‘selective batching.’

For understanding, imagine a group of friends that would like to ride a four-seated tandem bicycle along the Hudson River. Some want to bike for only 10 minutes, whereas others want a full hour. Under existing serving systems, once a bike ride has begun, all riders would be forced to bike until all passengers were satisfied (the longest target bike time on that team) and no other friends would be able to join until the former group was finished and returned to the starting point.

Orca solved this problem by using a ‘shuttle run’ system where the group returns to the starting point every 10 minutes, so riders can hop off individually soon after they satisfy their goals and late-comers can join the bike ride without a long wait. This shuttle run system corresponds to iteration-level scheduling. Additionally, Orca provides another technique, selective batching, to group the originally ungroupable riders before the biking starts.

Related Posts
1 of 41,122

With Orca, large-scale models can perform their generative tasks more than tens of times faster than existing serving systems (with GPT-3 175B). Besides, the cost of using large-scale models like GPT-3 is a hundredth smaller with Orca. The challenges of using large-scale models fade away with the new serving system. Orca can serve these models to a much wider range of users through a new level of accessibility.

Recommended AI News: Confluence Health Leans on Infor AI to Better Care for its Communities

The research on Orca, “Orca: A Distributed Serving System for Transformer-Based Generative Models”, was presented to OSDI 2022 (16th USENIX Symposium on Operating Systems Design and Implementation) on July 12th, which is a top-notch conference in the field of Computer Systems. Orca is already being used in production.

“Not only is it important to acquire data to improve model learning, but maximizing the efficiency of the serving system itself allows users to make use of large-scale generative models like OpenAI’s GPT-3,” said Byung-Gon Chun, the CEO of FriendliAI. “I expect this research will increase opportunities to use large-scale models on a variety of products.”

Recommended AI News: D2iQ Partners with Aqua Security to Secure Smart Cloud-Native Applications

[To share your insights with us, please write to sghosh@martechseries.com]

Comments are closed.