[bsfp-cryptocurrency style=”widget-18″ align=”marquee” columns=”6″ coins=”selected” coins-count=”6″ coins-selected=”BTC,ETH,XRP,LTC,EOS,ADA,XLM,NEO,LTC,EOS,XEM,DASH,USDT,BNB,QTUM,XVG,ONT,ZEC,STEEM” currency=”USD” title=”Cryptocurrency Widget” show_title=”0″ icon=”” scheme=”light” bs-show-desktop=”1″ bs-show-tablet=”1″ bs-show-phone=”1″ custom-css-class=”” custom-id=”” css=”.vc_custom_1523079266073{margin-bottom: 0px !important;padding-top: 0px !important;padding-bottom: 0px !important;}”]

Databricks Unified Analytics Platform Simplifies Distributed Deep Learning

Unified Analytics Leader Supports New Apache Spark 2.4; Introduces New Feature to Simplify Distributed Deep Learning

Databricks, the leader in unified analytics and founded by the original creators of Apache Spark™, announced support for the newly released Apache Spark 2.4.0 within Databricks’ Unified Analytics Platform. Databricks is the first unified analytics vendor to support Apache Spark 2.4. It is supported as part of Databricks Runtime 5.0, which is now generally available. Databricks also introduced a key feature, HorovodRunner, within Runtime 5.0 to further simplify distributed deep learning.

“Innovation continues to thrive within the Apache Spark community. Project Hydrogen is the most recent major initiative with an aim to provide first-class support for popular distributed machine learning frameworks on Apache Spark”

 

The Apache Spark community made multiple valuable contributions to the Spark 2.4 release which was introduced on November 8, 2018. In this release, Project Hydrogen substantially improves the performance and fault-recovery of distributed deep learning and machine learning frameworks on Spark. Project Hydrogen directly addresses the challenges data teams face because there is a significant difference between how big data jobs and deep learning jobs are executed. Whereas Spark excels at data processing at massive scale, deep learning assumes complete coordination and dependency among tasks which is optimized for constant communication rather than scalability and fault tolerance.

Related Posts
1 of 8,341

Read More: The Customer Journey Data Dilemma: Real-Time Versus Historical Data

“Innovation continues to thrive within the Apache Spark community. Project Hydrogen is the most recent major initiative with an aim to provide first-class support for popular distributed machine learning frameworks on Apache Spark,” said Reynold Xin, co-founder at Databricks, Apache Spark PMC member and the top contributor to the project.

Read More: Interview with Tyler Koblasa, CEO at CloudApp

Within Apache Spark 2.4, Project Hydrogen introduces Barrier Execution, a new scheduling mode that allows practitioners to properly embed distributed deep learning training as an Apache Spark workload. Added Xin, “This is the largest change to Spark’s scheduler since the inception of the project. At Databricks, we also found additional opportunities to simplify the complexity of machine learning workloads. Within Databricks’ Unified Analytics Platform, which is powered by Spark 2.4, we created further optimizations to simplify distributed deep learning.”

Read More: Interview with Sheldon Fernandez, CEO at DarwinAI

Leave A Reply