StreamSets Expands Databricks Partnership Extending Ingestion Capabilities for Delta Lake
Databricks Data Ingestion Network Now Features StreamSets Integration to Enable Users to Easily Ingest, Integrate and Monitor Data into the Lakehouse
StreamSets, provider of the industry’s first DataOps platform, announced an expansion of its partnership with Databricks by participating in Databricks’ newly launched Data Ingestion Network. As part of the expanded partnership, StreamSets is offering additional functionality with a new connector for Delta Lake, an open source project that provides reliable data lakes at scale. With it, users can configure their pipelines to write data from any source moving in batch or streaming mode directly into Delta Lake. Now, data teams can deliver all of their data in a shorter time frame, driving BI, analytics and ML.
Today, companies require systems for diverse data applications like real-time monitoring, machine learning and data science — and that can process unstructured data like text, images, video and audio. A decade ago, data lakes replaced data warehouses as the best repositories for this raw data; however, they neither support transactions nor enforce data quality. In addition, they lack consistency, making it almost impossible to mix batch and streaming jobs and appends and reads.
AiThority.com News: Unisys Unveils Unisys Stealth 5.0 Software Extending Protection to Container and Kubernetes Environments to Secure New Workloads
Leveraging the best of data warehouses and data lakes, lakehouses remedy the above limitations, but friction ingesting fresh data remains. With this partnership, Databricks users will now be able to capitalize on the new lakehouse paradigm without the friction previously encountered. They can easily connect into StreamSets Cloud and leverage out-of-the-box connectors to load batch, change data capture (CDC) or streaming data from any source (such as cloud applications, relational data, on-premises data lakes and warehouses) into Delta Lake. With StreamSets, data engineers can easily build and operate data pipelines for modern and legacy data sources to migrate to a lakehouse and continuously refresh with relevant data.
AiThority.com News: LevelJump Launches Gong Integration To Unite Outcome-Based Enablement and Revenue Intelligence
Specifically, the new StreamSets connector for Delta Lake enables several key benefits for even greater operational control over the full life cycle of data:
- Faster migration to the cloud with fewer data engineering resources
- Drag-and-drop interface to simplify data movement from multiple disparate sources
- Improved management of operations and performance for lakehouses
- Change-data-capture capability from several data sources into Delta Lake
- Built-in Kubernetes containerization and native cloud scaling
Combined with Delta Lake which provides ACID transactions, the connector also makes it possible to unify batch and streaming data to support the timeliness of transactional operations.
“Databricks Ingest brings an opportunity for organizations to build a central lakehouse without worrying about repetitive data movement,” said Michael Hoff, senior vice president of Business Development and Partners at Databricks. “With StreamSets’ expanded support for Delta Lake, small and midsize companies now have an easy way to ingest data from their cloud-based service into Delta Lake so they can maximize their analytics efforts with fresh data in their lakehouse.”
“This connector is another step forward in our alliance with Databricks to deliver more data, faster, to drive traditional BI and machine learning initiatives — which is critical to the survival and success of today’s organizations,” said Jobi George, general manager of Cloud Business at StreamSets. “We’re excited to continue our work with Databricks to drive innovation in the industry.”
The connector is currently available for Databricks customers.
AiThority.com News: NetStar Introduces IoT Device Honeypot Capability
Comments are closed, but trackbacks and pingbacks are open.