Monte Carlo Announces Delta Lake, Unity Catalog Integrations to Bring End-to-End Data Observability to Databricks
New metadata-focused integrations enable data teams to detect, resolve, and prevent data quality issues across data lake and data lakehouse environments.
Monte Carlo, the data reliability company, announced integrations with Delta Lake and Databricks’ Unity Catalog, becoming the first provider of end-to-end data observability across these data lake and lakehouse environments, down to the BI layer.
Latest Aithority Insights: Why Contextual Targeting Deserves Another Look with Artificial Intelligence (AI)
“Metadata is a data lake’s secret weapon, and Monte Carlo is thrilled to be partnering with Databricks to help our mutual customers take advantage of it and bring their data reliability to the next level”
Traditionally, data lakes held raw data in its native format and were known for their flexibility, speed, and open source ecosystem. By design, data was less structured with limited metadata and no atomicity, consistency, isolation, and durability (ACID) properties.
As a result, data quality has been particularly challenging for data lake environments as they often hold large amounts of unstructured data, making data issues challenging to detect, resolve and prevent.
Delta Lake and Unity Catalog enable Databricks users to add more structure and metadata to their data lake and lakehouse deployments, which can now be leveraged by the Monte Carlo data observability platform to automatically detect data freshness, volume, and schema anomalies across structured and unstructured in their environment via machine learning.
AI and ML News: Why SMBs Shouldn’t Be Afraid of Artificial Intelligence (AI)
Additional opt-in monitors can provide more granular and customized coverage for key assets and critical tables – monitoring data distributions and statistics.
“With Monte Carlo, my team is better positioned to understand the impact of a detected data issue and decide on the next steps like stakeholder communication and resource prioritization. Monte Carlo’s end-to-end lineage helps the team draw these connections between critical data tables and the Looker reports, dashboards, and KPIs the company relies on to make business decisions,” said Satish Rane, head of data engineering, ThredUp. “I’m excited to leverage Monte Carlo’s data observability for our Databricks environment.”
With these integrations, Databricks customers can now:
- Achieve end-to-end data observability across the lake or lakehouse. Get end-to-end data observability for Databricks data pipelines with a quick, no-code implementation process. Access out-of-the-box visibility into data freshness, volume, distribution, schema, and lineage just by plugging Monte Carlo into Databricks metastores, Unity Catalog, or Delta Lake.
- Know when data breaks, as soon as it happens. Monte Carlo continuously monitors your Databricks assets and proactively alerts stakeholders to data issues. Monte Carlo’s machine learning-first approach gives data teams broad coverage for common data issues with minimal configuration, and business-context-specific checks layered on top ensure coverage at each stage of the data pipeline.
- Find the root cause of data quality issues, fast. Monte Carlo gives teams a single pane of glass to investigate data issues, drastically reducing time to resolution. By bringing all information and context for pipelines into one place, teams spend less time firefighting data issues and more time improving the business.
“Metadata is a data lake’s secret weapon, and Monte Carlo is thrilled to be partnering with Databricks to help our mutual customers take advantage of it and bring their data reliability to the next level,” said Lior Gavish, co-founder and CTO, Monte Carlo. “When you combine the performance and flexibility of the data lake with high levels of data trust, it becomes a powerful foundation from which data teams can launch incredible projects and data products.”
Later this year, Monte Carlo plans to introduce support for end-to-end field-level Spark lineage–which maps how data assets are connected within the Databricks environment so teams can gain full visibility into their pipelines for root cause analysis and how they impact downstream reports and dashboards.
Read More About AI News : AI Innovation Supports Rural and Remote Internet Connectivity
[To share your insights with us, please write to sghosh@martechseries.com]
Comments are closed.