Observe and Monitor Data at the Edge

Lumadata is designed to monitor data at the end of the data pipeline... right where data is consumed.

Data Observability

Our data observability strategy aligns data engineering teams with real-world business use cases.

Overview

Data quality tools get it wrong. They focus on testing data as it moves through the data pipeline. They require large data QA teams to build complex test sets. The tests are "in the weeds" of the data and not focused on business outcomes resulting in a thicket of tests that test everything but don't catch even basic faults in data. This strategy has missed the mark for decades. Lumadata takes a completely different approach.

Lumadata focuses on observing data at the end of the data pipeline. Lumadata continously monitors data in the data warehouse after it has been delivered to ensure it still matches to data that was previously validated. Lumadata compares incremental data loads to source systems to ensure recently added data matches to the source. Lumadata's catalog is built from dashboards in your BI tools, extracts that serve data scientists, queries that update spreadsheets, and feeds that deliver data to third parties. In orther words, Lumadata is tightly aligned to actual business use cases. Lumadata's unique approach to data observation creates certainty that the data served to the business is accurate because the observation strategies we use match the queries used by data consumers.

Snapshot Strategies

That vast majority of the data in your data warehouse should not change. Snapshot observations are created from aggregate queries that represent a wide cut across static datasets. The snapshot query is usually limited by a filter that "cuts off" the dataset at the point at which business rules dictate that data shouldn't change. Consider, for example, sales data in your data warehouse. The sales table may receive new data daily, but the historic data in the table (probably 90+% of the volume of the table) does not (and should not) change.

Snapshot observations are built on queries from business use cases - dashboards, reports, predictive models, etc. The queries are filtered to limit the resultset to the static component of the data. Lumadata takes a snapshot of the static data returned from such aggregate queries. Business teams validate and approve the snapshot in Lumadata giving their official sign-off that the data is correct. The approval is documented so you always know which snapshot was approved. Then, as scheduled or on demand, Lumadata runs the snapshot queries and verifies that the results match the data that was previously captured and approved.

Live Observation Strategies

A portion of the data in data warehouses changes rapidly and is delivered automatically to business teams for use. This data is usually the most timely data and most opportune to be acted upon by business teams, but also the highest risk because it is not validated by anyone in the business or engineering. This data is often delivered with the assumption that there were no errors reported in the data pipeline so everything must be fine... right? Right? Live data observations focus on validating quick changing, fast delivery data. Live observations query rapidly changing data from the data warehouse and compare it to live data from source systems. They're made from a data warehouse query that is limited in scope to only the portion of data that is changing quickly, and an equal query from the source system where the data comes from. Both queries are run and their results are compared. Live data observations verify that data delivered to the data warehouse is accurate by going back to the source.

Profile Observation Strategies

There are other types of issues that crop up in data delivery that require a different type of observation strategy. Profile observations are built on demographics, or profiles, of data in your data warehouse. Lumadata uses statistical methods to build profiles and then, based on types of data profiles and unique qualities of each datum, Lumadata generates a customizable set of checks to ensure characteristics of the data do not change unexpectedly. Profile observations include strategies to ensure recency, for example. If Lumadata determines that a set of data in your data warehouse is updated daily, it will create a check to ensure that the new data appears daily. Lumadata will check numeric values for outliers, will check for missing data values, and will check for changes in data density that seem suspect. Profiles are a supplement to the matching-style strategies above that form the core of Lumadata's platform.

Learn More

Contact Us
  • Email: info@lumadata.io
  • Phone: (+1) 844-999-LUMA (5862)
  • Mail

    Sail for the Sun LLC DBA Lumadata
    Jacksonville, FL 32043
    United States of America

© 2023 Lumadata. All rights reserved.