RESTful API
Integrate Lumadata observability directly into your data pipeline.
RESTful API
Use our API to run data observations after your data pipeline jobs. Run observations synchronously or asynchronously and drive your pipeline based on outcomes.
Overview
Lumadata provides a RESTful API that can be used to manually activate data observation strategies as part of data pipeline flows. This allows Lumadata to be integrated into your data load process and to stop processing if data is observed to be incorrect. Alerts can be triggered as part of the flow or these manual executions can be triggered silently.
Set up your API keys
To connect to the REST API, you will need to enable API access within your account. In the Lumadata UI, go to Settings -> API and webhook and click Manage API Keys. You can enable the API here and generate your access and secret keys. Note that the secret key is only visible ONCE and then it is not recoverable. You should store your secret key in a vault and inject it into your code and build or run time.
Connecting to the API
The REST API uses JWT to ensure authenticity of requests received by the Lumadata platform. You will need to use a library that builds proper JWT tokens in order to connect. For more information about JWT, check out http://jwt.io.
Lumadata's API is available at https://api.lumadata.io.
To create an access token, you need to build a JSON payload as below and encode the payload using your API secret key.
{
"api_key": "YOUR ACCESS KEY GOES HERE"
}
The encode step will produce a properly formatted JWT token. Include this token as an Authorization header in your API requests.
Authorization: Bearer eyJhbGciOiJIUzI1N………
Available REST API routes
Runnings data observations after a DAG or other data load process is a common use case for Lumadata. To support invoking data observations after data load jobs, Lumadata exposes a route that can be used to invoke data observations. There are several options available to control how data observations are executed.
Invoke a data observation
To invoke an data observations, use the /test/run-now route with the POST method. A sample payload is below. Not all payload fields are required and an explanation follows as to what each field does.
{
"data_test_key": "6e2ed507-e369-4d4c-a4ad-52325ed9fc91",
"data_test_name": "Snap Test",
"data_test_run_log_auto_generate_alert_flag": true,
"data_test_run_synchronously": true
}
- data_test_key OR data_test_name
- These fields should not be used together. Instead, use one or the other. In most cases, you’ll use the data_test_name. This must match EXACTLY to your data observation strategy name in Lumadata.
- data_test_run_log_auto_generate_alert_flag
- When integrating Lumadata into your daily data load jobs, you should set this flag to true. This will ensure that if an issue is detected, an alert is created in the Lumadata platform.
- data_test_run_synchronously
- This will cause the data observations API to wait for the observation to complete before returning an HTTP response. The response will include the result of the observation in the form of a run log entry. The format of the response is below as an example. Note that the raw_data returned in the output field is limited to only 10 rows regardless of how many were recorded. This is to prevent excessively large HTTP responses. This observation run is recorded in Lumadata and can be viewed in the Lumadata UI.
JSON response from invoking an observation using asynchronous execution by setting data_test_run_synchronously = false
{
"control": {
"requestUrl": "/test/ad8642cf-ac30-40f1-87ac-b2a60d806a79/run-now?company_uuid=ccb66502-700b-425a-a287-ea9148451892",
"requestTimestamp": 1683861421411,
"requestMethod": "POST",
"requestData": {},
"requestContext": {},
"responseTimestamp": 1683861423261,
"responseStatus": 200,
"responseMessages": []
},
"data": {
"successful": true,
"response": {
"data_test_key": "ad8642cf-ac30-40f1-87ac-b2a60d806a79",
"data_test_run_log_key": "6c034650-47a5-4571-a85d-6a186d6a8292",
"invocation": "465fc27d-b465-4584-9463-3a9c8522e4f4",
"fetch_run_status": "/test/ad8642cf-ac30-40f1-87ac-b2a60d806a79/run-log/6c034650-47a5-4571-a85d-6a186d6a8292?company_uuid=ccb66502-700b-425a-a287-ea9148451892"
}
}
}
Retrieving execution status and run results
When invoking a data observation synchronously, the response from the API will be structured as per the JSON below.
When invoking a data observation asynchronously, the response from the Lumadata API will include a fetch URL as well as several keys used to fetch the status of the data observation run. Using the fetch URL, you can fetch the run status and output from the data observation execution. The route to fetch the data observation run log requires the test id, test run log id, and company id. Each of these components is provided in the fetch url: /test/{data_test_key}/run-log/{data_test_run_log_id}?company_uuid={company_key}
{
"control": {
"requestUrl": "/test/ad8642cf-ac30-40f1-87ac-b2a60d806a79/run-log/71a0ee80-d2f1-448d-9d38-83115ae2aff3?company_uuid=ccb66502-700b-425a-a287-ea9148451892",
"requestTimestamp": 1683861208042,
"requestMethod": "GET",
"requestData": {},
"requestContext": {},
"responseTimestamp": 1683861208172,
"responseStatus": 200,
"responseMessages": [],
"responsePage": 0,
"responsePageSize": 1,
"responseRowCount": 10
},
"data": {
"data_test_run_log_key": "71a0ee80-d2f1-448d-9d38-83115ae2aff3",
"company_key": "ccb66502-700b-425a-a287-ea9148451892",
"data_test_definition_key": "cfd61cd4-cea2-473d-ab48-c887d343b340",
"data_test_run_log_status": "Failed",
"data_test_run_log_output": {
"impact_summary": {
"issue_impact": 153.6,
"issue_count": 43,
"issue_column_count": 4,
"issue_list": [
{
"column_name": "orders",
"column_total": null,
"issue_count": 4,
"variance_percentage": null,
"variance_total": null
},
{
"column_name": "units_sold",
"column_total": null,
"issue_count": 4,
"variance_percentage": null,
"variance_total": null
},
{
"column_name": "unit_cost",
"column_total": null,
"issue_count": 4,
"variance_percentage": null,
"variance_total": null
},
{
"column_name": "total_revenue",
"column_total": null,
"issue_count": 4,
"variance_percentage": null,
"variance_total": null
}
]
},
"raw_data": [
{
"source_data": {
"order_date": "2017-07-28",
"orders": "30",
"units_sold": 163088,
"unit_price": 8030,
"unit_cost": 5650,
"total_revenue": 44179947,
"total_profit": 13635269
},
"target_data": {},
"message": "Data was found in the Data Warehouse dataset that is not in the ERP dataset."
},
{
"source_data": {
"order_date": "2017-07-27",
"orders": "42",
"units_sold": 240765,
"unit_price": 11005,
"unit_cost": 7652,
"total_revenue": 60150662,
"total_profit": 19553451
},
"target_data": {},
"message": "Data was found in the Data Warehouse dataset that is not in the ERP dataset."
},
{
"source_data": {
"order_date": "2017-07-26",
"orders": "42",
"units_sold": 222418,
"unit_price": 10947,
"unit_cost": 7657,
"total_revenue": 58720601,
"total_profit": 18202881
},
"target_data": {},
"message": "Data was found in the Data Warehouse dataset that is not in the ERP dataset."
},
............
{
"source_data": {
"order_date": "2017-07-20",
"orders": "27",
"units_sold": 140511,
"unit_price": 7561,
"unit_cost": 5545,
"total_revenue": 33733654,
"total_profit": 9059805
},
"target_data": {},
"message": "Data was found in the Data Warehouse dataset that is not in the ERP dataset."
},
{
"source_data": {
"order_date": "2017-07-19",
"orders": "42",
"units_sold": 205311,
"unit_price": 13107,
"unit_cost": 9578,
"total_revenue": 63412804,
"total_profit": 16931176
},
"target_data": {},
"message": "Data was found in the Data Warehouse dataset that is not in the ERP dataset."
}
]
},
"data_test_run_log_run_start": "2023-05-12T01:56:52.505Z",
"data_test_run_log_run_finish": "2023-05-12T01:56:53.646Z",
"data_test_run_log_run_by": null,
"data_test_run_log_system_initiated": false,
"data_test_run_log_last_run_flag": false,
"data_test_run_log_count_of_issues": 43
}
}