Define and enforce data validity rules like NOT NULL, UNIQUE, and custom patterns
For LLM agents: documentation index at /llms.txt, full text at /llms-full.txt. Append .md to any page URL for plain markdown.
The Validity API enables programmatic management of data validity rules. Use it to enforce data quality constraints, detect invalid records, and integrate validation into your pipelines.
Use NOT_NULL for required columns, UNIQUE for primary-key-like invariants, REGEX for string formats (emails, IDs), RANGE for numeric bounds, ENUM for finite allowed sets, DATE_FORMAT for date strings, and CUSTOM_SQL when no built-in type fits. Each type’s required rule_config is listed in the Rule Types table at the top of this page.
How do I inspect rows that failed a validity rule?
Every check response includes invalid_samples.samples, up to sample_limit rows (default 10, configurable per check). Bump sample_limit up to 100 when debugging a broken ingest. The response also returns invalid_count and invalid_percent so you can report a failure rate even when individual samples aren’t needed.
What does the alert_threshold_percent field control?
alert_threshold_percent is the invalid-row percentage that flips a check result from pass to fail in manual mode. Set it to 0 if any single invalid row should page you. Use higher values (e.g. 1.0) on rules where a small amount of invalid data is tolerated and you only want to catch systemic regressions. It is ignored in auto mode.
When should I use detection_mode auto instead of a fixed threshold?
Use auto when you don’t know the right threshold up front, or when a column always carries some baseline level of invalid values that you don’t want to alert on. In auto mode the rule learns the column’s historical invalid_percent distribution (with weekly seasonality once enough history accrues) and alerts only when the current check is an anomalously high deviation from that baseline. It needs at least 7 prior check results before it starts evaluating; until then it records results without alerting (a cold-start period). Lower sensitivity values (toward 1) flag smaller deviations and alert more often; higher values (toward 4) tolerate wider swings. A normal, in-baseline invalid rate stays pass even when invalid_count is greater than zero. Each auto-mode result also returns the learned band as expected_value, expected_range_low, and expected_range_high.
operating_period_mode makes the auto baseline business-hours aware so closed-period quiet does not contaminate the open-period band. With off (the default) the rule pools all prior checks into one baseline. With schedule it segments the baseline using a linked operating schedule (declared days, hours, and timezone) referenced by operating_schedule_id. With auto it learns an active/dormant calendar from the rule’s own history. An active-period check is compared only against prior active checks; a dormant-period check never alerts on a low invalid rate and alerts only on unexpected activity above a near-zero floor. It composes with detection_mode="auto" and, like the baseline, takes effect once enough history accrues. Get the schedule UUID from the operating-schedules UI or the /operating-schedules API.
How is treat_null_as_valid different from using NOT_NULL?
treat_null_as_valid governs how a non-NULL-type rule (REGEX, RANGE, ENUM, etc.) handles NULLs. When true, NULL rows are skipped; when false, NULLs count as invalid. Combine with a separate NOT_NULL rule when you need to enforce both non-null and format at once - they surface as two distinct checks you can alert on independently.