> ## Documentation Index
> Fetch the complete documentation index at: https://docs.anomalyarmor.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Data Quality Metrics

> Monitor null percentages, distinct counts, and other column-level statistics to detect data quality issues

<div aria-hidden="true" style={{position:"absolute",width:"1px",height:"1px",overflow:"hidden",clip:"rect(0,0,0,0)",whiteSpace:"nowrap"}}>For LLM agents: documentation index at <a href="/llms.txt" tabIndex={-1}>/llms.txt</a>, full text at <a href="/llms-full.txt" tabIndex={-1}>/llms-full.txt</a>. Append .md to any page URL for plain markdown.</div>
Data quality metrics let you track statistical properties of your columns over time. AnomalyArmor captures metric values on a schedule, builds historical baselines, and automatically detects when values fall outside expected ranges.

<Note>
  **Looking for row count monitoring?** Use [Row Count Monitoring](/data-quality/row-count-monitoring) for tracking row counts with ML-based anomaly detection or explicit thresholds.
</Note>

<Note>
  **Prerequisites**: Before creating metrics, you need:

  * A [connected data source](/data-sources/overview) with discovery completed
  * At least one asset (table/view) to monitor
</Note>

**Example scenario:** The `customer_email` column normally has \~3% null values. On Jan 30, null percentage jumped to 12.3%, well outside the expected range band. AnomalyArmor flags this as an anomaly, indicating a potential data quality issue in the source system.

## Why Use Metrics

Freshness tells you *when* data was updated. Completeness tells you *how much* arrived. Metrics tell you *what changed* at the column level:

| Issue                      | Freshness     | Completeness   | Metrics        |
| -------------------------- | ------------- | -------------- | -------------- |
| ETL job failed completely  | Detects it    | Detects it     | Detects it     |
| ETL ran but loaded 0 rows  | Might miss it | **Catches it** | N/A            |
| Data loaded but 50% nulls  | Misses it     | Misses it      | **Catches it** |
| Unexpected duplicates      | Misses it     | Misses it      | **Catches it** |
| Values outside valid range | Misses it     | Misses it      | **Catches it** |

**Use freshness for "did data arrive on time?"**
**Use row count monitoring for "did the right amount of data arrive?"**
**Use metrics for "is the column-level data quality correct?"**

## Metric Types

All metrics require a specific column to monitor:

| Type              | Description               | Best For               |
| ----------------- | ------------------------- | ---------------------- |
| `null_percent`    | Percentage of null values | Detecting missing data |
| `distinct_count`  | Count of unique values    | Cardinality monitoring |
| `duplicate_count` | Count of repeated values  | Data quality checks    |
| `min_value`       | Minimum numeric value     | Range validation       |
| `max_value`       | Maximum numeric value     | Outlier detection      |
| `mean`            | Average numeric value     | Central tendency       |
| `percentile`      | Nth percentile value      | Distribution analysis  |

## Creating a Metric

<Steps>
  <Step title="Navigate to the Asset">
    Go to **Assets** and select the table you want to monitor.
  </Step>

  <Step title="Open Metrics Tab">
    Click the **Metrics** tab on the asset detail page.
  </Step>

  <Step title="Create New Metric">
    Click **Create Metric** to open the metric configuration form.
  </Step>

  <Step title="Select Metric Type">
    Choose the type of metric you want to track:

    * **null\_percent**: Percentage of null values in a column
    * **distinct\_count**: Number of unique values
    * **duplicate\_count**: Number of duplicate values
    * **min/max/avg**: Numeric range and central tendency
    * **percentile**: Distribution analysis

    <Tip>
      Need to monitor row counts? Use [Row Count Monitoring](/data-quality/row-count-monitoring) instead.
    </Tip>
  </Step>

  <Step title="Configure Capture Interval">
    Choose how often to capture the metric:

    | Interval | Best For                              |
    | -------- | ------------------------------------- |
    | Hourly   | High-frequency data, real-time tables |
    | Daily    | Most batch ETL pipelines              |
    | Weekly   | Slowly changing data                  |
  </Step>

  <Step title="Enable Anomaly Detection">
    Toggle **Anomaly Detection** on and set sensitivity:

    | Sensitivity | Meaning                        | Use When               |
    | ----------- | ------------------------------ | ---------------------- |
    | 1.0         | Alert at 1 standard deviation  | Very sensitive         |
    | 2.0         | Alert at 2 standard deviations | Balanced (recommended) |
    | 3.0         | Alert at 3 standard deviations | Less sensitive         |

    <Tip>
      Start with sensitivity 2.0. Adjust based on false positive rate.
    </Tip>
  </Step>

  <Step title="Save Metric">
    Click **Create** to save the metric. The first capture will run immediately.
  </Step>
</Steps>

## Viewing Metric History

Each metric tracks historical values and displays them as a trend chart:

* **Value line**: Actual metric values over time
* **Anomaly band**: Expected range (mean +/- sensitivity \* stddev)
* **Anomaly points**: Values outside the band are flagged

### Reading the Chart

| Indicator              | Meaning            |
| ---------------------- | ------------------ |
| Green line within band | Normal values      |
| Red dot outside band   | Anomaly detected   |
| Gray dashed lines      | Upper/lower bounds |

## Which Metric Type Should I Use?

<AccordionGroup>
  <Accordion title="Is my table growing or shrinking unexpectedly?">
    Use [Row Count Monitoring](/data-quality/row-count-monitoring). It provides ML-based pattern learning, time-windowed counting, and explicit threshold support for row count monitoring.
  </Accordion>

  <Accordion title="Are there unexpected null values?">
    Use **null\_percent** on the column that shouldn't have nulls.

    Example: Monitor `customer_email` for null percentage. Alert if nulls exceed historical baseline (e.g., jumps from 2% to 15%).
  </Accordion>

  <Accordion title="Are values within expected range?">
    Use **min\_value** and **max\_value** on numeric columns.

    Example: Monitor `price` column. Alert if minimum drops below 0 (invalid) or maximum exceeds historical norms.
  </Accordion>

  <Accordion title="Is data being duplicated?">
    Use **duplicate\_count** on columns that should be unique.

    Example: Monitor `order_id` for duplicates. Any duplicates indicate a data quality issue.
  </Accordion>

  <Accordion title="How many unique values exist?">
    Use **distinct\_count** on categorical columns.

    Example: Monitor `country_code` distinct count. A sudden increase might indicate invalid data.
  </Accordion>
</AccordionGroup>

## Best Practices

### Start with High-Impact Metrics

Focus on metrics that catch real problems:

**Critical table (orders):**

* **Completeness**: Catch data loss or duplication (see [Row Count Monitoring](/data-quality/row-count-monitoring))
* **null\_percent** on `order_id`: Should never be null
* **null\_percent** on `customer_id`: Should never be null
* **min\_value** on `total_amount`: Should never be negative

### Match Capture Interval to Data Freshness

| Data Update Pattern | Recommended Interval |
| ------------------- | -------------------- |
| Real-time streaming | Hourly               |
| Hourly batch jobs   | Hourly               |
| Daily batch jobs    | Daily                |
| Weekly aggregates   | Weekly               |

### Use Meaningful Sensitivity Values

| Scenario                           | Sensitivity | Rationale                   |
| ---------------------------------- | ----------- | --------------------------- |
| New table, learning patterns       | 3.0         | Reduce noise while learning |
| Established table, stable patterns | 2.0         | Balanced detection          |
| Critical data, low tolerance       | 1.5         | More sensitive alerting     |

## Operating-Period Awareness

Many tables are only active during business hours. Nights and weekends are structurally quiet, so pooling those near-zero periods into one baseline widens the expected band until real weekday regressions slip through, or it flags every weekend as anomalously low. Operating-period awareness fixes this by comparing a value only against history from the same kind of period.

Each metric has an **operating period mode**:

| Mode            | Behavior                                                                                                                          |
| --------------- | --------------------------------------------------------------------------------------------------------------------------------- |
| `off` (default) | Pools all history into one baseline. Unchanged from before.                                                                       |
| `schedule`      | Segments the baseline using a linked [operating schedule](/alerts/operating-schedules) (declared days, hours, and timezone).      |
| `auto`          | Learns an active/dormant calendar from the metric's own history (day-of-week, plus hour-of-week for sub-daily capture intervals). |

When a value falls in an **active** period, it is baselined only against prior active values, so the band stays tight. When a value falls in a **dormant** (closed) period, a low or zero value is expected and never alerts; only unexpected activity above a near-zero floor alerts (for example, writes at 3am to a table that should be idle overnight).

Both `schedule` and `auto` need enough history before they take effect. Until then, and whenever mode is `off`, the metric uses the standard pooled baseline.

<Note>
  `auto` learns from observed volume, so it adapts when real activity differs from nominal hours. Use `schedule` when you want to declare exact hours, or to override what `auto` would learn.
</Note>

## Troubleshooting

<AccordionGroup>
  <Accordion title="Metric shows 'No data'">
    **Causes:**

    * Metric was just created and hasn't captured yet
    * Capture job failed
    * Table is empty

    **Solutions:**

    1. Wait for the next scheduled capture (check interval)
    2. Trigger a manual capture: **Actions > Capture Now**
    3. Check the table has data
  </Accordion>

  <Accordion title="Too many false positive anomalies">
    **Causes:**

    * Sensitivity is too low (too sensitive)
    * Normal data patterns are highly variable
    * Seasonality not accounted for

    **Solutions:**

    1. Increase sensitivity (e.g., 2.0 to 3.0)
    2. Allow more baseline data to accumulate (30+ days)
    3. Consider if the variation is actually expected
  </Accordion>

  <Accordion title="Missing real anomalies">
    **Causes:**

    * Sensitivity is too high (not sensitive enough)
    * Baseline includes anomalous data
    * Capture interval too infrequent

    **Solutions:**

    1. Decrease sensitivity (e.g., 3.0 to 2.0)
    2. Reset baseline after fixing data issues
    3. Increase capture frequency
  </Accordion>

  <Accordion title="Metric capture failing">
    **Causes:**

    * Database connection issues
    * Column was renamed or removed
    * Permission changes

    **Solutions:**

    1. Check data source connection status
    2. Verify column still exists
    3. Check database user permissions
  </Accordion>
</AccordionGroup>

## Common Questions

### When should I use metrics versus row count monitoring?

Use **metrics** for column-level checks like null rates, distinct counts, and numeric ranges. Use [Row Count Monitoring](/data-quality/row-count-monitoring) for table-level volume tracking, it has ML-based pattern learning and time-windowed counting that metrics don't.

### What sensitivity should I start with for anomaly detection?

Start at **2.0** (balanced, alerts on 2 standard deviations). Drop to 1.5 for critical data where you want tight detection, or raise to 3.0 if you're seeing too many false positives from noisy patterns.

### How long before anomaly detection is useful?

Anomaly detection needs a baseline. Expect rougher results for the first week or two while history accumulates. For stable patterns, 30+ days of baseline data gives the tightest, most trustworthy bands.

### Does AnomalyArmor read my column values?

It runs aggregate queries (like `COUNT`, `MIN`, `MAX`, `AVG`) against your database to compute the metric. Only the numeric result is stored, individual row values aren't transmitted or retained.

### Can I monitor a metric on a custom SQL expression?

The built-in metric types run against a specific column. For arbitrary SQL, use [Custom SQL Monitoring](/data-quality/custom-sql-monitoring) instead, which lets you write any `SELECT` that returns a numeric value.

## What's Next

<CardGroup cols={2}>
  <Card title="Set Up Metric Alerts" icon="bell" href="/alerts/alert-rules">
    Get notified when metrics detect anomalies
  </Card>

  <Card title="Metrics API" icon="code" href="/api/metrics">
    Automate metric management with the API
  </Card>

  <Card title="Report Badges" icon="shield-check" href="/data-quality/report-badges">
    Embed metric status in dashboards
  </Card>

  <Card title="Alert Rules" icon="bell" href="/alerts/alert-rules">
    Configure where alerts are sent
  </Card>
</CardGroup>
