> ## Documentation Index
> Fetch the complete documentation index at: https://docs.anomalyarmor.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Row Count Monitoring

> Monitor row counts with ML-based anomaly detection or explicit thresholds

<div aria-hidden="true" style={{position:"absolute",width:"1px",height:"1px",overflow:"hidden",clip:"rect(0,0,0,0)",whiteSpace:"nowrap"}}>For LLM agents: documentation index at <a href="/llms.txt" tabIndex={-1}>/llms.txt</a>, full text at <a href="/llms-full.txt" tabIndex={-1}>/llms-full.txt</a>. Append .md to any page URL for plain markdown.</div>
Row Count Monitoring tracks row counts in your tables over time. It detects when data volumes drop unexpectedly (data loss) or spike unusually (duplicate loads), helping you catch ETL issues before they impact downstream consumers.

<Note>
  **Why Row Count?** Row count monitoring used to be part of Data Quality Metrics. We moved it to its own feature with enhanced capabilities: ML-based pattern learning, time-windowed counting, and explicit threshold support.
</Note>

**Example scenario:** The orders table typically receives 45,000-55,000 rows daily. On Jan 30, only 15,234 rows were loaded -- a 70% drop flagged as an anomaly, indicating a potential ETL failure.

## Configuration Reference

<h3 id="monitoring-mode">
  Monitoring Mode
</h3>

Row Count Monitoring offers two approaches to fit different needs:

#### Auto-Learn Mode (Recommended)

Let AnomalyArmor learn your table's normal row count patterns:

| Aspect                | How It Works                                               |
| --------------------- | ---------------------------------------------------------- |
| **Learning period**   | Collects data for 7+ days to establish baseline            |
| **Pattern detection** | Identifies daily, weekly, and seasonal trends              |
| **Anomaly detection** | Uses statistical analysis (mean +/- stddev \* sensitivity) |
| **Best for**          | Tables with consistent, predictable patterns               |

```
Auto-learn example (orders table):

Day 1-7:    Learning... collecting baseline data
Day 8+:     Baseline established (avg: 48,000, stddev: 3,200)
            Alerts if row count deviates significantly
```

#### Explicit Mode

Set specific row count thresholds when you know exactly what to expect:

| Setting      | Description                               |
| ------------ | ----------------------------------------- |
| **Min rows** | Alert if row count falls below this value |
| **Max rows** | Alert if row count exceeds this value     |
| **Best for** | Tables with known, fixed expectations     |

```
Explicit example (daily_summary table):

Expected: Exactly 1 row per day
Min: 1, Max: 1
Alert if row count != 1
```

<h3 id="sensitivity">
  Sensitivity
</h3>

For auto-learn mode, sensitivity controls how strict the anomaly detection is. It's the multiplier applied to the standard deviation when calculating expected ranges.

| Sensitivity      | Behavior                                       | Use When                                   |
| ---------------- | ---------------------------------------------- | ------------------------------------------ |
| **1 (Tight)**    | Very strict, catches small deviations          | Critical data, low tolerance for anomalies |
| **2 (Balanced)** | Default, catches moderate deviations           | Most tables, standard monitoring           |
| **3 (Relaxed)**  | Less strict, allows more variation             | High natural variability, noisy data       |
| **4 (Loose)**    | Very permissive, only catches large deviations | Highly variable patterns, initial setup    |

**Default**: 2 (balanced detection)

**Formula**: Expected range = mean ± (stddev × sensitivity)

```
Example with sensitivity = 2:
Mean: 48,000 rows
StdDev: 3,000 rows
Expected range: 42,000 - 54,000 rows
(48k - 6k to 48k + 6k)

If actual count = 35,000 → ANOMALY (outside range)
If actual count = 51,000 → HEALTHY (within range)
```

<Tip>
  Start with sensitivity 2-3 for new monitors, then tighten to 1-2 once patterns are stable.
</Tip>

<h3 id="timestamp-column">
  Timestamp Column
</h3>

Optional column used to filter rows within the time window. Without a timestamp column, all rows in the table are counted regardless of when they were created.

**When to use:**
-Append-only tables that grow over time
-Event streams or log tables
-Tables where you care about recent data arrival

**When to skip:**
-Tables that are fully replaced on each load
-Snapshot tables with fixed row counts
-Dimension tables with slow-changing data

```
With timestamp column (orders.created_at):
  Query: SELECT COUNT(*) FROM orders
         WHERE created_at >= NOW() - INTERVAL '24 hours'
  Result: 48,000 (recent rows only)

Without timestamp column:
  Query: SELECT COUNT(*) FROM orders
  Result: 5,000,000 (all rows ever)
```

**Column requirements:**

* Must be a timestamp or datetime type
* Should represent when the row was created/ingested
* Should be indexed for performance

<h3 id="time-window">
  Time Window
</h3>

How far back to count rows when a timestamp column is specified. Choose based on your data load frequency:

| Window        | Duration        | Best For                                   |
| ------------- | --------------- | ------------------------------------------ |
| **1 hour**    | Last 60 minutes | Real-time streaming, high-frequency events |
| **6 hours**   | Last 6 hours    | Hourly batch jobs, frequent updates        |
| **12 hours**  | Last 12 hours   | Twice-daily pipelines                      |
| **24 hours**  | Last day        | Daily batch ETL (most common)              |
| **168 hours** | Last 7 days     | Weekly aggregates, slow-changing data      |

```
Example: Daily batch job loads orders every night at 2 AM

Time window: 24 hours
Check interval: 6 hours (runs at 2 AM, 8 AM, 2 PM, 8 PM)

Check at 8 AM:
  Counts rows WHERE created_at >= 8 AM yesterday
  Includes last night's batch + today's streaming data
```

<Warning>
  Without a timestamp column, the time window setting is ignored and all rows are counted.
</Warning>

<h3 id="check-interval">
  Check Interval
</h3>

How often to run the row count check and evaluate for anomalies:

| Interval     | Frequency    | Best For                                  |
| ------------ | ------------ | ----------------------------------------- |
| **1 hour**   | Every hour   | Real-time monitoring, critical tables     |
| **6 hours**  | 4x per day   | Standard monitoring, daily tables         |
| **12 hours** | 2x per day   | Less critical tables, longer time windows |
| **24 hours** | Once per day | Weekly tables, slow-changing data         |

**Cost considerations:** More frequent checks = more compute resources. Choose the interval that matches your SLA requirements.

```
Example check intervals for different scenarios:

Scenario: Real-time event stream
  Time window: 1 hour
  Check interval: 1 hour
  Result: Hourly checks on last hour of data

Scenario: Daily batch job
  Time window: 24 hours
  Check interval: 6 hours
  Result: 4 checks per day on last 24h of data

Scenario: Weekly report table
  Time window: 168 hours
  Check interval: 24 hours
  Result: Daily checks on last week of data
```

<Tip>
  Check interval should be ≤ time window for meaningful monitoring. A 24-hour check interval with a 1-hour time window would miss most anomalies.
</Tip>

<h3 id="operating-period">
  Operating Period
</h3>

Tables that are only active during business hours have structurally quiet nights and weekends. Auto-learn mode normally pools every observation into one average, so those quiet periods widen the expected range until weekday drops slip through, or they flag every weekend as anomalously low. Operating-period awareness compares each measurement only against history from the same kind of period.

| Mode            | Behavior                                                                                                                          |
| --------------- | --------------------------------------------------------------------------------------------------------------------------------- |
| `off` (default) | Pools all history. Unchanged from before.                                                                                         |
| `schedule`      | Segments by a linked [operating schedule](/alerts/operating-schedules) (declared days, hours, timezone).                          |
| `auto`          | Learns an active/dormant calendar from the schedule's own history (day-of-week, plus hour-of-week for sub-daily check intervals). |

An **active**-period measurement is baselined only against prior active measurements, keeping the expected range tight. A **dormant**-period measurement never alerts on a low or zero count (expected when closed); it alerts only on unexpected activity above a near-zero floor, catching rows written when the table should be idle.

Operating-period awareness only applies in auto-learn mode and needs enough history before it takes effect. Explicit mode and `off` keep their existing behavior.

## Time-Windowed Counting

For tables that accumulate data over time, use a timestamp column to count rows within a specific window:

| Window        | Use Case                      |
| ------------- | ----------------------------- |
| **1 hour**    | Real-time event streams       |
| **6 hours**   | Frequent batch loads          |
| **12 hours**  | Twice-daily pipelines         |
| **24 hours**  | Daily batch ETL (most common) |
| **168 hours** | Weekly aggregates             |

```
Time-windowed counting (orders table with created_at):

Without time window:  COUNT(*) = 5,000,000 (all time)
With 24h window:      COUNT(*) WHERE created_at >= now() - 24h = 48,000
```

<Tip>
  Use time-windowed counting for append-only tables. Without it, row counts only grow, making anomaly detection less useful.
</Tip>

## Setting Up Row Count Monitoring

<Steps>
  <Step title="Navigate to the Asset">
    Go to **Assets** and select the table you want to monitor.
  </Step>

  <Step title="Open Data Quality Tab">
    Click the **Data Quality** tab on the asset detail page, then scroll to the **Row Count Monitoring** section.
  </Step>

  <Step title="Create Schedule">
    Click **Create Schedule** and configure:

    * **Table**: Select the table to monitor
    * **Timestamp column**: (Optional) For time-windowed counting
    * **Time window**: How far back to count rows
    * **Check interval**: How often to check (1h, 6h, 12h, 24h)
  </Step>

  <Step title="Choose Monitoring Mode">
    Select your monitoring approach:

    **Auto-Learn Mode:**

    * Toggle **Auto-learn** on
    * Set **Sensitivity** (1-4, lower = more sensitive)
    * Wait for learning period to complete

    **Explicit Mode:**

    * Toggle **Auto-learn** off
    * Set **Expected min rows**
    * Set **Expected max rows**
  </Step>

  <Step title="Save and Monitor">
    Click **Create**. The first check runs immediately, then continues on your configured interval.
  </Step>
</Steps>

## Understanding Results

### Status Indicators

| Status       | Meaning                          | Action                          |
| ------------ | -------------------------------- | ------------------------------- |
| **Healthy**  | Row count within expected range  | None needed                     |
| **Anomaly**  | Row count outside expected range | Investigate the cause           |
| **Learning** | Collecting baseline data         | Wait for learning to complete   |
| **No Data**  | No checks have run yet           | Check will run on next interval |

### Anomaly Types

| Anomaly                | Possible Causes                                  |
| ---------------------- | ------------------------------------------------ |
| **Row count too low**  | ETL failure, data loss, filter bug, source issue |
| **Row count too high** | Duplicate load, removed filter, upstream spike   |
| **Row count zero**     | Complete ETL failure, wrong table, permissions   |

## Best Practices

### Choose the Right Mode

| Scenario                     | Recommended Mode                 |
| ---------------------------- | -------------------------------- |
| Data patterns vary naturally | Auto-learn with sensitivity 2-3  |
| Exact expectations known     | Explicit with min/max thresholds |
| New table, unknown patterns  | Auto-learn with sensitivity 3-4  |
| Critical data, low tolerance | Auto-learn with sensitivity 1-2  |

### Set Appropriate Windows

| Data Pattern        | Recommended Window |
| ------------------- | ------------------ |
| Real-time streaming | 1 hour             |
| Hourly batch jobs   | 6 hours            |
| Daily batch jobs    | 24 hours           |
| Weekly aggregates   | 168 hours          |

### Start Conservative, Then Tighten

1. **Week 1**: Use auto-learn with sensitivity 3 (less sensitive)
2. **Week 2-4**: Review any anomalies, adjust if too noisy
3. **Month 2+**: Tighten to sensitivity 2 once patterns are stable

## Row Count vs. Metrics

| Feature                 | Row Count          | Data Quality Metrics      |
| ----------------------- | ------------------ | ------------------------- |
| **Purpose**             | Monitor row counts | Monitor column statistics |
| **Scope**               | Table-level        | Column-level              |
| **ML-based**            | Yes (auto-learn)   | Yes (anomaly detection)   |
| **Time windows**        | Yes                | No                        |
| **Explicit thresholds** | Yes                | Via checks                |

**Use Row Count Monitoring for**: "Did the right amount of data arrive?"
**Use Metrics for**: "Is the data quality correct?" (nulls, duplicates, ranges)

## Troubleshooting

<AccordionGroup>
  <Accordion title="Status shows 'Learning' for too long">
    **Causes:**

    * Not enough data points collected yet
    * Check interval is very long (weekly)

    **Solutions:**

    1. Wait for at least 7 data points (7 days for daily checks)
    2. Consider switching to explicit mode if you know expected values
  </Accordion>

  <Accordion title="Too many false positive anomalies">
    **Causes:**

    * Sensitivity is too low (too sensitive)
    * Natural data variation is high
    * Seasonality not yet learned

    **Solutions:**

    1. Increase sensitivity (e.g., 2 to 3)
    2. Allow more baseline data (30+ days)
    3. Switch to explicit mode with wider thresholds
  </Accordion>

  <Accordion title="Missing real anomalies">
    **Causes:**

    * Sensitivity is too high (not sensitive enough)
    * Baseline includes anomalous data

    **Solutions:**

    1. Decrease sensitivity (e.g., 3 to 2)
    2. Switch to explicit mode with tighter thresholds
  </Accordion>

  <Accordion title="Row count always zero with time window">
    **Causes:**

    * Timestamp column has no recent data
    * Wrong timestamp column selected
    * Time window too narrow

    **Solutions:**

    1. Verify timestamp column has data in the window
    2. Check column data type (should be timestamp/datetime)
    3. Widen the time window
  </Accordion>
</AccordionGroup>

## Common Questions

### Auto-learn or explicit mode, which should I pick?

Use **auto-learn** when row counts fluctuate naturally, AnomalyArmor builds a statistical baseline and flags deviations. Use **explicit** when you know the exact min and max (for example, a daily summary that should always have exactly one row).

### Why configure a timestamp column?

Without one, `COUNT(*)` returns all rows ever, so counts only grow and anomaly detection loses meaning. A timestamp column lets AnomalyArmor count only rows inside a time window like the last 24 hours, which is what you actually want to monitor for append-only tables.

### How long does auto-learn mode need before it starts alerting?

At least 7 data points. For a daily check interval that's 7 days; for hourly it's 7 hours. Until enough baseline accumulates, the status shows **Learning** and no anomalies fire.

### What does a sensitivity of 2 actually mean?

It's the multiplier on the standard deviation used to define the expected range. Expected range = `mean ± (stddev × sensitivity)`. Lower values (1) are stricter and catch smaller drifts; higher values (3-4) tolerate more variation.

### What's the difference between row count monitoring and data quality metrics?

Row count is table-level ("did the right volume arrive?") with ML pattern learning and time windows. [Metrics](/data-quality/metrics) are column-level ("are the values correct?"), tracking things like null percentages, distinct counts, and numeric ranges.

### What typically causes a row count spike?

The most common cause is a duplicate load, where the same batch ran twice or a filter was removed upstream. Sudden drops are usually ETL failures, wrong source, or a filter change that excluded valid data. The anomaly page shows the timing so you can correlate with deploys.

## What's Next

<CardGroup cols={2}>
  <Card title="Set Up Alerts" icon="bell" href="/alerts/overview">
    Get notified when row count anomalies are detected
  </Card>

  <Card title="Data Quality Metrics" icon="chart-line" href="/data-quality/metrics">
    Monitor column-level statistics like null percentages
  </Card>

  <Card title="Freshness Monitoring" icon="clock" href="/data-quality/freshness-monitoring">
    Track when data was last updated
  </Card>

  <Card title="Report Badges" icon="shield-check" href="/data-quality/report-badges">
    Embed row count status in dashboards
  </Card>
</CardGroup>
