> ## Documentation Index
> Fetch the complete documentation index at: https://docs.anomalyarmor.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Prevent Pipeline Failures

> Catch schema changes before they break your dbt models and ETL jobs

<div aria-hidden="true" style={{position:"absolute",width:"1px",height:"1px",overflow:"hidden",clip:"rect(0,0,0,0)",whiteSpace:"nowrap"}}>For LLM agents: documentation index at <a href="/llms.txt" tabIndex={-1}>/llms.txt</a>, full text at <a href="/llms-full.txt" tabIndex={-1}>/llms-full.txt</a>. Append .md to any page URL for plain markdown.</div>
**Audience**: Data Engineers, Analytics Engineers

Schema changes are one of the most common causes of pipeline failures. A dropped column upstream can cascade into failed dbt runs, broken dashboards, and late-night debugging sessions.

This guide shows you how to use AnomalyArmor to catch schema changes before they impact your pipelines.

## The Problem

<img src="https://mintcdn.com/anomalyarmor/-pFpKEip0ftEEXe9/images/diagrams/without-with-monitoring-light.svg?fit=max&auto=format&n=-pFpKEip0ftEEXe9&q=85&s=39217edf473d7ae8da2060d58b9698d9" alt="Schema change cascade showing pipeline failure" className="block dark:hidden" width="800" height="380" data-path="images/diagrams/without-with-monitoring-light.svg" />

<img src="https://mintcdn.com/anomalyarmor/OSEzjlRMQ1RGruVN/images/diagrams/without-with-monitoring-dark.svg?fit=max&auto=format&n=OSEzjlRMQ1RGruVN&q=85&s=cb94f236b794ef39a09be74a57a5b1dd" alt="Schema change cascade showing pipeline failure" className="hidden dark:block" width="800" height="380" data-path="images/diagrams/without-with-monitoring-dark.svg" />

## The Solution

With AnomalyArmor, you'll know about schema changes before your pipelines run:

<img src="https://mintcdn.com/anomalyarmor/-pFpKEip0ftEEXe9/images/diagrams/pipeline-event-timeline-light.svg?fit=max&auto=format&n=-pFpKEip0ftEEXe9&q=85&s=4524cd76d909df10cde1b55b6bedbde2" alt="Pipeline event detection timeline" className="block dark:hidden" width="700" height="180" data-path="images/diagrams/pipeline-event-timeline-light.svg" />

<img src="https://mintcdn.com/anomalyarmor/CZXBGa_D1aE9spAI/images/diagrams/pipeline-event-timeline-dark.svg?fit=max&auto=format&n=CZXBGa_D1aE9spAI&q=85&s=b1be53df19dc5aa10319486ae2a843f4" alt="Pipeline event detection timeline" className="hidden dark:block" width="700" height="180" data-path="images/diagrams/pipeline-event-timeline-dark.svg" />

## Setup Guide

### Step 1: Connect Your Source Databases

Connect the databases that your pipelines read , not just your warehouse.

**Common sources to monitor:**

* Production application databases (the ones your dbt reads from)
* Third-party data sources
* Shared data lakes

For each source, follow the [connection guide](/data-sources/overview).

### Step 2: Schedule Frequent Discovery

For pipeline-critical databases, run discovery frequently:

| Database Type         | Recommended Schedule | Why                        |
| --------------------- | -------------------- | -------------------------- |
| Application databases | Hourly               | Changes can happen anytime |
| Shared warehouses     | Every 6 hours        | Less frequent changes      |
| Third-party sources   | Daily                | Usually stable             |

Configure in: **Data Sources → \[Your Connection] → Settings → Discovery Schedule**

### Step 3: Create Breaking Change Alerts

Set up alerts specifically for changes that break pipelines:

**Rule: Breaking Schema Changes (Production)**

| Field            | Value                                                    |
| ---------------- | -------------------------------------------------------- |
| **Event**        | Schema Change Detected                                   |
| **Data Source**  | `production-app-db`                                      |
| **Schema**       | `public`                                                 |
| **Assets**       | All (or list specific tables)                            |
| **Change Type**  | Column Removed, Table Removed, Type Changed              |
| **Destinations** | Slack `#data-engineering`, Email `data-team@company.com` |

### Step 4: Time Alerts Before Pipeline Runs

If your dbt runs at 3 AM, schedule discovery at 2 AM:

<img src="https://mintcdn.com/anomalyarmor/CZXBGa_D1aE9spAI/images/diagrams/discovery-schedule-timeline-light.svg?fit=max&auto=format&n=CZXBGa_D1aE9spAI&q=85&s=0e7dc0f075de80078c75b702ab1481b1" alt="Discovery schedule timeline strategy" className="block dark:hidden" width="700" height="180" data-path="images/diagrams/discovery-schedule-timeline-light.svg" />

<img src="https://mintcdn.com/anomalyarmor/pPIiSU0b3Ixsp9az/images/diagrams/discovery-schedule-timeline-dark.svg?fit=max&auto=format&n=pPIiSU0b3Ixsp9az&q=85&s=ef09ab7e97e035afff735b73054217c1" alt="Discovery schedule timeline strategy" className="hidden dark:block" width="700" height="180" data-path="images/diagrams/discovery-schedule-timeline-dark.svg" />

## Advanced: Pre-dbt Validation

### Option 1: Webhook Integration

Use webhooks to fail your pipeline early if breaking changes are detected:

1. Set up a webhook destination in AnomalyArmor

2. Point it at a validation endpoint in your orchestrator

3. If webhook fires, block the dbt run

4. **AnomalyArmor Alert** fires on schema change

5. **Webhook** sent to Airflow/Dagster

6. **Set flag**: `schema_changes_detected = true`

7. **dbt task** checks flag before running

8. **If flag = true**: Fail fast with meaningful error

### Option 2: Discovery Schedule Alignment

Align discovery with your orchestration schedule:

```python theme={null}
# In your Airflow DAG
discovery_check = SimpleHttpOperator(
    task_id='check_for_schema_changes',
    http_conn_id='anomalyarmor',
    endpoint='/api/v1/discoveries/latest',
    method='GET',
)

run_dbt = BashOperator(
    task_id='run_dbt',
    bash_command='dbt run',
)

discovery_check >> run_dbt
```

## What to Do When Alerts Fire

### Immediate Actions

1. **Acknowledge the alert**: Let your team know you're investigating
2. **Check the change details**: View in AnomalyArmor: what changed, when, and on which asset
3. **Assess impact**: Which models/dashboards use this table?

### If the Change is Breaking

1. **Pause affected pipelines** (if possible before they run)
2. **Update your dbt models** to handle the change
3. **Test locally** with the new schema
4. **Deploy the fix** before the next scheduled run

### If the Change is Expected

1. **Document it**: Note in AnomalyArmor or your team wiki
2. **Update downstream**: Ensure all dependents are updated
3. **Consider communication**: Should you announce to stakeholders?

## Model Dependency Mapping

Know which models depend on which tables:

**Source Table: `production.orders`**

* `stg_orders` (staging model)
  * `int_orders_enriched` (intermediate)
    * `fct_orders` (fact table)
      * monthly\_revenue (dashboard)
      * customer\_lifetime\_value (analytics)
  * `rpt_daily_orders` (report)
* `dim_order_status` (dimension)

When `production.orders` changes, all of these are potentially impacted.

<Tip>
  Use dbt's `dbt ls --select +models/staging/stg_orders.sql` to see all downstream dependencies.
</Tip>

## Alert Configuration Examples

| Priority   | Rule Name               | Event         | Scope                          | Conditions                     | Destinations                    |
| ---------- | ----------------------- | ------------- | ------------------------------ | ------------------------------ | ------------------------------- |
| **High**   | Revenue Table Changes   | Schema Change | orders, payments, transactions | Any change                     | Slack #data-critical, PagerDuty |
| **Medium** | Dimension Table Changes | Schema Change | dim\_\*, \*\_lookup            | Column removed or type changed | Slack #data-engineering         |
| **Low**    | External Source Changes | Schema Change | external.*, partner\_*         | Any change                     | Email (daily digest)            |

## Troubleshooting

<AccordionGroup>
  <Accordion title="Pipeline failed but I didn't get an alert">
    1. **Check discovery timing**: Did discovery run before the pipeline?
    2. **Check scope**: Is the table included in the alert rule?
    3. **Check conditions**: Does the change type match your conditions?
    4. **Verify destination**: Is the destination configured correctly?
  </Accordion>

  <Accordion title="Too many alerts for non-breaking changes">
    1. **Filter change types**: Alert only on `Column Removed`, `Table Removed`, `Type Changed`
    2. **Exclude test schemas**: Filter out `test_*`, `dev_*`
    3. **Separate environments**: Different rules for prod vs. staging
  </Accordion>

  <Accordion title="Can't connect to production database">
    1. **Use a read replica**: Monitor the replica instead of primary
    2. **Create a dedicated user**: With read-only permissions
    3. **Check network access**: Firewall rules, security groups
  </Accordion>
</AccordionGroup>

## Checklist

Before going live:

* [ ] Connected all source databases that feed pipelines
* [ ] Discovery scheduled to run before pipeline runs
* [ ] Alert rules for breaking changes (column/table removed)
* [ ] Alerts routed to the right channel (data engineering team)
* [ ] Team knows what to do when alerts fire
* [ ] Documented critical table dependencies

## Common Questions

### How often should I run schema discovery for pipeline-critical tables?

For production application databases that feed dbt or ETL, run discovery **hourly**, shared warehouses every 6 hours, and stable third-party sources daily. The goal is to detect a change before the next pipeline run, so align the discovery schedule with your orchestrator. See [Schedule Frequent Discovery](#step-2-schedule-frequent-discovery).

### Which schema changes actually break dbt pipelines?

The high-severity ones are **column removed**, **table removed**, and **column type changed**. Additive changes (new columns, new tables) rarely break existing models. Scope your breaking-change alert rule to those three change types to cut alert noise without missing pipeline-breakers.

### Can AnomalyArmor block my dbt run if a breaking change is detected?

Yes, via a webhook destination. Point the webhook at an Airflow/Dagster sensor that sets a flag, then make your dbt task depend on the flag being clear. See [Option 1: Webhook Integration](#option-1-webhook-integration) for the pattern. This is the "fail fast with a meaningful error" flow that beats a 3 AM dbt compilation error.

### Should I monitor my source database or my warehouse?

Monitor both, but source databases are where most breaking changes originate, upstream teams drop columns without telling you. Connect the production application DBs your dbt project reads from, not just the warehouse you write into. See [Step 1](#step-1-connect-your-source-databases).

### Why am I getting too many alerts for non-breaking changes?

Tighten the change-type filter on the rule to **Column Removed**, **Table Removed**, and **Type Changed** only. Exclude `test_*` and `dev_*` schemas, and split prod and staging into separate rules with different destinations. See the [Troubleshooting section](#troubleshooting).

### How do I know which dbt models a source table feeds?

Upload your dbt manifest via the [lineage upload](/guides/lineage-upload) flow and AnomalyArmor's asset page will show downstream dependencies. You can also use `dbt ls --select +models/staging/stg_orders.sql` locally to see the chain from source to dashboard.

## Related Resources

<CardGroup cols={2}>
  <Card title="Schema Monitoring" icon="table" href="/schema-monitoring/overview">
    Deep dive into schema change detection
  </Card>

  <Card title="Alert Rules" icon="bell" href="/alerts/alert-rules">
    Configure alert conditions
  </Card>
</CardGroup>
