Prevent Pipeline Failures

Audience: Data Engineers, Analytics Engineers Schema changes are one of the most common causes of pipeline failures. A dropped column upstream can cascade into failed dbt runs, broken dashboards, and late-night debugging sessions. This guide shows you how to use AnomalyArmor to catch schema changes before they impact your pipelines.

The Problem

Schema change cascade showing pipeline failure

The Solution

With AnomalyArmor, you’ll know about schema changes before your pipelines run:

Setup Guide

Step 1: Connect Your Source Databases

Connect the databases that your pipelines read , not just your warehouse. Common sources to monitor:

Production application databases (the ones your dbt reads from)
Third-party data sources
Shared data lakes

For each source, follow the connection guide.

Step 2: Schedule Frequent Discovery

For pipeline-critical databases, run discovery frequently:

Database Type	Recommended Schedule	Why
Application databases	Hourly	Changes can happen anytime
Shared warehouses	Every 6 hours	Less frequent changes
Third-party sources	Daily	Usually stable

Configure in: Data Sources → [Your Connection] → Settings → Discovery Schedule

Step 3: Create Breaking Change Alerts

Set up alerts specifically for changes that break pipelines: Rule: Breaking Schema Changes (Production)

Field	Value
Event	Schema Change Detected
Data Source	`production-app-db`
Schema	`public`
Assets	All (or list specific tables)
Change Type	Column Removed, Table Removed, Type Changed
Destinations	Slack `#data-engineering`, Email `data-team@company.com`

Step 4: Time Alerts Before Pipeline Runs

If your dbt runs at 3 AM, schedule discovery at 2 AM:

Advanced: Pre-dbt Validation

Option 1: Webhook Integration

Use webhooks to fail your pipeline early if breaking changes are detected:

Set up a webhook destination in AnomalyArmor
Point it at a validation endpoint in your orchestrator
If webhook fires, block the dbt run
AnomalyArmor Alert fires on schema change
Webhook sent to Airflow/Dagster
Set flag: schema_changes_detected = true
dbt task checks flag before running
If flag = true: Fail fast with meaningful error

Option 2: Discovery Schedule Alignment

Align discovery with your orchestration schedule:

# In your Airflow DAG
discovery_check = SimpleHttpOperator(
    task_id='check_for_schema_changes',
    http_conn_id='anomalyarmor',
    endpoint='/api/v1/discoveries/latest',
    method='GET',
)

run_dbt = BashOperator(
    task_id='run_dbt',
    bash_command='dbt run',
)

discovery_check >> run_dbt

What to Do When Alerts Fire

Immediate Actions

Acknowledge the alert: Let your team know you’re investigating
Check the change details: View in AnomalyArmor: what changed, when, and on which asset
Assess impact: Which models/dashboards use this table?

If the Change is Breaking

Pause affected pipelines (if possible before they run)
Update your dbt models to handle the change
Test locally with the new schema
Deploy the fix before the next scheduled run

If the Change is Expected

Document it: Note in AnomalyArmor or your team wiki
Update downstream: Ensure all dependents are updated
Consider communication: Should you announce to stakeholders?

Model Dependency Mapping

Know which models depend on which tables: Source Table: production.orders

stg_orders (staging model)
- int_orders_enriched (intermediate)
  - fct_orders (fact table)
    - monthly_revenue (dashboard)
    - customer_lifetime_value (analytics)
- rpt_daily_orders (report)
dim_order_status (dimension)

When production.orders changes, all of these are potentially impacted.

Use dbt’s dbt ls --select +models/staging/stg_orders.sql to see all downstream dependencies.

Alert Configuration Examples

Priority	Rule Name	Event	Scope	Conditions	Destinations
High	Revenue Table Changes	Schema Change	orders, payments, transactions	Any change	Slack #data-critical, PagerDuty
Medium	Dimension Table Changes	Schema Change	dim_, _lookup	Column removed or type changed	Slack #data-engineering
Low	External Source Changes	Schema Change	external., partner_	Any change	Email (daily digest)

Troubleshooting

Pipeline failed but I didn't get an alert

Check discovery timing: Did discovery run before the pipeline?
Check scope: Is the table included in the alert rule?
Check conditions: Does the change type match your conditions?
Verify destination: Is the destination configured correctly?

Too many alerts for non-breaking changes

Filter change types: Alert only on Column Removed, Table Removed, Type Changed
Exclude test schemas: Filter out test_*, dev_*
Separate environments: Different rules for prod vs. staging

Can't connect to production database

Use a read replica: Monitor the replica instead of primary
Create a dedicated user: With read-only permissions
Check network access: Firewall rules, security groups

Checklist

Before going live:

Connected all source databases that feed pipelines
Discovery scheduled to run before pipeline runs
Alert rules for breaking changes (column/table removed)
Alerts routed to the right channel (data engineering team)
Team knows what to do when alerts fire
Documented critical table dependencies

Common Questions

How often should I run schema discovery for pipeline-critical tables?

For production application databases that feed dbt or ETL, run discovery hourly, shared warehouses every 6 hours, and stable third-party sources daily. The goal is to detect a change before the next pipeline run, so align the discovery schedule with your orchestrator. See Schedule Frequent Discovery.

Which schema changes actually break dbt pipelines?

The high-severity ones are column removed, table removed, and column type changed. Additive changes (new columns, new tables) rarely break existing models. Scope your breaking-change alert rule to those three change types to cut alert noise without missing pipeline-breakers.

Can AnomalyArmor block my dbt run if a breaking change is detected?

Yes, via a webhook destination. Point the webhook at an Airflow/Dagster sensor that sets a flag, then make your dbt task depend on the flag being clear. See Option 1: Webhook Integration for the pattern. This is the “fail fast with a meaningful error” flow that beats a 3 AM dbt compilation error.

Should I monitor my source database or my warehouse?

Monitor both, but source databases are where most breaking changes originate, upstream teams drop columns without telling you. Connect the production application DBs your dbt project reads from, not just the warehouse you write into. See Step 1.

Why am I getting too many alerts for non-breaking changes?

Tighten the change-type filter on the rule to Column Removed, Table Removed, and Type Changed only. Exclude test_* and dev_* schemas, and split prod and staging into separate rules with different destinations. See the Troubleshooting section.

How do I know which dbt models a source table feeds?

Upload your dbt manifest via the lineage upload flow and AnomalyArmor’s asset page will show downstream dependencies. You can also use dbt ls --select +models/staging/stg_orders.sql locally to see the chain from source to dashboard.

Schema Monitoring

Deep dive into schema change detection

Alert Rules

Configure alert conditions

​The Problem

​The Solution

​Setup Guide

​Step 1: Connect Your Source Databases

​Step 2: Schedule Frequent Discovery

​Step 3: Create Breaking Change Alerts

​Step 4: Time Alerts Before Pipeline Runs

​Advanced: Pre-dbt Validation

​Option 1: Webhook Integration

​Option 2: Discovery Schedule Alignment

​What to Do When Alerts Fire

​Immediate Actions

​If the Change is Breaking

​If the Change is Expected

​Model Dependency Mapping

​Alert Configuration Examples

​Troubleshooting

​Checklist

​Common Questions

​How often should I run schema discovery for pipeline-critical tables?

​Which schema changes actually break dbt pipelines?

​Can AnomalyArmor block my dbt run if a breaking change is detected?

​Should I monitor my source database or my warehouse?

​Why am I getting too many alerts for non-breaking changes?

​How do I know which dbt models a source table feeds?

​Related Resources

Schema Monitoring

Alert Rules

The Problem

The Solution

Setup Guide

Step 1: Connect Your Source Databases

Step 2: Schedule Frequent Discovery

Step 3: Create Breaking Change Alerts

Step 4: Time Alerts Before Pipeline Runs

Advanced: Pre-dbt Validation

Option 1: Webhook Integration

Option 2: Discovery Schedule Alignment

What to Do When Alerts Fire

Immediate Actions

If the Change is Breaking

If the Change is Expected

Model Dependency Mapping

Alert Configuration Examples

Troubleshooting

Checklist

Common Questions

How often should I run schema discovery for pipeline-critical tables?

Which schema changes actually break dbt pipelines?

Can AnomalyArmor block my dbt run if a breaking change is detected?

Should I monitor my source database or my warehouse?

Why am I getting too many alerts for non-breaking changes?

How do I know which dbt models a source table feeds?

Related Resources