Audience: Data Engineers, Analytics Engineers Schema changes are one of the most common causes of pipeline failures. A dropped column upstream can cascade into failed dbt runs, broken dashboards, and late-night debugging sessions. This guide shows you how to use AnomalyArmor to catch schema changes before they impact your pipelines.Documentation Index
Fetch the complete documentation index at: https://docs.anomalyarmor.ai/llms.txt
Use this file to discover all available pages before exploring further.
The Problem
The Solution
With AnomalyArmor, you’ll know about schema changes before your pipelines run:Setup Guide
Step 1: Connect Your Source Databases
Connect the databases that your pipelines read , not just your warehouse. Common sources to monitor:- Production application databases (the ones your dbt reads from)
- Third-party data sources
- Shared data lakes
Step 2: Schedule Frequent Discovery
For pipeline-critical databases, run discovery frequently:| Database Type | Recommended Schedule | Why |
|---|---|---|
| Application databases | Hourly | Changes can happen anytime |
| Shared warehouses | Every 6 hours | Less frequent changes |
| Third-party sources | Daily | Usually stable |
Step 3: Create Breaking Change Alerts
Set up alerts specifically for changes that break pipelines: Rule: Breaking Schema Changes (Production)| Field | Value |
|---|---|
| Event | Schema Change Detected |
| Data Source | production-app-db |
| Schema | public |
| Assets | All (or list specific tables) |
| Change Type | Column Removed, Table Removed, Type Changed |
| Destinations | Slack #data-engineering, Email data-team@company.com |
Step 4: Time Alerts Before Pipeline Runs
If your dbt runs at 3 AM, schedule discovery at 2 AM:Advanced: Pre-dbt Validation
Option 1: Webhook Integration
Use webhooks to fail your pipeline early if breaking changes are detected:- Set up a webhook destination in AnomalyArmor
- Point it at a validation endpoint in your orchestrator
- If webhook fires, block the dbt run
- AnomalyArmor Alert fires on schema change
- Webhook sent to Airflow/Dagster
-
Set flag:
schema_changes_detected = true - dbt task checks flag before running
- If flag = true: Fail fast with meaningful error
Option 2: Discovery Schedule Alignment
Align discovery with your orchestration schedule:What to Do When Alerts Fire
Immediate Actions
- Acknowledge the alert: Let your team know you’re investigating
- Check the change details: View in AnomalyArmor: what changed, when, and on which asset
- Assess impact: Which models/dashboards use this table?
If the Change is Breaking
- Pause affected pipelines (if possible before they run)
- Update your dbt models to handle the change
- Test locally with the new schema
- Deploy the fix before the next scheduled run
If the Change is Expected
- Document it: Note in AnomalyArmor or your team wiki
- Update downstream: Ensure all dependents are updated
- Consider communication: Should you announce to stakeholders?
Model Dependency Mapping
Know which models depend on which tables: Source Table:production.orders
stg_orders(staging model)int_orders_enriched(intermediate)fct_orders(fact table)- monthly_revenue (dashboard)
- customer_lifetime_value (analytics)
rpt_daily_orders(report)
dim_order_status(dimension)
production.orders changes, all of these are potentially impacted.
Alert Configuration Examples
| Priority | Rule Name | Event | Scope | Conditions | Destinations |
|---|---|---|---|---|---|
| High | Revenue Table Changes | Schema Change | orders, payments, transactions | Any change | Slack #data-critical, PagerDuty |
| Medium | Dimension Table Changes | Schema Change | dim_*, *_lookup | Column removed or type changed | Slack #data-engineering |
| Low | External Source Changes | Schema Change | external., partner_ | Any change | Email (daily digest) |
Troubleshooting
Pipeline failed but I didn't get an alert
Pipeline failed but I didn't get an alert
- Check discovery timing: Did discovery run before the pipeline?
- Check scope: Is the table included in the alert rule?
- Check conditions: Does the change type match your conditions?
- Verify destination: Is the destination configured correctly?
Too many alerts for non-breaking changes
Too many alerts for non-breaking changes
- Filter change types: Alert only on
Column Removed,Table Removed,Type Changed - Exclude test schemas: Filter out
test_*,dev_* - Separate environments: Different rules for prod vs. staging
Can't connect to production database
Can't connect to production database
- Use a read replica: Monitor the replica instead of primary
- Create a dedicated user: With read-only permissions
- Check network access: Firewall rules, security groups
Checklist
Before going live:- Connected all source databases that feed pipelines
- Discovery scheduled to run before pipeline runs
- Alert rules for breaking changes (column/table removed)
- Alerts routed to the right channel (data engineering team)
- Team knows what to do when alerts fire
- Documented critical table dependencies
Common Questions
How often should I run schema discovery for pipeline-critical tables?
For production application databases that feed dbt or ETL, run discovery hourly, shared warehouses every 6 hours, and stable third-party sources daily. The goal is to detect a change before the next pipeline run, so align the discovery schedule with your orchestrator. See Schedule Frequent Discovery.Which schema changes actually break dbt pipelines?
The high-severity ones are column removed, table removed, and column type changed. Additive changes (new columns, new tables) rarely break existing models. Scope your breaking-change alert rule to those three change types to cut alert noise without missing pipeline-breakers.Can AnomalyArmor block my dbt run if a breaking change is detected?
Yes, via a webhook destination. Point the webhook at an Airflow/Dagster sensor that sets a flag, then make your dbt task depend on the flag being clear. See Option 1: Webhook Integration for the pattern. This is the “fail fast with a meaningful error” flow that beats a 3 AM dbt compilation error.Should I monitor my source database or my warehouse?
Monitor both, but source databases are where most breaking changes originate, upstream teams drop columns without telling you. Connect the production application DBs your dbt project reads from, not just the warehouse you write into. See Step 1.Why am I getting too many alerts for non-breaking changes?
Tighten the change-type filter on the rule to Column Removed, Table Removed, and Type Changed only. Excludetest_* and dev_* schemas, and split prod and staging into separate rules with different destinations. See the Troubleshooting section.
How do I know which dbt models a source table feeds?
Upload your dbt manifest via the lineage upload flow and AnomalyArmor’s asset page will show downstream dependencies. You can also usedbt ls --select +models/staging/stg_orders.sql locally to see the chain from source to dashboard.
Related Resources
Schema Monitoring
Deep dive into schema change detection
Alert Rules
Configure alert conditions
