Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.anomalyarmor.ai/llms.txt

Use this file to discover all available pages before exploring further.

Audience: Data Engineers, Analytics Engineers Schema changes are one of the most common causes of pipeline failures. A dropped column upstream can cascade into failed dbt runs, broken dashboards, and late-night debugging sessions. This guide shows you how to use AnomalyArmor to catch schema changes before they impact your pipelines.

The Problem

Schema change cascade showing pipeline failure

The Solution

With AnomalyArmor, you’ll know about schema changes before your pipelines run: Pipeline event detection timeline

Setup Guide

Step 1: Connect Your Source Databases

Connect the databases that your pipelines read , not just your warehouse. Common sources to monitor:
  • Production application databases (the ones your dbt reads from)
  • Third-party data sources
  • Shared data lakes
For each source, follow the connection guide.

Step 2: Schedule Frequent Discovery

For pipeline-critical databases, run discovery frequently:
Database TypeRecommended ScheduleWhy
Application databasesHourlyChanges can happen anytime
Shared warehousesEvery 6 hoursLess frequent changes
Third-party sourcesDailyUsually stable
Configure in: Data Sources → [Your Connection] → Settings → Discovery Schedule

Step 3: Create Breaking Change Alerts

Set up alerts specifically for changes that break pipelines: Rule: Breaking Schema Changes (Production)
FieldValue
EventSchema Change Detected
Data Sourceproduction-app-db
Schemapublic
AssetsAll (or list specific tables)
Change TypeColumn Removed, Table Removed, Type Changed
DestinationsSlack #data-engineering, Email data-team@company.com

Step 4: Time Alerts Before Pipeline Runs

If your dbt runs at 3 AM, schedule discovery at 2 AM: Discovery schedule timeline strategy

Advanced: Pre-dbt Validation

Option 1: Webhook Integration

Use webhooks to fail your pipeline early if breaking changes are detected:
  1. Set up a webhook destination in AnomalyArmor
  2. Point it at a validation endpoint in your orchestrator
  3. If webhook fires, block the dbt run
  4. AnomalyArmor Alert fires on schema change
  5. Webhook sent to Airflow/Dagster
  6. Set flag: schema_changes_detected = true
  7. dbt task checks flag before running
  8. If flag = true: Fail fast with meaningful error

Option 2: Discovery Schedule Alignment

Align discovery with your orchestration schedule:
# In your Airflow DAG
discovery_check = SimpleHttpOperator(
    task_id='check_for_schema_changes',
    http_conn_id='anomalyarmor',
    endpoint='/api/v1/discoveries/latest',
    method='GET',
)

run_dbt = BashOperator(
    task_id='run_dbt',
    bash_command='dbt run',
)

discovery_check >> run_dbt

What to Do When Alerts Fire

Immediate Actions

  1. Acknowledge the alert: Let your team know you’re investigating
  2. Check the change details: View in AnomalyArmor: what changed, when, and on which asset
  3. Assess impact: Which models/dashboards use this table?

If the Change is Breaking

  1. Pause affected pipelines (if possible before they run)
  2. Update your dbt models to handle the change
  3. Test locally with the new schema
  4. Deploy the fix before the next scheduled run

If the Change is Expected

  1. Document it: Note in AnomalyArmor or your team wiki
  2. Update downstream: Ensure all dependents are updated
  3. Consider communication: Should you announce to stakeholders?

Model Dependency Mapping

Know which models depend on which tables: Source Table: production.orders
  • stg_orders (staging model)
    • int_orders_enriched (intermediate)
      • fct_orders (fact table)
        • monthly_revenue (dashboard)
        • customer_lifetime_value (analytics)
    • rpt_daily_orders (report)
  • dim_order_status (dimension)
When production.orders changes, all of these are potentially impacted.
Use dbt’s dbt ls --select +models/staging/stg_orders.sql to see all downstream dependencies.

Alert Configuration Examples

PriorityRule NameEventScopeConditionsDestinations
HighRevenue Table ChangesSchema Changeorders, payments, transactionsAny changeSlack #data-critical, PagerDuty
MediumDimension Table ChangesSchema Changedim_*, *_lookupColumn removed or type changedSlack #data-engineering
LowExternal Source ChangesSchema Changeexternal., partner_Any changeEmail (daily digest)

Troubleshooting

  1. Check discovery timing: Did discovery run before the pipeline?
  2. Check scope: Is the table included in the alert rule?
  3. Check conditions: Does the change type match your conditions?
  4. Verify destination: Is the destination configured correctly?
  1. Filter change types: Alert only on Column Removed, Table Removed, Type Changed
  2. Exclude test schemas: Filter out test_*, dev_*
  3. Separate environments: Different rules for prod vs. staging
  1. Use a read replica: Monitor the replica instead of primary
  2. Create a dedicated user: With read-only permissions
  3. Check network access: Firewall rules, security groups

Checklist

Before going live:
  • Connected all source databases that feed pipelines
  • Discovery scheduled to run before pipeline runs
  • Alert rules for breaking changes (column/table removed)
  • Alerts routed to the right channel (data engineering team)
  • Team knows what to do when alerts fire
  • Documented critical table dependencies

Common Questions

How often should I run schema discovery for pipeline-critical tables?

For production application databases that feed dbt or ETL, run discovery hourly, shared warehouses every 6 hours, and stable third-party sources daily. The goal is to detect a change before the next pipeline run, so align the discovery schedule with your orchestrator. See Schedule Frequent Discovery.

Which schema changes actually break dbt pipelines?

The high-severity ones are column removed, table removed, and column type changed. Additive changes (new columns, new tables) rarely break existing models. Scope your breaking-change alert rule to those three change types to cut alert noise without missing pipeline-breakers.

Can AnomalyArmor block my dbt run if a breaking change is detected?

Yes, via a webhook destination. Point the webhook at an Airflow/Dagster sensor that sets a flag, then make your dbt task depend on the flag being clear. See Option 1: Webhook Integration for the pattern. This is the “fail fast with a meaningful error” flow that beats a 3 AM dbt compilation error.

Should I monitor my source database or my warehouse?

Monitor both, but source databases are where most breaking changes originate, upstream teams drop columns without telling you. Connect the production application DBs your dbt project reads from, not just the warehouse you write into. See Step 1.

Why am I getting too many alerts for non-breaking changes?

Tighten the change-type filter on the rule to Column Removed, Table Removed, and Type Changed only. Exclude test_* and dev_* schemas, and split prod and staging into separate rules with different destinations. See the Troubleshooting section.

How do I know which dbt models a source table feeds?

Upload your dbt manifest via the lineage upload flow and AnomalyArmor’s asset page will show downstream dependencies. You can also use dbt ls --select +models/staging/stg_orders.sql locally to see the chain from source to dashboard.

Schema Monitoring

Deep dive into schema change detection

Alert Rules

Configure alert conditions