> ## Documentation Index
> Fetch the complete documentation index at: https://docs.anomalyarmor.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Databricks

> Connect AnomalyArmor to Databricks Unity Catalog

<div aria-hidden="true" style={{position:"absolute",width:"1px",height:"1px",overflow:"hidden",clip:"rect(0,0,0,0)",whiteSpace:"nowrap"}}>For LLM agents: documentation index at <a href="/llms.txt" tabIndex={-1}>/llms.txt</a>, full text at <a href="/llms-full.txt" tabIndex={-1}>/llms-full.txt</a>. Append .md to any page URL for plain markdown.</div>
Connect AnomalyArmor to your Databricks workspace to monitor Unity Catalog assets. We support Delta tables, views, and all Unity Catalog-managed objects.

## Requirements

Before connecting, ensure you have:

* **Databricks workspace** with Unity Catalog enabled
* **SQL Warehouse** (serverless or classic)
* **Personal Access Token** or Service Principal credentials
* **Catalog access** for the catalogs you want to monitor

## Connection Settings

| Field               | Description               | Example                            |
| ------------------- | ------------------------- | ---------------------------------- |
| **Connection Name** | Friendly identifier       | `Databricks Production`            |
| **Workspace URL**   | Your Databricks workspace | `https://xxx.cloud.databricks.com` |
| **HTTP Path**       | SQL warehouse path        | `/sql/1.0/warehouses/abc123`       |
| **Catalog**         | Unity Catalog to monitor  | `main`                             |
| **Access Token**    | Authentication token      | `dapi...`                          |

## Finding Your Connection Details

### Workspace URL

Your workspace URL is in your browser when logged into Databricks:

| Cloud     | URL Format                                      |
| --------- | ----------------------------------------------- |
| **Azure** | `https://adb-1234567890.12.azuredatabricks.net` |
| **AWS**   | `https://dbc-abc123.cloud.databricks.com`       |
| **GCP**   | `https://xxx.gcp.databricks.com`                |

### SQL Warehouse HTTP Path

1. Go to **SQL Warehouses** in Databricks
2. Click on your warehouse
3. Go to **Connection Details** tab
4. Copy the **HTTP Path**

```
HTTP Path format:
/sql/1.0/warehouses/abc123def456
                    ↑ Your warehouse ID
```

<Tip>
  Use a **Serverless SQL Warehouse** for best compatibility. Classic warehouses work too but may have startup delays.
</Tip>

### Creating an Access Token

<Tabs>
  <Tab title="Personal Access Token">
    Best for quick setup and testing:

    1. Click your username in Databricks → **User Settings**
    2. Go to **Access Tokens** tab
    3. Click **Generate New Token**
    4. Set a description: `AnomalyArmor`
    5. Set lifetime (or leave blank for no expiry)
    6. Click **Generate**
    7. **Copy the token immediately** (you won't see it again)

    ```
    Token format: dapi1234567890abcdef1234567890abcdef
    ```

    <Warning>
      Personal access tokens are tied to your user account. If you leave the organization, the token stops working. Consider using a service principal for production.
    </Warning>
  </Tab>

  <Tab title="Service Principal (Recommended)">
    Best for production use:

    **Step 1: Create Service Principal**

    1. Go to **Admin Console → Service Principals**
    2. Click **Add Service Principal**
    3. Name it: `anomalyarmor-monitoring`
    4. Note the **Application ID**

    **Step 2: Generate OAuth Token**

    1. Select the service principal
    2. Go to **Secrets** tab
    3. Click **Generate Secret**
    4. Copy the **Client ID** and **Client Secret**

    **Step 3: Grant Permissions**

    The service principal needs:

    * `USE CATALOG` on target catalogs
    * `USE SCHEMA` on target schemas
    * `SELECT` on tables (or `ALL PRIVILEGES` for read access)

    ```sql theme={null}
    -- Grant catalog access
    GRANT USE CATALOG ON CATALOG main TO `anomalyarmor-monitoring`;

    -- Grant schema access
    GRANT USE SCHEMA ON SCHEMA main.* TO `anomalyarmor-monitoring`;

    -- Grant table read access
    GRANT SELECT ON SCHEMA main.* TO `anomalyarmor-monitoring`;
    ```

    **Step 4: Use in AnomalyArmor**

    Enter the OAuth token in the Access Token field.
  </Tab>
</Tabs>

## Granting Catalog Permissions

The user or service principal needs read access to Unity Catalog.

<Tip>
  **Quick Setup**: [Download the Databricks permissions script](/downloads/databricks-permissions) for a ready-to-use SQL template with Unity Catalog grants.
</Tip>

```sql theme={null}
-- Minimal permissions for AnomalyArmor

-- Access to catalog
GRANT USE CATALOG ON CATALOG production TO `anomalyarmor`;

-- Access to all schemas in catalog
GRANT USE SCHEMA ON CATALOG production TO `anomalyarmor`;

-- Read access to tables
GRANT SELECT ON CATALOG production TO `anomalyarmor`;
```

### Per-Schema Permissions

For more granular control:

```sql theme={null}
-- Access specific schemas only
GRANT USE SCHEMA ON SCHEMA production.raw TO `anomalyarmor`;
GRANT USE SCHEMA ON SCHEMA production.staging TO `anomalyarmor`;
GRANT USE SCHEMA ON SCHEMA production.marts TO `anomalyarmor`;

-- Read access per schema
GRANT SELECT ON SCHEMA production.raw TO `anomalyarmor`;
GRANT SELECT ON SCHEMA production.staging TO `anomalyarmor`;
GRANT SELECT ON SCHEMA production.marts TO `anomalyarmor`;
```

## What We Monitor

AnomalyArmor discovers and monitors these Unity Catalog objects:

| Object Type      | Monitored | Notes                          |
| ---------------- | --------- | ------------------------------ |
| **Delta Tables** | Yes       | Including managed and external |
| **Views**        | Yes       | Standard and materialized      |
| **Schemas**      | Yes       | Schema-level metadata          |
| **Volumes**      | No        | Coming soon                    |
| **Functions**    | No        | Not supported                  |

### Metadata Captured

For each table and view:

* Table name and schema
* Column names and data types
* Table properties
* Last modified timestamp (for freshness)
* Partitioning information

## Multiple Catalogs

### 3-Level Namespace Support

Databricks Unity Catalog uses a 3-level namespace: `catalog.schema.table`. AnomalyArmor fully supports this structure, enabling you to:

* **Track tables across catalogs**: Distinguish between `prod.analytics.users` and `dev.analytics.users`
* **Filter by catalog**: View only tables from specific catalogs in the UI
* **Catalog-aware alerting**: Get notified of changes in production catalogs only
* **Lineage across catalogs**: Track data flow between development, staging, and production

### Connecting Multiple Catalogs

To monitor multiple catalogs, create separate data sources for each:

**Data Sources:**

* Databricks Production (catalog: production)
* Databricks Staging (catalog: staging)
* Databricks Development (catalog: development)

<Note>
  Each data source needs access to its respective catalog. Use the same token if it has permissions across catalogs.
</Note>

### Catalog-Aware Features

| Feature                   | Catalog Support                                    |
| ------------------------- | -------------------------------------------------- |
| **Asset Discovery**       | Tables shown with full `catalog.schema.table` path |
| **Schema Drift Alerts**   | Filter alerts by catalog                           |
| **Tag Inheritance**       | Tags propagate within catalog boundaries           |
| **Table Filtering**       | API supports `catalog_name` filter parameter       |
| **Lineage Visualization** | Shows cross-catalog data dependencies              |

## SQL Warehouse Considerations

### Warehouse State

AnomalyArmor queries run on your SQL warehouse. Consider:

| Warehouse Type          | Behavior                          |
| ----------------------- | --------------------------------- |
| **Serverless**          | Auto-starts, minimal delay        |
| **Classic (Auto-stop)** | May have startup delay (30s-2min) |
| **Classic (Always-on)** | Immediate, but costs more         |

### Warehouse Sizing

Discovery queries are lightweight. A **Small** or **X-Small** warehouse is sufficient:

* **Recommended**: Serverless SQL Warehouse
* **Alternative**: X-Small Classic Warehouse with auto-stop

### Scheduling Discovery

If using a classic warehouse with auto-stop:

1. Schedule discovery during business hours
2. Or extend auto-stop timeout to cover discovery windows
3. Or use serverless (recommended)

## Connection Architecture

<img src="https://mintcdn.com/anomalyarmor/mPQTTzz5PYy4fThA/images/diagrams/databricks-connection-light.svg?fit=max&auto=format&n=mPQTTzz5PYy4fThA&q=85&s=ec6ab689a5460de06d8a78c44224c587" alt="Databricks connection architecture" className="block dark:hidden" width="800" height="280" data-path="images/diagrams/databricks-connection-light.svg" />

<img src="https://mintcdn.com/anomalyarmor/mPQTTzz5PYy4fThA/images/diagrams/databricks-connection-dark.svg?fit=max&auto=format&n=mPQTTzz5PYy4fThA&q=85&s=4d3cac5397f48219aa3ea03752da8ce9" alt="Databricks connection architecture" className="hidden dark:block" width="800" height="280" data-path="images/diagrams/databricks-connection-dark.svg" />

## What We Query

AnomalyArmor runs these types of queries:

```sql theme={null}
-- List schemas
SHOW SCHEMAS IN CATALOG production;

-- List tables
SHOW TABLES IN SCHEMA production.raw;

-- Get table details
DESCRIBE TABLE EXTENDED production.raw.events;

-- Check freshness (for tables with timestamp columns)
SELECT MAX(event_timestamp) FROM production.raw.events;
```

**Impact**: Minimal. These are metadata queries that don't scan table data.

## Troubleshooting

<AccordionGroup>
  <Accordion title="Connection test fails">
    **Common causes**:

    1. Invalid or expired access token
    2. Wrong workspace URL
    3. Incorrect HTTP path

    **Solutions**:

    1. Generate a new access token
    2. Verify workspace URL matches your browser
    3. Copy HTTP path directly from SQL Warehouse settings
  </Accordion>

  <Accordion title="Permission denied errors">
    **Causes**:

    * Token lacks catalog/schema permissions
    * Service principal not granted access

    **Solutions**:

    ```sql theme={null}
    -- Check current permissions
    SHOW GRANTS ON CATALOG production;

    -- Grant necessary permissions
    GRANT USE CATALOG ON CATALOG production TO `your-user`;
    GRANT SELECT ON CATALOG production TO `your-user`;
    ```
  </Accordion>

  <Accordion title="Warehouse not found">
    **Causes**:

    * Wrong HTTP path
    * Warehouse deleted or renamed

    **Solutions**:

    1. Go to SQL Warehouses in Databricks
    2. Copy the HTTP path from Connection Details
    3. Ensure the warehouse exists and is accessible
  </Accordion>

  <Accordion title="Discovery times out">
    **Causes**:

    * Warehouse is starting up
    * Large number of tables

    **Solutions**:

    1. Use a serverless warehouse (faster startup)
    2. Extend warehouse auto-stop timeout
    3. Filter to specific schemas if catalog is very large
  </Accordion>

  <Accordion title="Token expired">
    **Causes**:

    * Personal access token has expiry date

    **Solutions**:

    1. Generate a new token with longer expiry
    2. Use a service principal with OAuth (no expiry)
    3. Update the token in AnomalyArmor Data Sources settings
  </Accordion>
</AccordionGroup>

## Best Practices

### Use Service Principals for Production

Personal access tokens are tied to individual users. If that user leaves:

* Token stops working
* Monitoring breaks

Service principals are organization-owned and persist regardless of user changes.

### Monitor Production Catalog

Start with your production catalog where schema changes have the most impact.

### Schedule Discovery After ETL

If you have predictable ETL schedules, run discovery after ETL completes to catch changes immediately:

```
ETL Schedule:     2:00 AM daily
Discovery Schedule: 3:00 AM daily (1 hour after ETL)
```

## Common Questions

### Does AnomalyArmor require Unity Catalog, or does the legacy Hive Metastore work?

Unity Catalog is required. The legacy Hive Metastore does not expose the information-schema views AnomalyArmor needs for consistent cross-catalog monitoring. If you're still on Hive, migrating to Unity Catalog also unlocks most other modern observability tools.

### Should I use a SQL Warehouse or a general-purpose cluster for AnomalyArmor?

SQL Warehouse - specifically Serverless or Pro. All-Purpose compute clusters work but start cold, which slows discovery. Use the smallest warehouse size; AnomalyArmor's queries are lightweight metadata and bounded aggregates.

### How do I authenticate AnomalyArmor to Databricks?

Personal Access Token (PAT) on a dedicated service-principal user is the standard. Grant the service principal `USE CATALOG`, `USE SCHEMA`, and `SELECT` on the objects you want monitored. OAuth machine-to-machine is available as an Enterprise option.

### Will AnomalyArmor keep my Databricks warehouse warm and run up Databricks cost?

No. Discovery runs in short bursts; your warehouse's `auto_stop_mins` setting takes over afterward. Set `auto_stop_mins` to 5-10 minutes on the AnomalyArmor warehouse to minimize idle cost. Serverless warehouses bill per second of activity and spin down instantly when idle.

### Can AnomalyArmor monitor Delta Live Tables or streaming tables?

Yes. DLT materializations and streaming tables appear as ordinary Delta tables in Unity Catalog and are fully supported for schema, freshness, and metric monitoring. Freshness reads Delta's history for accurate last-update timestamps.

## Next Steps

<CardGroup cols={2}>
  <Card title="Run Discovery" icon="magnifying-glass" href="/quickstart/run-first-discovery">
    Scan your Databricks catalog
  </Card>

  <Card title="Set Up Alerts" icon="bell" href="/quickstart/set-up-first-alert">
    Get notified of schema changes
  </Card>
</CardGroup>
