> ## Documentation Index
> Fetch the complete documentation index at: https://docs.anomalyarmor.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# BigQuery

> Connect AnomalyArmor to Google BigQuery for schema monitoring and change detection

<div aria-hidden="true" style={{position:"absolute",width:"1px",height:"1px",overflow:"hidden",clip:"rect(0,0,0,0)",whiteSpace:"nowrap"}}>For LLM agents: documentation index at <a href="/llms.txt" tabIndex={-1}>/llms.txt</a>, full text at <a href="/llms-full.txt" tabIndex={-1}>/llms-full.txt</a>. Append .md to any page URL for plain markdown.</div>
Connect AnomalyArmor to Google BigQuery to monitor schemas, track freshness, and detect schema drift across your datasets and tables.

## Requirements

Before connecting, ensure you have:

* **Google Cloud Platform (GCP) account** with an active project
* **BigQuery API enabled** in your project
* **Service account** with `bigquery.datasets.get` and `bigquery.tables.list` permissions
* **Service account JSON key** for authentication

## Connection Settings

| Field                          | Description                             | Example                            |
| ------------------------------ | --------------------------------------- | ---------------------------------- |
| **Asset Name**                 | Friendly identifier for this connection | `BigQuery Production`              |
| **Project ID**                 | GCP project ID                          | `my-analytics-project`             |
| **Service Account JSON**       | Full JSON key contents                  | `{"type": "service_account", ...}` |
| **Default Dataset** (Optional) | Scope introspection to a single dataset | `analytics`                        |
| **Location** (Optional)        | BigQuery data location/region           | `US`, `EU`, `us-east1`             |

### Service Account JSON Input

You can provide the service account JSON in two ways:

* **File Upload**: Click **Upload JSON File** to select your downloaded key file
* **Paste**: Copy and paste the entire JSON contents into the text area

<Tip>
  File upload is recommended to avoid copy-paste errors with the \~2KB JSON key.
</Tip>

### Default Dataset

If your project has many datasets, use the **Default Dataset** field to scope introspection to a single dataset. This is useful for:

* Large projects with 100+ datasets
* Focusing on production data only
* Reducing discovery time

Leave empty to discover all datasets in the project.

### Location

The **Location** field specifies where your BigQuery data resides. Common options:

| Location          | Description                   |
| ----------------- | ----------------------------- |
| `US`              | United States (multi-region)  |
| `EU`              | European Union (multi-region) |
| `us-east1`        | South Carolina                |
| `us-west1`        | Oregon                        |
| `europe-west1`    | Belgium                       |
| `europe-west2`    | London                        |
| `asia-northeast1` | Tokyo                         |

Leave as "Auto" to let BigQuery determine the location automatically.

<Note>
  BigQuery uses project-level connections. Without a default dataset, all datasets within the project will be discovered automatically.
</Note>

## Creating a Service Account

### Step 1: Create Service Account

1. Go to **GCP Console > IAM & Admin > Service Accounts**
2. Click **Create Service Account**
3. Enter details:
   * Name: `anomalyarmor`
   * Description: `AnomalyArmor schema monitoring`
4. Click **Create and Continue**

### Step 2: Assign Roles

Grant the minimum required roles:

| Role                     | Purpose                           |
| ------------------------ | --------------------------------- |
| **BigQuery Data Viewer** | Read table/view metadata and data |
| **BigQuery Job User**    | Execute queries for introspection |

| Role                     | ID                          | Purpose                          |
| ------------------------ | --------------------------- | -------------------------------- |
| **BigQuery Data Viewer** | `roles/bigquery.dataViewer` | Read metadata and table contents |
| **BigQuery Job User**    | `roles/bigquery.jobUser`    | Run queries in the project       |

<Tip>
  For production, consider creating a custom role with only the specific permissions needed (see Minimal Permissions section below).
</Tip>

### Step 3: Create JSON Key

1. Click on the created service account
2. Go to **Keys** tab
3. Click **Add Key > Create new key**
4. Select **JSON** format
5. Download and securely store the key file

<Warning>
  Keep your service account key secure. Never commit it to version control or share it publicly. You can rotate keys in GCP Console if compromised.
</Warning>

## Minimal Permissions

For maximum security, create a custom role with only these permissions:

```yaml theme={null}
# Custom role for AnomalyArmor
title: "AnomalyArmor Schema Monitor"
description: "Minimal permissions for schema monitoring"
includedPermissions:
  - bigquery.datasets.get
  - bigquery.tables.get
  - bigquery.tables.list
  - bigquery.jobs.create
  - bigquery.jobs.get
```

To create via gcloud CLI:

```bash theme={null}
# Create custom role
gcloud iam roles create anomalyarmor_monitor \
  --project=your-project-id \
  --title="AnomalyArmor Schema Monitor" \
  --description="Minimal permissions for schema monitoring" \
  --permissions=bigquery.datasets.get,bigquery.tables.get,bigquery.tables.list,bigquery.jobs.create,bigquery.jobs.get

# Assign to service account
gcloud projects add-iam-policy-binding your-project-id \
  --member="serviceAccount:anomalyarmor@your-project-id.iam.gserviceaccount.com" \
  --role="projects/your-project-id/roles/anomalyarmor_monitor"
```

## Finding Your Project ID

Your project ID is shown in multiple places in GCP Console:

* **Dashboard**: Top-left dropdown shows project name and ID
* **URL**: `console.cloud.google.com/bigquery?project=YOUR-PROJECT-ID`
* **Project Settings**: IAM & Admin > Settings

<Warning>
  Use the **Project ID** (e.g., `my-analytics-123`), not the Project Name (e.g., "My Analytics"). Project IDs are lowercase with hyphens.
</Warning>

## Enabling BigQuery API

If the BigQuery API is not enabled:

1. Go to **GCP Console > APIs & Services > Library**
2. Search for "BigQuery API"
3. Click **Enable**

Or via gcloud CLI:

```bash theme={null}
gcloud services enable bigquery.googleapis.com --project=your-project-id
```

## What We Monitor

AnomalyArmor discovers and monitors these BigQuery objects:

| Object Type            | Monitored | Notes                              |
| ---------------------- | --------- | ---------------------------------- |
| **Tables**             | Yes       | Standard and partitioned           |
| **Views**              | Yes       | Standard views                     |
| **Materialized Views** | Yes       | Treated as views                   |
| **External Tables**    | Yes       | Schema only, not source            |
| **Datasets**           | Yes       | As schemas/namespaces              |
| **Routines**           | No        | Functions/procedures not monitored |
| **Models**             | No        | ML models not monitored            |

### Metadata Captured

For each table and view:

* Table name and dataset
* Column names and data types
* Nullability
* Last modified timestamp (for freshness)
* Partitioning information

## Connection Architecture

<img src="https://mintcdn.com/anomalyarmor/mPQTTzz5PYy4fThA/images/diagrams/bigquery-connection-light.svg?fit=max&auto=format&n=mPQTTzz5PYy4fThA&q=85&s=b88b392eff261655e35c2f0c7ecac0b8" alt="BigQuery Connection Architecture" className="block dark:hidden" width="800" height="280" data-path="images/diagrams/bigquery-connection-light.svg" />

<img src="https://mintcdn.com/anomalyarmor/mPQTTzz5PYy4fThA/images/diagrams/bigquery-connection-dark.svg?fit=max&auto=format&n=mPQTTzz5PYy4fThA&q=85&s=437078ce8327f1a1b2323a7b24029ae2" alt="BigQuery Connection Architecture" className="hidden dark:block" width="800" height="280" data-path="images/diagrams/bigquery-connection-dark.svg" />

## What We Query

AnomalyArmor runs these types of queries:

```sql theme={null}
-- List tables in each dataset
SELECT table_catalog, table_schema, table_name, table_type
FROM `project_id.dataset_id.INFORMATION_SCHEMA.TABLES`;

-- Get column details
SELECT column_name, data_type, is_nullable, column_default
FROM `project_id.dataset_id.INFORMATION_SCHEMA.COLUMNS`
WHERE table_name = 'your_table';

-- Check freshness (for tables with timestamp columns)
SELECT MAX(your_timestamp_column) FROM `project_id.dataset_id.your_table`;
```

**Impact**: Minimal. These are lightweight metadata queries that don't scan table data.

### Cost Estimation

| Metric             | Value                               |
| ------------------ | ----------------------------------- |
| **Query type**     | Metadata queries (minimal cost)     |
| **Data scanned**   | \~1KB per INFORMATION\_SCHEMA query |
| **Estimated cost** | \< \$0.01 per discovery             |

<Note>
  INFORMATION\_SCHEMA queries in BigQuery are free. The only cost is for freshness checks that query actual tables, which scan minimal data.
</Note>

## Multiple Projects

To monitor multiple GCP projects, create separate data sources for each:

| Data Source          | Project                 |
| -------------------- | ----------------------- |
| BigQuery Production  | `prod-analytics-123`    |
| BigQuery Staging     | `staging-analytics-456` |
| BigQuery Development | `dev-analytics-789`     |

Each project needs its own service account or a service account with cross-project permissions.

### Cross-Project Access

To use one service account for multiple projects:

```bash theme={null}
# Grant access in each project
gcloud projects add-iam-policy-binding other-project-id \
  --member="serviceAccount:anomalyarmor@main-project.iam.gserviceaccount.com" \
  --role="roles/bigquery.dataViewer"
```

## VPC Service Controls

If your BigQuery is protected by VPC Service Controls:

1. Add AnomalyArmor's IP addresses to the access level
2. Or configure an access policy that allows the service account

<Note>
  Contact AnomalyArmor support for Enterprise VPC Service Controls integration options.
</Note>

## Troubleshooting

<AccordionGroup>
  <Accordion title="Permission denied / 403 error">
    **Causes**:

    * Service account lacks required roles
    * API not enabled
    * VPC Service Controls blocking access

    **Solutions**:

    1. Verify roles: **BigQuery Data Viewer** and **BigQuery Job User**
    2. Enable BigQuery API in GCP Console
    3. Check VPC Service Controls perimeter settings
    4. Test service account access via gcloud CLI
  </Accordion>

  <Accordion title="Project not found">
    **Causes**:

    * Wrong project ID
    * Project ID vs Project Name confusion
    * Project deleted or suspended

    **Solutions**:

    1. Use Project ID (lowercase with hyphens), not Project Name
    2. Verify project exists in GCP Console
    3. Check project hasn't been deleted or suspended
  </Accordion>

  <Accordion title="Invalid service account JSON">
    **Causes**:

    * Malformed JSON
    * Missing required fields
    * Copied text instead of file contents

    **Solutions**:

    1. Download fresh key from GCP Console
    2. Paste entire file contents (including braces)
    3. Verify JSON is valid (no trailing commas, proper quotes)
  </Accordion>

  <Accordion title="BigQuery API not enabled">
    **Causes**:

    * API disabled in project
    * New project without API enabled

    **Solutions**:

    1. Go to GCP Console > APIs & Services > Library
    2. Search for "BigQuery API"
    3. Click Enable
    4. Wait a few minutes for propagation
  </Accordion>

  <Accordion title="Dataset not found">
    **Causes**:

    * Dataset was deleted
    * Dataset in different project
    * Permission to list datasets missing

    **Solutions**:

    1. Verify dataset exists in BigQuery Console
    2. Check service account can list datasets
    3. Grant `bigquery.datasets.get` permission
  </Accordion>

  <Accordion title="No tables found in discovery">
    **Causes**:

    * Empty datasets
    * Service account can't read INFORMATION\_SCHEMA
    * Dataset-level permissions missing

    **Solutions**:

    1. Verify tables exist in BigQuery Console
    2. Test query as service account
    3. Grant dataset-level access if using fine-grained permissions
  </Accordion>

  <Accordion title="Quota exceeded / Rate limit">
    **Causes**:

    * Too many concurrent queries
    * Project quota limits reached

    **Solutions**:

    1. Wait and retry
    2. Check quota usage in GCP Console
    3. Request quota increase if needed
    4. Schedule discovery during off-peak hours
  </Accordion>
</AccordionGroup>

## Best Practices

### Use Dedicated Service Account

Create a dedicated service account for AnomalyArmor:

* Easier to audit and manage permissions
* Can be rotated independently
* Clear separation of concerns

### Start with Production Data

Monitor your production project first where schema changes have the most impact:

```
Priority:
1. Production project (critical)
2. Staging project (important)
3. Development projects (optional)
```

### Rotate Keys Periodically

Rotate service account keys on a regular schedule:

1. Create new key in GCP Console
2. Update credentials in AnomalyArmor
3. Delete old key after verification

### Monitor Specific Datasets

If you have many datasets, consider creating dataset-level permissions to monitor only critical ones:

```sql theme={null}
-- Grant access to specific datasets only
GRANT `roles/bigquery.dataViewer`
ON SCHEMA `project.critical_dataset`
TO "serviceAccount:anomalyarmor@project.iam.gserviceaccount.com";
```

## Common Questions

### How does AnomalyArmor authenticate to BigQuery?

Via a Google Cloud service account JSON key. Create a service account with `BigQuery Metadata Viewer` and `BigQuery Data Viewer` on the project or specific datasets, generate a key, and paste the JSON into the connection form. Workload Identity federation is available on Enterprise plans for customers who want to avoid long-lived keys.

### Does AnomalyArmor monitor across multiple BigQuery projects?

One connection monitors one project. Add a separate data source per project. Assets from all connected projects appear unified in the asset catalog; plan limits count monitored tables, not projects.

### What BigQuery costs does AnomalyArmor incur?

Minimal. Metadata reads against `INFORMATION_SCHEMA` are free. Bounded aggregate queries (`COUNT(*)`, `MIN`, `MAX`) hit the usual slot pricing but scan very little data per table since BigQuery's column-store skips unused columns. Use a dedicated reservation or project if you want clean cost attribution.

### Can AnomalyArmor monitor partitioned tables and clustered tables in BigQuery?

Yes. Schema drift, freshness, and metrics all work on partitioned and clustered tables. For very large partitioned tables, configure freshness to query a specific partition (e.g., `WHERE _PARTITIONTIME >= CURRENT_TIMESTAMP() - INTERVAL 1 DAY`) to minimize bytes scanned.

## Next Steps

<CardGroup cols={2}>
  <Card title="Run Discovery" icon="magnifying-glass" href="/quickstart/run-first-discovery">
    Scan your BigQuery project
  </Card>

  <Card title="Set Up Alerts" icon="bell" href="/quickstart/set-up-first-alert">
    Get notified of schema changes
  </Card>
</CardGroup>
