Labs4Change

Automate Looker Content Cleanup with the SDK and BigQuery

How to build an automated Looker content cleanup pipeline using the Looker SDK — identify stale dashboards, unused Looks, and orphaned content every 90 days with results stored in BigQuery.

Every Looker instance we audit has the same problem: hundreds of dashboards and Looks that nobody uses. They were created for a one-off analysis six months ago, or by someone who left the company, or as a draft that was never finished. They clutter the content browser, confuse new users, and make governance impossible.

The fix isn't a one-time cleanup. It's an automated pipeline that runs every 90 days, identifies stale content, archives or soft-deletes it, and logs everything to BigQuery for audit purposes.

Here's how to build it with the Looker SDK.


What "Stale Content" Means

Before you delete anything, define what stale means for your organization. Our default criteria:

  • Not viewed in 90 days — the content exists but nobody has opened it
  • Not scheduled — it's not being sent to anyone on a regular basis
  • Created by a deactivated user — the creator no longer has access
  • In a personal space of a deactivated user — orphaned content nobody can manage

These rules are configurable. Some organizations use 60 days, some use 180. The point is to codify the definition so it's not a judgment call every time.

The Architecture

┌─────────────┐     ┌──────────────┐     ┌───────────┐
│  Looker SDK  │ ──► │  Python      │ ──► │  BigQuery  │
│  (API calls) │     │  Pipeline    │     │  (Audit    │
│              │     │  (identify,  │     │   logs)    │
│              │     │   archive,   │     │           │
│              │     │   report)    │     │           │
└─────────────┘     └──────────────┘     └───────────┘
        │                                      │
        ▼                                      ▼
   Soft-delete                          Historical audit
   stale content                        trail in BQ

The pipeline runs on a cron schedule (Cloud Scheduler + Cloud Run, or a simple GitHub Action). Each run:

  1. Queries Looker's System Activity to find content usage
  2. Identifies stale content based on your rules
  3. Logs the full inventory to BigQuery (before any changes)
  4. Soft-deletes or moves content to a "Trash" folder
  5. Sends a summary report to Slack or email

Step 1: Connect to the Looker SDK

import looker_sdk
from google.cloud import bigquery
from datetime import datetime, timedelta

# Initialize the Looker SDK
sdk = looker_sdk.init40("looker.ini")

# Or use environment variables
# sdk = looker_sdk.init40()

STALE_DAYS = 90
CUTOFF_DATE = datetime.now() - timedelta(days=STALE_DAYS)

The looker.ini file contains your API credentials:

[Looker]
base_url=https://your-instance.cloud.looker.com
client_id=YOUR_CLIENT_ID
client_secret=YOUR_CLIENT_SECRET
verify_ssl=true

Step 2: Find Stale Dashboards

def get_stale_dashboards(sdk, cutoff_date):
    """Find dashboards not viewed since cutoff_date."""
    all_dashboards = sdk.all_dashboards(fields="id,title,folder,user_id,created_at,view_count")

    stale = []
    for dash in all_dashboards:
        # Skip LookML dashboards (managed in code)
        if dash.id and dash.id.startswith("lookml::"):
            continue

        # Check last access via content_metadata_access
        try:
            metadata = sdk.content_metadata(
                content_metadata_id=dash.content_metadata_id,
                fields="last_accessed_at"
            )
            last_accessed = metadata.last_accessed_at
        except Exception:
            last_accessed = None

        if last_accessed is None or last_accessed < cutoff_date:
            stale.append({
                "content_type": "dashboard",
                "content_id": dash.id,
                "title": dash.title,
                "folder_id": dash.folder.id if dash.folder else None,
                "creator_id": dash.user_id,
                "created_at": str(dash.created_at),
                "last_accessed_at": str(last_accessed) if last_accessed else "never",
                "view_count": dash.view_count or 0,
                "run_date": datetime.now().isoformat(),
            })

    return stale

Step 3: Find Stale Looks

def get_stale_looks(sdk, cutoff_date):
    """Find Looks not viewed since cutoff_date."""
    all_looks = sdk.all_looks(fields="id,title,folder,user_id,created_at,view_count")

    stale = []
    for look in all_looks:
        try:
            metadata = sdk.content_metadata(
                content_metadata_id=look.content_metadata_id,
                fields="last_accessed_at"
            )
            last_accessed = metadata.last_accessed_at
        except Exception:
            last_accessed = None

        if last_accessed is None or last_accessed < cutoff_date:
            stale.append({
                "content_type": "look",
                "content_id": look.id,
                "title": look.title,
                "folder_id": look.folder.id if look.folder else None,
                "creator_id": look.user_id,
                "created_at": str(look.created_at),
                "last_accessed_at": str(last_accessed) if last_accessed else "never",
                "view_count": look.view_count or 0,
                "run_date": datetime.now().isoformat(),
            })

    return stale

Step 4: Find Orphaned Content

def get_orphaned_content(sdk, stale_items):
    """Flag content created by deactivated users."""
    # Get all users and identify deactivated ones
    all_users = sdk.all_users(fields="id,is_disabled")
    disabled_user_ids = {u.id for u in all_users if u.is_disabled}

    for item in stale_items:
        item["creator_disabled"] = item["creator_id"] in disabled_user_ids

    return stale_items

Step 5: Log to BigQuery

def log_to_bigquery(items, project_id, dataset, table):
    """Write audit log to BigQuery."""
    client = bigquery.Client(project=project_id)
    table_ref = f"{project_id}.{dataset}.{table}"

    # Create table if it doesn't exist
    schema = [
        bigquery.SchemaField("content_type", "STRING"),
        bigquery.SchemaField("content_id", "STRING"),
        bigquery.SchemaField("title", "STRING"),
        bigquery.SchemaField("folder_id", "STRING"),
        bigquery.SchemaField("creator_id", "STRING"),
        bigquery.SchemaField("created_at", "STRING"),
        bigquery.SchemaField("last_accessed_at", "STRING"),
        bigquery.SchemaField("view_count", "INTEGER"),
        bigquery.SchemaField("creator_disabled", "BOOLEAN"),
        bigquery.SchemaField("action_taken", "STRING"),
        bigquery.SchemaField("run_date", "TIMESTAMP"),
    ]

    job_config = bigquery.LoadJobConfig(
        schema=schema,
        write_disposition="WRITE_APPEND",
    )

    job = client.load_table_from_json(items, table_ref, job_config=job_config)
    job.result()

    print(f"Logged {len(items)} items to {table_ref}")

Step 6: Soft-Delete Stale Content

def soft_delete_content(sdk, items, trash_folder_id):
    """Move stale content to a Trash folder instead of hard-deleting."""
    for item in items:
        try:
            if item["content_type"] == "dashboard":
                sdk.move_dashboard(item["content_id"], trash_folder_id)
            elif item["content_type"] == "look":
                sdk.move_look(item["content_id"], trash_folder_id)

            item["action_taken"] = "moved_to_trash"
        except Exception as e:
            item["action_taken"] = f"error: {str(e)}"

    return items

Important: Always soft-delete (move to a Trash folder) instead of hard-deleting. Create a "Trash — Auto-Archived" folder that admins can review. If someone complains about a missing dashboard, you can restore it. After 90 days in Trash with no complaints, hard-delete.

Step 7: Put It All Together

def run_cleanup(sdk, project_id="your-project", dataset="looker_audit",
                table="content_cleanup", trash_folder_id="your-trash-folder-id"):
    """Main cleanup pipeline."""
    print(f"Running content cleanup — stale threshold: {STALE_DAYS} days")

    # Identify stale content
    stale_dashboards = get_stale_dashboards(sdk, CUTOFF_DATE)
    stale_looks = get_stale_looks(sdk, CUTOFF_DATE)
    all_stale = stale_dashboards + stale_looks

    # Flag orphaned content
    all_stale = get_orphaned_content(sdk, all_stale)

    print(f"Found {len(stale_dashboards)} stale dashboards, {len(stale_looks)} stale Looks")

    if not all_stale:
        print("No stale content found. Done.")
        return

    # Move to trash
    all_stale = soft_delete_content(sdk, all_stale, trash_folder_id)

    # Log to BigQuery
    log_to_bigquery(all_stale, project_id, dataset, table)

    print(f"Cleanup complete. {len(all_stale)} items archived and logged.")

if __name__ == "__main__":
    sdk = looker_sdk.init40("looker.ini")
    run_cleanup(sdk)

Scheduling the Pipeline

Run this on a 90-day cadence using any scheduler:

Google Cloud (Cloud Scheduler + Cloud Run):

gcloud scheduler jobs create http looker-content-cleanup \
  --schedule="0 6 1 */3 *" \
  --uri="https://your-cloud-run-service/cleanup" \
  --http-method=POST

GitHub Actions:

on:
  schedule:
    - cron: '0 6 1 */3 *'  # First day of every 3rd month at 6am

jobs:
  cleanup:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pip install looker-sdk google-cloud-bigquery
      - run: python cleanup.py
        env:
          LOOKERSDK_BASE_URL: ${{ secrets.LOOKER_URL }}
          LOOKERSDK_CLIENT_ID: ${{ secrets.LOOKER_CLIENT_ID }}
          LOOKERSDK_CLIENT_SECRET: ${{ secrets.LOOKER_CLIENT_SECRET }}

Querying the Audit Trail

Once you have a few runs logged in BigQuery, you can answer governance questions:

-- How much stale content has been cleaned up over time?
SELECT
  DATE_TRUNC(run_date, QUARTER) AS quarter,
  content_type,
  COUNT(*) AS items_archived,
  COUNTIF(creator_disabled) AS orphaned_items
FROM `project.looker_audit.content_cleanup`
GROUP BY 1, 2
ORDER BY 1 DESC;

-- Which folders accumulate the most stale content?
SELECT
  folder_id,
  COUNT(*) AS stale_items,
  AVG(view_count) AS avg_views
FROM `project.looker_audit.content_cleanup`
WHERE run_date >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 YEAR)
GROUP BY 1
ORDER BY 2 DESC
LIMIT 20;

These queries help you identify systemic governance issues — teams that create dashboards and abandon them, folders that need cleanup policies, and patterns that suggest your content creation process needs guardrails.

Get the LookML Best Practices Guide + AI Skill

Good governance starts with clean models. Get our guide covering 6 LookML patterns that make Looker projects maintainable.


Labs4Change builds Looker governance pipelines and automation for enterprise teams. Book a free strategy call if your Looker instance needs a content governance overhaul.