Automate Looker Content Cleanup with the SDK and BigQuery
How to build an automated Looker content cleanup pipeline using the Looker SDK — identify stale dashboards, unused Looks, and orphaned content every 90 days with results stored in BigQuery.
Every Looker instance we audit has the same problem: hundreds of dashboards and Looks that nobody uses. They were created for a one-off analysis six months ago, or by someone who left the company, or as a draft that was never finished. They clutter the content browser, confuse new users, and make governance impossible.
The fix isn't a one-time cleanup. It's an automated pipeline that runs every 90 days, identifies stale content, archives or soft-deletes it, and logs everything to BigQuery for audit purposes.
Here's how to build it with the Looker SDK.
What "Stale Content" Means
Before you delete anything, define what stale means for your organization. Our default criteria:
- Not viewed in 90 days — the content exists but nobody has opened it
- Not scheduled — it's not being sent to anyone on a regular basis
- Created by a deactivated user — the creator no longer has access
- In a personal space of a deactivated user — orphaned content nobody can manage
These rules are configurable. Some organizations use 60 days, some use 180. The point is to codify the definition so it's not a judgment call every time.
The Architecture
┌─────────────┐ ┌──────────────┐ ┌───────────┐
│ Looker SDK │ ──► │ Python │ ──► │ BigQuery │
│ (API calls) │ │ Pipeline │ │ (Audit │
│ │ │ (identify, │ │ logs) │
│ │ │ archive, │ │ │
│ │ │ report) │ │ │
└─────────────┘ └──────────────┘ └───────────┘
│ │
▼ ▼
Soft-delete Historical audit
stale content trail in BQ
The pipeline runs on a cron schedule (Cloud Scheduler + Cloud Run, or a simple GitHub Action). Each run:
- Queries Looker's System Activity to find content usage
- Identifies stale content based on your rules
- Logs the full inventory to BigQuery (before any changes)
- Soft-deletes or moves content to a "Trash" folder
- Sends a summary report to Slack or email
Step 1: Connect to the Looker SDK
import looker_sdk
from google.cloud import bigquery
from datetime import datetime, timedelta
# Initialize the Looker SDK
sdk = looker_sdk.init40("looker.ini")
# Or use environment variables
# sdk = looker_sdk.init40()
STALE_DAYS = 90
CUTOFF_DATE = datetime.now() - timedelta(days=STALE_DAYS)
The looker.ini file contains your API credentials:
[Looker]
base_url=https://your-instance.cloud.looker.com
client_id=YOUR_CLIENT_ID
client_secret=YOUR_CLIENT_SECRET
verify_ssl=true
Step 2: Find Stale Dashboards
def get_stale_dashboards(sdk, cutoff_date):
"""Find dashboards not viewed since cutoff_date."""
all_dashboards = sdk.all_dashboards(fields="id,title,folder,user_id,created_at,view_count")
stale = []
for dash in all_dashboards:
# Skip LookML dashboards (managed in code)
if dash.id and dash.id.startswith("lookml::"):
continue
# Check last access via content_metadata_access
try:
metadata = sdk.content_metadata(
content_metadata_id=dash.content_metadata_id,
fields="last_accessed_at"
)
last_accessed = metadata.last_accessed_at
except Exception:
last_accessed = None
if last_accessed is None or last_accessed < cutoff_date:
stale.append({
"content_type": "dashboard",
"content_id": dash.id,
"title": dash.title,
"folder_id": dash.folder.id if dash.folder else None,
"creator_id": dash.user_id,
"created_at": str(dash.created_at),
"last_accessed_at": str(last_accessed) if last_accessed else "never",
"view_count": dash.view_count or 0,
"run_date": datetime.now().isoformat(),
})
return stale
Step 3: Find Stale Looks
def get_stale_looks(sdk, cutoff_date):
"""Find Looks not viewed since cutoff_date."""
all_looks = sdk.all_looks(fields="id,title,folder,user_id,created_at,view_count")
stale = []
for look in all_looks:
try:
metadata = sdk.content_metadata(
content_metadata_id=look.content_metadata_id,
fields="last_accessed_at"
)
last_accessed = metadata.last_accessed_at
except Exception:
last_accessed = None
if last_accessed is None or last_accessed < cutoff_date:
stale.append({
"content_type": "look",
"content_id": look.id,
"title": look.title,
"folder_id": look.folder.id if look.folder else None,
"creator_id": look.user_id,
"created_at": str(look.created_at),
"last_accessed_at": str(last_accessed) if last_accessed else "never",
"view_count": look.view_count or 0,
"run_date": datetime.now().isoformat(),
})
return stale
Step 4: Find Orphaned Content
def get_orphaned_content(sdk, stale_items):
"""Flag content created by deactivated users."""
# Get all users and identify deactivated ones
all_users = sdk.all_users(fields="id,is_disabled")
disabled_user_ids = {u.id for u in all_users if u.is_disabled}
for item in stale_items:
item["creator_disabled"] = item["creator_id"] in disabled_user_ids
return stale_items
Step 5: Log to BigQuery
def log_to_bigquery(items, project_id, dataset, table):
"""Write audit log to BigQuery."""
client = bigquery.Client(project=project_id)
table_ref = f"{project_id}.{dataset}.{table}"
# Create table if it doesn't exist
schema = [
bigquery.SchemaField("content_type", "STRING"),
bigquery.SchemaField("content_id", "STRING"),
bigquery.SchemaField("title", "STRING"),
bigquery.SchemaField("folder_id", "STRING"),
bigquery.SchemaField("creator_id", "STRING"),
bigquery.SchemaField("created_at", "STRING"),
bigquery.SchemaField("last_accessed_at", "STRING"),
bigquery.SchemaField("view_count", "INTEGER"),
bigquery.SchemaField("creator_disabled", "BOOLEAN"),
bigquery.SchemaField("action_taken", "STRING"),
bigquery.SchemaField("run_date", "TIMESTAMP"),
]
job_config = bigquery.LoadJobConfig(
schema=schema,
write_disposition="WRITE_APPEND",
)
job = client.load_table_from_json(items, table_ref, job_config=job_config)
job.result()
print(f"Logged {len(items)} items to {table_ref}")
Step 6: Soft-Delete Stale Content
def soft_delete_content(sdk, items, trash_folder_id):
"""Move stale content to a Trash folder instead of hard-deleting."""
for item in items:
try:
if item["content_type"] == "dashboard":
sdk.move_dashboard(item["content_id"], trash_folder_id)
elif item["content_type"] == "look":
sdk.move_look(item["content_id"], trash_folder_id)
item["action_taken"] = "moved_to_trash"
except Exception as e:
item["action_taken"] = f"error: {str(e)}"
return items
Important: Always soft-delete (move to a Trash folder) instead of hard-deleting. Create a "Trash — Auto-Archived" folder that admins can review. If someone complains about a missing dashboard, you can restore it. After 90 days in Trash with no complaints, hard-delete.
Step 7: Put It All Together
def run_cleanup(sdk, project_id="your-project", dataset="looker_audit",
table="content_cleanup", trash_folder_id="your-trash-folder-id"):
"""Main cleanup pipeline."""
print(f"Running content cleanup — stale threshold: {STALE_DAYS} days")
# Identify stale content
stale_dashboards = get_stale_dashboards(sdk, CUTOFF_DATE)
stale_looks = get_stale_looks(sdk, CUTOFF_DATE)
all_stale = stale_dashboards + stale_looks
# Flag orphaned content
all_stale = get_orphaned_content(sdk, all_stale)
print(f"Found {len(stale_dashboards)} stale dashboards, {len(stale_looks)} stale Looks")
if not all_stale:
print("No stale content found. Done.")
return
# Move to trash
all_stale = soft_delete_content(sdk, all_stale, trash_folder_id)
# Log to BigQuery
log_to_bigquery(all_stale, project_id, dataset, table)
print(f"Cleanup complete. {len(all_stale)} items archived and logged.")
if __name__ == "__main__":
sdk = looker_sdk.init40("looker.ini")
run_cleanup(sdk)
Scheduling the Pipeline
Run this on a 90-day cadence using any scheduler:
Google Cloud (Cloud Scheduler + Cloud Run):
gcloud scheduler jobs create http looker-content-cleanup \
--schedule="0 6 1 */3 *" \
--uri="https://your-cloud-run-service/cleanup" \
--http-method=POST
GitHub Actions:
on:
schedule:
- cron: '0 6 1 */3 *' # First day of every 3rd month at 6am
jobs:
cleanup:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: pip install looker-sdk google-cloud-bigquery
- run: python cleanup.py
env:
LOOKERSDK_BASE_URL: ${{ secrets.LOOKER_URL }}
LOOKERSDK_CLIENT_ID: ${{ secrets.LOOKER_CLIENT_ID }}
LOOKERSDK_CLIENT_SECRET: ${{ secrets.LOOKER_CLIENT_SECRET }}
Querying the Audit Trail
Once you have a few runs logged in BigQuery, you can answer governance questions:
-- How much stale content has been cleaned up over time?
SELECT
DATE_TRUNC(run_date, QUARTER) AS quarter,
content_type,
COUNT(*) AS items_archived,
COUNTIF(creator_disabled) AS orphaned_items
FROM `project.looker_audit.content_cleanup`
GROUP BY 1, 2
ORDER BY 1 DESC;
-- Which folders accumulate the most stale content?
SELECT
folder_id,
COUNT(*) AS stale_items,
AVG(view_count) AS avg_views
FROM `project.looker_audit.content_cleanup`
WHERE run_date >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 YEAR)
GROUP BY 1
ORDER BY 2 DESC
LIMIT 20;
These queries help you identify systemic governance issues — teams that create dashboards and abandon them, folders that need cleanup policies, and patterns that suggest your content creation process needs guardrails.
Get the LookML Best Practices Guide + AI Skill
Good governance starts with clean models. Get our guide covering 6 LookML patterns that make Looker projects maintainable.
Labs4Change builds Looker governance pipelines and automation for enterprise teams. Book a free strategy call if your Looker instance needs a content governance overhaul.