Looker Caching and Datagroups: Stop Hitting Your Database on Every Query
How Looker caching works, how to configure datagroups for optimal performance, and the caching patterns that align refresh with your ETL schedule.
Every time a user clicks "Run" in Looker, a SQL query goes to your database. If 50 users run the same dashboard every morning, that's 50 identical queries — unless you've set up caching properly.
Most Looker instances we audit either have no caching strategy (every query hits the database) or overly aggressive caching (users see stale data and don't trust the numbers). Here's how to get it right.
How Looker Caching Works
When a query runs, Looker stores the results in an encrypted cache file. On the next identical query — same fields, same filters, same parameters, same row limit — Looker serves the cached results instead of querying the database.
You can see this in the Explore interface:
- "Results from 2:30 PM" = cached results
- "Results from Now" = fresh query
By default, Looker invalidates the cache after 1 hour. For most production workloads, this default is wrong in both directions — too fresh for historical dashboards (unnecessary database load) and too stale for operational dashboards (users see old data).
Datagroups: The Right Way to Control Caching
Datagroups are Looker's mechanism for defining cache policies. They answer two questions:
- How long should cached results stay valid?
- What condition should trigger a cache refresh?
Basic Time-Based Datagroup
datagroup: daily_refresh {
max_cache_age: "24 hours"
sql_trigger: SELECT CURRENT_DATE() ;;
}
The max_cache_age is a safety net — cache expires after 24 hours regardless. The sql_trigger is the primary mechanism — Looker runs this query periodically, and when the result changes (a new date), all caches tied to this datagroup are invalidated.
ETL-Aligned Datagroup
datagroup: etl_refresh {
max_cache_age: "24 hours"
sql_trigger: SELECT MAX(last_updated) FROM etl_log ;;
}
This is the gold standard. Your ETL pipeline writes a timestamp when it finishes. Looker checks that timestamp. When it changes, caches refresh. Your users always see the latest data, and queries never hit the database unnecessarily between ETL runs.
Data-Based Datagroup
datagroup: orders_refresh {
max_cache_age: "4 hours"
sql_trigger: SELECT MAX(modified_at) FROM orders ;;
}
For tables that update throughout the day (streaming data, real-time order feeds), trigger cache refresh when new data arrives.
Applying Datagroups
At the Model Level
Applies to all Explores in the model:
model: ecommerce {
persist_with: daily_refresh
}
At the Explore Level
Overrides the model default for specific Explores:
explore: orders {
persist_with: orders_refresh # Refreshes more frequently than the model default
}
explore: historical_analysis {
persist_with: daily_refresh # Historical data doesn't change intra-day
}
Multiple Datagroups for Different Cadences
Most production instances need 2-3 datagroups:
# For slowly-changing data (products, customers, reference tables)
datagroup: daily_refresh {
max_cache_age: "24 hours"
sql_trigger: SELECT CURRENT_DATE() ;;
}
# For transactional data updated by ETL
datagroup: etl_refresh {
max_cache_age: "12 hours"
sql_trigger: SELECT MAX(completed_at) FROM etl_runs ;;
}
# For near-real-time data
datagroup: frequent_refresh {
max_cache_age: "1 hour"
sql_trigger: SELECT MAX(created_at) FROM events ;;
}
SQL Trigger Patterns
Time-Based Triggers
# Refresh every hour
sql_trigger: SELECT DATE_TRUNC('hour', CURRENT_TIMESTAMP) ;;
# Refresh daily at midnight
sql_trigger: SELECT CURRENT_DATE ;;
# Refresh weekly
sql_trigger: SELECT DATE_TRUNC('week', CURRENT_DATE) ;;
Data-Based Triggers
# When new records arrive
sql_trigger: SELECT MAX(updated_at) FROM orders ;;
# When row count changes (useful for append-only tables)
sql_trigger: SELECT COUNT(*) FROM daily_summary ;;
# Checksum approach (detects any change)
sql_trigger: SELECT MD5(CAST(COUNT(*) AS STRING) || CAST(MAX(updated_at) AS STRING)) FROM orders ;;
BigQuery-Specific Optimization
On BigQuery, sql_trigger queries count toward your query costs. Use lightweight queries:
# Good — scans minimal data
sql_trigger: SELECT MAX(updated_at) FROM orders ;;
# Better — uses BigQuery metadata (free)
sql_trigger: SELECT MAX(last_modified_time)
FROM `project.dataset.INFORMATION_SCHEMA.TABLE_STORAGE`
WHERE table_name = 'orders' ;;
Caching Strategy by Dashboard Type
| Dashboard Type | Recommended Cache | Why |
|---|---|---|
| Executive summary | 24 hours / ETL-aligned | Data doesn't need to be real-time; consistency matters more |
| Operational monitoring | 15 min – 1 hour | Users need near-current data for decisions |
| Historical analysis | 24-48 hours | Data never changes; no reason to re-query |
| Ad-hoc exploration | 1 hour (default) | Users expect fresh data when actively exploring |
| Embedded analytics | ETL-aligned | Customer-facing dashboards must show latest data post-refresh |
Common Caching Mistakes
1. No Datagroups At All
The default 1-hour cache means every dashboard refreshes hourly regardless of whether the underlying data changed. For a BigQuery-backed instance with 50 dashboards, this can cost thousands per month in unnecessary query costs.
2. Cache Too Short
Setting max_cache_age: "5 minutes" on a dashboard that shows monthly revenue means the same expensive query runs 12 times per hour per user. The monthly revenue number won't change — you're just burning money.
3. Cache Too Long
Setting max_cache_age: "48 hours" on an operational dashboard means the morning shift sees yesterday's data. They lose trust in the numbers and start running their own queries in BigQuery directly — defeating the purpose of Looker.
4. Not Aligning With ETL
If your ETL finishes at 6 AM and your cache refreshes at 7 AM, there's a one-hour window where users see stale data. Use an ETL-aligned datagroup to eliminate this gap.
5. Expensive sql_trigger Queries
The trigger query runs every 5 minutes by default (configurable in admin). A SELECT COUNT(*) FROM billion_row_table as your trigger will cost more than the cached queries it's trying to save.
Monitoring Cache Effectiveness
In Looker's Admin panel:
- System Activity > Query Performance — check cache hit rates
- System Activity > Data Connections — monitor database query volume
- i button on dashboard tiles — shows whether results are cached
If your cache hit rate is below 50%, your datagroup strategy needs work. Well-configured instances typically see 70-90% cache hit rates.
Key Takeaways
- Always configure datagroups. The default 1-hour cache is wrong for every workload.
- Align with your ETL. The best trigger is
SELECT MAX(completed_at) FROM etl_runs. - Use 2-3 datagroups with different cadences for different data freshness requirements.
- Keep trigger queries cheap — especially on BigQuery where they count toward costs.
- Monitor cache hit rates and adjust based on actual usage patterns.
Get the LookML Best Practices Guide + AI Skill
Our guide covers caching, datagroups, project structure, permissions, and more — the patterns we've refined across 200+ Looker implementations.
Labs4Change has trained 16,000+ analytics engineers on Looker. Book a free performance review — we'll check your caching strategy, query performance, and datagroup configuration.