Labs4Change

dbt + BigQuery + Looker: Building Your Modern BI Stack in 2026

How to architect a modern BI stack with dbt, BigQuery, and Looker — layer separation, transformation best practices, LookML patterns, and the role of AI in the modern analytics pipeline.

The dbt + BigQuery + Looker stack is the most battle-tested modern BI architecture in production today. We've implemented it at dozens of organizations and trained thousands of engineers on each component. Here's how to architect it correctly — including the decisions that are easy to get wrong.


The Architecture

Data Sources          Ingestion       Warehouse        Transformation      BI Layer
┌──────────┐        ┌─────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐
│ Salesforce│───┐    │ Fivetran│     │          │     │          │     │          │
│ Stripe   │───┤────│ Stitch  │────►│ BigQuery │────►│   dbt    │────►│  Looker  │
│ Postgres │───┤    │ Custom  │     │          │     │          │     │          │
│ APIs     │───┘    └─────────┘     └──────────┘     └──────────┘     └──────────┘

Each tool has a clear, non-overlapping responsibility:

  • BigQuery stores the data. It's your warehouse — column-oriented, serverless, scales to petabytes.
  • dbt transforms the data. All business logic — joins, calculations, deduplication, type casting — lives here.
  • Looker presents the data. The semantic layer, dashboards, and user-facing exploration.

The principle: push logic upstream. Complex transforms belong in dbt, not in Looker's LookML. Looker handles the last mile — labels, descriptions, simple aggregations, and user experience.

Layer 1: BigQuery as the Warehouse

Why BigQuery

  • Serverless — no cluster sizing, no maintenance windows, no knobs to turn
  • Separation of storage and compute — pay for what you query, not for idle capacity
  • Native integration with Google Cloud services (Cloud Functions, Pub/Sub, Vertex AI)
  • Looker optimization — Looker was built for BigQuery before Google acquired either product

Key Configuration

Partitioning — partition large tables by date to reduce query costs and improve performance:

CREATE TABLE `project.dataset.orders`
PARTITION BY DATE(created_at)
CLUSTER BY customer_id, status
AS SELECT * FROM staging.raw_orders;

Clustering — cluster on high-cardinality filter columns. If users frequently filter by customer_id and status, cluster on those columns.

Materialization strategy — this is where most teams make mistakes:

Table SizeUpdate Frequencydbt Materialization
< 1M rowsAnyview (no storage cost)
1M-100M rowsDailytable (full refresh)
100M+ rowsDailyincremental (append/merge)
AnyReal-time neededview with BigQuery BI Engine

Don't default to table materialization for everything. Views are free and fast enough for most dimensions. Save incremental for genuinely large fact tables.

Layer 2: dbt as the Transformation Layer

Project Structure

models/
├── staging/           # 1:1 with source tables
│   ├── stg_orders.sql
│   ├── stg_customers.sql
│   └── stg_payments.sql
├── intermediate/      # Business logic joins
│   ├── int_orders_enriched.sql
│   └── int_customer_metrics.sql
└── marts/             # Looker-ready outputs
    ├── fct_orders.sql
    ├── dim_customers.sql
    └── dim_products.sql

Staging — one model per source table. Rename columns, cast types, deduplicate. No joins, no business logic.

Intermediate — join staging models, apply business logic, calculate derived metrics. These are the models where your CASE statements, window functions, and complex calculations live.

Marts — the final outputs that Looker consumes. Fact tables (events, transactions) and dimension tables (entities, attributes). These should be clean, well-documented, and stable enough that downstream consumers can depend on them.

What Belongs in dbt (Not Looker)

This is the most common source of tech debt in this stack. Teams put logic in Looker because it's "faster" and then can't maintain it.

Always dbt:

  • Table joins and relationships
  • Data cleaning and deduplication
  • Type casting and standardization
  • Complex CASE statements
  • Window functions (LAG, LEAD, ROW_NUMBER)
  • Historical data transformations
  • Business rule calculations
  • Data quality tests

Always Looker:

  • Field labels and descriptions
  • Simple aggregations (SUM, COUNT, AVG)
  • Drill fields and links
  • Dashboard layouts and visualizations
  • User-facing formatting
  • Row-level security

The gray zone:

  • Derived tables — use dbt models instead of Looker PDTs whenever possible. dbt models are version-controlled, tested, documented, and reusable across BI tools. Looker PDTs are opaque, hard to test, and locked to Looker.
  • Calculated dimensions — if a calculation is simple (e.g., CONCAT(first_name, ' ', last_name)), it can live in Looker. If it involves CASE logic with multiple conditions, put it in dbt.

Testing and Documentation

dbt tests are the quality gate between your data and your users:

models:
  - name: fct_orders
    description: "Order-level fact table with payment and customer context"
    columns:
      - name: order_id
        description: "Unique order identifier"
        tests:
          - unique
          - not_null
      - name: customer_id
        description: "FK to dim_customers"
        tests:
          - not_null
          - relationships:
              to: ref('dim_customers')
              field: customer_id
      - name: total_amount
        description: "Order total in USD after discounts"
        tests:
          - not_null
          - dbt_utils.accepted_range:
              min_value: 0
              max_value: 1000000

Every mart model should have:

  • unique and not_null tests on the primary key
  • relationships tests on foreign keys
  • accepted_range tests on financial/metric fields
  • A description on every column (these flow through to BigQuery and can be imported into Looker)

Layer 3: Looker as the Semantic Layer

LookML Model Architecture

Your LookML should be a thin presentation layer on top of dbt marts:

# model file
connection: "bigquery_production"
include: "/views/**/*.view.lkml"
include: "/explores/**/*.explore.lkml"
# views/_autogen/fct_orders.view.lkml
view: fct_orders {
  sql_table_name: `project.analytics.fct_orders` ;;

  dimension: order_id {
    primary_key: yes
    type: number
    sql: ${TABLE}.order_id ;;
  }

  dimension: total_amount {
    type: number
    sql: ${TABLE}.total_amount ;;
  }

  dimension_group: created {
    type: time
    timeframes: [raw, date, week, month, quarter, year]
    sql: ${TABLE}.created_at ;;
  }
}
# views/_refined/fct_orders_r.view.lkml
view: +fct_orders {
  dimension: order_id {
    label: "Order ID"
    description: "Unique order identifier"
  }

  measure: total_revenue {
    type: sum
    sql: ${total_amount} ;;
    value_format_name: usd
    label: "Total Revenue"
    description: "Sum of order amounts in USD"
  }

  measure: order_count {
    type: count
    label: "Order Count"
    drill_fields: [order_id, created_date, total_amount]
  }
}

Key pattern: auto-generated views contain only sql_table_name and basic dimension definitions (matching the database schema). Refined views add labels, descriptions, measures, and formatting. This separation means you can regenerate auto-generated views without losing business logic.

Explore Design

Explores should be defined in their own files, not inside views:

# explores/order_explores.explore.lkml
explore: orders {
  label: "Customer Orders"
  description: "Analyze order patterns, revenue, and customer behavior.
    Key filters: order date, status, region."

  from: fct_orders

  join: dim_customers {
    type: left_outer
    relationship: many_to_one
    sql_on: ${fct_orders.customer_id} = ${dim_customers.customer_id} ;;
  }

  join: dim_products {
    type: left_outer
    relationship: many_to_one
    sql_on: ${fct_orders.product_id} = ${dim_products.product_id} ;;
  }
}

The AI Layer (2026 Addition)

This stack has a new participant in 2026: AI.

AI in dbt

  • Code generation — AI assistants (Claude Code, Cursor, Copilot) generate SQL models, tests, and documentation from natural language descriptions
  • Data quality monitoring — AI-powered anomaly detection on dbt test results
  • Documentation — AI generates column descriptions, model descriptions, and data dictionaries from SQL and existing documentation

AI in BigQuery

  • BigQuery ML — Train ML models directly in SQL, deploy them as functions
  • Vertex AI integration — Call foundation models from SQL for classification, extraction, and summarization
  • Natural language queries — Duet AI in BigQuery translates questions to SQL

AI in Looker

  • Gemini in Looker — Natural language queries through the semantic layer
  • AI-assisted modeling — Suggestions for LookML improvements
  • The semantic layer advantage — AI queries go through governed definitions, not raw SQL. This means answers are consistent with your business metrics.

The dbt + BigQuery + Looker stack is uniquely positioned for AI because the semantic layer (Looker/LookML) provides the governance that prevents AI from hallucinating metrics. When a user asks "what's our revenue this quarter?", the AI queries through the total_revenue measure definition — not by guessing which column to sum.

When to Consider Omni Instead

We'd be remiss not to mention: for new implementations, Omni Analytics is worth evaluating as an alternative to Looker in this stack. It uses YAML instead of LookML, has native dbt integration (no manual sync), and was built by former Looker leadership who understood what needed to improve.

The dbt + BigQuery part stays the same. Only the BI layer changes. And if you're already on Looker, we have migration tooling that makes the switch practical.

Get the LookML Best Practices Guide + AI Skill

The complete guide to LookML patterns that scale — auto-generated views, refinements, derived tables, and more. Plus an AI agent skill for your coding assistant.


Labs4Change has trained 16,000+ analytics engineers on this stack and implements it for enterprise teams. Book a free strategy call to discuss your data architecture.