Skip to content

Dimensional Hierarchies #1562

@shangyian

Description

@shangyian

Add support for dimensional hierarchies to enable drill-down/roll-up operations and better aggregate reuse. This feature would just be a thin metadata layer on top of DJ's dimension nodes, but would formally encode whether a set of dimension nodes have a hierarchical relation.

Problem

Currently, DJ users work with flat dimensions and must manually specify aggregation levels in their queries. Some implicit hierarchies can be built with dimension links, but there is no formal way to express that country > state > city represents a hierarchy. This means that DJ can't leverage hierarchical relationships to optimize drill-down or roll-up operations (for example, we can't easily reuse a pre-aggregated materialization with daily data for monthly or quarterly rollups).

Solution

Add a hierarchy metadata layer, where hierarchy definitions are a new entity type that references existing dimension nodes and formalizes their hierarchical structure.

Database Schema

CREATE TABLE hierarchies (
    id BIGINT PRIMARY KEY,
    name VARCHAR UNIQUE NOT NULL,
    description TEXT
    -- No dimension reference needed - levels handle it
);

CREATE TABLE hierarchy_levels (
    id BIGINT PRIMARY KEY,
    hierarchy_id BIGINT REFERENCES hierarchies(id),
    name VARCHAR NOT NULL,
    dimension_node_id BIGINT REFERENCES nodes(id), -- Points to existing dimension node
    level_order INTEGER NOT NULL,
    grain_columns JSONB, -- For single-dimension hierarchies
    UNIQUE(hierarchy_id, name)
);

Note that grain_columns is optional, and in many cases we can leverage the existing infra around dimension links, which already define the join relationships between any two hierarchical levels. All we need to do in these cases is store the order of the levels.

Example

A geography hierarchy can be created from either a single dimension node or multiple dimension nodes, depending on how it was modeled originally.

Multi-Dimension Pattern:

hierarchy_type: multi_dimension
levels:
  - name: country
    dimension_node: default.country_dim
    # grain_columns: null (uses primary key from dimension node)
  - name: state  
    dimension_node: default.state_dim
    # grain_columns: null (uses primary key from dimension node)
  - name: city
    dimension_node: default.city_dim  
    # grain_columns: null (uses primary key from dimension node)

Single-Dimension Pattern:

hierarchy_type: single_dimension
levels:
  - name: country
    dimension_node: default.location_dim    # Same dimension node for all levels
    grain_columns: [country]                # Just country column
  - name: state
    dimension_node: default.location_dim    # Same dimension node  
    grain_columns: [country, state]         # Country + state columns
  - name: city  
    dimension_node: default.location_dim    # Same dimension node
    grain_columns: [country, state, city]   # All three columns

API Endpoints

# Hierarchy management
POST   /hierarchies/                    # Create hierarchy
GET    /hierarchies/                    # List hierarchies  
GET    /hierarchies/{name}              # Get hierarchy details
PUT    /hierarchies/{name}              # Update hierarchy
DELETE /hierarchies/{name}              # Delete hierarchy
GET    /hierarchies/{name}/levels       # Get hierarchy levels

Checklist

Core Setup

  • Database schema for hierarchies + levels
  • Basic CRUD API endpoints, which includes hierarchy validation logic for joins
  • UI configuration support
  • YAML configuration support
  • Hierarchy metadata is exposed via GraphQL

Query Generation Optimization

  • Hierarchy-aware query building (e.g., automatic rollup from finer to coarser levels)
  • Multi-hierarchy materialization strategies

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions