Part 8: Databricks Pipeline & Dashboard

Published: (January 2, 2026 at 06:00 AM EST)
1 min read
Source: Dev.to

Source: Dev.to

Pipeline creation

Databricks workflow is created with each task handling a part discussed in this blog series. The entire pipeline is orchestrated to stream and process data incrementally.

  • Bronze ingestion
  • ZIP dimension build
  • Silver enrichment
  • Gold aggregation (both tables)

Dependencies enforce order automatically. You can also schedule the pipeline with simple cron expressions.

Pipeline diagram

Workflow overview

Dashboard creation

Queries on the Gold tables feed data to Databricks dashboards. In the Databricks workflow, create your own dashboard and add custom queries to provide visual representations of business insights.

Dashboard UI

Example query

To get the peak hours, add the following query as a data source (SQL) and create a tile in the dashboard:

SELECT
  trip_hour,
  SUM(total_trips) AS trips
FROM nyc_taxi.gold.taxi_trip_metrics
GROUP BY trip_hour;

The result appears as:

Peak hours result

You can keep adding tiles to enrich your dashboard. Dashboards update automatically when:

  • New files arrive
  • Jobs rerun
  • Late data is processed (within watermark)

To simulate new data arrival, add extra data to the DBFS input file source.

Simulating new data

You can experiment with tpep_pickup_datetime to see watermarks handling late data in action.

Reprocessing strategy

If you need to reset the state and reprocess everything:

  1. Drop tables or schema.
  2. Delete checkpoints.
  3. Rerun the workflow.

The source code is available in the GitHub repository for reference.

Back to Blog

Related posts

Read more »