About · 25-second overview
Video transcript
- At 0:00: Every morning, grocery stores face the same question: what do we reorder, what do we discount, what do we promote?
- At 0:04: Stockouts lose sales. Overstock turns into waste. Both quietly eat margin.
- At 0:09: Grocery Data Intelligence ingests raw point-of-sale and inventory data into a Bronze → Silver → Gold lakehouse on Databricks Free Edition.
- At 0:14: A Gold risk view scores every product in every store and ranks the actions that matter today.
- At 0:19: Store managers open one dashboard, see the day's plan, and act in minutes — not hours.
- At 0:23: Smarter inventory. Less waste. Happier shelves.
Audio companion — full walkthrough
For when you'd rather listen than read — what this app does, how it's built, and how a store uses it every morning.
0:00/6:59
voice enhanced by ai
A grocery data product
built for DAIS 2026
Grocery Data Intelligence is a Databricks Free Edition project, submitted to the DAIS 2026 Community Virtual Contest. It turns raw store data into a daily plan — what to reorder, what to discount, what to promote — for every store, every morning.
DAIS 2026 Community Virtual Contest entryBuilt by Brahmareddy · Databricks Community Esteemed Contributor·k.b.r••••12@gmail.com
The story — why this exists
Most grocery analytics tools stop at the dashboard. The store manager still has to translate charts into a morning routine: what to pull, what to mark down, what to push. That gap is where shrink and lost sales actually happen.
Grocery Data Intelligence closes that gap. It models the same lakehouse data every chain already has — POS, inventory, expiry, sales velocity — through a clean Bronze → Silver → Gold pipeline, and surfaces the Gold layer as a daily inventory story and a per-product action list.
It was built as a single-builder experiment on Databricks Free Edition for the DAIS 2026 Community Virtual Contest — to prove that an entire decision-grade data product can ship on the Lakehouse alone, without a separate analytics stack.
Medallion architecture on Databricks Free Edition
Every screen in this app is powered by a named Delta object in a single schema: workspace.grocery_data_intelligence.*. The data flows Bronze → Silver → Gold; the app only ever reads Gold.
┌──────────────────────────────────────────────────────────┐
│ BRONZE — raw, append-only │
│ bronze_sales_raw 5,150 rows +150/day │
└──────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────┐
│ SILVER — cleaned + validated masters │
│ silver_stores 10 silver_products 15 │
│ silver_sales 5,150 silver_inventory 150 │
└──────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────┐
│ GOLD — business metrics the app reads │
│ gold_inventory_risk_daily 150 rows / day │
│ + 7 business-logic views (KPI, risk, story, actions) │
└──────────────────────────────────────────────────────────┘gold_inventory_risk_daily
Per-store, per-product daily risk + recommended action (150 rows/day)
vw_dashboard_kpi_summary
Chain-wide KPI tiles on the Dashboard
vw_top_stockout_risk_products
Lowest days of supply, ranked
vw_top_waste_risk_products
Closest to expiry, ranked
vw_store_risk_summary
Per-store attention score
vw_category_risk_summary
Stockout vs waste by category
vw_today_inventory_story
Daily narrative for the Story page
gold_product_action_recommendations
Per-product recommended action (drives the Actions page)
Delta Lake gold views
Business logic lives in SQL, not in the app.
SQL Statement API
Server functions call Databricks directly — no extra warehouse.
SWR + cache-warm
Stale-while-revalidate cache keeps pages sub-second.
Risk model + recommended actions
Every recommendation in the app comes from these deterministic rules in the Gold view — auditable, explainable, and identical across the Dashboard, Actions, and Story pages.
Stockout risk
- High · < 2 days of stock remaining
- Medium · 2 – 5 days remaining
- Low · > 5 days remaining
Waste risk (perishables)
- High · ≤ 1 day until expiry
- Medium · 2 – 3 days until expiry
- Low · > 3 days until expiry
- N/A · non-perishable items
Recommended actions
Reorder Now — stockout imminent
Discount or Donate — expiry within 1 day
Promote Soon — expiry within 2 – 3 days
Monitor Closely — medium stockout risk
No Action Needed — optimal inventory
The daily 6 AM Workflow
One scheduled Databricks Workflow refreshes the entire pipeline every morning before stores open.
Schedule
Daily at 06:00 America/Chicago
cron: 0 0 6 * * ? *
Compute
Serverless Starter Warehouse · PRO 2X-Small · Photon enabled · 10-min auto-stop
Execution flow
Bronze append (+150 sales) → Silver clean & validate → Gold risk + actions → Pipeline summary
Data growth — 90-day projection
The dataset is live and growing. Each morning the Workflow appends another day of sales and recomputes risk for every store-product pair.
Day Bronze Silver Sales Gold Risk ───── ────── ──────────── ────────── Day 1 5,150 5,150 150 Day 30 9,650 9,650 4,500 Day 60 14,150 14,150 9,000 Day 90 18,650 18,650 13,500
Build this yourself on Databricks Free Edition
If you are a data engineer exploring the Lakehouse, you can stand up the same project in an afternoon. Everything below runs on Databricks Free Edition — no extra services required.
- 1Sign up for Databricks Free Edition and start a Serverless SQL Warehouse.
- 2Create a Unity Catalog schema:
CREATE SCHEMA workspace.grocery_data_intelligence. - 3Land raw POS into a Bronze Delta table (
bronze_sales_raw) — append-only, one row per transaction. - 4Clean into Silver:
silver_stores,silver_products,silver_sales,silver_inventorywith ETL timestamps. - 5Compute the Gold risk + recommended action in one SQL statement (sample below). Wrap derived metrics in views so the app stays thin.
- 6Schedule a daily Workflow at 06:00, and point any front-end at the result via the Databricks SQL Statement API.
-- Gold: daily inventory risk + recommended action
CREATE OR REPLACE TABLE workspace.grocery_data_intelligence.gold_inventory_risk_daily AS
SELECT
current_date() AS report_date,
s.store_id, s.store_name, s.city, s.state,
p.product_id, p.product_name, p.category,
i.current_stock,
COALESCE(d.avg_daily_demand, 0) AS avg_daily_demand,
i.days_until_expiry,
ROUND(i.current_stock / NULLIF(d.avg_daily_demand, 0), 2) AS days_of_stock_remaining,
CASE
WHEN i.current_stock / NULLIF(d.avg_daily_demand, 0) < 2 THEN 'High'
WHEN i.current_stock / NULLIF(d.avg_daily_demand, 0) <= 5 THEN 'Medium'
ELSE 'Low'
END AS stockout_risk_level,
CASE
WHEN i.days_until_expiry IS NULL THEN 'N/A'
WHEN i.days_until_expiry <= 1 THEN 'High'
WHEN i.days_until_expiry <= 3 THEN 'Medium'
ELSE 'Low'
END AS waste_risk_level,
CASE
WHEN i.current_stock / NULLIF(d.avg_daily_demand, 0) < 2 THEN 'Reorder Now'
WHEN i.days_until_expiry <= 1 THEN 'Discount or Donate'
WHEN i.days_until_expiry <= 3 THEN 'Promote Soon'
WHEN i.current_stock / NULLIF(d.avg_daily_demand, 0) <= 5 THEN 'Monitor Closely'
ELSE 'No Action Needed'
END AS recommended_action,
current_timestamp() AS gold_created_at
FROM workspace.grocery_data_intelligence.silver_inventory i
JOIN workspace.grocery_data_intelligence.silver_stores s USING (store_id)
JOIN workspace.grocery_data_intelligence.silver_products p USING (product_id)
LEFT JOIN (
SELECT store_id, product_id, AVG(quantity_sold) AS avg_daily_demand
FROM workspace.grocery_data_intelligence.silver_sales
WHERE sale_date >= current_date() - INTERVAL 14 DAYS
GROUP BY store_id, product_id
) d USING (store_id, product_id);Everything you see in this app comes from those six steps — no extra services, no second warehouse, no separate BI stack.
Project journey
Bronze — raw POS
Landed 5,150 raw sales transactions (Jan 24 – May 17, 2026) into bronze_sales_raw on Delta. Grows by +150 rows every day.
Silver — cleaned masters
Cleaned and validated into silver_stores (10), silver_products (15), silver_sales (5,150), and silver_inventory (150) with ETL timestamps.
Gold — risk + actions
Built gold_inventory_risk_daily plus seven business-logic views — KPI summary, stockout risk, waste risk, store/category risk, product actions, and a daily inventory story.
Intent-mapped Ask Data
Natural-language questions are classified into a small set of safe business intents that resolve to Gold-view queries — no raw SQL ever crosses the boundary.
Caching & performance
Server-side stale-while-revalidate cache with in-flight request dedup, plus a token-gated cache-warm endpoint for sub-second page loads.
DAIS 2026 entry
Recorded a short demo, wrote a Community Articles post, and submitted the form between May 15–31, 2026.
See today's inventory story
Open the dashboard, or read the Methodology page to see every Gold view, field, and decision rule behind this submission.