Metric Decomposition Engine

An automated system that investigates why a metric changed — breaking down the movement across dimensions, ranking root causes by impact, and generating plain-English incident reports.

This solves the most common ad-hoc request product analysts get: "DAU dropped 12% on Monday — why?"

Instead of spending hours slicing data manually, this engine runs the investigation automatically and outputs a stakeholder-ready report in seconds.

Live Demo

→ Open Metric Change Investigator

No setup needed. Load sample GitHub Archive data and run a decomposition in 30 seconds.

What It Does

Takes any event-based metric (e.g., PushEvents, PullRequestEvents) across two time periods
Decomposes the change across multiple dimensions (actor, repository, organization)
Ranks each segment's contribution to the total change
Generates a plain-English incident report with executive summary, dimensional breakdowns, root cause hypothesis, and recommended next steps
Provides an interactive Streamlit dashboard for running decompositions on demand

Example Output

METRIC CHANGE INCIDENT REPORT

EXECUTIVE SUMMARY Count of PushEvent decreased by 4.2% (382,104 → 366,058, change of -16,046) between 2024-01-15 and 2024-01-22.

BREAKDOWN BY: repo_name The largest contributor was 'example/repo' with a decrease of 1,203 events (7.5% of total change).

ROOT CAUSE HYPOTHESIS The primary driver of this metric change appears to be 'example/repo' (dimension: repo_name), accounting for 7.5% of the total change.

(Numbers above are illustrative. Actual output depends on the data loaded.)

Tech Stack

Python — core decomposition logic, data pipeline, report generation
PostgreSQL — event storage and dimensional queries
pandas — data manipulation and aggregation
SQLAlchemy — database connectivity
Streamlit — interactive dashboard UI
Jinja2 — report templating

Project Structure

metric-decomposition-engine/ ├── data/ │ └── raw/ # GitHub Archive .json.gz files (gitignored) ├── notebooks/ # Exploratory analysis ├── src/ │ ├── load_gharchive.py # Data pipeline: GH Archive → PostgreSQL │ ├── decompose.py # Core decomposition engine │ ├── report_generator.py # Plain-English report builder │ └── app.py # Streamlit dashboard ├── reports/ # Generated incident reports ├── tests/ ├── .env # Database config (gitignored) ├── .gitignore ├── requirements.txt └── README.md

Setup

Prerequisites

Python 3.9+
PostgreSQL
Git

Installation

Clone the repository:

   git clone https://github.com/SomeshZanwar/metric-decomposition-engine.git
   cd metric-decomposition-engine

Create and activate a virtual environment:

   python -m venv venv
   venv\Scripts\activate        # Windows
   source venv/bin/activate     # Mac/Linux

Install dependencies:

   pip install -r requirements.txt

Create a PostgreSQL database:

   CREATE DATABASE metric_decomposition;

Create a .env file in the project root: DB_HOST=localhost DB_PORT=5432 DB_NAME=metric_decomposition DB_USER=postgres DB_PASSWORD=your_password
Download GitHub Archive data:

   cd data/raw
   curl -L -O https://data.gharchive.org/2024-01-15-0.json.gz
   curl -L -O https://data.gharchive.org/2024-01-22-0.json.gz
   cd ../..

Load data into PostgreSQL:

   python src/load_gharchive.py

Usage

Run decomposition from command line:

python src/decompose.py

Generate an incident report:

python src/report_generator.py

Launch the interactive dashboard:

cd src
streamlit run app.py

How the Decomposition Works

The engine uses a dimensional drilldown approach:

Baseline vs Comparison: Counts the target metric for each time period
Dimensional Split: Groups the metric by each dimension (actor, repo, org)
Segment Change: Calculates the absolute change per segment between periods
Contribution Ranking: Expresses each segment's change as a percentage of the total change
Root Cause Hypothesis: Identifies the dimension and segment with the highest absolute contribution

This mirrors how internal analytics tools at companies like Meta, Uber, and Airbnb investigate metric movements — except packaged as a reusable, open-source system.

Dataset

Uses GitHub Archive — a public dataset that records every event on public GitHub repositories (pushes, pull requests, issues, forks, stars) with hourly granularity since 2011.

This dataset was chosen over synthetic data because:

Real-world scale and messiness
Known events (GitHub outages, feature launches) serve as natural test cases for the decomposition
Multiple natural dimensions for drilldown (user, repo, org, event type)

What I Learned

Metric decomposition is conceptually simple but operationally tricky — edge cases like segments that appear in only one period, or dimensions with high cardinality, required careful handling
The report generator taught me that the hardest part of analytics is not the calculation — it is translating the result into something a non-technical stakeholder can act on
Building the Streamlit layer showed how much difference interactivity makes when the same analysis exists as a script vs. a tool someone can actually use

Future Improvements

Add time-series decomposition (not just two-point comparison)
Support for custom SQL-defined metrics beyond event counts
Statistical significance testing for segment-level changes
Automated anomaly detection to trigger decompositions without manual date selection
OpenAI API integration for more nuanced natural-language summaries

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
reports		reports
sample_data		sample_data
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py
streamlit_app.py		streamlit_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Metric Decomposition Engine

Live Demo

What It Does

Example Output

Tech Stack

Project Structure

Setup

Prerequisites

Installation

Usage

Run decomposition from command line:

Generate an incident report:

Launch the interactive dashboard:

How the Decomposition Works

Dataset

What I Learned

Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Metric Decomposition Engine

Live Demo

What It Does

Example Output

Tech Stack

Project Structure

Setup

Prerequisites

Installation

Usage

Run decomposition from command line:

Generate an incident report:

Launch the interactive dashboard:

How the Decomposition Works

Dataset

What I Learned

Future Improvements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages