Open Notebook Case Study: Recreating the Bills vs. Broncos 10,000-Simulation Forecast
Open notebook case study reproduces a 10,000-simulation Bills vs Broncos forecast—complete with data pipeline, code, and teaching exercises.
Hook: Why an open notebook of a 10,000-simulation forecast matters to students, teachers and reproducible research
Paywalled models, buried code and undocumented assumptions make sports forecasts—like the Bills vs Broncos 10,000-simulation forecast—hard to trust or reuse in the classroom. This case study publishes an open notebook that documents data sources, code, assumptions and a step-by-step reproduction of a 10,000-simulation NFL game forecast so instructors can teach reproducibility, and learners can practice building robust data pipelines and simulation models.
Executive summary: What you’ll get from this open notebook
In this article (and the accompanying open notebook) you will find:
- A transparent data pipeline for gathering team and situational data for Buffalo and Denver leading up to the 2026 divisional game.
- Clean, documented code (Python + Jupyter/Quarto) to compute team ratings, expected scores, and to run 10,000 Monte Carlo simulations.
- Assumptions, hyperparameters and reproducibility controls (random seeds, environment files, and a Dockerfile) so instructors can recreate results exactly.
- Pedagogical notes: discussion prompts, grading rubrics and incremental exercises to use the notebook as a teaching resource.
Context and relevance in 2026: why open notebooks are mainstream teaching tools
By 2026, open computational notebooks (Jupyter, Quarto, Observable) and reproducible workflows (DVC, GitHub Actions, Binder/Repo2Docker) are standard in data science curricula. Universities increasingly require published code and data for capstone projects. The sports analytics community—with public-facing forecasts like the SportsLine 10,000-simulation pieces—now expects reproducible artifacts that show how odds and win probabilities were produced.
Prerequisites: what learners need to reproduce the forecast
- Basic Python (pandas, numpy), familiarity with Jupyter/Quarto notebooks.
- Command-line Git and a free GitHub account to clone the repository.
- Optional: Docker or access to Binder/Colab for immediate execution without local installs.
- Time: about 90–180 minutes to run and inspect the full pipeline (less for guided labs).
Data sources and licensing: full provenance
Transparency starts with clear provenance. The open notebook documents the exact sources, query times and file checksums. Example sources used in the notebook include:
- Play-by-play and drive-level data: nfl_data (nflfastR style) snapshots pulled on 2026-01-16 (SHA256 stored).
- Team-level boxscore metrics and advanced stats: public Pro Football Reference scrapes with the date and the scraping script archived in the repo.
- Injury and roster status: publicly available team injury reports and the official NFL gameday injury list (timestamped JSON).
- Weather and stadium factors: NOAA historical weather API (query parameters recorded).
Each dataset is accompanied by a data manifest (CSV) in the notebook that lists file name, source URL, retrieval timestamp and a checksum. For any paid APIs referenced (e.g., Sportradar, Stats Perform), the notebook provides an open-data fallback and explicit instructions for how to switch to the paid API via configuration variables without changing core code.
Overview of the modeling approach: balancing pedagogy and realism
For teaching reproducibility we prioritize a transparent and modular model rather than a black-box ensemble. The notebook implements a three-stage pipeline:
- Rating construction: compute team attacking and defensive strength via an Elo-like rating augmented with recent-form decay and QB-adjustments (e.g., Josh Allen tradeoff when injured).
- Score generation: use a Poisson (or Negative Binomial) model to produce expected points for each team, with covariates for home-field, altitude (Denver), weather and injuries.
- Monte Carlo simulation: run 10,000 simulations of the game by sampling scores from the probabilistic model and applying tiebreak rules (OT rules included). Output win probabilities, distributions, and calibration diagnostics.
This structure is pedagogically useful because each block can be replaced with more complex models (drive-sim Markov chains, agent-based drive simulators) as students progress.
Reproducible environment: pinning versions and containerization
The notebook includes all steps to recreate the environment exactly:
- environment.yml (Conda) with exact package versions (pandas, numpy, scipy, scikit-learn, matplotlib, jupyterlab, pyproj).
- Dockerfile that uses the environment.yml to build a reproducible container image for CI and Binder.
- requirements.txt for pip users and an R equivalent (if using Quarto with R engines).
- GitHub Actions workflow for running a smoke test (runs a 100-simulation quick-check and verifies output checksums).
Record the following reproducibility controls in the notebook header:
# Notebook metadata (example)
NOTEBOOK_VERSION = "1.0.0"
REPO_COMMIT = "$(git rev-parse HEAD)"
RUN_TIMESTAMP = "2026-01-16T21:00:00Z"
RNG_SEED = 20260116
Step-by-step reproduction: practical guide
1. Clone repo and build environment
- git clone the repository (link shown in the notebook).
- Build Conda environment:
conda env create -f environment.ymlor start Binder session via the repo badge. - Optional: build Docker image via
docker build -t bills-broncos-sim:1.0 .
2. Run the data pipeline (ETL)
The notebook exposes a simple pipeline with idempotent steps: download -> normalize -> save. Example pseudocode (Python):
from pathlib import Path
import pandas as pd
DATA_DIR = Path("data")
DATA_DIR.mkdir(exist_ok=True)
# 1) Download play-by-play snapshot (if not present)
if not (DATA_DIR / "pbp_2025_snapshot.csv").exists():
download_pbp_snapshot(url, DATA_DIR / "pbp_2025_snapshot.csv")
# 2) Normalize columns and compute game-level aggregates
pbp = pd.read_csv(DATA_DIR / "pbp_2025_snapshot.csv")
# keep only columns we need, compute EPA, drive metrics, etc.
Key pedagogical point: the notebook includes unit tests for the ETL (e.g., assert number of games matches expectation; checksum matches stored value) and shows students how to write such tests.
3. Compute team ratings
We implement a simple Elo variant with decay and QB adjustment:
- Initialize Elo ratings at 1500 at 2024-09-01 baseline.
- For each game, update Elo with margin-of-victory scaling and apply time decay to older games.
- Apply a QB-skill multiplier: when a starter is absent, apply roster-adjustment factor derived from replacement-level estimates in the dataset.
def elo_update(r_home, r_away, margin, k=20):
expected = 1 / (1 + 10 ** ((r_away - r_home) / 400))
score = margin_to_result(margin)
r_home_new = r_home + k * (score - expected)
return r_home_new
The notebook contains test cases showing that the Elo implementation reproduces expected updates on canonical games.
4. Fit a scoring model (Poisson/Negative Binomial)
Use game-level offensive and defensive rates to parameterize Poisson means. Incorporate covariates via a GLM:
# simplified: log(mu) = intercept + beta_offense * offense_rating + beta_defense * defense_rating + beta_home*home_flag + beta_altitude*alt_flag
Parameter estimates are saved to JSON and are part of the repository so students can compare refits against the archived estimates.
5. Run 10,000 Monte Carlo simulations
Important reproducibility points:
- Set RNG seeds at the process and library level:
numpy.random.seed(RNG_SEED),random.seed(RNG_SEED). - Record the RNG algorithm and version (NumPy versions changed RNG defaults in the 2020s; the notebook pins the algorithm).
import numpy as np
np.random.seed(RNG_SEED)
N = 10000
home_scores = np.random.poisson(lam=mu_home, size=N)
away_scores = np.random.poisson(lam=mu_away, size=N)
# Apply overtime rules
wins_home = np.sum(home_scores > away_scores)
wins_away = np.sum(away_scores > home_scores)
# handle ties by simulating OT (additional loop or distribution)
p_home_win = wins_home / N
For OT, the notebook demonstrates two approaches: a quick IID extra period draw, and a drive-sim approach for higher fidelity. Both are implemented so students can compare sensitivity.
6. Post-simulation diagnostics and visualization
The notebook produces:
- Win probability (point estimate with Monte Carlo error bars).
- Histogram and kernel density estimates of simulated scores.
- Calibration checks: if we had historical matches, we show reliability diagrams (predicted vs empirical frequencies).
Plots are saved with deterministic filenames that include the repo commit and timestamp so outputs are traceable to code versions.
Common teaching exercises included in the notebook
- Exercise 1: Change the K-factor in Elo and report the effect on Buffalo’s pregame win probability.
- Exercise 2: Replace the Poisson with a Negative Binomial for overdispersion and compare the predicted spread of scores.
- Exercise 3: Remove the altitude covariate and measure how much Denver’s win probability changes—discuss confounding and causal interpretation.
- Capstone: Implement a drive-level Markov chain and evaluate whether the extra complexity improves calibration (grading rubric provided).
Reproducibility checklist — practical, actionable items
Before publishing any simulation forecast, run this checklist (all items are implemented in the notebook):
- Pin versions: Commit environment.yml/Dockerfile.
- Seed everything: Seed numpy, random, and library-specific RNGs.
- Archive inputs: Save raw data snapshots and checksums.
- Document assumptions: Explicitly list model choices, excluded covariates and injury rules.
- Provide smoke tests: Quick 100-sim checks that run in CI with known outputs.
- Publish artifacts: Host code on GitHub, snapshot to Zenodo and include a DOI in the notebook.
Reproducibility is not a single step—it's a workflow. The notebook embeds provenance and tests so students learn to think like reproducible researchers.
Advanced strategies and 2026-forward practices
For advanced learners and instructors who want to future-proof teaching materials, the notebook demonstrates:
- Using DVC or Git LFS for large datasets so the repository remains lightweight while data are versioned.
- Integrating GitHub Actions or GitLab CI for automated smoke tests, automatic builds of the Docker image, and publishing artifacts.
- Deploying live reproducible environments on Codespaces or Binder so students can run the notebook without local installs. The notebook includes a Binder badge and a Codespaces devcontainer.json.
- Archiving with Zenodo to assign a DOI to the exact commit used for grading or publication—this is a 2024–2026 community best practice.
Limitations and ethical considerations
No simulation model is perfect. The notebook explains the limitations explicitly:
- Model fidelity vs interpretability tradeoffs when choosing Poisson vs drive-sim approaches.
- Data quality issues: scraping errors, late injury updates, and API latency can change results.
- Responsible communication: avoid presenting a single number without uncertainty—always show intervals and calibration diagnostics.
Case study outcomes: what the 10,000-simulation forecast teaches
Running the full notebook on the archived inputs produces a reproducible set of outputs: win probabilities, expected margins and calibration reports. For example (archived run):
- Buffalo win probability: 0.43 ± 0.005 (Monte Carlo SE)
- Denver win probability: 0.57 ± 0.005
- Median score distribution: Denver 23, Buffalo 20 (interquartile ranges provided in notebook)
More importantly for instruction, the notebook allows students to see how small changes to the pipeline (injury adjustment, altitude effect, Elo K-factor) produce measurable differences in outputs—a powerful lesson in model sensitivity and the social responsibility of public-facing forecasts.
How instructors can adopt this material
Suggested adoption paths:
- Use the full notebook as a 2–3 week module in an applied statistics, data science or sports analytics course.
- Break into labs: Week 1 (ETL and data manifest), Week 2 (ratings and scoring model), Week 3 (simulations and analysis), Week 4 (project: extend model).
- Assign the capstone for peer code review: students fork the repo, implement a change, and submit a reproducible pull request.
Actionable takeaways
- Always archive inputs—store snapshots and checksums to make exact reproduction possible.
- Pin environments and seed RNGs—commonly overlooked steps that break reproducibility.
- Use tests and CI—a quick smoke test prevents silent breaks when dependencies change.
- Teach sensitivity—show how small modeling choices can meaningfully alter public forecasts.
Where the open notebook is archived and how to cite it
The notebook, environment files, Dockerfile, and a DOI for the archived commit are included in the repository. Instructors and students should cite both the GitHub repository and the Zenodo DOI when reusing the material. The notebook includes a citation block and suggested citation format so reuse is straightforward and creditable.
Final thoughts and next steps
Publishing this open notebook for the Bills vs Broncos 10,000-simulation forecast is both a practical teaching resource and a demonstration of contemporary reproducible research practice in 2026. It shows how transparent data provenance, pinned environments, and clear assumptions let students and instructors focus on modeling questions rather than troubleshooting opaque results.
Ready to reproduce the forecast? Clone the repository, launch the notebook in Binder or Codespaces, and complete the guided labs. Each step is annotated for reproducibility and pedagogy so you can use it as a lecture, lab, or assessment.
Call to action
Access the open notebook (GitHub + Zenodo DOI included in the repo), run the 10,000 simulations, and share your classroom variations. If you adapt the material for teaching, publish your fork’s DOI and submit a short reproducibility note to our teaching repository—help build a library of vetted, reproducible sports analytics teaching resources for 2026 and beyond.
Related Reading
- Serverless Data Mesh for Edge Microhubs: A 2026 Roadmap for Real‑Time Ingestion
- The Evolution of Site Reliability in 2026: SRE Beyond Uptime
- Edge-Assisted Live Collaboration: Predictive Micro‑Hubs, Observability and Real‑Time Editing for Hybrid Video Teams (2026 Playbook)
- Edge Auditability & Decision Planes: An Operational Playbook for Cloud Teams in 2026
- Counterplay Guide: How to Beat the New Executor in PvP
- Why TikTok Moderators' Legal Fight Matters to Local Creators and Community Groups
- Swap the Syrup: Viennese Fingers Flavoured with Cocktail Syrups for a Twist on Lunchbox Biscuit
- Frasers Plus & Sports Direct Integration: What Shoppers Need to Know to Maximize Rewards
- Pet-Friendly Sunglasses: Do Dogs Need Eye Protection and Which Frames Work Best?
Related Topics
researchers
Contributor
Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.
Up Next
More stories handpicked for you