Data ManagementGaming ResearchCommunity Engagement

Collecting Data in Gaming: The Hunt for Riftbound’s Second Expansion

DDr. Marcus Hale

2026-04-25

4 min read

A practical, research-focused guide to how the Riftbound community collected, managed, and reproduced expansion data.

Collecting Data in Gaming: The Hunt for Riftbound’s Second Expansion

How a volunteer community, telemetry, and research-grade data management practices converged during the Riftbound expansion campaign — and what researchers and designers can learn about reproducibility, consent, and community engagement.

Introduction: Why this case matters to researchers and designers

Riftbound’s expansion as a living lab

The announcement of Riftbound’s second expansion transformed a fandom into an active data-generating ecosystem. Players began logging builds, sharing playtests, and clustering around hypotheses for balance. That grassroots data collection mirrored many features of formal research projects: protocols, version control, and iterative analysis. In this guide we use Riftbound as a concrete case study to examine how gaming communities collect, manage, and reuse data, and how those practices align with established research data management (RDM) and reproducibility principles.

Why gaming data is high-value and high-risk

Data produced by games includes telemetry, chat logs, forum polls, and user-submitted playtests. This combination is valuable for designers, academics studying user experience, and community managers aiming to improve engagement. At the same time, it brings privacy, security, and trust issues that researchers routinely face. Lessons from digital security reporting, such as the analysis of the WhisperPair vulnerability, are directly relevant when community datasets contain personal identifiers or sensitive behavioral traces (digital security lessons from WhisperPair).

Who should read this guide

This deep dive is for game designers, community managers, academic researchers, and advanced players who run playtests. If you want reproducible results, ethical consent workflows, or a practical step-by-step to run a community-driven data collection campaign, the next sections are a practical blueprint grounded in real-world examples.

Section 1 — Motivations: Why communities collect data

Balancing and meta discovery

Players collect data to discover overperforming builds, nerf/buff candidates, and emergent strategies. In Riftbound’s expansion campaign, competitive clans ran systematic queues to estimate win rates and power curves, echoing formal experimental arms with control groups and repeated measures. These grassroots analytics often use spreadsheets, shared trackers, and communal dashboards to aggregate tens of thousands of game outcomes.

Content creation and visibility

Streams and videos build narratives around balance shifts. Community members use collected statistics to craft persuasive content — think patch breakdowns or hero tier lists — which in turn drive engagement. Streaming and review dynamics influence what data projects gain traction, a phenomenon reminiscent of how live reviews shape audience attention in other domains (live reviews impact audience engagement).

Research curiosity and modding

Some players approach game data as an experimental platform. Researchers and modders test hypotheses about reward schedules, user retention, and emergent cooperative systems. Community-powered experimentation is fertile ground for insights that inform design, and it often leverages practices from other creative domains — for example, cross-disciplinary lessons on crafting impactful experiences from the art world (creating impactful gameplay lessons from the art world).

Section 2 — Anatomy of Riftbound’s data collection campaign

Kickoff and community organization

Shortly after the leak of expansion patch notes, Riftbound community leads posted organized playtest schedules in Discord and Reddit. They set explicit playtest windows, variables to change (e.g., item X’s cooldown), and data templates for logging match outcomes. That first step — agreeing on a protocol — is the single most important activity for ensuring comparable, interpretable results.

Channels and artifacts

Data artifacts included annotated spreadsheets, telemetry extracts from client-side logs, VOD timestamps for verification, and user surveys about subjective experience. Forums aggregated qualitative reports while automated scripts parsed log files to extract structured events. Community channels replicated features we see in platform development: feature fatigue and update conversations often mirror broader software discussions, which communities navigate using methods similar to social platforms managing feature overload (navigating feature overload on social platforms).

Roles and governance

Volunteers took on roles like data steward, scrapers, and statisticians. They enforced a modest governance model: a transparent changelog, version-coded datasets, and a central repository. This mirrors academic projects where data curators and PIs set policies for dataset access and reuse.

Section 3 — Methods: How the community gathered data

Automated telemetry and client logging

Community contributors wrote parsers for Riftbound’s local logs and aggregated anonymized telemetry. These parsers converted raw events into schemas suitable for analysis (e.g., match_id, player_class, timestamp, outcome). Converting to a stable schema is crucial for reproducibility; without it, later analysts cannot compare datasets reliably.

Structured playtests and A/B-style matches

Players ran structured matches that varied one factor at a time — a practical analog of A/B testing. They recorded player composition, target metrics, and environmental settings. Systematic replication of these small

Dr. Marcus Hale

Senior Editor, researchers.site

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.