Our approach is shaped by researchers asking the hardest questions about how humans actually think—not just what they choose. Georgia Tech's "Towards Experience-Centered AI: A Framework for Integrating Lived Experience in Design and Development" argues that training AI only on abstracted behavior isn't enough. AI systems need lived experience: first-person, embodied decision-making that shows how people navigate uncertainty, weigh tradeoffs, and change their minds. LifeTrade is building exactly that kind of data. We're collecting longitudinal reasoning traces from real people in real contexts—work directly informed by the LEAF framework, Stanford's research on process supervision, and conversations with leading alignment researchers across Berkeley, University of Toronto, and Oxford who see the same gap we do. If you're working on reward modeling, process-based evaluation, value alignment, or temporal consistency in AI systems, we'd like to explore research partnerships. Pilot data access, co-authored papers, custom cohorts—however makes sense for your work.
Get in TouchThe LEAF framework, developed by Sanjana Gautam, Mohit Chandra, Ankolika De, Tatiana Chakravorti, Girik Malik, and Munmun De Choudhury in 2025, shows that AI systems fall short when trained only on abstracted behavior—not on lived experience. [Link to paper → ]Lived experience means first-person, embodied, contextual decision-making—how people actually think and feel when making real choices under real constraints. LifeTrade captures this through structured longitudinal interviews that follow the same people over weeks, documenting four temporal layers in every decision trace:The Four Temporal Layers:
Real-world context with actual stakes—the job offer she's weighing, the family obligation he's navigating, the health choice they're making. Not synthetic prompts or hypothetical scenarios.
Explicit "because" logic: what tradeoffs she weighed, what factors mattered most, how she reasoned through the choice. Not just which option she picked.
"What would make you decide differently?"—systematic elicitation of preference boundaries and causal structure. This is extraordinarily rare in preference datasets.
What actually happened after the decision, and how the person reflects on their choice with hindsight. This reveals whether their reasoning was sound, what they learned, and how it shaped future decisions.
Every session captures intermediate reasoning steps—what people were weighing, what made them realize something, what stopped them. This extends OpenAI's work on process supervision (rewarding reasoning steps, not just final answers) from mathematical reasoning to human social and emotional judgment. Researchers can train process reward models (PRMs) that distinguish between "correct for good reasons" versus "correct by accident" versus "incorrect but reasonable given priors"—a capability outcome-only supervision cannot provide.
Same participants answering structurally similar questions across multiple sessions (days or weeks apart) reveal how beliefs, preferences, and strategies shift in response to life events, new information, or reflection. Current RLHF assumes static human preferences. LifeTrade data shows preferences are dynamic and context-dependent, enabling research on preference drift, belief updating, consistency versus adaptability tradeoffs, and long-term consequences of decisions. This could become the first public benchmark for temporal consistency scoring of language models.
Explicit flip condition probes ("When would you do the opposite?") systematically elicit the boundary conditions of every preference, creating a natural dataset of human counterfactual reasoning. This reveals what factors are causally necessary versus merely correlated with choices, enables preference generalization to novel situations, supports robustness testing, and constrains the space of utility functions consistent with observed behavior. This is extraordinarily rare in preference datasets.
Decisions are about actual life events—career changes, family obligations, health choices, financial tradeoffs—not responses to internet-style prompts. The data comes from the true human preference distribution, not "things people ask chatbots."Every trace is anchored in physical and social reality with actual consequences. Labs can use this data to measure how much synthetic RLHF data deviates from real human reasoning, potentially correcting for the "synthetic data wall" problem.
Because tradeoffs are explicit ("I gave up X to get Y"), every trace reveals the participant's values hierarchy in that context—autonomy versus security, family versus career, peace versus justice. Researchers can trace exactly which human values are being learned by reward models, study how values vary across people and contexts, and enable principled value-learning rather than implicit value-absorption. This makes it possible to audit not just what a model does, but why—and whose values it's actually encoding.
Does a model give coherent advice to the same person at different points in time, accounting for what happened in between? Current models have no mechanism for this. LifeTrade's longitudinal data tracks how the same people reason through structurally similar decisions weeks apart, revealing how context, new information, and life events shift their judgments. This can become the first public benchmark for temporal consistency scoring—a crucial capability as AI systems become more deeply embedded in people's lives.
Sessions explicitly probe for times strategies failed ("When have you ignored these signals? What happened?"), creating a systematic collection of human reasoning failures with post-hoc analysis. This provides negative examples for training, reveals robustness challenges, supports interpretability research, and helps align models to avoid failure modes humans themselves fall into. Understanding how and why human reasoning breaks down is as valuable as understanding when it succeeds.
Structured capture of how people weight others' opinions, process advice, and navigate disagreement reveals epistemic social graphs and conflict resolution strategies. This informs multi-agent alignment (how humans aggregate conflicting preferences), advice-taking protocols (when to defer versus trust intuition), and disagreement management (preserving relationships while disagreeing). As AI systems mediate human coordination, understanding these dynamics becomes essential.
Systematic capture of how people notice internal emotional and physiological signals, what those signals mean to them, and how they use them to make decisions. As language models become interfaces for high-stakes decisions, understanding how emotional context changes preference weighting becomes essential. LifeTrade data enables research on interoceptive decision-making, emotional regulation strategies, and empathetic response quality—showing not just what people choose, but what they're feeling when they choose it.
Our approach is directly informed by researchers who see alignment as the central challenge of our time. We're always looking to collaborate with members of the AI research and safety community who see the same gap we do. If you're working on reward modeling, process supervision, value alignment, preference learning, or temporal consistency in AI systems, we'd love to talk. Whether you're interested in pilot data access, co-authoring papers, or designing custom research cohorts, we're open to collaboration.Please get in touch. We're happy to share datasets and explore what research partnerships could look like.
Get in Touch