Reverse Engineering Research Processes
Learning to Design Human-Centered AI Systems by Deconstructing Published Research
Time to Complete: 30 minutes
PDF 5-Minute Warm-Up Activity can be downloaded above.
Who This is For:
This lesson is designed for two overlapping groups who share a common frustration: the gap between what ‘human-centered AI’ promises and what actually gets built. The first group are graduate and upper-division undergraduate students in Human-Computer Interaction, Science & Technology Studies, AI Ethics or Computer Science -- particularly those in research methods, sociotechnical systems or responsible AI courses who need to move from reading papers to critically interrogating them. The second group are practitioners in the field: UX researchers who are asked to evaluate AI-powered products they had no hand in designing; AI product designers working inside technology companies, civic tech organiztions, or public-sector agencies who need a rigorous lens for assessing whether a tool genuinely serves the people it claims to; and responsible AI team members who must audit third-party systems without a shared vocabulary for doing so. What unites both groups is a specific problem -- they encounter published research or deployed systems that assert human-centeredness, yet lack the analytical tools to trace whether that intention survived the design process, whose needs were actually prioritized, and where the methodology quietly foreclosed alternatives. If you have ever read a paper claiming to empower users and wondered which users or been handed an ‘auditing tool’ and asked to evaluate it without a framework for doing so, this lesson is built for you.
Goal: You will develop critical research literacy by reverse engineering how WeAudit -- a platform for user-engaged AI auditing -- was designed, gaining hands-on experience with identifying research gaps, translating user needs into design goals and evaluating sociotechnical systems.
Real-World Applications:
The reverse engineering framework practiced here maps directly onto a recurring challenge inside responsible AI teams and product organizations: evaluating whether an AI auditing or accountability tool is genuinely fit for the communities it claims to serve -- before deploying it at scale. Consider a civic tech team at a city government that has been offered a commercial AI audit platform to assess bias in its automated benefits-screening algorithm. The vendor's documentation describes a ‘human-centered’ workflow built on user feedback. Using the same analytical moves practiced in this lesson -- tracing design goals back to formative studies, identifying who was and was not included in evaluation, and surfacing the power asymmetries baked into the feedback pipeline -- the team can determine whether the tool will actually surface harms experienced by low-income residents or whether it will systematically translate their observations into formats legible only to engineers. The practitioner need is real and urgent: responsible AI teams at companies including major platform providers regularly commission or evaluate audit tooling, and the field currently lacks a shared, teachable framework for doing that evaluation rigorously. This lesson provides exactly that framework, grounding it in a concrete, peer-reviewed case study so that the analytical habits transfer immediately to the next tool, the next paper, and the next procurement decision.
The Problem and Its Relevance
The proliferation of AI auditing tools has created a peculiar irony: while we obsess over detecting bias in algorithms, we rarely examine whether our methods for detecting bias are themselves well-designed or meaningfully engage the people most affected by algorithmic harm. When researchers build systems to help users audit AI, they are making implicit assumptions about what users need, how they work and what counts as actionable feedback -- assumptions that shape whose voices get heard and whose harms get addressed. This creates a methodological blindspot: we lack systematic frameworks for understanding how research teams move from identifying problems to implementing solutions, making it difficult to learn from successful designs or avoid repeating unsuccessful ones. The WeAudit project reveals something unexpected about AI accountability work: the hardest challenge is not building tools that find bias, but rather creating workflows that transform diverse user observations into insights that AI practitioners can actually use. When end users from marginalized communities spend hours testing AI systems and reporting harms, but their feedback gets ignored or misinterpreted by development teams, the problem is not just technical -- it reflects power asymmetries in who gets to define what counts as a legitimate concern worth addressing.
Why Does This Matter?
Understanding research design processes in AI auditing matters because:
(i) Research decisions encode values: When researchers choose to scaffold user audits with worked examples versus letting users explore freely, they are making value judgments about autonomy, efficiency and whose knowledge matters.
(ii) Formative studies shape everything downstream: The 11 end users and 7 practitioners interviewed in the formative study directly influenced what features got built, whose needs got prioritized and what counted as successful auditing.
(iii) Design goals reveal assumptions: The six design goals (DG1-DG6) that emerged from formative work represent hypotheses about what users struggle with and what would help -- hypotheses that could be wrong or incomplete.
(iv) Evaluation strategies have blind spots: Testing WeAudit with university students before broader deployment makes practical sense but limits what we can learn about how diverse communities would actually use the tool.
(v) Trade-offs are unavoidable: Providing scaffolding helps users get started but risks constraining their creativity; social augmentation encourages exploration but might create echo chambers.
(vi) Practitioner needs differ from user needs: What helps someone detect bias is not the same as what helps someone fix bias, creating tensions in tool design that cannot be resolved through better interfaces alone.
(vii) Workflow choices have consequences: Organizing auditing into ‘Investigate’ and ‘Deliberate’ loops structures how people think about their work, potentially excluding other valid approaches to finding algorithmic harm.
Three Critical Questions to Ask Yourself
Do I understand how the researchers moved from identifying problems in their formative study to proposing specific design goals and features?
Can I articulate what trade-offs the researchers made and what alternative choices might have led to different outcomes?
Am I able to identify methodological limitations in the research design and propose how future work could address them?
Roadmap
Read the WeAudit paper carefully, paying attention to how the research unfolds from formative studies through system design to evaluation. Focus on understanding the logic connecting each stage rather than memorizing details.
Working in groups, your task is to:
(i) Map the research workflow by creating a visual diagram that shows:
What research questions the formative study aimed to answer
How findings from end users and practitioners were synthesized into design goals
Which design goals informed which specific WeAudit features
How evaluation methods were chosen to assess the system
Tip: Use arrows to show causality and highlight where researchers made interpretive leaps or design choices that were not inevitable.
(ii) Identify and justify one critical design decision the researchers made. Select a specific choice -- like using pairwise comparison, providing worked examples or structuring reports with specific questions -- and explain:
What problem this design decision aimed to solve
What alternative approaches existed
What evidence from the formative study supported this choice
What potential drawbacks or limitations this choice introduced
(iii) Analyze methodological trade-offs by examining three tensions in the research:
Why did researchers conduct formative studies with general users but evaluate with university students? What gets gained and lost?
How does providing scaffolding balance helping users versus constraining their creativity?
What does interviewing practitioners after the user study reveal or conceal compared to involving them throughout?
For each tension, explain both perspectives and propose one concrete way future research could address the limitation.
(iv) Reverse engineer one design goal by working backward from a WeAudit feature:
Select any system feature (comparison interface, worked examples, report portal, discussion forum, verification process)
Identify which design goal(s) it addresses
Find specific quotes or observations from the formative study that likely motivated this design
Propose an alternative way to address the same design goal and explain what different outcomes might result
(v) Evaluate the evaluation strategy by examining how success was measured:
What types of data were collected during the three-week user study?
How did researchers assess whether WeAudit achieved its goals?
What important questions remain unanswered by this evaluation approach?
Design one additional study that would address a gap you identified
Tip: Consider what stakeholders were included or excluded in the evaluation and how that shapes what counts as evidence of success.
(vi) Propose one meaningful extension by identifying something the research did not address:
A different user population that might have different needs
An alternative auditing context beyond text-to-image models
A potential harm or limitation the researchers did not fully explore
A design principle that could improve future user-engaged auditing systems
Support your proposal with specific reasoning about why this extension matters and what it would reveal.
Individual Reflection
By replying to your group’s post, share what you have learned (or questioned) from this reverse engineering exercise. You may include:
How examining research workflows changed your understanding of what design decisions involve and what they exclude
Whether you now see published research as more tentative or more rigorous after understanding the choices behind the findings
What this experience revealed about the relationship between who gets studied and what gets designed
How you might apply this analytical approach to evaluate other sociotechnical systems or research papers
Whether understanding the gap between what researchers intended and what they could test changes how you interpret their claims
The Bottom Line
Reverse engineering research succeeds when you can articulate not just what researchers did but why they made specific choices and what those choices foreclosed. Every research design embodies trade-offs between competing values: efficiency versus thoroughness, generalizability versus depth, practical constraints versus theoretical ideals. The WeAudit project demonstrates how formative studies with stakeholders, systematic synthesis of design goals and iterative evaluation can produce actionable knowledge -- yet it also reveals that even thoughtfully designed research has inherent limitations shaped by who participates, what gets measured and when evaluation occurs. Yet here lies the deeper challenge that the WeAudit researchers acknowledge but cannot fully resolve: building better tools for AI auditing matters far less than addressing the power asymmetries that determine whether user feedback gets ignored or acted upon. When you can identify research gaps, trace how problems become design goals, evaluate methodological choices critically and recognize what remains unaddressed, you have developed the research literacy needed to both learn from published work and contribute to advancing it. This understanding serves you whether you are conducting your own studies, implementing others’ findings, or simply being a critical consumer of research in a world where claims about ‘human-centered AI’ often obscure whose humans get centered and whose concerns get marginalized. The question is not whether WeAudit solves AI auditing -- it is what examining its design process teaches us about the choices we make when building systems that claim to empower users while operating within institutional structures that limit genuine empowerment.
#ReverseEngineeringResearch #UserEngagedAuditing #DesignGoalSynthesis #FormativeStudyImpact #StakeholderPowerDynamics
{"@context":"https://schema.org","@type":"LearningResource","name":"Reverse Engineering Research Processes","description":"A 30-minute structured activity for learning human-centered AI system design by deconstructing the WeAudit paper — a published, peer-reviewed platform for user-engaged AI auditing.","timeRequired":"PT30M","educationalLevel":"Graduate; Upper-Division Undergraduate","inLanguage":"en","teaches":["reverse engineering research design","formative study analysis","design goal synthesis","sociotechnical systems thinking","AI audit workflow construction","bias detection methodology","stakeholder power mapping","HCI research literacy","identifying methodological trade-offs","evaluating algorithmic accountability tools","translating user needs into design requirements","scaffolding versus open-ended task design","audit tool critique","practitioner-user feedback gap analysis","participatory AI evaluation"],"keywords":["AI auditing","WeAudit","human-centered AI","HCI","responsible AI","sociotechnical systems","research methods","user-engaged auditing","AI bias detection","design goal synthesis","formative study","practitioner-user gap","AI product design","algorithmic accountability","civic tech","UX research methodology","AI tool evaluation","power asymmetries in AI","audit framework design","AI ethics course material"],"audience":{"@type":"Audience","audienceType":"Graduate and upper-division undergraduate students in HCI, STS, AI Ethics, or Computer Science; UX researchers; AI product designers; responsible AI practitioners; civic tech teams"},"isPartOf":{"@type":"Course","name":"Research Methods in Human-Computer Interaction / Responsible AI"},"about":{"@type":"Thing","name":"WeAudit","sameAs":"https://dl.acm.org/doi/abs/10.1145/3757702"},"lastReviewed":"2026-03-18","version":"1.0 — Initial release; benchmarked against WeAudit (CHI 2024); audience scope and keyword taxonomy pending practitioner review cycle","mainEntityOfPage":{"@type":"WebPage","name":"Reverse Engineering Research Processes — Lesson Page"}}