The Assessment Isn't Broken. The Assumption Behind It Is.
What happens to learning when AI can finish the assignment?
Duration: 30 minutes
5-minute PDF Warmup Activity available for download
Who This Is For: This lesson is designed for university faculty, instructional designers, academic integrity officers, department chairs and curriculum developers who are wrestling with a practical and urgent question: how do you assess whether a student has actually learned something when a generative AI tool can produce a plausible answer in seconds. It is also directly relevant to ed-tech professionals building assessment platforms, HR and learning and development practitioners in corporate training environments, and graduate students in education or pedagogy programs. If you have ever graded a submission that felt too polished to be authentic, struggled to write a take-home exam that AI cannot complete, or tried to explain your institution's AI policy to a skeptical colleague, this lesson was written for you. The challenge you are facing is not a failure of vigilance. It is a structural problem that requires new thinking about what assessment is actually designed to do.
Real-World Applications
Universities in South Africa, Australia and the United Kingdom are already piloting oral examination formats called viva voce sessions in response to the research summarized in this lesson. Rather than relying solely on written submissions, these institutions are requiring students to defend their work in real time, answer follow-up questions and demonstrate the reasoning behind their responses. This approach directly targets what the research identifies as AI's core weakness: its inability to engage in the kind of higher-order, situated thinking that Bloom's taxonomy classifies under evaluate and create. For corporate training designers and HR professionals, the same logic applies to competency assessments, certification programs and performance appraisals. Any evaluation that can be completed by submitting a document is now vulnerable to the same integrity challenges that universities are facing. The research-backed strategies in this lesson apply across all of these contexts.
The Problem and Its Relevance
Generative AI does not cheat. Students who use it to complete assignments they were supposed to complete themselves do. But that distinction, while legally important, misses the deeper problem: a student who submits AI-generated work may receive a passing grade while acquiring almost none of the knowledge or skill the assessment was designed to measure. The degree credential they earn becomes a record of what an AI tool can do, not what the graduate can do. Research reviewed in this lesson found that students who depend heavily on AI for academic tasks sometimes struggle to construct sentences independently, suggesting that the tool is not supplementing learning but replacing it.
The fairness dimension compounds this problem in ways that institutions have not yet fully addressed. AI-assisted assessments create an uneven playing field by benefiting students with better access to high-quality AI tools while disadvantaging those from underprivileged or rural backgrounds. An institution that grades all students on the same rubric while some have access to premium AI writing assistants and others do not is not administering a neutral test. It is administering a test of resource access. Closing that gap requires more than an AI policy. It requires a fundamental rethinking of what is being evaluated and why.
What Assessment Measures in an AI-Infused Environment
Assessment in higher education has historically served three connected purposes: measuring what a student has learned, tracking progress over time and maintaining standards that have meaning outside the classroom. A 2026 systematic review published in AI and Ethics examined 56 empirical studies on this topic and found that traditional assessment formats including multiple-choice exams, standardized essays and take-home assignments are increasingly inadequate for achieving any of these three purposes in AI-infused environments. Understanding why requires a clear grasp of how assessment works and where AI intervenes.
Assessment is defined in the research as a process by which information is obtained relative to a known objective or goal. It exists within a broader cycle where measurement provides the data, assessment interprets it and evaluation uses the resulting insights to improve teaching and learning. When AI generates the data, that cycle breaks down. The measurement captures AI output. The assessment interprets AI output. And the evaluation draws conclusions about student learning that the data does not actually support.
The research draws a further distinction between formative assessments, which help students prepare for final evaluations, and summative assessments, which are used for grading. It also distinguishes between process-based assessments, which evaluate the steps a student takes, and product-based assessments, which evaluate only the final output. AI tools are most effective at producing finished products. They are far less capable of replicating a documented learning process, a live oral defense or a real-world problem-solving task that requires the student to make decisions under uncertainty and explain their reasoning in the moment. This is not a minor technical limitation. It is the structural vulnerability that assessment redesign must target.
Bloom's taxonomy provides the clearest diagnostic tool for understanding where the vulnerability lies. The research found that generative AI tools such as ChatGPT perform well at the taxonomy's lower levels, specifically remembering and understanding. They perform poorly at the upper levels, particularly creating, which requires lifelong learning skills including critical thinking, creativity, communication and collaboration. Assessments designed at or near the create level are substantially harder for AI to complete and substantially better at measuring genuine learning. Assessments designed at the remember and understand levels are now, effectively, optional for any student with access to a generative AI tool.
The Bottom Line
Institutions that respond to AI-enabled academic dishonesty primarily through detection and punishment are solving the wrong problem. Detection tools such as Turnitin and GPTZero are improving, but the research is clear that they still produce false positives and false negatives at rates that make them unreliable as the primary enforcement mechanism. A student who uses AI in ways the detection tool misses still submits work the institution cannot verify. A student falsely flagged as having used AI faces a serious accusation based on an error. Neither outcome serves the goals of education, and neither is resolved by better detection alone. The research advocates for a shift from punitive models toward educative models that reduce the likelihood of misconduct by giving students the skills and knowledge to engage with AI responsibly.
The more durable response requires accepting something most institutions have not yet said out loud: if your assessment can be completed by AI, it is no longer measuring what you think it is measuring. Oral exams, multi-stage assignments with documented drafts, reflective journals, real-world problem-solving projects and peer evaluations are not workarounds or compromises. They are, according to the research, more valid assessments of learning than many of the written formats they would replace. Redesigning assessment around what AI cannot do is not a retreat from rigor. It is a return to the original purpose of assessment, which was never to measure compliance with a deadline. It was to measure whether learning happened.
#AIinEducation #AcademicIntegrity #GenerativeAI #HigherEdReform #AuthenticAssessment