Assessment validity
Assessment validity refers to the degree to which a test accurately measures what it is intended to measure. The availability of generative AI tools has raised concerns about the validity of certain assessment types, as AI can generate outputs that meet learning outcomes. Consequently, it is essential to specify the permissible extent of AI use for all assessments and potentially redesign assessments to ensure they accurately measure student learning.
While no assessment type can be deemed entirely immune to AI misuse, we have categorised various assessment types as having low, moderate, or high validity in the context of an education landscape influenced by generative AI. These categorisations are based on a review of current literature on how generative AI affects assessment validity (refer to the bibliography below for the articles consulted).
Please refer to the tables below to determine the validity category of your assessment types. If your course assessments fall into the low or moderate validity categories, we recommend consulting our linked resources for further guidance.
Low validity assessments
Assessments with low validity are highly vulnerable to AI-generated outputs, making it difficult to ensure they measure what they intend to. These assessments often involve tasks that AI can easily perform, such as generating text or solving standard problems, thus compromising the integrity of the evaluation.
Moderate validity assessments
Assessments with moderate validity are somewhat susceptible to AI influence but still retain a reasonable degree of accuracy in measuring student learning. These assessments may include tasks where AI can assist but not fully complete the work, requiring students to demonstrate understanding and application of concepts.
Assessment type | Information |
---|---|
Analysing, data interpretation and findings | AI tools can analyse and categorise information but lack depth in comparative analysis and integration with literature. They often fail to link ideas cohesively or provide thorough, multi-faceted analysis for higher-level tasks. |
Essays and reports | AI-generated essays are comparable to human essays and reports in terms of structure and academic tone. However, AI outputs often lack depth, originality, and critical analysis, reducing validity for higher-level tasks. |
Image-based questions | AI tools have limited ability to interpret or generate responses for image-based assessments, but this area is improving with technology. |
Laboratory work and reports | AI can support structuring reports, but it cannot perform or replicate physical lab work and lacks the ability to interpret real data from lab experiments, preserving the need for human analysis and interpretation. |
Numerical problem-solving | AI can perform well on basic numerical calculations, especially when enhanced with plugins like Wolfram, but struggles with complex, multi-step problems requiring specific reasoning or diagrams. |
Project-based written assessments | AI can generate structured responses, but project-based assessments require deep contextual understanding and original contributions, limiting AI’s role. |
Reflective writing, personal essays and portfolios | AI struggles to replicate personal experiences or metacognitive reflections. AI cannot effectively compile or explain the context and personal insights for portfolios, making them less prone to automation. However, a passable effort is possible with the correct input/training, and especially if the student applied enough effort to build upon the generated response. |
High validity assessments
Assessments with high validity are those that remain largely unaffected by generative AI tools. These assessments accurately measure student learning outcomes through methods that require critical thinking, problem-solving, and original thought, which AI cannot easily replicate.
Assessment type | Information |
---|---|
Close reading and critical analysis of literature | AI struggles with deep textual analysis, failing to offer nuanced interpretations, incorporate cultural context and secondary criticism, maintaining the validity of these assessments. |
Complex visual artefacts | AI finds it difficult to generate unique non-text-based content like diagrams, mind maps, or long-form videos. |
Context-specific tasks | AI struggles with tasks requiring personal experience, real-world scenarios, or detailed contextual analysis, preserving validity in these assessments. |
Group work |
AI cannot effectively engage in group collaboration or contribute unique insights within a team, maintaining assessment validity for group work. |
In-class exams or handwritten assignments | AI cannot assist in real-time tasks such as in-class, handwritten tasks or timed quizzes, maintaining their integrity. |
Nested or staged assessments | Breaking larger tasks into smaller, staged assessments with feedback maintains validity, as AI cannot easily engage in iterative learning processes. |
Oral presentations, debates and interviews | AI tools can assist in scriptwriting, but students must present the work themselves, making the assessment format resistant to AI-generated content. Interview-based assessments enhance security. |
Peer review |
Peer reviews require critical evaluation skills, which AI struggles with. They promote higher-order thinking, making AI-generated content less useful for peer review exercises. |
Process-oriented assessments | Shifting focus from final products to the learning process (e.g., process notebooks, reflection) reduces AI misuse and offers better insights into student thinking. |
Situational judgment scenarios | AI struggles with critical evaluation, particularly when assessments require judgment based on theoretical frameworks or contextualised knowledge. |
Viva voce exams and real-time Q&A | AI cannot participate in real-time verbal exchanges, keeping these assessments highly valid and secure. |
AI and assessment validity bibliography
- Liu, H.-Y. (2024). Using sensitive data to debias AI systems: Article 10(5) of the EU AI Act. Journal of Law, Technology and Policy. https://doi.org/10.1080/17579961.2024.2392932
- Lye, C. Y., & Lim, L. (2024). Generative Artificial Intelligence in Tertiary Education: Assessment Redesign Principles and Considerations. Education Sciences, 14(6), 569. https://doi.org/10.3390/educsci14060569
- Mulder, R., Baik, C., & Ryan, T. (2024). Rethinking assessment in response to AI. Melbourne Centre for the Study of Higher Education. Retrieved from https://melbourne-cshe.unimelb.edu.au/__data/assets/pdf_file/0004/4712062/Assessment-Guide_Web_Final.pdf
- Nguyen Thanh, B., Vo, D. T. H., Nguyen Nhat, M., Pham, T. T. T., Trung, H. T., & Ha Xuan, S. (2023). Race with the machines: Assessing the capability of generative AI in solving authentic assessments. Australasian Journal of Educational Technology, 39(5). https://doi.org/10.14742/ajet.8902
- Nikolic, S., Daniel, S., Haque, R., Belkina, M., Hassan, G. M., Grundy, S., Lyden, S., Neal, P., & Sandison, C. (2023). ChatGPT versus engineering education assessment: A multidisciplinary and multi-institutional benchmarking and analysis of this generative artificial intelligence tool to investigate assessment integrity. European Journal of Engineering Education, 48(4), 559-614. https://doi.org/10.1080/03043797.2023.2213169
- Nikolic, S., Sandison, C., Haque, R., Daniel, S., Grundy, S., Belkina, M., Lyden, S., Hassan, G. M., & Neal, P. (2024). ChatGPT, Copilot, Gemini, SciSpace and Wolfram versus higher education assessments: An updated multi-institutional study of the academic integrity impacts of Generative Artificial Intelligence (GenAI) on assessment, teaching and learning in engineering. Australasian Journal of Engineering Education. https://doi.org/10.1080/22054952.2024.2372154
- Raftery, D. (2023). Will ChatGPT pass the online quizzes? Adapting an assessment strategy in the age of generative AI. Irish Journal of Technology Enhanced Learning, 7(1). https://doi.org/10.22554/ijtel.v7i1.114
- Revell, T., Yeadon, W., Cahilly-Bretzin, G., Clarke, I., Manning, G., Jones, J., Mulley, C., Pascual, R. J., Bradley, N., Thomas, D., & Leneghan, F. (2024). ChatGPT versus human essayists: An exploration of the impact of artificial intelligence for authorship and academic integrity in the humanities. International Journal for Educational Integrity, 20, Article 18. https://doi.org/10.1007/s40979-024-00161-8
- Wang, J. T. H. (2023). Is the laboratory report dead? AI and ChatGPT. Microbiology Australia, 44(3), 144-148. https://doi.org/10.1071/MA23042
AI video tutorials
These video tutorials featuring UNSW AI practitioners explore the use of AI in the classroom, for assessment design and for implementing accessible teaching.
Analysing AI outputs critically
Nexus Fellow and Senior Lecturer Andrew Dymock explains how generative AI can systematically be implemented using the RECESS model to promote learning with students.
Assessment design
The 6 permissible categories of AI use in assessment
Professor Alex Steel, Director of AI Strategy in Education, explains the 6 categories of permissible AI use within assessments at UNSW.
Maintaining assessment validity: A step by step approach
Nexus Fellow and Lecturer, Dr Dhanushi Abeygunawardena from the Faculty of Science explains an approach that identifies any assessment adjustments required because of AI capabilities.
Testing assessment vulnerability
A team from Engineering research the validity of assessments by ethically AI hacking assessments by generating AI submissions and submitting them alongside student submissions for blind-marking.
The value of programmatic assessment amid AI disruption
Nexus Fellow and Associate Professor Priya Khanna Pathak explains the need to rethink how we assess student capabilities and competencies because of the AI disruption.
Using AI for inclusive and engaging assessment methods
Lucy Jellema (Educational Developer, Equity) describes how teachers can use AI for designing assessments that foster inclusion of a diverse student cohort while also challenging students with different learning styles and teaching real-world skills.
Assessment rubrics
Using AI for Rubric Writing
Helena Pacitti, Nexus Fellow and Lecturer, introduces the benefits and limitations of using generative AI to write assessment rubrics.
Part 1: Utlising Generative AI to Design Assessment Rubrics
Part 1 focuses on using generative AI to structure the layout of a rubric, review and refine rubric criteria and weights.
Part 2: Utilising Generative AI to Design Assessment Rubrics
Part 2 focuses on reviewing and adjusting performance descriptors to ensure appropriate use of terminology, detail so students can demonstrate learning and reconciling any conflicting descriptions.
AI and Universal Design for Learning
Supporting Universal Design for Learning using AI tools
Universal Design for Learning (UDL) expert Prof. Terry Cumming, Professor of Special Education in the School of Education and UNSW Scientia Education Fellow, discusses the role of UDL in creating inclusive educational environments, as well as how AI can support UDL implementation.
Creativity for Accessibility: Using AI tools to implement UDL
Prof. Terry Cumming, Professor discusses how getting creative with AI tools can make teaching and assessment more accessible from both the teacher and student perspective.
Creating Lightbulb Moments: Implement Universal Design for Learning with AI
Lucy Jellema (Educational Developer, Equity) explores how teachers can use AI to present course materials in accessible formats, help students see how course content relates to their own lives, and brainstorm activities that can engage classes of all sizes.
Enhancing learning for neurodiverse learners with AI
Prof. Terry Cumming discusses how AI technology can enhance learning for neurodiverse learners.
Supporting inclusive assessment design with AI
Lucy Jellema (Educational Developer, Equity) explores innovative methods for leveraging AI in creating flexible and inclusive assessment rubrics. She discusses how you can enhance thoughtful assessment design by using AI to consider assessment from the students' perspective.