Solving AI challenges in teaching

Assessment validity

Assessment validity refers to the degree to which a test accurately measures what it is intended to measure. The availability of generative AI tools has raised concerns about the validity of certain assessment types, as AI can generate outputs that meet learning outcomes. Consequently, it is essential to specify the permissible extent of AI use for all assessments and potentially redesign assessments to ensure they accurately measure student learning.

While no assessment type can be deemed entirely immune to AI misuse, we have categorised various assessment types as having low, moderate, or high validity in the context of an education landscape influenced by generative AI. These categorisations are based on a review of current literature on how generative AI affects assessment validity (refer to the bibliography below for the articles consulted).

Please refer to the tables below to determine the validity category of your assessment types. If your course assessments fall into the low or moderate validity categories, we recommend consulting our linked resources for further guidance.

Low validity assessments

Assessments with low validity are highly vulnerable to AI-generated outputs, making it difficult to ensure they measure what they intend to. These assessments often involve tasks that AI can easily perform, such as generating text or solving standard problems, thus compromising the integrity of the evaluation.

Moderate validity assessments

Assessments with moderate validity are somewhat susceptible to AI influence but still retain a reasonable degree of accuracy in measuring student learning. These assessments may include tasks where AI can assist but not fully complete the work, requiring students to demonstrate understanding and application of concepts.

**Assessment types with moderate AI validity**
Assessment type	Information
Analysing, data interpretation and findings	AI tools can analyse and categorise information but lack depth in comparative analysis and integration with literature. They often fail to link ideas cohesively or provide thorough, multi-faceted analysis for higher-level tasks.
Essays and reports	AI-generated essays are comparable to human essays and reports in terms of structure and academic tone. However, AI outputs often lack depth, originality, and critical analysis, reducing validity for higher-level tasks.
Image-based questions	AI tools have limited ability to interpret or generate responses for image-based assessments, but this area is improving with technology.
Laboratory work and reports	AI can support structuring reports, but it cannot perform or replicate physical lab work and lacks the ability to interpret real data from lab experiments, preserving the need for human analysis and interpretation.
Numerical problem-solving	AI can perform well on basic numerical calculations, especially when enhanced with plugins like Wolfram, but struggles with complex, multi-step problems requiring specific reasoning or diagrams.
Project-based written assessments	AI can generate structured responses, but project-based assessments require deep contextual understanding and original contributions, limiting AI’s role.
Reflective writing, personal essays and portfolios	AI struggles to replicate personal experiences or metacognitive reflections. AI cannot effectively compile or explain the context and personal insights for portfolios, making them less prone to automation. However, a passable effort is possible with the correct input/training, and especially if the student applied enough effort to build upon the generated response.

High validity assessments

Assessments with high validity are those that remain largely unaffected by generative AI tools. These assessments accurately measure student learning outcomes through methods that require critical thinking, problem-solving, and original thought, which AI cannot easily replicate.

**Assessment types with high AI validity**
Assessment type	Information
Close reading and critical analysis of literature	AI struggles with deep textual analysis, failing to offer nuanced interpretations, incorporate cultural context and secondary criticism, maintaining the validity of these assessments.
Complex visual artefacts	AI finds it difficult to generate unique non-text-based content like diagrams, mind maps, or long-form videos.
Context-specific tasks	AI struggles with tasks requiring personal experience, real-world scenarios, or detailed contextual analysis, preserving validity in these assessments.
Group work	AI cannot effectively engage in group collaboration or contribute unique insights within a team, maintaining assessment validity for group work.
In-class exams or handwritten assignments	AI cannot assist in real-time tasks such as in-class, handwritten tasks or timed quizzes, maintaining their integrity.
Nested or staged assessments	Breaking larger tasks into smaller, staged assessments with feedback maintains validity, as AI cannot easily engage in iterative learning processes.
Oral presentations, debates and interviews	AI tools can assist in scriptwriting, but students must present the work themselves, making the assessment format resistant to AI-generated content. Interview-based assessments enhance security.
Peer review	Peer reviews require critical evaluation skills, which AI struggles with. They promote higher-order thinking, making AI-generated content less useful for peer review exercises.
Process-oriented assessments	Shifting focus from final products to the learning process (e.g., process notebooks, reflection) reduces AI misuse and offers better insights into student thinking.
Situational judgment scenarios	AI struggles with critical evaluation, particularly when assessments require judgment based on theoretical frameworks or contextualised knowledge.
Viva voce exams and real-time Q&A	AI cannot participate in real-time verbal exchanges, keeping these assessments highly valid and secure.

AI and assessment validity bibliography

Liu, H.-Y. (2024). Using sensitive data to debias AI systems: Article 10(5) of the EU AI Act. Journal of Law, Technology and Policy. https://doi.org/10.1080/17579961.2024.2392932

Lye, C. Y., & Lim, L. (2024). Generative Artificial Intelligence in Tertiary Education: Assessment Redesign Principles and Considerations. Education Sciences, 14(6), 569. https://doi.org/10.3390/educsci14060569

Mulder, R., Baik, C., & Ryan, T. (2024). Rethinking assessment in response to AI. Melbourne Centre for the Study of Higher Education. Retrieved from https://melbourne-cshe.unimelb.edu.au/__data/assets/pdf_file/0004/4712062/Assessment-Guide_Web_Final.pdf

Nguyen Thanh, B., Vo, D. T. H., Nguyen Nhat, M., Pham, T. T. T., Trung, H. T., & Ha Xuan, S. (2023). Race with the machines: Assessing the capability of generative AI in solving authentic assessments. Australasian Journal of Educational Technology, 39(5). https://doi.org/10.14742/ajet.8902

Nikolic, S., Daniel, S., Haque, R., Belkina, M., Hassan, G. M., Grundy, S., Lyden, S., Neal, P., & Sandison, C. (2023). ChatGPT versus engineering education assessment: A multidisciplinary and multi-institutional benchmarking and analysis of this generative artificial intelligence tool to investigate assessment integrity. European Journal of Engineering Education, 48(4), 559-614. https://doi.org/10.1080/03043797.2023.2213169

Nikolic, S., Sandison, C., Haque, R., Daniel, S., Grundy, S., Belkina, M., Lyden, S., Hassan, G. M., & Neal, P. (2024). ChatGPT, Copilot, Gemini, SciSpace and Wolfram versus higher education assessments: An updated multi-institutional study of the academic integrity impacts of Generative Artificial Intelligence (GenAI) on assessment, teaching and learning in engineering. Australasian Journal of Engineering Education. https://doi.org/10.1080/22054952.2024.2372154

Raftery, D. (2023). Will ChatGPT pass the online quizzes? Adapting an assessment strategy in the age of generative AI. Irish Journal of Technology Enhanced Learning, 7(1). https://doi.org/10.22554/ijtel.v7i1.114

Revell, T., Yeadon, W., Cahilly-Bretzin, G., Clarke, I., Manning, G., Jones, J., Mulley, C., Pascual, R. J., Bradley, N., Thomas, D., & Leneghan, F. (2024). ChatGPT versus human essayists: An exploration of the impact of artificial intelligence for authorship and academic integrity in the humanities. International Journal for Educational Integrity, 20, Article 18. https://doi.org/10.1007/s40979-024-00161-8

Wang, J. T. H. (2023). Is the laboratory report dead? AI and ChatGPT. Microbiology Australia, 44(3), 144-148. https://doi.org/10.1071/MA23042

students at an AI event

AI video tutorials

These video tutorials featuring UNSW AI practitioners explore the use of AI in the classroom, for assessment design and for implementing accessible teaching.

Analysing AI outputs critically

Nexus Fellow and Senior Lecturer Andrew Dymock explains how generative AI can systematically be implemented using the RECESS model to promote learning with students.

Assessment design

The 6 permissible categories of AI use in assessment

Professor Alex Steel, Director of AI Strategy in Education, explains the 6 categories of permissible AI use within assessments at UNSW.

Maintaining assessment validity: A step by step approach

Nexus Fellow and Lecturer, Dr Dhanushi Abeygunawardena from the Faculty of Science explains an approach that identifies any assessment adjustments required because of AI capabilities.

Testing assessment vulnerability

A team from Engineering research the validity of assessments by ethically AI hacking assessments by generating AI submissions and submitting them alongside student submissions for blind-marking.

The value of programmatic assessment amid AI disruption

Nexus Fellow and Associate Professor Priya Khanna Pathak explains the need to rethink how we assess student capabilities and competencies because of the AI disruption.

Using AI for inclusive and engaging assessment methods

Lucy Jellema (Educational Developer, Equity) describes how teachers can use AI for designing assessments that foster inclusion of a diverse student cohort while also challenging students with different learning styles and teaching real-world skills.

Assessment rubrics

Using AI for Rubric Writing

Helena Pacitti, Nexus Fellow and Lecturer, introduces the benefits and limitations of using generative AI to write assessment rubrics.

Part 1: Utlising Generative AI to Design Assessment Rubrics

Part 1 focuses on using generative AI to structure the layout of a rubric, review and refine rubric criteria and weights.

Part 2: Utilising Generative AI to Design Assessment Rubrics

Part 2 focuses on reviewing and adjusting performance descriptors to ensure appropriate use of terminology, detail so students can demonstrate learning and reconciling any conflicting descriptions.

AI and Universal Design for Learning

Supporting Universal Design for Learning using AI tools

Universal Design for Learning (UDL) expert Prof. Terry Cumming, Professor of Special Education in the School of Education and UNSW Scientia Education Fellow, discusses the role of UDL in creating inclusive educational environments, as well as how AI can support UDL implementation.

Creativity for Accessibility: Using AI tools to implement UDL

Prof. Terry Cumming, Professor discusses how getting creative with AI tools can make teaching and assessment more accessible from both the teacher and student perspective.

Creating Lightbulb Moments: Implement Universal Design for Learning with AI

Lucy Jellema (Educational Developer, Equity) explores how teachers can use AI to present course materials in accessible formats, help students see how course content relates to their own lives, and brainstorm activities that can engage classes of all sizes.

Enhancing learning for neurodiverse learners with AI

Prof. Terry Cumming discusses how AI technology can enhance learning for neurodiverse learners.

Supporting inclusive assessment design with AI

Lucy Jellema (Educational Developer, Equity) explores innovative methods for leveraging AI in creating flexible and inclusive assessment rubrics. She discusses how you can enhance thoughtful assessment design by using AI to consider assessment from the students' perspective.

AI in Teaching and Learning at UNSW

Suggestions for this site