
Evaluating AI tools for teaching and learning
I am in the process of evaluating a suite of AI tools from Contact North, Ontario, for students, instructors and administrators.
In an earlier post I set out a set of criteria for evaluating AI tools for students. This post though focuses on AI for teachers/instructors, and in particular on creating multiple choice questions. I have therefore slightly amended the criteria to take account of the change of target group and topic.
If you are interested, other evaluation frameworks for the use of AI for teaching and learning include:
- Lauren M. Anstey & Gavan P.L. Watson (2018) Rubric for eLearning Tool Evaluation London ON: Centre for Teaching and Learning, Western University (thanks to D’Arcy Norman for the recommendation)
- AI for Education.org QA of AI: Quality Assurance Framework (thanks to Michael Trucano for the recommendation.
My first evaluation was of Contact North’s tool for students, AI Tutor Pro.
What tools is Contact North offering for teachers/ instructors?
AI Teaching Assistant Pro is in fact a suite of six separate tools, generating:
- multiple choice questions
- essay questions + scoring rubric
- syllabus and teaching notes
- a slide builder
- learning shorts (short videos)
- a Faculty Assistant for AI Tutor Pro
I will describe each of these in more detail when I come to evaluate them. Each of these tools deserve a separate evaluation, so there will be a blog post on each. This one focuses on generating multiple choice questions.
Multiple choice questions
You are asked:
- to choose a topic OR to upload a document on which tests will be generated
- if you choose a topic, to state the level (e.g. high school or undergraduate)
- how many questions you want in the quiz
- the number of options to choose from in the answer,
- to choose a language.
What I did
I tried both choosing a topic and uploading a document.
The topic I chose was ‘how to design algorithms for large language models at a professional level’.
I asked for three questions, each with five options. The questions the tool provided would require knowledge of the names of methods used in modelling large language models. However they would not help in designing a specific algorithm.
I also uploaded, in Word, Section 12.4 on ‘Open pedagogy‘ from my open textbook, ‘Teaching in a Digital Age‘ and asked for three questions each with five answers. These resulted in three questions that required ‘factual’ answers (Which if these did x suggest?, etc.) but did not require any evaluation of the ideas or concepts in this section.
Bias warning. Lastly, but most importantly, before I start the evaluation you need to know that I am not a great fan of multiple choice questions outside of mathematics and certain areas of science where students have to make calculations or solve problems. It takes considerable inventiveness to design multiple choice questions that test anything beyond memorisation and basic understanding.
1. Target group (Scale 0-5)
Is it clear who should make use of these tools and for what purpose?
It requires the teacher or instructor to specify the topic and level, but otherwise it is clearly suitable for anyone who wants to use multiple choice questions. Teachers or instructors will need to define the topic or choice of uploaded document at the level appropriate for their students.
I give this a score of 4 out of 5 on this criterion.
2. Ease of use (Scale: 0-10)
- Is it easy to find/log in? Yes, just click and enter your question or upload an existing document (this may mean copying and pasting into Word or pdf).
- Is it easy to set questions? You need to define the topic or choose appropriate material to enable it to ask questions and give answers. This may take a few efforts to get it right.
- Does it provide the necessary information quickly? Extremely fast, but it is important to evaluate both the questions it asks and the choice of answers it gives.
- Is it easy to make use of the questions and answers it provides? You will need to copy and paste the response from the tool into a format that students can use, but it saves the time of generating questions and answers.
Because of the need to (a) state carefully the topic (b) check the questions and answers carefully and (c) transcribe the responses into a usable format with students, I give this a score of 7 out of 10.
3. Accuracy/comprehensiveness of questions and answers (Scale: 0-10)
- How accurate is the information provided? In terms of the topic provided, the questions and answers related to the topic and I did not find any incorrect answers.
- Is the information correct within context? I found this more of a problem. The questions and answers focused entirely on factual questions, but without context the Q and A did not really test knowledge at any depth or sophistication.
- Does it provide a range of possible answers where this is appropriate? Yes, there is an unlimited number of answer options
- Does it provide relevant follow-up questions or activities? No.
I am giving this 6 out of 10, but this is influenced by my questioning the general usefulness of multiple choice questions. It would be useful if students who gave incorrect answers were referred to the relevant section of the uploaded document or a relevant source from the Internet so they could understand why their answer was wrong.
4. Likely learning outcomes (Scale: 0-10)
- provides accurate/essential assessment on the topic/question (0-3 points) I am giving this 2 out of 3. The questions chosen are at a low level of hierarchies of learning, but will test accurately memorisation
- helps with testing key concepts or principles within the study area/topic (0-3 points) I am giving this 1 out 3. Multiple choice tests need to be extremely well designed to test key concepts or principles at more than a surface level, and this tool does not do this
- enables/supports critical thinking about the topic (with: max 5 points) or without (max 3 points) good feedback. I have to give this a zero. The questions do not require critical thinking to answer and there is no feedback other than correct or incorrect.
- motivates the learner to continue learning about the topic (1 point). Some learners may like this simple form of testing: 1 point
Total score: 4 out of 10
5. Transparency (Scale: 0-5)
Where do the questions and answers come from? Who says? Will it provide references, facts or sources to justify the questions and answers it provides? What confidence can I have in the information provided?
Again, I have to give this tool a zero on transparency. It failed to reference the document I provided to justify the answers, and gave no references for the questions and answers on the topic I provided. Students doing these assessments will be entirely dependent on the instructor for feedback and if the instructor does not know themselves the answers, they have no way of finding out so they can help their learners.
6. Ethics and privacy
This depends on how the instructor chooses to use such a tool. If it is used as a form of personal feedback for students to assess what more they need to know, it could be a valid and useful tool. If the instructor asks students to use the tool, there is no risk to their privacy, as it does not collect personal information.
However, I would very much question the appropriateness of this tool for any formal assessment. I therefore give this a score of 7 out of 10 on this criterion.
7. Overall satisfaction (Scale: 0-10)
If you think multiple choice questions are useful, you might give this a score of around 7 out of 10. It might be helpful in generating questions for you to think about but you would be unwise to use them without careful consideration and possible amendment. If, like me, you are wary of this form of assessment, you might give this tool a 3 out of 10. I can see some circumstances where this might be useful, particularly for students to check their understanding and what they still need to cover, but I would not want to issue grades or qualifications based on such testing.
Overall evaluation
I give this a total score of 31 out of 60 – barely 50%. If you like multiple choice questions as a teaching technique, you would probably score it much higher.
If you are an instructor I do suggest you try the tool. It may well suit your method of teaching and assessment. You might want to recommend your students to use it to get feedback on their progress, in which case you might want to give them some guidance on how to phrase the topic request or upload some suitable document on which the test can be based.
Update
In my previous post I noted the difficulty of recording and saving the AI responses. I have now found out how to do that. When you click on the ‘copy’ icon on the AI generated response,it goes into your screensaver. You can then open up a new Word or pdf document and paste what you have copied into the document. Clumsy but still valuable.
Over to you
Once you’ve tried the multiple choice testing tool, please comment on it using the box at the end of this blog post or send me an email at tony.bates@ubc.ca.
Up next
I will be evaluating the essay questions and scoring rubric within the next week.