First of a series of briefings from the expert AI and education group with all the latest information and guidance in this fast-moving area. Published February 2023.
In November 2022 an artificial intelligence (AI) tool, ChatGPT, was released which caused widespread interest and concern across the education sector because of its ability to create plausible answers to assignments, construct essays, and write computer code, all in seconds. Since its launch, many institutions have reported that students have attempted to pass off AI-generated work as their own, and some are banning it. It is likely that some UCL students will use this and similar tools, and it is essential that all teaching staff are alert to this, and aware of what actions to take.
UCL has considerable internal expertise, and has convened a group of colleagues who are shaping our response to this challenge, and ensuring we are well-placed take advantage of the many opportunities posed by this fast evolving technology. This Briefing has been compiled by this group and outlines some immediate advice and actions for teaching staff, together with an explanation of the technologies, their strengths and limitations.
In each briefing we will report on the work of the core expert group and include progress reports on series of workstreams. We will also invite colleagues to join a work stream if they have a particular interest in its work.
We can’t cover everything in this first briefing, so we thought it would be helpful to offer some practical tips based on the best available guidance to date. Note, discussion on these AI tools is evolving fast. So, we will ensure we update you further in each briefing.
Contact your Faculty Learning Technology Lead or your Arena Faculty contact if you would like further advice.
Only got two minutes? Jump straight to key actions for teaching staff.
UCL’s position
Whilst AI technologies such as ChatGPT are disruptive, especially for assessment,a blanket prohibition on their use or engaging in an arms race to try to outwit or detect them is not a productive strategy. Their use is already widespread in some workplaces and students and staff will need to be supported to use them ethically and transparently.
Our current advice to teaching staff is to be clear with students what you regard as a permissible use of AI in your particular assignment, and how they should acknowledge that use.
Background
In December 2022, we sent out an initial statement on the use of AI tools in education and assessment [UCL login required]. This was in response to growing interest in one highly publicised tool: ChatGPT. Since then, we have established an expert group of UCL colleagues, co-chaired by Prof Kathleen Armour (Vice Provost Education and Student Experience) and Prof Steve Hailes (Head of Computer Science) to steer us through some of the complexities of this issue.
The expert group met in January. AI technology — as exemplified by ChatGPT — is evolving fast, so we thought it would be helpful to distribute regular briefing papers for staff who teach and assess students. The education sector worldwide is grappling with the best ways to mitigate its challenges and take advantage of its opportunities so in addition to drawing on our UCL expertise, we are also keeping abreast of (and contributing to) wider sector developments.
The expert group is:
- Developing guidance for students, to be released in the next couple of weeks.
- Working on ethical and regulatory responses.
- Developing resources, running workshops and planning support for staff who would like help with designing teaching and assessment activities in the light of AI.
- Looking to the future and seeking to lead the sector in its response to these exciting technologies.
Sector response
Predictably, the development of these tools has led to speculation about ‘the death of the essay’, and concerns about the integrity of open book exams and take-home papers. Some have even claimed these tools present an existential challenge to HE as we know it. As has been widely noted, plagiarism detection tools are lagging behind, making their use in detection less useful than in the past. Yet, while these tools should not cause us to overreact, there is a need to interrogate our teaching, assessment and feedback practices in light of these developments. Staff will need generic information and also subject-specific guidance; indeed, some subject associations are now considering both the challenges and the future opportunities.
What are we talking about here?
- Large language models and ChatGPT
Following the recent publicity around ChatGPT and similar tools, technical developments in AI have started to overlap the world of education. ChatGPT is an example of an emerging technology called a ‘large language model’, and it is useful to understand a little more about them, as it shines light on their current limitations.
To be specific, ChatGPT is a user-friendly, chat-bot style Web tool based on GPT-3, a proprietary large language model built by the firm OpenAI. GPT stands for ‘generative pre-trained transformer’, referring to its creation of output, the way it is trained before used and does not dynamically learn (although its deployment may collect data for future rounds of training), and the type of machine learning model used — transformers have seen success in natural language processing.
ChatGPT predicts the next character in a sequence. Give it some text, and it will give you some more. GPT-3 has been trained on text from the Web, from books and from Wikipedia, costing an estimate $4.6m (and 500 metric tonnes of CO2) to train its 175 billion parameters, in addition to large total sums on small army of outsourced contractors who check prompts is to steer the model away from reproducing illegal or harmful content.
We can expect language models both from OpenAI and other firms and research groups to be rapidly integrated into many other tools and technologies. OpenAI has been using GPT-3 as a basis for other systems, including the image generation (‘AI art’) system DALLE-2 and the code-generation tool GitHub CoPilot, and OpenAI’s synthetic content detection tool, ‘AI Text Classifier’. The company also offers GPT-3 as a paid service for other companies to make their own software using, which has led to writing assistants (e.g. copy.ai) and other tools. They have agreed a multi-billion dollar partnership with Microsoft, who suggest they will make OpenAI’s large language models available in MS Word. Many other firms and research groups have similar technologies to GPT-3 (e.g. Google’s LAMBDA/Sparrow/GLaM, BigScience’s BLOOM). These are currently unreleased or less publicised, but heavy competition means that tomorrow, the conversation could move quickly from ChatGPT to another tool entirely.
- Functionality and limitations
Large language models attempt to capture and regurgitate the general patterns in text without producing the exact same text they were trained upon. This means its results are plausible, but lack understanding. Large language models frequently generates things that copy the form of text they have been trained on, so may be grammatically or syntactically correct, but as they do not capture meaning or ‘understand’, are often confidently wrong. Despite what it might appear, these models are not (yet) searching the Internet in real–time, but are drawing on statistical patterns captured at the moment of training. They will not know the weather today, or the current Prime Minister, as they are restricted to patterns they capture at the point of training. As language models reproduce patterns in text, they also have tendencies to reproduce unwanted biases, toxic speech, or specific worldviews.
This does not mean the technology is useless. Commentators have noted that this ‘fake intelligence’ can be practically useful in many domains. It can highlight grammatical errors, expand or summarise prose, or even help with ideation and writer’s block. Yet navigating the difference between utility and meaning can be hazardous. Bender et al. note that ‘the human tendency to attribute meaning to text, in combination with large [language models’] ability to learn patterns of forms that humans associate with various biases and other harmful attitudes, leads to risks of real-world harm’.
With the release of ChatGPT, this type of technology reached a turning point: no longer will we live in a world without a chatbot at least this convincing — and it already has functionality enough to be useful. GPT-3 is the third version of GPT, originally released in 2020 with 175 billion parameters; GPT-2 came out in 2019 with 1.5 billion parameters. The difference between these two is the difference between something useable and something largely of academic interest. We already know that ChatGPT hit 1 million users within just a week, and language models’ fast integration in other tools may see the numbers of users interacting with them in some form soaring. If those users add value to the process of evaluating the quality of the outputs, then we could see those outputs improve significantly over time.
ChatGPT rarely fails to produce text that looks like it could have been generated by a human – it is plausible in that sense. However, because GPT-3 does not really have a deep understanding of the entities about which it is generating text, the output may be inaccurate or meaningless in human terms, or may reflect the biases inherent in the subject material on which it was trained, a common problem with this type of AI. Human input helps correct some of the more egregious of these failings, but the GPT-3 heritage of ChatGPT can still become obvious if it is pushed.
The final limitation of ChatGPT again derives from GPT-3, most specifically the P (pre-trained) part of it. In short, GPT-3 was trained at a fixed point in time. Because it is only periodically re-trained (GPT-4 is rumoured to be on the way), the model is invariably out of date to a greater or lesser extent. It has no mechanism to search the web for current information, for example. Since GPT-3.5 was released in 2021, it does not embody any data from past that point in time.
We do not yet know where GPT-4 or GPT-5 or …. will put us, nor how others will seek to add value to GPT-x as ChatGPT has added value. If history is anything to go by, advances are likely to be inexorable and the technology will soon be embedded in everyday life, most particularly working life.
Trying out ChatGPT
Perhaps the first thing to try is…asking ChatGPT! Students certainly are.
Assessment Expert, Prof Mike Sharples (Open University) overheard students talking about ChatGPT, so he decided to ask ChatGPT to come up with guidelines for students:
Note these are not UCL's guidelines, but rather an example of ChatGPT.
- AI-generated guidelines on best practices [generated by ChatGPT, 28 January]
- Understand the limitations of the AI system: generative AI systems are language models, not knowledge bases dot they may not produce accurate or reliable information and should not be used as the sole source for research or assignments.
- Use the AI as a supplement: Use the AI system to supplement not replace your own critical thinking and research. Verify the information generated by the AI system with other sources before using it in your work.
- Cite your sources: If you use information generated by the AI system in your work, be sure to cite it as a source. Plagiarism is a serious issue and academic integrity is of the utmost important.
- Be aware of ethical concerns: Be aware of ethical concerns surrounding the use of AI systems and strive to use some responsibly and in an ethical manner.
- Communicate with your instructors: If you are unsure about how to use the AI system or have concerns talk to your instructors. They can provide guidance and advice on how to use the AI system effectively
- Continuously evaluate and improve: Continuously evaluate the effectiveness of using AI systems in your education and adapt accordingly. Provide feedback to improve the AI system for future use.
- Embrace the opportunity to learn: Use AI systems as an opportunity to learn more about technology, AI and its impact on society and the future.
This is not a bad start. In particular, it reminds us that students are using ChatGPT, so finding ways to ensure they are using it transparently and ethically is the task ahead of us.
Practical steps you can take now
Be clear with your students about permissible use in your context
A legitimate use of AI tools might be to help students summarise and analyse complex materials, extract key findings, break writers’ block, all in preparation for a piece of assessed work. One of the UCL group has created this ‘sand pit’ with some great examples of their strengths and weaknesses. Whilst we might feel that this is the kind of activity that students should be doing unaided, the reality is that these tools can save a great deal of time and effort, and so are extremely attractive assistive technologies for students – and indeed for educators – alike.
As a teacher, be really clear how far, if at all, they may use AI tools, and that they must clearly acknowledge their use.
If you suspect that a student has used AI unfairly in an assessment
When submitting work in AssessmentUCL or Moodle students accept the following statement:
By submitting this assessment, I confirm that all the work is my own unless collaboration has been specifically authorised. I understand that any form of Academic Misconduct is strictly prohibited, including the use of essay mills, homework help sites, plagiarism, collusion, falsification, impersonation or any other action which might give me an unfair advantage.
If you suspect that a student is trying to pass off AI-generated output as their own work then the regulations in the Academic Manual (9.2.1 g, h and m) apply.
Upcoming exams and assessments 2022/23
Short answer questions
If you are concerned that students might use this technology to create responses to short-answer questions, consider the following:
- Check previous, similar exam questions to those you are setting on chatGPT to understand how responses vary. In general, staff should ensure their current questions remain confidential, and it is best not to paste them into online services, particularly as these services typically claim rights to use and publish the information contributed to them.
- Ask for any short answer question to take an example from a recent paper or perhaps the specifics of a lab experiment or data set encountered on the course. This means some tweaks to the questions, but not a radical assessment overhaul.
- Consider how much an essay or (more likely) short answer question is recall/ fact based. What ‘higher order’ adjustment (e.g. including data synthesis or analysis) could you make?
- Consider how you can add a twist or subtle nuance to a question which would be required to be identified by the student and engaged with to get higher marks. Such nuances are currently more difficult for an AI system to pick up on, as the common signal of the topic of the question can outweigh the subtlety.
Students’ writing skills
If you are concerned that students might use this technology to mask weaknesses in written communication, then engage with them in order to educate them about what is/is not acceptable:
- When sharing a brief or talking through a written assessment: be completely open about our awareness of rapid and significant advances and availability of text (and image) generation tools (think about Grammarly, for example).
- Discuss with students their perceptions- ask them to compare/ contrast with e.g. calculators, spell checkers, translators, even Google search.
- Identify where, at programme/departmental level, your students are being asked to discuss the extent to which using a tool like ChatGPT could be considered dishonest? Where is the line? Is generating a framework cheating? An entire essay? What do they think the quality will be like?
- Show them a generated response, point out the blandness of the style and say that they frequently contain factual errors that are a real give-away!
- Point out that detection tools already exist, many much more sophisticated ones are in development and, predictably, a web-based sub-culture of ways to ‘fool’ the detection systems is also growing along with debates about the wisdom of letting AI tools determine human authorship! Whatever way you anyone engages with these tools our counsel would be to be wary and sceptical about claims and potentials.
- Mention that it is likely that contract cheating companies/ essay mills would almost certainly be employing these tools themselves so, in many ways, employing such techniques/ shortcuts is even less wise these days and may even be easier to detect through detection tools.
To make it more difficult for students to use AI tools unfairly:
- Consider ways you could ask for a response other than prose or short text; e.g. label a diagram, create a flow chart
- Use image-based prompts; e.g. ask for a commentary, explanation, speculation, or description with implications.
- Build in requirement that responses draw on a specific text, lecture, lab session, experiment, field trip or other source specific to the course
- Ask for responses, arguments, challenges, concerns, issues with content from slides or multimedia source
- Ask for rationalised opinions
- Require specific bibliographic references — large language models are currently unable to accurately cite work, and often create fictionalised, plausible references, although this may change.
Sign up for this Arena workshop on designing assessments with academic integrity in mind - further dates to follow.
Longer term: revisions for 2023/24
- It is easy to assume that a return to many more in-person, hand-written and invigilated exams may provide a solution but it’s worth remembering that assessment, where this is the dominant or principal type of assessment, may be iniquitous, inauthentic and/ or unscalable/ unsustainable. In-person exams should be considered as just one potential response to the challenges tools like ChatGPT are provoking.
- Consider the range of assessments for your module or programme, and discuss in teams whether the current design really measures what you want it to measure, whether there is unnecessary load, and how else you might ascertain understanding/ development/ achievement.
- Consider how ‘knowledge-based’ your assessments are. Where you expect a relatively simple fact recall, what is the justification for this? And rather than being a stand-alone question, could the knowledge recall be embedded into an assessment that also tests students' abilities to compare, use and analyse this knowledge?
- Strongly recommend reviewing the wording in modules and, where possible, building in ‘ambiguity of format’. So, for example, consider using ‘submissions’ or ‘artefact’ for ‘written essay’.
- Consider variety in assessment format - interactive short oral assessments can be scalable and actually more efficient for both student and assessor. An equivalent to a substantial piece of writing might be conducted in 20 mins (including producing feedback) for example. Think portfolios of short pieces.
Assessment decisions and planning
For 23-24, it is unlikely we will be able to increase the number of seats available at the ExCeL. We have made progress in diversifying our assessment in recent years, and, rather than revert to traditional exams, would like to encourage colleagues to build on this and shift to alternatives to examinations. This might involve creative, reflective, collaborative, data-based, media-rich and generally more authentic and real world challenges. Colleagues in Arena and Digital Education are keen to work with teaching teams on assessment design.
Some types of assessments are more susceptible to the use of these tools, and colleagues may want to consider their assessment designs both in the immediate term (for coursework and examinations this session), and their plans for next session. Note that for the current 2022-23 assessment period it is not possible to change from coursework to exam, or vice versa.
Key dates
Guidance and suggestions on some approaches are outlined below – and there is a short window of time to make changes:
Please note upcoming deadlines: assessment planning is 28 February and exam questions is 10 March.
- Exam papers
The Central Assessment Team deadline for exam papers is Friday 10 March and departments will have earlier deadlines in place. This means that you have some limited time available to amend your exam questions. Please review your questions and consider whether students might be able to use AI tools to gain an unfair advantage. Guidance below.
- Coursework for this session
If you have not yet published coursework assignments for current modules, consider whether students might be able to use AI tools to gain unfair advantage, and consider how you might modify the assignment to improve its integrity. Guidance below.
- Assessment planning for 2023/24 - Curriculum Data Task
Departments are currently working on the CDM task confirming assessment patterns for 2023/24 which is due for completion on Tuesday 28 February. It is possible that assessments may require re-thinking in the light of rapid developments in AI. There will be a further opportunity in July to tweak this. We are also looking at redacting the granular assessment information that is publicly displayed in the module catalogue, for example as Coursework/Exam.