On the application of Large Language Models for language teaching and assessment technology

by   Andrew Caines, et al.

The recent release of very large language models such as PaLM and GPT-4 has made an unprecedented impact in the popular media and public consciousness, giving rise to a mixture of excitement and fear as to their capabilities and potential uses, and shining a light on natural language processing research which had not previously received so much attention. The developments offer great promise for education technology, and in this paper we look specifically at the potential for incorporating large language models in AI-driven language teaching and assessment systems. We consider several research areas and also discuss the risks and ethical considerations surrounding generative AI in education technology for language learners. Overall we find that larger language models offer improvements over previous models in text generation, opening up routes toward content generation which had not previously been plausible. For text generation they must be prompted carefully and their outputs may need to be reshaped before they are ready for use. For automated grading and grammatical error correction, tasks whose progress is checked on well-known benchmarks, early investigations indicate that large language models on their own do not improve on state-of-the-art results according to standard evaluation metrics. For grading it appears that linguistic features established in the literature should still be used for best performance, and for error correction it may be that the models can offer alternative feedback styles which are not measured sensitively with existing methods. In all cases, there is work to be done to experiment with the inclusion of large language models in education technology for language learners, in order to properly understand and report on their capacities and limitations, and to ensure that foreseeable risks such as misinformation and harmful bias are mitigated.


page 1

page 2

page 3

page 4


Evaluating the Capability of Large-scale Language Models on Chinese Grammatical Error Correction Task

Large-scale language models (LLMs) has shown remarkable capability in va...

Structured Like a Language Model: Analysing AI as an Automated Subject

Drawing from the resources of psychoanalysis and critical media studies,...

Using Large Language Models for Cybersecurity Capture-The-Flag Challenges and Certification Questions

The assessment of cybersecurity Capture-The-Flag (CTF) exercises involve...

Thrilled by Your Progress! Large Language Models (GPT-4) No Longer Struggle to Pass Assessments in Higher Education Programming Courses

This paper studies recent developments in large language models' (LLM) a...

Large Language Models as Sous Chefs: Revising Recipes with GPT-3

With their remarkably improved text generation and prompting capabilitie...

Can ChatGPT and Bard Generate Aligned Assessment Items? A Reliability Analysis against Human Performance

ChatGPT and Bard are AI chatbots based on Large Language Models (LLM) th...

Conditioning Predictive Models: Risks and Strategies

Our intention is to provide a definitive reference on what it would take...

Please sign up or login with your details

Forgot password? Click here to reset