ChatGPT is amazing. Seriously. Go try it: chat.openai.com/chat. So what is it? It is an artificial intelligence language model that has been trained on vast amounts of data, turning this into an internal representation of the structure of the language used and a knowledge base that it can use to answer questions. From this, it can hold human-like conversations through a text interface. But that doesn’t do it justice. It feels like a revolution has happened, and that ChatGPT surpasses the abilities of previous generations of language AIs to the point where it represents a leap forwards in terms of natural interactions with computers (compare it with pretty much any chatbot that answers your questions on a website). It seems to be able to understand not just precise commands, but vaguer requests and queries, as well as having an idea about what you mean when you ask it to discuss or change specific parts of its previous responses. It can produce convincing stories and essays on a huge variety of topics. It can write poems, CVs and cover letters, tactful emails, as well as producing imagined conversations. With proper prompting, it can even help generate a fictitious language.
It has one more trick up its sleeve: it can generate functional computer code in a variety of languages from simple text descriptions of the problem. For example, if you prompt it with “Can you write a python program that prints the numbers one to ten?”, it will produce functional code (side-stepping some pitfalls like getting the start/end numbers right in range), and can modify its code if you ask it not to use a loop and use numpy.
But this really just scratches the surface of its coding abilities: it can produce Python astrophoto processing code (including debugging an error message), Python file download code, and an RStats shiny app.
All of this has implications for academia in general, particularly for the teaching and assessment of students. Its ability to generate short essays on demand on a variety of topics could clearly be used to answer assignment questions. As the answer is not directly copied from one source, it will not be flagged as plagiarism by tools such as Turnitin. Its ability to generate short code snippets from simple prompts could be used on coding assignments. If used blindly by a student, both of these would detrimentally shortcut the student’s learning process. However, it also has the potential to be used as a useful tool in the writing and coding processes. Let’s dive in and see how ChatGPT can be used and misused in academia.
ChatGPT as a scientific writing assistant
To get a feel for ChatGPT’s ability to write short answers on questions related to atmospheric science, let’s ask it a question on a topic close to my own interests – mesoscale convective systems:
ChatGPT does a decent job of writing a suitable first paragraph for an introduction to MCSs. You could take issue with the “either linear or circular in shape” phrase, as they come in all shapes and sizes and this wording implies one or the other. Also, “short-lived”, followed by “a couple of days”, does not really make sense.
Let’s probe its knowledge of MCSs, by asking what it can tell us about the stratiform region:I am not sure where it got the idea of “low-topped” clouds from – this is outright wrong. The repetition of “convective” is not ideal as it adds no extra information. However, in broad strokes, this gives a reasonable description about the stratiform region of MCSs. Finally, here is a condensed version of both responses together, that could reasonably serve as the introduction to a student report on MCSs (after it had been carefully checked for correctness).There are no citations – this is a limitation of ChatGPT. A similar language model, Galactica, has been developed to address this and have a better grasp of scientific material, but it is currently offline. Furthermore, ChatGPT has no knowledge of the underlying physics, other than the words it used are statistically likely to describe an MCS. Therefore, its output cannot be trusted or relied upon to be correct. However, it can produce flowing prose, and could be used as a way of generating an initial draft of some topic area.
Following this idea, one more way that ChatGPT can be used is by feeding it text, and asking it to modify or transform it in some way. When I write paper drafts, I normally start by writing a Latex bullet-point paper – with the main points in ordered bullet points. Could I use ChatGPT to turn this into sensible prose?
Here, it does a great job. I can be pretty sure of its scientific accuracy (at least – any mistakes will be mine!). It correctly keeps the Latex syntax where appropriate, and turns these into fluent prose.
ChatGPT as a coding assistant
One other capability of ChatGPT is its ability to write computer code. Given sparse information about roughly the kind of code the user wants, ChatGPT will write code that can perform specific tasks. For example, I can ask it to perform some basic analysis on meteorological data:
It gets a lot right here: reading the correct data, performing the unit conversion, and labelling the clouds. But there is one subtle bug – if you run this code it will not produce labelled clouds (setting the threshold should be done using precipitation.where(precipitation > threshold, 0)). This illustrates its abilities as well as its shortcomings – it will confidently produce subtly incorrect code. When it works, it is magical. But when it doesn’t, debugging could take far longer than writing the code yourself.
The final task I tried was seeing if ChatGPT could manage a programming assignment from an “Introduction to Python” course that I demonstrated on. I used the instructions directly from the course handbook, with the only editing being that I stripped out any questions to do with interpretation of the results:Here, ChatGPT’s performance was almost perfect. This was not an assessed assignment, but ChatGPT would have received close to full marks if it were. This is a simple, well-defined task, but it demonstrates that students may be able to use it to complete assignments. There is always the chance that the code it produces will contain bugs, as above, but when it works it is very impressive.
Conclusions
ChatGPT already shows promise at being able to perform mundane tasks, and generating useful drafts of text and code. However, its output cannot be trusted yet, and must be checked carefully for errors by someone who understands the material. As such, if students use it to generate text or code, they are likely to be able to deceive themselves that what they have is suitable, but it may well fail the test when read by an examiner or a compiler. For examiners, there may well be tell-tale signs that text or code has been produced by ChatGPT. In its base incarnation, it produces text that seems (to me) to be slightly generic and could contain some give-away factual errors. When producing code, it may well produce (incredibly clean and well commented!) code that contains structures or uses libraries that have not been specifically taught in the course. Neither of these is definitive proof that ChatGPT has been used. Even it ChatGPT has been used, it may not be a problem. Provided its output has been carefully checked, it is a tool that has the ability to write fluent English, and might be useful to, for example, foreign language students.
Here, I’ve only scratched the surface of ChatGPT’s capabilities and shortcomings. It has an extraordinary grasp of language, but does not fully understand the meaning behind its words or code, far less the physical explanations of processes that form MCSs. This can lead it to confidently assert the wrong thing. It also has a poor understanding of numbers, presumably built up from statistical inference from its training database, and will fail at standard logical problems. It can however perform remarkable transformations of inputs, and generate new lists and starting points for further refinement. It can answer simple questions, and some seemingly complex ones – but can its answer be trusted? For this to be the case, it seems to me that it will need to be coupled to some underlying artificial intelligence models of: logic, physics, arithmetic, physical understanding, common sense, critical thinking, and many more. It is clear to me that ChatGPT and other language models are the start of something incredible, and that they will be used for both good and bad purposes. I am excited, and nervous, to see how it will develop in the coming months and years.