It may soon become common to encounter a tweet, essay or news article and wonder if it was written by AI software. There could be questions over the authorship of a given piece of writing, like in academic settings, or the veracity of its content, in the case of an article.
There could also be questions about authenticity: if a misleading idea suddenly appears in posts across the Internet, is it spreading organically, or have the posts been generated by AI to create the appearance of real traction?
Tools to identify whether a piece of text was written by AI have come up in recent months, including one by OpenAI, the company behind ChatGPT. It uses an AI model trained to spot differences between generated and human-written text. When OpenAI tested the tool, it correctly identified AI text in only about half of the writing samples.
Identifying generated text, experts say, is becoming increasingly difficult as software like ChatGPT continues to advance and turns out text that is more convincingly human. OpenAI is now experimenting with a technology that would insert special words into the text ChatGPT generates, making it easier to detect. The technique is known as watermarking.
The watermarking method that OpenAI is exploring is similar to one described in a recent paper by researchers at the University of Maryland, US, said Jan Leike, the head of alignment at OpenAI. If someone tried to remove a watermark by editing the text, they would not know which words to change. And even if they managed to change some of the special words, they would most likely only reduce the total percentage by a couple of points.
Tom Goldstein of the University of Maryland, US, and co-author of the watermarking paper, said a watermark could be detected even from “a very short text fragment”, such as a tweet. By contrast, OpenAI’s detection tool requires at least 1,000 characters.
Like all approaches to detection, however, watermarking is not perfect, Goldstein said. OpenAI’s current detection tool is trained to identify text generated by 34 different language models, while a watermark detector could only identify text that was produced by a model or chatbot that uses the same list of special words as the detector itself.
That means that unless companies in the AI field agree on a standard watermark implementation, the method could lead to a future where questionable text must be checked against several different watermark detection tools.
“The idea that there’s going to be a magic tool, created either by the vendor or by a third party, that’s going to take away doubt — I don’t think we’re going to have the luxury of living in that world,” said David Cox, a director of the MIT-IBM Watson AI Lab.
Some experts believe that OpenAI and other companies building chatbots should come up with solutions for detection before they release AI products, rather than after. OpenAI launched ChatGPT at the end of November, for example, but did not release its detection tool until about two months later.
By that time, educators and researchers had already been calling for tools to help them identify generated text. Many signed up to use GPTZero, built by a Princeton University student and released in January. “We’ve heard from an overwhelming number of teachers,” said Edward Tian, the student who built GPTZero.
When AI software like ChatGPT writes, it considers many options for each word, taking into account the response it has written so far and the question being asked.
It assigns a score to each option, which quantifies how likely the word is to come next, based on the amount of human-written text it has analysed. ChatGPT, built on “a large language model”, then chooses a word with a high score.
The model’s output is often so sophisticated that it can seem like the chatbot understands what it is saying — but it does not. Every choice it makes is determined by complex math and huge amounts of data. So much so that it often produces text that is both coherent and accurate. But when ChatGPT says something that is untrue, it inherently does not realise it.
NYTNS