We’ve been looking at different ways we can improve the site for readers. One suggestion has been to use ChatGPT, or similar, to write some articles. If we can save time writing, then we can cover more topics. In theory, it might be possible. Scientific papers are highly stylised forms of writing, and news items can be formulaic. If ChatGPT could convert from one format to another, it could produce readable blog posts at scale. Reality is more complicated.
We have worked out a workflow that we think improves speed without compromising accuracy. However, if you were to use an AI text-checker like GPTZero you might not see that. Below is a description of how we’re using AI on the site, and why it might not be traceable.
Of all the problems with AI writing, the first you’re likely to come across is that GPT-3, the basis for most of these tools, has a limited memory. It can read and analyse an abstract but will struggle with a full academic paper. I could ask an AI to read the abstract and produce a blog post from that – but the abstract is not the paper. Things get missed. However, asking an AI to produce a story from an abstract can reframe what the authors think is important into a more readable format, and this can quickly suggest if there’s something striking about the paper. This is how we’re using AI. It can write the first draft of a story.
This is how I’ve been using AI recently, but there are several steps between this first draft and the final draft. AI, at the moment, has more problems.
Accuracy is an issue. I saw a video with a tech-bro claiming you can use Chat-GPT for fact-checking. As his example, he asked when Napoleon invaded Egypt and got an answer he thought sounded plausible. He should have checked by asking about a ‘fact’ he knew was wrong. The best description I’ve read about Chat-GPT is that it’s ‘mansplaining-as-a-service’. Everything that GPT creates needs to be checked line-by-line.
A common complaint is that GPT is wordy. I’m wordy. That’s why I like the Hemingway editor. Hemingway helps trim out excess words. GPT takes wordiness to another level. In its own words:
As an AI language model, Chat-GPT has been trained on a vast amount of text data, and it has learned to generate responses that are often verbose and wordy. While this can sometimes be useful for conveying detailed information, it can also lead to excessively long and convoluted sentences that may be difficult for humans to follow or comprehend. As such, it is important for Chat-GPT to be programmed to balance clarity and concision in its responses, so that it can communicate effectively with users.
For this reason, each sentence is edited one at a time, which helps with marking what needs fact-checking. After doing this, the post is usually down to 200 to 250 words. Next, I start to increase the length by expanding on the story’s focus. This might mean talking more about the problem tackled or the novel method used in the study. At this stage, I’m also looking for a quote from the paper where the authors highlight what they think is important.
More chunks of the AI’s writing can disappear when I move sections of the story around and re-write. I’ll do this because what GPT thinks is interesting in the paper might differ from what I believe is the most exciting thing. While GPT might be able to write an article for you, it cannot understand an article for you. While it connects words into sentences quickly, it’s less good at connecting paragraphs into an article.
Another factor is that GPT is not very good at drawing connections between papers at the moment. If five papers on the origins of angiosperms came out in a month, this method would treat each article in isolation, when one of the most interesting features would be how much the papers support or contradict each other. Where I can, I’ll draw connections to other papers.
Once there’s a fact-checked text, it gets loaded into Grammarly, an AI-powered grammar and spell checker. I’ll accept about half of Grammarly’s suggestions, leaving it to complain about the quote from the paper. I’ll then look for an illustration or two.
If an author has published a press release, there’s probably a good photo to use. If they haven’t, there might be a helpful image in the paper. Sometimes there might not be a usable image. In this case, I’ll look for a photo from Wikipedia or Canva; if that fails, I look for a photo to illustrate a concept in the paper. If that fails, I’ll use Stable Diffusion, an AI art generator, to create artwork because all the posts need at least one image.
Once the text and art are finished and ready to publish, the final step is to produce the audio file. The earlier files were created with Google’s WaveNet. Now we use Lovo.ai. This is because recording human audio takes a lot of time and editing. Using AI audio means all the time spent correcting human recordings can be spent writing the next blog post instead.
If the post has an AI seed, from 2023, it will be credited to Fi Gennu if it’s about a paper from an Annals of Botany Company journal, or Dale Maylea if it’s from another journal. This tagging of the posts helps keep track of how much is getting written by different methods.
The result is that a blog post might not appear to have any AI writing in it, but there will likely be some AI involvement somewhere. The goal is to release some time so that the hours saved can be used highlighting more research, or else write other kinds of articles that readers might find helpful. If you’re interested in the time-saving we estimate this cuts writing time from 4-6 hours for a typical blog post down to 2-4 hours. There is no magic generate-an-accurate-article button yet.