person sitting in a coffee shop, writing

Post

Plagiarism and fact-checking: How to handle AI writers' most glaring gaps

Plagiarism & fact-checking: How to handle AI writers’ most glaring gaps

April 20, 2023

On November 30, 2022, content creation as we know it fundamentally changed with the arrival of ChatGPT. Content marketers, PR pros and students everywhere rejoiced at the prospect of simplifying the complex, critical, yet often evasive skill of researching a topic and writing about it.

While not the first piece of technology to use the generative pre-trained transformer (GPT) model, ChatGPT and its 175 billion linguistic parameters were the most advanced Natural Language Processing (NLP) tool the market had ever seen. Fast forward five months, and new tools that leverage the GPT model to create text, images, music, art, etc. spring up almost daily.

It’s a scary time to be a content creator as flashy new robots seemingly threaten human jobs, but it’s no secret that AI writing tools are inherently flawed. The internet is filled with examples of AI writers going rogue on a routine task and generating output that is covertly—and in some cases, overtly—racist, sexist, ableist and otherwise inappropriate.

It’s easy to recognize and reject obviously toxic content. But it’s much more complicated when those editorial transgressions are more subtle.

Identifying plagiarism and spurious sourcing becomes even more amplified when human authors and editors forsake their training and blindly trust the tool’s output. Regardless of how smart AI writing tools seem, they are incapable of original, insightful thought. They regurgitate and reorder information they find or are fed, making them particularly vulnerable to committing these two content-related cardinal sins.

Plagiarism

Plagiarism is probably the most cited fault when AI evangelists talk about where the technology falls short. ChatGPT and other AI writing tools don’t necessarily cut and paste large swaths of content from third-party sources and try to pass it off as something it’s not.

Their brand of plagiarism is stealthier.

They scrape information they encounter, reorder it, reword it (depending on the tool) and stitch it together as original thought.

Frankenstein-ing content in this way is a lesser-known variety of plagiarism but is plagiarism, nonetheless. First, it’s dishonest and can damage your brand reputation. Second, it contradicts Google’s quality content guidelines and can lead to search engine penalties and poor rankings.

Avoiding plagiarism with AI writers

Several AI writing tools on the market claim to have integrated plagiarism checkers in their algorithms, but it’s a lofty goal, and success has been inconsistent. The tools can spot text matching. That’s not the issue. It’s nuanced elements like themes, writing styles and paraphrasing that create a tougher task.

We recommend a two-step process using a third-party plagiarism checker. Whatever output you receive from your AI writer (or your employee, colleague, student, or whomever), incorporate running it through another tool to double-check the work during a content review. It may catch problems the first tool missed and help you feel better about publishing.

Here are some of the top options available:

Grammarly: We love this tool and have found its tone checker, thoroughness and seamless integration with web browsers and the Microsoft suite second to none. There’s a reason Grammarly is the market leader in this space. It offers a free version, but the Business level is affordable and far more robust.
Turnitin: A stalwart of plagiarism checkers and a favorite among teachers and professors, Turnitin has been upholding academic integrity for more than 20 years. Two areas where this platform shines are detecting text originality and concept similarity.
Quetext: An up-and-comer in the plagiarism detection world, Quetext relies on proprietary DeepSearch technology that uses contextual analysis to determine the statistical likelihood of word and phrase usage. It also identifies “fuzzy” matches in which several words might have been changed to disguise plagiarism.
GPTZero: There is a ton of buzz around this tool right now. Built in 2022 by a Princeton computer science student, the app purports to detect any text written by GPT-model bots. It relies on content “perplexity” and “burstiness” as its primary detection parameters.

The app makes two assumptions about the content is scans: First, if the bot is perplexed by the content, it is more complex, more nuanced, and more likely to have been written by a human. Because AI writers are trained on existing content, familiarity with it suggests a greater likelihood AI created it. The second assumption, burstiness, compares sentence variation. Humans tend to mix long, complex sentences with short, “bursty” ones, whereas AI-written sentences tend to be more uniform.

The fact is, plagiarism-detecting tools are imperfect. Plagiarized copy may slip through the cracks if the author is persistent enough. But those seeking to use AI writers in nefarious ways, beware, the industry is on to you. ChatGPT developer, OpenAI, has acknowledged the plagiarism problem and is developing a way to “watermark” GPT-generated copy with a secret signal identifying its origin.

Fact-checking & sourcing

Media companies from Forbes and Bloomberg to The Washington Post and The Associated Press have experimented with using AI writing tools to plug gaps in their news coverage. Traditionally, the “stories” the machines wrote were financial earnings reports, election updates, local events and sports recaps.

Put another way, these behemoths of global media weren’t entrusting AI writers with breaking news or investigative reporting. But they did publish pieces of information created or conceived largely by robots.

The word “largely” here is an important distinction because each media outlet uses humans to validate and legitimize the AI’s work—an extra step that has proven crucial.

CNET and Men’s Journal recently faced public backlash and issued corrected articles after their AI writing tool created material that proved inaccurate and plagiarized—and they published it without properly fact-checking. Tech giant CNET used an AI writer to publish financial advice, which is a tricky topic itself. But, perhaps more alarming, human doctors found “serious errors” in MJ‘s article on low testosterone in men. Let’s all agree that robots shouldn’t be giving unchecked financial and health advice.

Fact-checking your AI writer’s output

Fact-checking is all about sources. It’s about using accurate information to support the claims made in your blog post, article, white paper, homepage, etc. Simply put, fact-checking is the editorial equivalent of showing your work, and it’s where AI writing tools often falter.

As regurgitators of publicly available internet content, AI writers are inherently limited to a relatively small slice of information. Without the ability to verify third-party information, compare data sources or apply reason to the veracity of source material, AI writing can be rife with erroneous facts, figures, quotes, data, etc. Erroneous information is one thing, but when AI bots publish wholly fabricated accounts of a law professor sexually harassing students or deepfake pictures of police misconduct while arresting a former president, we enter a far more chilling reality.

Suppose you believe that AI writing is here to stay and the bulk of future internet content will be conceived or created by generative AI. In that case, you must concede that this generation of AI writers is essentially training the next one. This new reality only underscores the need for thorough fact-checking from the start.

MIT’s Knight Science Journalism Program offers a free and comprehensive set of modules called “Fact-Checking 101” that delve deeply into the who, what, when and how of all things fact-checking. However, most organizations outside of big media don’t have the time or budget to achieve and sustain KSJ’s level of thoroughness.

If that sounds like you, you’re in luck.

We leaned on KSJ, Contently and our own experience to build an 8-step guide for fact-checking when time and/or budget is limited.

Define the sources your organization deems trustworthy, reliable and reputable for the industry you’re writing about.
If you don’t have solid, reputable sources to support big claims, don’t make them.
Rely less on human sources and more on print/digital ones.
- For example: Government reports, academic papers, official websites, court documents, etc., have built-in authority and can be cited with confidence.
Avoid citing secondary sources.
- For example: Don’t cite a news story that cites a government study. Find the original study and cite it.
Confirm common errors like titles, degrees, dates and the spelling of names, places, etc.
Differentiate the writing process from the fact-checking one.
- For example: Take a break between writing and fact-checking. It can be long or short but try to clear your headspace. It may also help to change the font, switch rooms or print out the piece before you start fact-checking.
Read like an antagonist.
- For example: Ask yourself how your biggest critic, competitor or corporate lawyer would perceive your writing. Are there leaps in logic? Are there potentially problematic claims or phrases? Have you used shoddy sources? If the answer to any of these questions is yes, identify new sources.
Lean on fact-checking sites.
- For example: Factcheck.org, PolitiFact, Snopes and Fact Checker are well-respected sites built on debunking fake news and stopping the spread of misinformation. They all lean a bit political, so perhaps not useful for all subject matter.

A final word

The robots won. We lost. Game over for human writers everywhere.

We jest. ???? It is actually hard to imagine a world where generative AI creates all content, everywhere, without human oversight. Our colleague’s recent blog post, 4 hot takes on generative AI tools, perfectly sums up the state of AI writers:

“Generative AI tools can help you come up with more ideas faster, and then refine those ideas to conform to content best practices faster yet,” the post states. “But it still takes a trained eye to cull through AI outputs to determine what works, what needs a little tweaking, and what’s just blatantly wrong.”

AI writing tools can do many things humans cannot, to be sure. But they are inherently limited in the critical thinking needed to avoid plagiarism and accurately fact-check something, and they should not be trusted to do so. Until someone builds a sentient machine, writers, content managers and marketing leaders can rest easy, even if it means tweaking our content creation processes to include AI output.

Who Should Be Our Next Guest?

Contact us at HCMpodcast@inprela.com with your suggestions for guests who are making waves in healthcare marketing.

Post

Plagiarism and fact-checking: How to handle AI writers' most glaring gaps

Plagiarism & fact-checking: How to handle AI writers’ most glaring gaps