A reporting-assistant test found ChatGPT weak on sourcing but useful for code

This March 2024 newsroom experiment remains one of the clearest concrete cautionary records about AI in reporting work.

Summary

This March 2024 newsroom experiment remains one of the clearest concrete cautionary records about AI in reporting work. Jon Keegan tested ChatGPT as an assistant on an East Palestine derailment scenario and found that the tool sounded confident while failing on sourcing discipline, coordinates, and evidentiary traceability. The useful nuance is that the failure was not universal: Keegan also found the model more promising as a coding helper than as a reporting brain.

Why It Matters

This is a durable direct journalists story because it does not stop at abstract warnings.

it shows what goes wrong when a reporting workflow depends on a model that cannot produce reliable receipts for where it got information
it captures the hidden labor cost of trying to force a chatbot into a rigorous reporting workflow through repeated prompts and corrections
it distinguishes between bad and better use cases, warning against source-heavy reporting assistance while preserving bounded uses such as code generation
it documents how one newsroom translated the test into policy by banning AI-created stories or artwork and requiring disclosure, checking, and tool review

What the Source Says

Keegan says he spent substantial time trying to use ChatGPT as an assistant for a hypothetical East Palestine reporting workflow and that the process "didn't go so well." He describes the model giving poorly sourced information, imprecise locations, and answers based on vague "general knowledge." He says the sessions were labor-intensive because he had to keep figuring out where the system got its information and redirect it with precise instructions. At the same time, he says the model was more useful when asked to generate simple Python code. The article ends by describing The Markup's updated AI policy: no publishing of AI-created stories or artwork, mandatory labeling or disclosure, rigorous checking, and case-by-case review of security, privacy, and ethics.

A reporting-assistant test found ChatGPT weak on sourcing but useful for code

Summary

Why It Matters

What the Source Says

Attribution

Sources