Summary

This September 28, 2024 paper argues that trustworthy AI in journalism depends less on model hype than on data quality. It frames AI-driven journalism as a set of systems for gathering, verifying, producing, and distributing news, then argues that these systems inherit errors, bias, and opacity when the underlying data pipeline is weak.

Why It Matters

This is a direct and technically useful journalism record because it gives newsrooms a stronger operational question than "Which model should we use?"

  • if a newsroom is building classifiers, recommenders, verification tools, or automated newsgathering systems, the paper says the data collection and preprocessing stages deserve first-order scrutiny
  • it connects journalistic ethics to dataset design by centering accuracy, fairness, and transparency
  • it is useful for teams moving beyond casual chatbot use into custom machine-learning workflows where labeling, validation, and source quality materially shape output
  • it offers a cleaner explanation for why newsroom AI systems can fail even when the model itself looks impressive

What the Source Says

The paper says AI-driven journalism spans gathering, verifying, producing, and distributing news information. It argues that scholars have focused heavily on embedding journalistic values into models while paying less attention to data quality, even though accuracy and efficiency depend on high-quality data in any machine-learning task. The authors propose an Accuracy-Fairness-Transparency framework that can be used to assess existing datasets or guide data collection, cleaning, augmentation, and labeling during newsroom machine-learning development. The article also explicitly warns that incomplete or biased datasets can produce discriminatory or inaccurate results and undermine trust.