An experiment to evaluate poetry from AI

Published: January 27, 2025

Brian Porter and Edouard Machery from the University of Pittsburgh conducted a study comparing people's perception of biological and machine poetry.

For biological poetry, they selected various well-known authors like Shakespeare, Byron, and Emily Dickinson. As the machine, they used ChatGPT 3.5, which is essentially an outdated model from OpenAI that modern models surpass in many ways. It was used without any additional tuning.

The study consisted of two experiments. In the first, participants were given 10 poems to evaluate—5 from each source. Their goal was to determine authorship—human or AI. As a result, participants guessed correctly 46.6% of the time, which is slightly below the level of random guessing.

In the second experiment, participants were divided into three groups. One group was told the poems were written by AI, another that they were written by humans, and the third wasn't given any authorship information. Poems were evaluated according to 14 criteria. Ultimately, poems that participants believed were written by AI received lower ratings. When authorship wasn't indicated, AI received higher ratings.

Overall, these conclusions can be drawn:

Participants were unable to distinguish AI poetry from human poetry
There was a bias against AI
Poems created by AI were more often perceived as human and received higher ratings

To explain the "more human than human" effect, researchers suggest that participants preferred AI poems due to their simplicity and straightforwardness. Such effects are also often explained nowadays by superstimuli.