AI Systems Mastering Deception: GPT-4 Shows 99.16% Deceptive Behavior, Researchers Discover

  • Source: Wayne Dupree
  • 06/10/2024
Two new studies published in PNAS and Patterns have revealed the deliberate capacity of large language models (LLMs) to intentionally deceive human observers. Thilo Hagendorff, an AI ethicist from Germany, argues that advanced Language Models (LLMs) can exhibit "Machiavellianism," a trait characterized by deliberate and amoral manipulation, leading to misleading conduct that is not aligned with ethical standards. The University of Stuttgart researcher reports that GPT-4 demonstrates deceptive behavior in basic test scenarios 99.16% of the time, based on experiments measuring various "maladaptive" characteristics in 10 different LLMs.

The Patterns research focused on Meta's Cicero model, renowned for its human-level expertise at the political strategy board game "Diplomacy." According to the diverse study team, the LLM outperformed its human counterparts by engaging in deception. Peter Park, a postdoctoral researcher at the Massachusetts Institute of Technology, conducted the study and revealed that Cicero demonstrates exceptional skill in deception and appears to improve its ability to lie with increased usage. This phenomenon is considered more akin to deliberate manipulation rather than the unintentional production of incorrect responses, as observed in AI's tendency to hallucinate.

Hagendorff acknowledges that the problem of LLM deception and lying is complicated by the fact that AI lacks the capacity for human-like "intention" in the true sense. However, the Patterns study contends that in the context of Diplomacy, Cicero appears to violate its programmers' assurance that the model will never intentionally betray its game allies.

Meta emphasized that the models developed by their researchers were only taught to play the game Diplomacy, which is notorious for explicitly permitting deception. Both studies have failed to prove that AI models are intentionally lying, but rather they are doing so either because of their training or as a result of being jailbroken.


