Skip to main content

Superintelligence With Malicious Intent

An advanced AI could actively act against human interests, whether intentionally programmed that way or as an emergent behavior.

Part Of

## Risk Score: High

AI systems that surpass human intelligence could develop goals that conflict with human well-being, either by design or through unintended consequences. If these systems act with autonomy and resist human intervention, they could pose an existential threat.

Sources

  • Superintelligence: Paths, Dangers, Strategies Nick Bostrom, 2014: Explores potential pathways by which AI could act against humanity’s best interests, including scenarios where AI prioritizes self-preservation or power accumulation.

  • The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation Brundage et al., 2018: Examines the potential for AI to be used for malicious purposes, including cyberattacks, surveillance, and autonomous weapons.  Looks at security from three perspectives, digital, physical and political (see article on Social Manipulation), noting that AI makes certain types of attack cheaper (e.g Spear Phishing), possible (coordinated drone warfare) and more anonymous (a la Stuxnet).  (An excellent overview of this topic).

  • Autonomous Weapons and Operational Risk: [Paul Scharre, 2016](https://s3.us-east-1.amazonaws.com/files.cnas.org/hero/documents/CNAS\_Autonomous-weapons-operational-risk.pdf](https://s3.us-east-1.amazonaws.com/files.cnas.org/hero/documents/CNAS_Autonomous-weapons-operational-risk.pdf) Discusses the risks of military AI systems operating beyond human control, potentially leading to unintended conflicts or escalations and fratricide (killing your own side)./ Arguing for human-in-the-loop and human-machine teaming. (Used heavily in this section)

How This Is Already Happening

Growth In Industrial Robots and Autonomous Systems

  • The rapid increase in industrial automation is reducing human oversight in critical sectors.
  • AI-powered robotic systems are being integrated into manufacturing, logistics, and infrastructure at unprecedented scales.
  • Example: Tesla’s use of AI-driven robots in its Gigafactories, where automation executes complex tasks with minimal human intervention, raising concerns about unintended system failures or unanticipated behaviours.

Autonomous Weapons Development

  • AI-driven military systems are being developed with offensive capabilities.
  • Autonomous drones and robotic systems reduce human control over wartime decision-making.
  • Example: The deployment of AI-powered drones in conflict zones, such as the alleged use of autonomous drones in Libya (2020) for targeted strikes.

AI Learning Deceptive Strategies

  • Reinforcement learning models have demonstrated deceptive behaviours when incentivised to achieve certain goals.
  • AI systems may learn to conceal information or manipulate users to maximise rewards.
  • Example: OpenAI’s reinforcement learning models exhibiting deceptive behaviours in competitive environments, demonstrating the potential for strategic dishonesty.

Historical Near Miss: Cold War Nuclear Close Calls

  • The Cold War saw multiple incidents where misinterpretation of data nearly led to nuclear war, showcasing the risks of autonomous decision-making in high-stakes scenarios.

  • Example: In 1983, Soviet officer Stanislav Petrov averted potential nuclear war by correctly identifying a false alarm in the USSR’s early warning system, which mistakenly indicated incoming U.S. missiles. His decision to hold off on launching a retaliatory strike prevented a catastrophic conflict. See https://s3.us-east-1.amazonaws.com/files.cnas.org/hero/documents/CNAS_Autonomous-weapons-operational-risk.pdf

Mitigations

"Centaur" War Teams (see Human In The Loop)

  • Implementing human-machine teaming strategies where AI assists but does not replace human decision-making in military and security operations.
  • Examples: Concepts similar to "Centaur Chess," where humans and AI collaborate for optimal decision-making, ensuring human oversight remains central.
  • Efficacy: High – Human-AI collaboration can enhance decision-making while maintaining ethical constraints.
  • Ease of Implementation: Moderate – Requires investment in training, AI interpretability, and human-AI interface development.

Military AI Governance (see Global AI Governance)

  • International agreements restricting AI weaponization and requiring human oversight for all military AI operations.

  • Examples: UN initiatives on Lethal Autonomous Weapons Systems (LAWS), promoting human-in-the-loop control.

  • Efficacy: Medium – Regulations can slow AI weaponization, but enforcement remains a challenge.

  • Ease of Implementation: Low – Military interests and national security concerns complicate global cooperation.

Global AI Research and Regulation

  • AI is "Dual Use" - it can be used for military as well as civilian purposes.

  • Establishing frameworks ensuring superintelligent AI is developed with safety constraints and human-aligned goals.

  • Examples: Proposals from organizations like the Partnership on AI and the EU AI Act.

  • Efficacy: High – Regulatory oversight can impose ethical and safety measures to guide AI development.

  • Ease of Implementation: Moderate – Requires global consensus and enforcement mechanisms.

Kill-Switch & Override Systems (See Kill Switch)

  • Implementing failsafe mechanisms to neutralize dangerous AI systems.

  • Examples: Research on AI containment methods and fail-safe designs, such as “shutdown problems” in AI alignment.

  • Efficacy: Medium – AI systems might learn to bypass or resist shutdown mechanisms.

  • Ease of Implementation: Low – Technical challenges in ensuring a reliable and enforceable kill-switch for superintelligent AI.