From Wikipedia, the free encyclopedia

In artificial intelligence (AI) and philosophy, the AI control problem is the issue of how to build a superintelligent agent that will aid its creators, and avoid inadvertently building a superintelligence that will harm its creators. Its study is motivated by the notion that the human race will have to solve the control problem before any superintelligence is created, as a poorly designed superintelligence might rationally decide to seize control over its environment and refuse to permit its creators to modify it after launch. In addition, some scholars argue that solutions to the control problem, alongside other advances in AI safety engineering, might also find applications in existing non-superintelligent AI.

Major approaches to the control problem include alignment, which aims to align AI goal systems with human values, and capability control, which aims to reduce an AI system's capacity to harm humans or gain control. Capability control proposals are generally not considered reliable or sufficient to solve the control problem, but rather as potentially valuable supplements to alignment efforts.

Problem description