A lack of introspective ability is not a lack of corrigibility
This article argues that a lack of introspective ability in AI does not equate to a lack of corrigibility. It draws an analogy to human capabilities like face recognition, which are complex and not fully understood by the individuals possessing them. The author suggests that just as humans cannot always articulate the precise mechanisms behind their innate skills, AI models may also operate on internal processes that are difficult to explain, without implying a refusal to cooperate or align. AI
IMPACT Argues that AI's internal complexity, like human cognition, doesn't preclude alignment, impacting how we assess AI safety.