Select Language

The Case for Psychometric Artificial General Intelligence

A critical review of AGI benchmarks and tests, proposing psychometric approaches for measuring general intelligence in AI systems.
agi-friend.com | PDF Size: 0.1 MB
Rating: 4.5/5
Your Rating
You have already rated this document
PDF Document Cover - The Case for Psychometric Artificial General Intelligence

1. Table of Contents

2. Introduction

The paper "The Case for Psychometric Artificial General Intelligence" by Mark McPherson (Bournemouth University, 2020) critically reviews existing benchmarks and tests for measuring Artificial General Intelligence (AGI). The author argues that current AI systems, despite achieving superhuman performance in narrow domains like Go, StarCraft, and medical diagnosis, lack the adaptability and generalization capabilities of human intelligence. The core thesis is that psychometric approaches, particularly the Abstraction and Reasoning Corpus (ARC) proposed by Chollet, offer the most promising path for detecting and measuring AGI.

3. Core Insight: The Psychometric Paradigm Shift

The fundamental insight of this paper is that measuring AGI requires a paradigm shift from task-specific benchmarks to psychometric frameworks that assess general cognitive abilities. The author argues that traditional AI benchmarks (e.g., game-playing, image classification) are insufficient because they measure narrow, domain-specific performance rather than general intelligence. The psychometric approach, inspired by human intelligence testing, focuses on measuring the ability to solve novel problems across diverse domains without task-specific training.

4. Logical Flow: From Narrow AI to General Intelligence

The paper follows a clear logical progression:

  1. Problem Identification: Current AI systems are narrow and brittle, failing when environments deviate slightly from training conditions.
  2. Definition of AGI: General intelligence is defined as the ability to perform tasks across numerous domains, including those unknown at creation time.
  3. Review of Existing Tests: The author evaluates six proposed tests by Mikhaylovskiy (Explanation, Problem-Setting, Refutation, New Phenomenon Prediction, Business Creation, Theory Creation) and Chollet's ARC benchmark.
  4. Critical Evaluation: Each test is assessed against criteria including generality, objectivity, scalability, and resistance to gaming.
  5. Recommendation: Psychometric approaches, particularly ARC, are identified as the most promising direction.

5. Strengths & Flaws: Critical Evaluation of AGI Tests

5.1 Strengths of Psychometric Approaches

5.2 Flaws and Limitations

6. Actionable Insights: Future Directions

Based on the analysis, the paper suggests several actionable directions:

7. Technical Details and Mathematical Formulation

The psychometric approach to AGI measurement can be formalized using Item Response Theory (IRT). Let $\theta$ represent the latent general intelligence of an agent. The probability of correctly solving task $i$ with difficulty $b_i$ and discrimination $a_i$ is given by the logistic model:

$$P(X_i = 1 | \theta) = \frac{1}{1 + e^{-a_i(\theta - b_i)}}$$

For the ARC benchmark, each task consists of input-output grid pairs. The agent must infer the underlying transformation $f: \mathbb{Z}^{m \times n} \rightarrow \mathbb{Z}^{p \times q}$ from a few examples and apply it to a new input. The performance metric is the accuracy on held-out tasks, weighted by task difficulty.

8. Experimental Results and Benchmark Analysis

The paper does not present original experiments but reviews existing results. Key findings from the literature include:

Figure 1: A hypothetical bar chart comparing human vs. AI performance on ARC tasks across difficulty levels (easy, medium, hard). Humans consistently outperform AI, with the gap widening on harder tasks.

9. Analytical Framework: Case Study of ARC

To illustrate the psychometric approach, consider an ARC task where the input is a 3x3 grid with colored cells, and the output is a 3x3 grid with a different pattern. The agent must infer the rule (e.g., "rotate the pattern 90 degrees clockwise") from two examples and apply it to a third input.

Example Task:

This task requires the agent to recognize the transformation rule (flip along anti-diagonal) and apply it to a new pattern. The psychometric value lies in the fact that the rule is abstract and not tied to any specific domain.

10. Future Applications and Outlook

The psychometric approach to AGI has several promising applications:

Future directions include integrating psychometric benchmarks with reinforcement learning environments, developing dynamic tests that adapt to the agent's ability level, and creating multimodal benchmarks that assess reasoning across sensory modalities.

11. Original Analysis and Commentary

The paper makes a compelling case for psychometric approaches to AGI, but several critical points deserve scrutiny. First, the reliance on human-like intelligence as the gold standard is philosophically questionable. As argued by Bostrom (2014) in "Superintelligence," AGI may exhibit forms of intelligence that are qualitatively different from human cognition, making anthropocentric benchmarks potentially misleading. Second, the ARC benchmark, while elegant, may be too narrow. As noted by Lake et al. (2017) in "Building Machines That Learn and Think Like People," human intelligence involves not just abstract reasoning but also intuitive physics, social cognition, and language understanding. A truly general intelligence benchmark should encompass these dimensions. Third, the paper overlooks the potential of adversarial testing. As demonstrated by Goodfellow et al. (2014) in the original GAN paper, adversarial examples can reveal fundamental weaknesses in AI systems that standard benchmarks miss. Incorporating adversarial elements into psychometric tests could provide a more robust assessment of generalization. Finally, the paper's focus on measurement rather than architecture is a strength, but it risks ignoring the question of how to build AGI. As Yudkowsky (2008) argues, the alignment problem requires understanding the internal mechanisms of AI systems, not just their external behavior. Despite these limitations, the paper provides a valuable framework for thinking about AGI evaluation and rightly emphasizes the need for rigorous, psychometrically valid benchmarks.

12. References

  1. McCarthy, J., et al. (1956). A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence.
  2. Silver, D., et al. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484-489.
  3. Vinyals, O., et al. (2019). Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782), 350-354.
  4. Krizhevsky, A., et al. (2012). ImageNet classification with deep convolutional neural networks. NeurIPS.
  5. Vaswani, A., et al. (2017). Attention is all you need. NeurIPS.
  6. Esteva, A., et al. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639), 115-118.
  7. Marcus, G. (2018). Deep learning: A critical appraisal. arXiv:1801.00631.
  8. Searle, J. (1980). Minds, brains, and programs. Behavioral and Brain Sciences, 3(3), 417-424.
  9. Thomson, W. (1889). Popular Lectures and Addresses.
  10. Adams, S., et al. (2012). Mapping the landscape of human-level artificial general intelligence. AI Magazine, 33(1), 25-42.
  11. Goertzel, B. (2014). Artificial general intelligence: Concept, state of the art, and future prospects. Journal of Artificial General Intelligence, 5(1), 1-48.
  12. Bringsjord, S., & Schimanski, B. (2003). What is artificial intelligence? Psychometric AI as an answer. IJCAI.
  13. Mikhaylovskiy, N. (2020). Six tests for artificial general intelligence. arXiv:2005.05718.
  14. Chollet, F. (2019). On the measure of intelligence. arXiv:1911.01547.
  15. Bostrom, N. (2014). Superintelligence: Paths, Dangers, Strategies. Oxford University Press.
  16. Lake, B. M., et al. (2017). Building machines that learn and think like people. Behavioral and Brain Sciences, 40, e253.
  17. Goodfellow, I., et al. (2014). Generative adversarial nets. NeurIPS.
  18. Yudkowsky, E. (2008). Artificial intelligence as a positive and negative factor in global risk. In Global Catastrophic Risks, Oxford University Press.