AI with Aish

AI with Aish

The AGI Staircase

Aishwarya Srinivasan's avatar
Aishwarya Srinivasan
Sep 10, 2025
∙ Paid

The discourse around Artificial General Intelligence (AGI) often feels speculative, mired in philosophical debates about sentience and consciousness. For the applied AI engineer, a more pragmatic, architectural framework is needed to understand the current state of the art and the roadmap ahead. AGI is not a monolithic endpoint to be reached with a single breakthrough; it is an incremental staircase of increasing cognitive complexity, autonomy, and system integration. Each step presents a new class of engineering challenges, from model-level fine-tuning to the architectural design of distributed, stateful systems.

This taxonomy, reportedly from OpenAI's internal discussions, breaks down the path to AGI into five distinct levels from a systems engineering perspective, offering a lens through which we can contextualize today's models and the monumental tasks that lie ahead.

AI with Aish is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Level 1: Foundational Models and Conversational Systems

This is the foundational stage, defined by a system's ability to engage in natural, human-like conversations. While the base models themselves, like GPT-4o, Claude 3, and Gemini 1.5 Pro, are Level 1 components, the conversational products you use daily are sophisticated, integrated systems.

These products are built on a complex backend pipeline that elevates the base model's capabilities. LangChain and LlamaIndex are popular frameworks used to architect these systems, incorporating key components such as:

  • Prompt Engineering: Guiding the base model's behavior with a hidden system prompt.

  • Memory and Context Management: Handling the conversation history to create the illusion of long-term memory.

  • Tool Use and Function Calling: This is the critical feature that turns a Level 1 language model into a component of a more powerful system. For example, when you ask ChatGPT to browse the web, the system uses a web browser tool. A request for a Python script triggers a call to its integrated code interpreter. Similarly, asking for an image calls an external model like DALL-E 3.


Level 2: Reasoners

Stepping beyond simple conversation, Level 2 models are defined by their ability to perform complex, expert-level problem-solving. This is where the engineering task shifts to enabling multi-step reasoning and logical inference. These systems often use advanced prompting techniques like Chain-of-Thought (CoT) or Tree-of-Thought (ToT) to break down a problem into a series of logical steps, allowing them to tackle complex tasks.

Real-world examples of Level 2 capabilities:

  • AlphaCode: A system developed by DeepMind that can solve competitive programming problems at a level comparable to an average human competitor.

  • AlphaDev: Another DeepMind project that discovered a more efficient sorting algorithm for the C++ standard library, outperforming human-tuned algorithms.

  • Advanced Data Analysis (ADA): The former "Code Interpreter" feature in ChatGPT, which allows the model to write and execute Python code in a sandboxed environment to perform data analysis and solve mathematical problems.


Level 3: Agents

At Level 3, the AI system gains true autonomy and can act on its own behalf to achieve a high-level goal. The core engineering challenge is to build a stateful, persistent AI that can manage its own workflow and adapt to changing circumstances. This involves a feedback loop where the agent plans, acts, observes the results, and refines its plan.

Real-world examples of agent frameworks and projects:

  • Auto-GPT and BabyAGI: Early, proof-of-concept open-source projects that demonstrated the core loop of an autonomous agent: a language model generates tasks, a planning module orders them, and an execution engine carries them out.

  • Microsoft AutoGen: A more robust framework that allows developers to build systems with multiple agents that can converse with each other to solve a problem.

  • LangChain and LlamaIndex: While also used for Level 1 systems, their core agentic capabilities (e.g., using a ReAct agent or conversational agent template) are designed to build Level 3 systems.

Keep reading with a 7-day free trial

Subscribe to AI with Aish to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Aishwarya Srinivasan
Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture