AI with Aish

AI with Aish

Context Engineering for LLM Apps: Prompts, System Messages, Tools, and Memory

Aishwarya Srinivasan's avatar
Aishwarya Srinivasan
Jan 26, 2026
∙ Paid

When ChatGPT launched in late 2022, a new discipline emerged: prompt engineering. The ability to craft the perfect input to coax the right output from an AI became one of tech’s most sought-after skills, with some roles commanding salaries over $300,000.

The techniques were genuinely useful. Zero-shot prompting gave direct instructions. Few-shot prompting provided examples for the model to learn from. Chain-of-thought prompting—asking the model to “think step by step”—boosted arithmetic accuracy from 18% to over 78% in research studies.

Prompt engineering felt accessible and almost magical. You could “program in prose,” crafting clever one-liners that made AI do impressive things. Communities formed around sharing “secret prompts” like recipes.

But as developers moved from demos to production, cracks appeared. Single prompts couldn’t handle real-world complexity. A customer service bot needs customer history, inventory data, policies, and conversation context—no single prompt can contain all that. Developers discovered something counterintuitive: the information you provide often matters more than how you phrase your request. A mediocre prompt with the right context outperformed a brilliant prompt without it.

AI with Aish is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

The Term Is Coined

On June 19, 2025, Shopify CEO Tobi Lütke tweeted what many developers had been experiencing:

Six days later, Andrej Karpathy—former OpenAI researcher and Tesla AI Director—amplified the message:

The term spread rapidly because it accurately described what experienced developers had already been doing.

So what is Context Engineering?

Context engineering is designing, managing, and optimizing all information that influences an AI model’s behavior—not just the prompt, but everything the model sees before generating a response.

A diagram showing overlapping aspects of context engineering

Source: https://www.promptingguide.ai/guides/context-engineering-guide

Karpathy offers a useful mental model: think of an LLM like a CPU, and its context window as RAM. Your job is like an operating system—loading just the right code and data into working memory for each task.

This context comes from multiple sources: system messages defining persona and constraints, user queries, retrieved documents (RAG), tool outputs from APIs, conversation history, and persistent memory across sessions. Context engineering orchestrates all these pieces into what the model actually sees.

If you want to dive-deeper into Context Engineering and get hands-on experience to learn, checkout out recent workshop!

Prompt engineering is a subset of context engineering. The shift in terminology reflects how we think about building AI applications.

Let’s look at an example

Imagine building an AI scheduling assistant.

Prompt engineering approach:

You are a helpful scheduling assistant. Help the user schedule a meeting.

Context engineering approach:

System: You are a scheduling assistant. Be concise and professional.
Calendar: [User's calendar showing Tuesday is fully booked]
Contacts: [Jim is a key partner, prefers morning meetings]
Email History: [Previous emails with Jim show informal tone]
Tools: [send_calendar_invite, send_email, check_availability]
User: "Can you schedule something with Jim this week?"

The second approach gives the model everything needed to generate: “Hey Jim! Tomorrow’s packed on my end. Thursday AM free if that works? Sent an invite, lmk if it works.”

The magic isn’t in smarter models or clever prompts—it’s in providing the right context.

Core Components

  • System Prompts establish persistent persona, constraints, and rules. They encode stable behavior while user prompts provide task-specific details.

  • Dynamic Prompt Construction assembles context at runtime using templates that pull from multiple sources based on each specific request.

  • Retrieval-Augmented Generation (RAG) was among the first context engineering techniques—fetching relevant documents and injecting them into prompts. Effective RAG requires smart chunking, relevance ranking, and token budgeting.

  • Tool Integration extends LLMs beyond their training data. Function calling lets models request external actions (checking weather, querying databases), with results injected back into context. Research shows tool-integrated reasoning solves 60%+ more complex tasks.

  • Memory Systems mirror human cognition: short-term memory for conversation context, long-term memory persisting across sessions via vector databases, and episodic memory storing summaries of past interactions. Well-designed memory improves multi-turn accuracy by 30-50%.

  • Context Window Management balances completeness against constraints. Modern models offer 128K to 1M+ tokens, but longer contexts mean slower inference and higher costs. Effective engineering involves summarization, selective retrieval, and strategic information placement.

So why does this even matter?

The push toward context engineering accelerated with AI agents—systems autonomously using tools to accomplish complex tasks.

As Philipp Schmid observed: “Most agent failures are not model failures anymore, they are context failures.”

When agents go off track, it’s rarely because the model lacks capability. It’s because relevant information wasn’t included, tools weren’t properly defined, or context was cluttered with noise. Building reliable agents requires systematic thinking about what information they need at each step.

Integrating Tools

Agent frameworks like LangGraph orchestrate: retrieve tool results, inject into next prompt. Challenges include hallucinated calls; mitigate with strict validation. Real-world gain: tool-integrated reasoning solves 60%+ more complex tasks.

Here are some “Best Practices” for builders

Understanding context engineering conceptually is one thing—applying it effectively is another. These practices have emerged from teams building production LLM applications, learning what separates demos that impress from systems that actually work at scale.

User's avatar

Continue reading this post for free, courtesy of Aishwarya Srinivasan.

Or purchase a paid subscription.
© 2026 Aishwarya Srinivasan · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture