A practical guide to engineering software with AI agents — from mental models to production.
Written for developers who want to move beyond "vibe coding" and build with discipline.
- What Is AI Agentic Software Engineering?
- The Claude Code Mental Model
- The Context Layer — How Claude Knows What to Do
- The 5-Level Settings Hierarchy
- The 5-Phase Agentic Workflow
- Good Practices
- Common Mistakes to Avoid
- Glossary
Traditional AI-assisted coding means asking an AI to write a function and copying the result. You are the driver — the AI is autocomplete.
Agentic software engineering is different. Here, the AI is an agent — it can read your files, write code, run tests, execute terminal commands, call APIs, and manage git — all autonomously, in sequence, guided by goals you set.
You move from writing code to directing outcomes.
The shift looks like this:
| Traditional AI Coding | Agentic Engineering |
|---|---|
| "Write me a function that does X" | "Implement the reservation feature per requirements.md" |
| You copy-paste the result | The agent writes, runs, and tests the code |
| AI has no memory of your project | AI reads your project context every session |
| You manage the steps | You manage the goals |
| One prompt, one response | Multi-step autonomous execution |
The discipline comes from giving the agent the right context, the right constraints, and the right workflow — so it acts like a skilled junior developer who knows your project deeply, rather than a stranger who just joined today.
Claude Code is a CLI tool that runs in your terminal. It is not a chatbot — it is an agent that acts on your codebase.
┌─────────────────────────────────────────────┐
│ YOU │
│ Natural language goals and instructions │
└───────────────────┬─────────────────────────┘
│
┌──────────▼──────────┐
│ CONTEXT LAYER │
│ CLAUDE.md · Skills │
│ Settings · Hooks │
└──────────┬──────────┘
│
┌───────────────────▼─────────────────────────┐
│ CLAUDE CODE (Agent) │
│ Reads files · Writes code · Runs commands │
│ Calls APIs · Manages git · Runs tests │
└───────────────────┬─────────────────────────┘
│
┌───────────────────▼─────────────────────────┐
│ YOUR CODEBASE + TERMINAL │
│ Real files on disk. Real git commits. │
│ Real test runs. Nothing is simulated. │
└─────────────────────────────────────────────┘
The key insight: The context layer sits between you and the agent. It is what separates disciplined agentic engineering from random prompting. A well-configured context layer means Claude Code arrives at every session already knowing who you are, what you're building, how your code is structured, and what conventions to follow.
The context layer is made of four components that Claude Code reads before it does anything.
CLAUDE.md is a Markdown file at the root of your project. Claude Code reads it on every session start. Think of it as the briefing document you'd give a new developer joining your team.
What to put in it:
# Project: RestaurantBot
## What this is
A Spring Boot chatbot that answers menu, hours, location, catering,
and reservation questions for restaurant customers via a web widget.
## Tech stack
- Java 21, Spring Boot 3.x, Spring AI
- PostgreSQL (Flyway migrations)
- Maven
- JUnit 5 + Mockito for testing
- Anthropic Claude API via Spring AI
## Coding conventions
- Constructor injection only — never @Autowired on fields
- All API responses use ResponseEntity<ApiResponse<T>>
- Service layer never returns null — use Optional<T>
- Every public method on a service must have a test
## Module structure
- api/ REST controllers
- service/ Business logic
- domain/ Entities and value objects
- repository/ Data access
- config/ Spring config classes
- chatbot/ AI agent and MCP integration
## What Claude should never do
- Never modify migration files once committed
- Never skip tests — every feature needs at minimum one happy path test
- Never hardcode secrets — use application.properties or env varsGood practices for CLAUDE.md:
- Keep it under 300 lines — long files dilute focus
- Write it as if briefing a developer, not configuring a tool
- Update it when your architecture changes
- Commit it to git — it is part of your project
Skills are Markdown files stored in .claude/skills/. Each skill is a set of instructions Claude Code follows when you invoke it with a /command.
They turn repeatable workflows into one-word commands.
Example skill: /spec
File: .claude/skills/spec/SKILL.md
You are generating a requirements document for a new feature.
Ask the user:
1. What does this feature do? (one sentence)
2. Who triggers it? (user action, scheduled job, API call)
3. What are the edge cases?
4. What does success look like?
Then produce a file called requirements/FEATURE-NAME.md with:
- Summary
- Acceptance criteria (as testable statements)
- Out of scope
- Open questions
Common skills to build:
| Skill | Invoked with | What it does |
|---|---|---|
| Spec | /spec |
Generates requirements.md from a conversation |
| Plan | /plan |
Breaks requirements into ordered task list |
| Build | /build |
Implements one task with TDD |
| Review | /review |
Reviews a diff against requirements |
| ADR | /adr |
Creates an Architecture Decision Record |
Hooks are scripts that run automatically before or after Claude Code actions. They enforce rules without you having to say them every time.
Pre-tool hook (runs before Claude touches files):
# .claude/hooks/pre-write.sh
# Block writes to migration files
if echo "$CLAUDE_TOOL_INPUT" | grep -q "db/migration"; then
echo "ERROR: Do not modify committed migration files."
exit 1
fiPost-tool hook (runs after Claude runs tests):
# .claude/hooks/post-bash.sh
# Remind Claude to check test coverage if tests were just run
if echo "$CLAUDE_TOOL_INPUT" | grep -q "mvn test"; then
echo "Reminder: check coverage report at target/site/jacoco/index.html"
fiHooks are the difference between "I told Claude not to do that" and "Claude physically cannot do that."
Claude Code loads settings from five levels on every startup. Higher levels override lower ones. All five load and merge together.
Priority Level File Scope
──────── ──────────── ──────────────────────────────── ───────────────────────
5 ▲ Managed (enterprise / OS-level) Cannot be overridden
4 │ CLI args --model --effort --permission-mode Per session
3 │ Local .claude/settings.local.json Per project, gitignored
2 │ Project .claude/settings.json Per project, committed
1 ▼ User ~/.claude/settings.json Global defaults
~/.claude/settings.json — your personal defaults across every project on your machine. Set once, rarely touched.
{
"model": "claude-sonnet-4-5",
"theme": "dark"
}.claude/settings.json — committed to git. The whole team shares this.
{
"allowedTools": ["Read", "Write", "Bash", "Grep"],
"mcpServers": {
"postgres": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-postgres"],
"env": { "DATABASE_URL": "${DB_URL}" }
}
}
}.claude/settings.local.json — per-project, gitignored. Your personal secrets and local config live here.
{
"env": {
"DB_URL": "postgres://localhost:5432/restaurantbot_dev",
"ANTHROPIC_API_KEY": "sk-ant-..."
}
}Critical rule: Secrets go in
.local.json(gitignored). Conventions go insettings.json(committed). Never mix them.
Flags passed when starting a session. Override everything below for that session only.
# Use Opus for a complex architecture session
claude --model claude-opus-4-5 --effort high
# Run in read-only mode for a review session
claude --permission-mode read-onlySet by your organisation or system admin. Cannot be overridden by anything. Enforces compliance rules, locked-down tool access, approved API endpoints. Not relevant for solo projects — important to know exists for team environments.
This workflow is the core discipline of agentic software engineering. It prevents the "vibe coding" failure mode where you describe something vaguely, the agent generates something, you are not sure if it is right, and you go in circles.
Each phase has a clear input, a clear output, and a clear human checkpoint.
Input: Project idea
│
▼
┌─────────────────────┐
│ 1. Requirements │ ← You describe. Claude clarifies. Output: requirements.md
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ 2. Architecture │ ← You discuss trade-offs. Output: architecture.md, ADRs
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ 3. Task Generation │ ← Claude reads docs. Output: ordered task list
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ 4. TDD Implement. │ ← Red → Green → Refactor. Per task.
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ 5. Review & Merge │ ← You review. Claude explains. You approve.
└─────────────────────┘
│
▼
Output: Merged, tested, documented feature
Goal: Understand what to build before touching any code.
What happens:
- You describe the feature in plain language
- Claude Code asks clarifying questions — edge cases, actors, constraints, failure modes
- Together you produce a requirements document
- You review and sign off — no code is written until this is approved
Prompt to start:
I want to build the reservation question-answering feature for my restaurant chatbot.
Help me write the requirements. Ask me what you need to know.
Output files:
requirements/
FEAT-001-reservation-questions.md
├── Summary
├── Acceptance criteria (testable statements)
├── Out of scope
└── Open questions
Example acceptance criteria (good):
✓ When a customer asks "can I book a table for Saturday?", the bot responds
with available time slots for that date.
✓ When Saturday is fully booked, the bot suggests the next 3 available dates.
✓ When the question is ambiguous (no date given), the bot asks for clarification
before checking availability.
✓ Response time is under 2 seconds for 95% of queries.
Why this phase exists: Without written requirements, the agent cannot know if what it built is correct. Without acceptance criteria, you cannot know if the test is meaningful.
Goal: Decide how components connect before writing any code.
What happens:
- You describe your tech stack and constraints
- Claude Code proposes a component breakdown and data flow
- You discuss trade-offs — sync vs async, SQL vs cache, REST vs WebSocket
- Decisions are recorded as Architecture Decision Records (ADRs)
Prompt to start:
I have Spring Boot 3 + PostgreSQL + Anthropic Claude API.
Help me design the architecture for the reservation chatbot feature.
Should reservation logic live in the chatbot service or a separate service?
Output files:
docs/architecture/
architecture.md ← Component diagram and data flow
ADR-001-chatbot-design.md
ADR-002-reservation-storage.md
ADR format:
# ADR-001: Chatbot runs as part of the main Spring Boot service
## Status: Accepted
## Context
The chatbot needs access to restaurant data (menu, hours, reservations).
Option A: chatbot as a separate microservice calling our REST API.
Option B: chatbot logic lives inside the main Spring Boot service.
## Decision
Option B — chatbot as an internal service within the monolith.
## Reasons
- Avoids network latency for data lookups
- Simpler deployment for a single-developer project
- Can extract to microservice later if load requires it
## Consequences
- All chatbot logic must go through the service layer, not directly to the DB
- If the chatbot service has a bug, it affects the whole applicationWhy this phase exists: Architectural decisions made in code are expensive to undo. Architectural decisions made in Markdown are cheap to change.
Goal: Break the architecture into atomic, independently implementable tasks.
What happens:
- Claude Code reads your
requirements.mdandarchitecture.md - It generates an ordered task list with dependencies
- Each task has: what to build, which file to create/modify, acceptance criteria, estimated complexity
- You review and reorder before any implementation begins
Prompt to start:
Read requirements/FEAT-001-reservation-questions.md and docs/architecture/architecture.md.
Generate an ordered task list for implementing this feature. Each task should be
implementable in under 2 hours and have clear acceptance criteria.
Output files:
tasks/
FEAT-001-tasks.md
TASK-001-reservation-repository.md
TASK-002-reservation-service.md
TASK-003-chatbot-reservation-handler.md
TASK-004-api-endpoint.md
TASK-005-integration-test.md
Example task format:
# TASK-002: ReservationService — check availability
## Depends on: TASK-001
## What to build
A Spring service method that takes a LocalDate and party size,
and returns a list of available time slots.
## File to create
src/main/java/com/restaurantbot/service/ReservationService.java
## Acceptance criteria
- Returns List<TimeSlot> (never null — return empty list if none available)
- Throws InvalidPartySize if party > restaurant capacity
- If date is in the past, throws PastDateException
## Test file
src/test/java/com/restaurantbot/service/ReservationServiceTest.javaWhy this phase exists: Without tasks, the agent drifts. It builds what seems logical, not what was designed. Tasks are the contract between your architecture and your implementation.
Goal: Build each task correctly, proven by tests.
Test-Driven Development (TDD) is the core discipline of this phase. The cycle is always:
RED GREEN REFACTOR
──── ───── ────────
Write a → Write the → Clean up
failing minimum the code
test code to without
pass it breaking
the test
What happens (per task):
- Claude Code reads the task file
- Writes the failing test first — the test must fail before any implementation
- Writes the minimum code to make the test pass
- Refactors for clarity, naming, and structure
- Moves to the next task
Prompt to start a task:
Implement TASK-002. Start with the failing test. Do not write any production code
until the test exists and fails. Follow TDD strictly.
Example TDD cycle for ReservationService:
// Step 1 — RED: Write the failing test
@Test
void shouldReturnEmptyListWhenDateIsFullyBooked() {
// Given
LocalDate saturday = LocalDate.of(2025, 6, 14);
when(reservationRepository.countByDate(saturday)).thenReturn(50); // fully booked
// When
List<TimeSlot> slots = reservationService.getAvailableSlots(saturday, 2);
// Then
assertThat(slots).isEmpty();
}
// ← This test fails because ReservationService doesn't exist yet. Good.// Step 2 — GREEN: Write minimum code to pass
public List<TimeSlot> getAvailableSlots(LocalDate date, int partySize) {
int booked = reservationRepository.countByDate(date);
if (booked >= MAX_CAPACITY) return List.of();
return reservationRepository.findAvailableSlots(date, partySize);
}
// ← Test passes now.// Step 3 — REFACTOR: Clean up without breaking tests
public List<TimeSlot> getAvailableSlots(LocalDate date, int partySize) {
validatePartySize(partySize);
if (isFullyBooked(date)) return List.of();
return reservationRepository.findAvailableSlots(date, partySize);
}
private boolean isFullyBooked(LocalDate date) {
return reservationRepository.countByDate(date) >= MAX_CAPACITY;
}
// ← Same test still passes. Code is now readable.Why TDD and not "write tests after":
- Tests written after implementation test what the code does, not what it should do
- A test written first is a specification — it captures intent before implementation bias sets in
- Red phase forces you to think about the API before the internals
- You can never have a false green (test passes before code exists)
Goal: Human-in-the-loop verification before any code enters the main branch.
The agent is not the final authority. You are.
What happens:
- Claude Code opens a diff or PR summary — you read every line
- You ask "does this match the requirements?" for each change
- You ask Claude to explain anything you do not understand
- You either approve and merge, or ask for specific changes
- The task is marked done only after merge
What to check in review:
□ Does the implementation match the acceptance criteria in the task file?
□ Is the test actually testing the right thing (not just making coverage go up)?
□ Are there edge cases the test doesn't cover?
□ Is the naming clear — would a stranger understand this in 6 months?
□ Is there any hardcoded value that should be a constant or config?
□ Does the PR description explain why, not just what?
Good review prompts:
Explain why you used Optional<List<Slot>> here instead of throwing an exception.
What happens if the database is down when this method is called?
Is there a case where this test could pass even if the feature is broken?
Why this phase exists: The review phase is where you learn. Reading code that was generated to your requirements, asking why decisions were made, and approving or rejecting changes builds the mental model you need to eventually write it yourself.
An agent with no context is like a developer who joined your team today with no onboarding. They will make reasonable-sounding decisions that are wrong for your project. CLAUDE.md is the onboarding doc.
A task that says "implement the chatbot" is not a task — it is a wish. A task that says "implement ReservationService.getAvailableSlots() that returns empty list when fully booked" is implementable, testable, and reviewable. If a task takes more than 2 hours, split it.
The most expensive mistake in software is building the wrong thing correctly. Requirements Engineering is what prevents this. Even 30 minutes of clarifying questions saves hours of rework.
Your project's shared conventions belong in git so every session — and every teammate — gets the same context. Your personal secrets and local overrides belong in .local.json which is gitignored.
Claude Code's context window fills up. Long sessions accumulate noise from earlier steps. Use /clear to start fresh for each new task, with CLAUDE.md providing the persistent context. Use /compact when you want to summarise and compress rather than fully reset.
When Claude Code produces something wrong, the first instinct is to ask it to regenerate. The better move is to ask it to explain, then ask for a specific targeted fix. Regeneration throws away everything that was right. Targeted fixes preserve it.
# Bad
"Build the chatbot feature"
# Good
"Implement TASK-003: ChatbotReservationHandler. Read the task file at
tasks/TASK-003-chatbot-reservation-handler.md. Start with the failing test."
The phases are ordered for a reason. Jumping straight to implementation without requirements means you are building a guess. Jumping straight to code without tasks means the agent will make architectural decisions that belong in Phase 2.
// ❌ Never do this in .claude/settings.json (committed to git)
{
"env": { "ANTHROPIC_API_KEY": "sk-ant-real-key-here" }
}
// ✅ Do this in .claude/settings.local.json (gitignored)
{
"env": { "ANTHROPIC_API_KEY": "sk-ant-real-key-here" }
}The Review phase is not optional. It is where you learn, and it is where bugs get caught. An agent that merges its own work is an agent without a safety net.
A test written after the code was working is not a test — it is a documentation exercise. It will confirm what the code does, not what it should do. The failing test in the TDD red phase is the specification.
| Term | Definition |
|---|---|
| Agent | An AI system that can take actions autonomously — reading files, writing code, running commands — rather than just generating text responses |
| Claude Code | Anthropic's CLI-based coding agent. Runs in your terminal and acts on your codebase |
| CLAUDE.md | A Markdown file read by Claude Code at every session start. Acts as permanent project memory |
| Skill | A Markdown file in .claude/skills/ that gives Claude Code on-demand instructions when invoked with /command |
| Hook | A script that runs automatically before or after Claude Code actions. Enforces guardrails without manual reminders |
| MCP | Model Context Protocol. A standard for connecting AI agents to external tools — databases, APIs, file systems |
| ADR | Architecture Decision Record. A short document capturing a significant architectural decision, its context, and its trade-offs |
| TDD | Test-Driven Development. Writing the failing test before the implementation code. Red → Green → Refactor |
| Context window | The amount of text Claude Code can hold in working memory per session. Use /clear and /compact to manage it |
| Settings hierarchy | The 5-level system by which Claude Code merges configuration from user, project, local, CLI, and managed sources |
| Agentic workflow | A structured, phase-driven approach to working with AI agents: Requirements → Architecture → Tasks → TDD → Review |
This guide was written while building RestaurantBot
Tech stack: Java 21 · Spring Boot 3 · Spring AI · PostgreSQL · Anthropic Claude API · Docker · AWS ECS
Repository structure:
restaurantbot/
├── CLAUDE.md ← Agent briefing document
├── .claude/
│ ├── settings.json ← Team shared settings (committed)
│ ├── settings.local.json ← Personal secrets (gitignored)
│ ├── skills/
│ │ ├── spec/SKILL.md ← /spec command
│ │ ├── plan/SKILL.md ← /plan command
│ │ ├── build/SKILL.md ← /build command
│ │ └── review/SKILL.md ← /review command
│ └── hooks/
│ ├── pre-write.sh ← Block writes to migration files
│ └── post-test.sh ← Coverage reminders
├── requirements/ ← Phase 1 outputs
├── docs/architecture/ ← Phase 2 outputs
├── tasks/ ← Phase 3 outputs
└── src/ ← Phase 4 outputs
This guide is a living document. It evolves as the project grows.
Last updated: April 2026