AI Agentic Software Engineering

A practical guide to engineering software with AI agents — from mental models to production.
Written for developers who want to move beyond "vibe coding" and build with discipline.

What Is AI Agentic Software Engineering?

Traditional AI-assisted coding means asking an AI to write a function and copying the result. You are the driver — the AI is autocomplete.

Agentic software engineering is different. Here, the AI is an agent — it can read your files, write code, run tests, execute terminal commands, call APIs, and manage git — all autonomously, in sequence, guided by goals you set.

You move from writing code to directing outcomes.

The shift looks like this:

Traditional AI Coding	Agentic Engineering
"Write me a function that does X"	"Implement the reservation feature per requirements.md"
You copy-paste the result	The agent writes, runs, and tests the code
AI has no memory of your project	AI reads your project context every session
You manage the steps	You manage the goals
One prompt, one response	Multi-step autonomous execution

The discipline comes from giving the agent the right context, the right constraints, and the right workflow — so it acts like a skilled junior developer who knows your project deeply, rather than a stranger who just joined today.

The Claude Code Mental Model

Claude Code is a CLI tool that runs in your terminal. It is not a chatbot — it is an agent that acts on your codebase.

┌─────────────────────────────────────────────┐
│              YOU                            │
│   Natural language goals and instructions   │
└───────────────────┬─────────────────────────┘
                    │
         ┌──────────▼──────────┐
         │    CONTEXT LAYER    │
         │  CLAUDE.md · Skills │
         │  Settings · Hooks   │
         └──────────┬──────────┘
                    │
┌───────────────────▼─────────────────────────┐
│            CLAUDE CODE (Agent)              │
│  Reads files · Writes code · Runs commands  │
│  Calls APIs · Manages git · Runs tests      │
└───────────────────┬─────────────────────────┘
                    │
┌───────────────────▼─────────────────────────┐
│         YOUR CODEBASE + TERMINAL            │
│  Real files on disk. Real git commits.      │
│  Real test runs. Nothing is simulated.      │
└─────────────────────────────────────────────┘

The key insight: The context layer sits between you and the agent. It is what separates disciplined agentic engineering from random prompting. A well-configured context layer means Claude Code arrives at every session already knowing who you are, what you're building, how your code is structured, and what conventions to follow.

The Context Layer

The context layer is made of four components that Claude Code reads before it does anything.

CLAUDE.md — Permanent Project Memory

CLAUDE.md is a Markdown file at the root of your project. Claude Code reads it on every session start. Think of it as the briefing document you'd give a new developer joining your team.

What to put in it:

# Project: RestaurantBot

## What this is
A Spring Boot chatbot that answers menu, hours, location, catering,
and reservation questions for restaurant customers via a web widget.

## Tech stack
- Java 21, Spring Boot 3.x, Spring AI
- PostgreSQL (Flyway migrations)
- Maven
- JUnit 5 + Mockito for testing
- Anthropic Claude API via Spring AI

## Coding conventions
- Constructor injection only — never @Autowired on fields
- All API responses use ResponseEntity<ApiResponse<T>>
- Service layer never returns null — use Optional<T>
- Every public method on a service must have a test

## Module structure
- api/          REST controllers
- service/      Business logic
- domain/       Entities and value objects
- repository/   Data access
- config/       Spring config classes
- chatbot/      AI agent and MCP integration

## What Claude should never do
- Never modify migration files once committed
- Never skip tests — every feature needs at minimum one happy path test
- Never hardcode secrets — use application.properties or env vars

Good practices for CLAUDE.md:

Keep it under 300 lines — long files dilute focus
Write it as if briefing a developer, not configuring a tool
Update it when your architecture changes
Commit it to git — it is part of your project

Skills — On-Demand Instructions

Skills are Markdown files stored in .claude/skills/. Each skill is a set of instructions Claude Code follows when you invoke it with a /command.

They turn repeatable workflows into one-word commands.

Example skill: /spec

File: .claude/skills/spec/SKILL.md

You are generating a requirements document for a new feature.

Ask the user:
1. What does this feature do? (one sentence)
2. Who triggers it? (user action, scheduled job, API call)
3. What are the edge cases?
4. What does success look like?

Then produce a file called requirements/FEATURE-NAME.md with:
- Summary
- Acceptance criteria (as testable statements)
- Out of scope
- Open questions

Common skills to build:

Skill	Invoked with	What it does
Spec	`/spec`	Generates requirements.md from a conversation
Plan	`/plan`	Breaks requirements into ordered task list
Build	`/build`	Implements one task with TDD
Review	`/review`	Reviews a diff against requirements
ADR	`/adr`	Creates an Architecture Decision Record

Hooks — Automated Guardrails

Hooks are scripts that run automatically before or after Claude Code actions. They enforce rules without you having to say them every time.

Pre-tool hook (runs before Claude touches files):

# .claude/hooks/pre-write.sh
# Block writes to migration files
if echo "$CLAUDE_TOOL_INPUT" | grep -q "db/migration"; then
  echo "ERROR: Do not modify committed migration files."
  exit 1
fi

Post-tool hook (runs after Claude runs tests):

# .claude/hooks/post-bash.sh
# Remind Claude to check test coverage if tests were just run
if echo "$CLAUDE_TOOL_INPUT" | grep -q "mvn test"; then
  echo "Reminder: check coverage report at target/site/jacoco/index.html"
fi

Hooks are the difference between "I told Claude not to do that" and "Claude physically cannot do that."

The 5-Level Settings Hierarchy

Claude Code loads settings from five levels on every startup. Higher levels override lower ones. All five load and merge together.

Priority  Level         File                              Scope
────────  ────────────  ────────────────────────────────  ───────────────────────
  5 ▲     Managed       (enterprise / OS-level)           Cannot be overridden
  4 │     CLI args      --model --effort --permission-mode Per session
  3 │     Local         .claude/settings.local.json       Per project, gitignored
  2 │     Project       .claude/settings.json             Per project, committed
  1 ▼     User          ~/.claude/settings.json           Global defaults

Level 1 — User (global defaults)

~/.claude/settings.json — your personal defaults across every project on your machine. Set once, rarely touched.

{
  "model": "claude-sonnet-4-5",
  "theme": "dark"
}

Level 2 — Project (team shared)

.claude/settings.json — committed to git. The whole team shares this.

{
  "allowedTools": ["Read", "Write", "Bash", "Grep"],
  "mcpServers": {
    "postgres": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-postgres"],
      "env": { "DATABASE_URL": "${DB_URL}" }
    }
  }
}

Level 3 — Local (personal overrides)

.claude/settings.local.json — per-project, gitignored. Your personal secrets and local config live here.

{
  "env": {
    "DB_URL": "postgres://localhost:5432/restaurantbot_dev",
    "ANTHROPIC_API_KEY": "sk-ant-..."
  }
}

Critical rule: Secrets go in .local.json (gitignored). Conventions go in settings.json (committed). Never mix them.

Level 4 — CLI arguments (per-session flags)

Flags passed when starting a session. Override everything below for that session only.

# Use Opus for a complex architecture session
claude --model claude-opus-4-5 --effort high

# Run in read-only mode for a review session
claude --permission-mode read-only

Level 5 — Managed (enterprise / OS-level)

Set by your organisation or system admin. Cannot be overridden by anything. Enforces compliance rules, locked-down tool access, approved API endpoints. Not relevant for solo projects — important to know exists for team environments.

The 5-Phase Agentic Workflow

This workflow is the core discipline of agentic software engineering. It prevents the "vibe coding" failure mode where you describe something vaguely, the agent generates something, you are not sure if it is right, and you go in circles.

Each phase has a clear input, a clear output, and a clear human checkpoint.

  Input: Project idea
     │
     ▼
┌─────────────────────┐
│  1. Requirements    │ ← You describe. Claude clarifies. Output: requirements.md
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  2. Architecture    │ ← You discuss trade-offs. Output: architecture.md, ADRs
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  3. Task Generation │ ← Claude reads docs. Output: ordered task list
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  4. TDD Implement.  │ ← Red → Green → Refactor. Per task.
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  5. Review & Merge  │ ← You review. Claude explains. You approve.
└─────────────────────┘
     │
     ▼
  Output: Merged, tested, documented feature

Phase 1 — Requirement Engineering

Goal: Understand what to build before touching any code.

What happens:

You describe the feature in plain language
Claude Code asks clarifying questions — edge cases, actors, constraints, failure modes
Together you produce a requirements document
You review and sign off — no code is written until this is approved

Prompt to start:

I want to build the reservation question-answering feature for my restaurant chatbot.
Help me write the requirements. Ask me what you need to know.

Output files:

requirements/
  FEAT-001-reservation-questions.md
  ├── Summary
  ├── Acceptance criteria (testable statements)
  ├── Out of scope
  └── Open questions

Example acceptance criteria (good):

✓ When a customer asks "can I book a table for Saturday?", the bot responds
  with available time slots for that date.
✓ When Saturday is fully booked, the bot suggests the next 3 available dates.
✓ When the question is ambiguous (no date given), the bot asks for clarification
  before checking availability.
✓ Response time is under 2 seconds for 95% of queries.

Why this phase exists: Without written requirements, the agent cannot know if what it built is correct. Without acceptance criteria, you cannot know if the test is meaningful.

Phase 2 — System Architecture

Goal: Decide how components connect before writing any code.

What happens:

You describe your tech stack and constraints
Claude Code proposes a component breakdown and data flow
You discuss trade-offs — sync vs async, SQL vs cache, REST vs WebSocket
Decisions are recorded as Architecture Decision Records (ADRs)

Prompt to start:

I have Spring Boot 3 + PostgreSQL + Anthropic Claude API.
Help me design the architecture for the reservation chatbot feature.
Should reservation logic live in the chatbot service or a separate service?

Output files:

docs/architecture/
  architecture.md          ← Component diagram and data flow
  ADR-001-chatbot-design.md
  ADR-002-reservation-storage.md

ADR format:

# ADR-001: Chatbot runs as part of the main Spring Boot service

## Status: Accepted

## Context
The chatbot needs access to restaurant data (menu, hours, reservations).
Option A: chatbot as a separate microservice calling our REST API.
Option B: chatbot logic lives inside the main Spring Boot service.

## Decision
Option B — chatbot as an internal service within the monolith.

## Reasons
- Avoids network latency for data lookups
- Simpler deployment for a single-developer project
- Can extract to microservice later if load requires it

## Consequences
- All chatbot logic must go through the service layer, not directly to the DB
- If the chatbot service has a bug, it affects the whole application

Why this phase exists: Architectural decisions made in code are expensive to undo. Architectural decisions made in Markdown are cheap to change.

Phase 3 — Task Generation

Goal: Break the architecture into atomic, independently implementable tasks.

What happens:

Claude Code reads your requirements.md and architecture.md
It generates an ordered task list with dependencies
Each task has: what to build, which file to create/modify, acceptance criteria, estimated complexity
You review and reorder before any implementation begins

Prompt to start:

Read requirements/FEAT-001-reservation-questions.md and docs/architecture/architecture.md.
Generate an ordered task list for implementing this feature. Each task should be
implementable in under 2 hours and have clear acceptance criteria.

Output files:

tasks/
  FEAT-001-tasks.md
  TASK-001-reservation-repository.md
  TASK-002-reservation-service.md
  TASK-003-chatbot-reservation-handler.md
  TASK-004-api-endpoint.md
  TASK-005-integration-test.md

Example task format:

# TASK-002: ReservationService — check availability

## Depends on: TASK-001

## What to build
A Spring service method that takes a LocalDate and party size,
and returns a list of available time slots.

## File to create
src/main/java/com/restaurantbot/service/ReservationService.java

## Acceptance criteria
- Returns List<TimeSlot> (never null — return empty list if none available)
- Throws InvalidPartySize if party > restaurant capacity
- If date is in the past, throws PastDateException

## Test file
src/test/java/com/restaurantbot/service/ReservationServiceTest.java

Why this phase exists: Without tasks, the agent drifts. It builds what seems logical, not what was designed. Tasks are the contract between your architecture and your implementation.

Phase 4 — TDD Implementation

Goal: Build each task correctly, proven by tests.

Test-Driven Development (TDD) is the core discipline of this phase. The cycle is always:

  RED          GREEN        REFACTOR
  ────         ─────        ────────
  Write a  →  Write the  →  Clean up
  failing     minimum        the code
  test        code to        without
              pass it        breaking
                             the test

What happens (per task):

Claude Code reads the task file
Writes the failing test first — the test must fail before any implementation
Writes the minimum code to make the test pass
Refactors for clarity, naming, and structure
Moves to the next task

Prompt to start a task:

Implement TASK-002. Start with the failing test. Do not write any production code
until the test exists and fails. Follow TDD strictly.

Example TDD cycle for ReservationService:

// Step 1 — RED: Write the failing test
@Test
void shouldReturnEmptyListWhenDateIsFullyBooked() {
    // Given
    LocalDate saturday = LocalDate.of(2025, 6, 14);
    when(reservationRepository.countByDate(saturday)).thenReturn(50); // fully booked

    // When
    List<TimeSlot> slots = reservationService.getAvailableSlots(saturday, 2);

    // Then
    assertThat(slots).isEmpty();
}
// ← This test fails because ReservationService doesn't exist yet. Good.

// Step 2 — GREEN: Write minimum code to pass
public List<TimeSlot> getAvailableSlots(LocalDate date, int partySize) {
    int booked = reservationRepository.countByDate(date);
    if (booked >= MAX_CAPACITY) return List.of();
    return reservationRepository.findAvailableSlots(date, partySize);
}
// ← Test passes now.

// Step 3 — REFACTOR: Clean up without breaking tests
public List<TimeSlot> getAvailableSlots(LocalDate date, int partySize) {
    validatePartySize(partySize);
    if (isFullyBooked(date)) return List.of();
    return reservationRepository.findAvailableSlots(date, partySize);
}

private boolean isFullyBooked(LocalDate date) {
    return reservationRepository.countByDate(date) >= MAX_CAPACITY;
}
// ← Same test still passes. Code is now readable.

Why TDD and not "write tests after":

Tests written after implementation test what the code does, not what it should do
A test written first is a specification — it captures intent before implementation bias sets in
Red phase forces you to think about the API before the internals
You can never have a false green (test passes before code exists)

Phase 5 — Review & Merge

Goal: Human-in-the-loop verification before any code enters the main branch.

The agent is not the final authority. You are.

What happens:

Claude Code opens a diff or PR summary — you read every line
You ask "does this match the requirements?" for each change
You ask Claude to explain anything you do not understand
You either approve and merge, or ask for specific changes
The task is marked done only after merge

What to check in review:

□ Does the implementation match the acceptance criteria in the task file?
□ Is the test actually testing the right thing (not just making coverage go up)?
□ Are there edge cases the test doesn't cover?
□ Is the naming clear — would a stranger understand this in 6 months?
□ Is there any hardcoded value that should be a constant or config?
□ Does the PR description explain why, not just what?

Good review prompts:

Explain why you used Optional<List<Slot>> here instead of throwing an exception.

What happens if the database is down when this method is called?

Is there a case where this test could pass even if the feature is broken?

Why this phase exists: The review phase is where you learn. Reading code that was generated to your requirements, asking why decisions were made, and approving or rejecting changes builds the mental model you need to eventually write it yourself.

Good Practices

Always write CLAUDE.md before starting a session

An agent with no context is like a developer who joined your team today with no onboarding. They will make reasonable-sounding decisions that are wrong for your project. CLAUDE.md is the onboarding doc.

Keep tasks atomic

A task that says "implement the chatbot" is not a task — it is a wish. A task that says "implement ReservationService.getAvailableSlots() that returns empty list when fully booked" is implementable, testable, and reviewable. If a task takes more than 2 hours, split it.

Never skip the requirements phase

The most expensive mistake in software is building the wrong thing correctly. Requirements Engineering is what prevents this. Even 30 minutes of clarifying questions saves hours of rework.

Commit CLAUDE.md and settings.json, not settings.local.json

Your project's shared conventions belong in git so every session — and every teammate — gets the same context. Your personal secrets and local overrides belong in .local.json which is gitignored.

Use `/clear` between major tasks

Claude Code's context window fills up. Long sessions accumulate noise from earlier steps. Use /clear to start fresh for each new task, with CLAUDE.md providing the persistent context. Use /compact when you want to summarise and compress rather than fully reset.

Review beats regenerate

When Claude Code produces something wrong, the first instinct is to ask it to regenerate. The better move is to ask it to explain, then ask for a specific targeted fix. Regeneration throws away everything that was right. Targeted fixes preserve it.

Common Mistakes to Avoid

Vague prompts

# Bad
"Build the chatbot feature"

# Good
"Implement TASK-003: ChatbotReservationHandler. Read the task file at
tasks/TASK-003-chatbot-reservation-handler.md. Start with the failing test."

Skipping phases

The phases are ordered for a reason. Jumping straight to implementation without requirements means you are building a guess. Jumping straight to code without tasks means the agent will make architectural decisions that belong in Phase 2.

Secrets in committed files

// ❌ Never do this in .claude/settings.json (committed to git)
{
  "env": { "ANTHROPIC_API_KEY": "sk-ant-real-key-here" }
}

// ✅ Do this in .claude/settings.local.json (gitignored)
{
  "env": { "ANTHROPIC_API_KEY": "sk-ant-real-key-here" }
}

Letting the agent merge without review

The Review phase is not optional. It is where you learn, and it is where bugs get caught. An agent that merges its own work is an agent without a safety net.

Writing tests after implementation

A test written after the code was working is not a test — it is a documentation exercise. It will confirm what the code does, not what it should do. The failing test in the TDD red phase is the specification.

Glossary

Term	Definition
Agent	An AI system that can take actions autonomously — reading files, writing code, running commands — rather than just generating text responses
Claude Code	Anthropic's CLI-based coding agent. Runs in your terminal and acts on your codebase
CLAUDE.md	A Markdown file read by Claude Code at every session start. Acts as permanent project memory
Skill	A Markdown file in `.claude/skills/` that gives Claude Code on-demand instructions when invoked with `/command`
Hook	A script that runs automatically before or after Claude Code actions. Enforces guardrails without manual reminders
MCP	Model Context Protocol. A standard for connecting AI agents to external tools — databases, APIs, file systems
ADR	Architecture Decision Record. A short document capturing a significant architectural decision, its context, and its trade-offs
TDD	Test-Driven Development. Writing the failing test before the implementation code. Red → Green → Refactor
Context window	The amount of text Claude Code can hold in working memory per session. Use `/clear` and `/compact` to manage it
Settings hierarchy	The 5-level system by which Claude Code merges configuration from user, project, local, CLI, and managed sources
Agentic workflow	A structured, phase-driven approach to working with AI agents: Requirements → Architecture → Tasks → TDD → Review

Project This Guide Was Built For

This guide was written while building RestaurantBot

Tech stack: Java 21 · Spring Boot 3 · Spring AI · PostgreSQL · Anthropic Claude API · Docker · AWS ECS

Repository structure:

restaurantbot/
├── CLAUDE.md                    ← Agent briefing document
├── .claude/
│   ├── settings.json            ← Team shared settings (committed)
│   ├── settings.local.json      ← Personal secrets (gitignored)
│   ├── skills/
│   │   ├── spec/SKILL.md        ← /spec command
│   │   ├── plan/SKILL.md        ← /plan command
│   │   ├── build/SKILL.md       ← /build command
│   │   └── review/SKILL.md      ← /review command
│   └── hooks/
│       ├── pre-write.sh         ← Block writes to migration files
│       └── post-test.sh         ← Coverage reminders
├── requirements/                ← Phase 1 outputs
├── docs/architecture/           ← Phase 2 outputs
├── tasks/                       ← Phase 3 outputs
└── src/                         ← Phase 4 outputs

This guide is a living document. It evolves as the project grows.
Last updated: April 2026

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

AI Agentic Software Engineering

Table of Contents

What Is AI Agentic Software Engineering?

The Claude Code Mental Model

The Context Layer

CLAUDE.md — Permanent Project Memory

Skills — On-Demand Instructions

Hooks — Automated Guardrails

The 5-Level Settings Hierarchy

Level 1 — User (global defaults)

Level 2 — Project (team shared)

Level 3 — Local (personal overrides)

Level 4 — CLI arguments (per-session flags)

Level 5 — Managed (enterprise / OS-level)

The 5-Phase Agentic Workflow

Phase 1 — Requirement Engineering

Phase 2 — System Architecture

Phase 3 — Task Generation

Phase 4 — TDD Implementation

Phase 5 — Review & Merge

Good Practices

Always write CLAUDE.md before starting a session

Keep tasks atomic

Never skip the requirements phase

Commit CLAUDE.md and settings.json, not settings.local.json

Use /clear between major tasks

Review beats regenerate

Common Mistakes to Avoid

Vague prompts

Skipping phases

Secrets in committed files

Letting the agent merge without review

Writing tests after implementation

Glossary

Project This Guide Was Built For

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Use `/clear` between major tasks

Packages