A terminal AI chat app powered by Docker Model Runner.
No API keys. No cloud. No cost. Your data never leaves your machine.
Built by @thissidemayur · Portfolio · Blog Series
# Quick one-shot question
$ llm "what is a goroutine?"
AI ────────────────────────────────────────
A goroutine is a lightweight thread managed
by the Go runtime...
# Interactive chat session
$ llm --chat
# Code mode — deterministic, precise
$ llm --mode code "write a binary search in TypeScript"
# List your local models
$ llm --models- 🚀 Streaming responses — words appear live as the model generates
- 🧠 Conversation memory — AI remembers everything in your session
- 🎛️ Preset modes — chat, code, creative (different temperature per mode)
- 📦 Auto model detection — pulls model automatically if missing
- 💾 History saved to disk — every session stored at
~/.llm/history/ - ❌ Plain English errors — no stack traces, ever
- 📦 Single binary — Linux, macOS, Windows
- Docker installed and running
- Docker Model Runner plugin installed
# Ubuntu / Debian
sudo apt-get install docker-model-plugin
# Fedora / RHEL
sudo dnf install docker-model-plugin
# macOS
# Enable in Docker Desktop → Settings → AI tabcurl -fsSL https://raw.githubusercontent.com/thissidemayur/llm-cli/main/install.sh | bashThe script automatically:
- Detects your OS and architecture
- Downloads the correct binary
- Installs to
/usr/local/bin - Pulls the default AI model
# Linux (x86_64)
curl -L https://github.com/thissidemayur/llm-cli/releases/latest/download/llm-linux-amd64 -o llm
chmod +x llm
sudo mv llm /usr/local/bin/llm
# macOS (Apple Silicon)
curl -L https://github.com/thissidemayur/llm-cli/releases/latest/download/llm-mac-apple-silicon -o llm
chmod +x llm
sudo mv llm /usr/local/bin/llmAll available binaries on the releases page:
| Binary | Platform |
|---|---|
llm-linux-amd64 |
Linux x86_64 |
llm-linux-arm64 |
Linux ARM64 |
llm-mac-intel |
macOS Intel |
llm-mac-apple-silicon |
macOS M1/M2/M3 |
llm-windows.exe |
Windows x64 |
# One-shot question
docker run --network host thissidemayur/llm-cli "what is docker?"
# Interactive chat
docker run -it --network host thissidemayur/llm-cli --chat
--network hostis required so the container can reach DMR atlocalhost:12434
# Requires Bun
git clone https://github.com/thissidemayur/llm-cli.git
cd llm-cli
bun install
bun build src/index.ts --compile --outfile bin/llm
sudo cp bin/llm /usr/local/bin/llmllm "explain async await in JavaScript"
llm --mode code "write a debounce function in TypeScript"
llm --mode creative "write a story about a Docker whale"
llm --model ai/smollm2 "hello"llm --chat
llm --chat --mode code
llm --chat --model ai/smollm2llm --models # list all local models
llm --history # view past conversations
llm --help # show all options
llm --version # show version| Command | Description |
|---|---|
/exit or /quit |
Quit and auto-save session |
/clear |
Start a fresh conversation |
/mode code |
Switch to code mode |
/mode chat |
Switch to chat mode |
/mode creative |
Switch to creative mode |
/models |
List local models |
/save |
Save conversation to file |
/history |
View past sessions |
/help |
Show all commands |
| Mode | Temperature | Best for |
|---|---|---|
chat |
0.7 | General conversation |
code |
0.1 | Code generation, deterministic |
creative |
1.2 | Writing, brainstorming |
Your terminal
↓
llm binary
↓
http://localhost:12434/engines/v1 (Docker Model Runner)
↓
Local AI model (llama3.2, smollm2, etc.)
DMR exposes an OpenAI-compatible REST API at localhost:12434.
LLM CLI points the OpenAI SDK at this address instead of api.openai.com.
No internet required after the initial model download.
llm-cli/
├── src/
│ ├── index.ts ← entry point
│ ├── cli/
│ │ └── args.ts ← CLI argument parsing
│ ├── ui/
│ │ ├── tui.ts ← all visual output
│ │ ├── spinner.ts ← thinking animation
│ │ └── colors.ts ← color constants
│ ├── core/
│ │ ├── chat.ts ← main chat loop
│ │ ├── memory.ts ← conversation history in RAM
│ │ ├── presets.ts ← mode configurations
│ │ └── history.ts ← save/load to disk
│ ├── dmr/
│ │ ├── client.ts ← OpenAI SDK → localhost
│ │ ├── models.ts ← list, check, pull models
│ │ └── stream.ts ← streaming response handler
│ └── utils/
│ ├── config.ts ← all defaults in one place
│ └── errors.ts ← plain English error messages
This project was built and documented as a 3-part series:
| Part | Title |
|---|---|
| Part 1 | Run AI Locally for Free — Setup & Core Concepts |
| Part 2 | Talking to Your Local AI Through Code — REST API + TypeScript |
| Part 3 | I Built a Terminal AI Chat App Using Docker |
- TypeScript CLI with streaming
- Conversation memory
- Preset modes (chat, code, creative)
- History saved to disk
- Single binary — Linux, macOS, Windows
- Docker image on Docker Hub
- One-line installer script
- Go migration — smaller binary, zero runtime
- Config file support (
~/.llm/config.json) - Multiple model support in one session
Issues and pull requests are welcome.
If you find a bug or want a feature → open an issue.
MIT © Mayur