diff --git a/0_app/0_root/index.md b/0_app/0_root/index.mdx
similarity index 67%
rename from 0_app/0_root/index.md
rename to 0_app/0_root/index.mdx
index 8162d21..d408a96 100644
--- a/0_app/0_root/index.md
+++ b/0_app/0_root/index.mdx
@@ -5,6 +5,50 @@ description: Learn how to run Llama, DeepSeek, Qwen, Phi, and other LLMs locally
index: 1
---
+import { Card, Cards } from "fumadocs-ui/components/card";
+import { getDocsSectionIcon } from "@/lib/docsSectionIcon";
+
+## Explore the docs
+
+
+
+
+
+
+
+
+
+
To get LM Studio, head over to the [Downloads page](/download) and download an installer for your operating system.
LM Studio is available for macOS, Windows, and Linux.
diff --git a/0_app/0_root/meta.json b/0_app/0_root/meta.json
new file mode 100644
index 0000000..bc9c64f
--- /dev/null
+++ b/0_app/0_root/meta.json
@@ -0,0 +1,7 @@
+{
+ "title": "Introduction",
+ "pages": [
+ "offline",
+ "system-requirements"
+ ]
+}
diff --git a/0_app/0_root/offline.md b/0_app/0_root/offline.mdx
similarity index 98%
rename from 0_app/0_root/offline.md
rename to 0_app/0_root/offline.mdx
index aebcd97..646de64 100644
--- a/0_app/0_root/offline.md
+++ b/0_app/0_root/offline.mdx
@@ -4,9 +4,9 @@ description: LM Studio can operate entirely offline, just make sure to get some
index: 4
---
-```lms_notice
+
In general, LM Studio does not require the internet in order to work. This includes core functions like chatting with models, chatting with documents, or running a local server, none of which require the internet.
-```
+
### Operations that do NOT require connectivity
diff --git a/0_app/1_basics/_connect-apps.md b/0_app/1_basics/_connect-apps.mdx
similarity index 81%
rename from 0_app/1_basics/_connect-apps.md
rename to 0_app/1_basics/_connect-apps.mdx
index 5f1cc09..dbd56ea 100644
--- a/0_app/1_basics/_connect-apps.md
+++ b/0_app/1_basics/_connect-apps.mdx
@@ -5,7 +5,7 @@ description: Getting started with connecting applications to LM Studio
LM Studio comes with a few built-in themes for app-wide color palettes.
-
+
### Selecting a Theme
@@ -13,13 +13,13 @@ You can choose a theme in the Settings tab.
Choosing the "Auto" option will automatically switch between Light and Dark themes based on your system settings.
-```lms_protip
+
You can jump to Settings from anywhere in the app by pressing `cmd` + `,` on macOS or `ctrl` + `,` on Windows/Linux.
-```
+
-###### To get to the Settings page, you need to be on [Power User mode](/docs/modes) or higher.
+To get to the Settings page, you need to be on Power User mode or higher.
-
+
### Community
diff --git a/0_app/1_basics/index.md b/0_app/1_basics/index.md
deleted file mode 100644
index 51c7513..0000000
--- a/0_app/1_basics/index.md
+++ /dev/null
@@ -1,52 +0,0 @@
----
-title: Get started with LM Studio
-sidebar_title: Overview
-description: Download and run Large Language Models like Qwen, Mistral, Gemma, or gpt-oss in LM Studio.
-index: 1
----
-
-Double check computer meets the minimum [system requirements](/docs/system-requirements).
-
-```lms_info
-You might sometimes see terms such as `open-source models` or `open-weights models`. Different models might be released under different licenses and varying degrees of 'openness'. In order to run a model locally, you need to be able to get access to its "weights", often distributed as one or more files that end with `.gguf`, `.safetensors` etc.
-```
-
-
-
-## Getting up and running
-
-First, **install the latest version of LM Studio**. You can get it from [here](/download).
-
-Once you're all set up, you need to **download your first LLM**.
-
-### 1. Download an LLM to your computer
-
-Head over to the Discover tab to download models. Pick one of the curated options or search for models by search query (e.g. `"Llama"`). See more in-depth information about downloading models [here](/docs/basics/download-models).
-
-
-
-### 2. Load a model to memory
-
-Head over to the **Chat** tab, and
-
-1. Open the model loader
-2. Select one of the models you downloaded (or [sideloaded](/docs/advanced/sideload)).
-3. Optionally, choose load configuration parameters.
-
-
-
-##### What does loading a model mean?
-
-Loading a model typically means allocating memory to be able to accommodate the model's weights and other parameters in your computer's RAM.
-
-### 3. Chat!
-
-Once the model is loaded, you can start a back-and-forth conversation with the model in the Chat tab.
-
-
-
-
-
-### Community
-
-Chat with other LM Studio users, discuss LLMs, hardware, and more on the [LM Studio Discord server](https://discord.gg/aPQfnNkxGC).
diff --git a/0_app/1_basics/index.mdx b/0_app/1_basics/index.mdx
new file mode 100644
index 0000000..6b950b1
--- /dev/null
+++ b/0_app/1_basics/index.mdx
@@ -0,0 +1,62 @@
+---
+title: Get started with LM Studio
+sidebar_title: Overview
+description: Download and run Large Language Models like Qwen, Mistral, Gemma, or gpt-oss in LM Studio.
+index: 1
+---
+
+import { Step, Steps } from "fumadocs-ui/components/steps";
+
+Double check computer meets the minimum [system requirements](/docs/system-requirements).
+
+
+You might sometimes see terms such as `open-source models` or `open-weights models`. Different models might be released under different licenses and varying degrees of 'openness'. In order to run a model locally, you need to be able to get access to its "weights", often distributed as one or more files that end with `.gguf`, `.safetensors` etc.
+
+
+
+
+## Getting up and running
+
+First, **install the latest version of LM Studio**. You can get it from [here](/download).
+
+Once you're all set up, you need to **download your first LLM**.
+
+
+
+ Download an LLM to your computer
+
+ Head over to the Discover tab to download models. Pick one of the curated options or search for models by search query (e.g. `"Llama"`). See more in-depth information about downloading models [here](/docs/basics/download-models).
+
+
+
+
+
+ Load a model to memory
+
+ Head over to the **Chat** tab, and
+
+ 1. Open the model loader
+ 2. Select one of the models you downloaded (or [sideloaded](/docs/advanced/sideload)).
+ 3. Optionally, choose load configuration parameters.
+
+
+
+ What does loading a model mean?
+
+ Loading a model typically means allocating memory to be able to accommodate the model's weights and other parameters in your computer's RAM.
+
+
+
+ Chat!
+
+ Once the model is loaded, you can start a back-and-forth conversation with the model in the Chat tab.
+
+
+
+
+
+
+
+### Community
+
+Chat with other LM Studio users, discuss LLMs, hardware, and more on the [LM Studio Discord server](https://discord.gg/aPQfnNkxGC).
diff --git a/0_app/1_basics/meta.json b/0_app/1_basics/meta.json
new file mode 100644
index 0000000..a793b2b
--- /dev/null
+++ b/0_app/1_basics/meta.json
@@ -0,0 +1,12 @@
+{
+ "title": "Getting Started",
+ "pages": [
+ "chat",
+ "_connect-apps",
+ "download-model",
+ "_keychords",
+ "lmstudio-vs-llmster-vs-lms",
+ "rag",
+ "_troubleshooting"
+ ]
+}
diff --git a/0_app/2_mcp/deeplink.md b/0_app/2_mcp/deeplink.mdx
similarity index 94%
rename from 0_app/2_mcp/deeplink.md
rename to 0_app/2_mcp/deeplink.mdx
index b4efd64..45e4d01 100644
--- a/0_app/2_mcp/deeplink.md
+++ b/0_app/2_mcp/deeplink.mdx
@@ -1,5 +1,5 @@
---
-title: "`Add to LM Studio` Button"
+title: "Add to LM Studio Button"
description: Add MCP servers to LM Studio using a deeplink
index: 2
---
@@ -14,9 +14,7 @@ Starting with version 0.3.17 (10), LM Studio can act as an MCP host. Learn more
Enter your MCP JSON entry to generate a deeplink for the `Add to LM Studio` button.
-```lms_mcp_deep_link_generator
-
-```
+
## Try an example
diff --git a/0_app/2_mcp/index.md b/0_app/2_mcp/index.mdx
similarity index 65%
rename from 0_app/2_mcp/index.md
rename to 0_app/2_mcp/index.mdx
index f769557..c754862 100644
--- a/0_app/2_mcp/index.md
+++ b/0_app/2_mcp/index.mdx
@@ -10,9 +10,9 @@ Starting LM Studio 0.3.17, LM Studio acts as an **Model Context Protocol (MCP) H
Never install MCPs from untrusted sources.
-```lms_warning
+
Some MCP servers can run arbitrary code, access your local files, and use your network connection. Always be cautious when installing and using MCP servers. If you don't trust the source, don't install it.
-```
+
# Use MCP servers in LM Studio
@@ -22,24 +22,28 @@ Starting 0.3.17 (b10), LM Studio supports both local and remote MCP servers. You
Switch to the "Program" tab in the right hand sidebar. Click `Install > Edit mcp.json`.
-
+
This will open the `mcp.json` file in the in-app editor. You can add MCP servers by editing this file.
-
+
### Example MCP to try: Hugging Face MCP Server
This MCP server provides access to functions like model and dataset search.
@@ -56,7 +60,7 @@ This MCP server provides access to functions like model and dataset search.
}
```
-###### You will need to replace `` with your actual Hugging Face token. Learn more [here](https://huggingface.co/docs/hub/en/security-tokens).
+You will need to replace `` with your actual Hugging Face token. Learn more [here](https://huggingface.co/docs/hub/en/security-tokens).
Use the [deeplink button](mcp/deeplink), or copy the JSON snippet above and paste it into your `mcp.json` file.
diff --git a/0_app/2_mcp/meta.json b/0_app/2_mcp/meta.json
new file mode 100644
index 0000000..a9726e1
--- /dev/null
+++ b/0_app/2_mcp/meta.json
@@ -0,0 +1,6 @@
+{
+ "title": "MCP",
+ "pages": [
+ "deeplink"
+ ]
+}
diff --git a/0_app/3_modelyaml/index.md b/0_app/3_modelyaml/index.md
index b66c23c..3b5b069 100644
--- a/0_app/3_modelyaml/index.md
+++ b/0_app/3_modelyaml/index.md
@@ -1,6 +1,6 @@
---
-title: "Introduction to `model.yaml`"
-description: Describe models with the cross-platform `model.yaml` specification.
+title: "Introduction to model.yaml"
+description: Describe models with the cross-platform model.yaml specification.
index: 5
socialCard:
url: https://files.lmstudio.ai/modelyaml-card.jpg
diff --git a/0_app/3_modelyaml/meta.json b/0_app/3_modelyaml/meta.json
new file mode 100644
index 0000000..37a1889
--- /dev/null
+++ b/0_app/3_modelyaml/meta.json
@@ -0,0 +1,6 @@
+{
+ "title": "model.yaml",
+ "pages": [
+ "publish"
+ ]
+}
diff --git a/0_app/3_modelyaml/publish.md b/0_app/3_modelyaml/publish.md
index eaea1ba..3d7f8e0 100644
--- a/0_app/3_modelyaml/publish.md
+++ b/0_app/3_modelyaml/publish.md
@@ -1,5 +1,5 @@
---
-title: Publish a `model.yaml`
+title: Publish a model.yaml
description: Upload your model definition to the LM Studio Hub.
index: 7
---
@@ -22,7 +22,7 @@ lms clone qwen/qwen3-8b
This will result in a local copy `model.yaml`, `README` and other metadata files. Importantly, this does NOT download the model weights.
-```lms_terminal
+```bash title="Terminal"
$ ls
README.md manifest.json model.yaml thumbnail.png
```
diff --git a/0_app/3_presets/meta.json b/0_app/3_presets/meta.json
new file mode 100644
index 0000000..6c054da
--- /dev/null
+++ b/0_app/3_presets/meta.json
@@ -0,0 +1,9 @@
+{
+ "title": "Presets",
+ "pages": [
+ "import",
+ "publish",
+ "pull",
+ "push"
+ ]
+}
diff --git a/0_app/3_presets/push.md b/0_app/3_presets/push.mdx
similarity index 50%
rename from 0_app/3_presets/push.md
rename to 0_app/3_presets/push.mdx
index 3cf073b..85513ef 100644
--- a/0_app/3_presets/push.md
+++ b/0_app/3_presets/push.mdx
@@ -4,6 +4,8 @@ description: Publish new revisions of your Presets to the LM Studio Hub.
index: 5
---
+import { Step, Steps } from "fumadocs-ui/components/steps";
+
`Feature In Preview`
Starting LM Studio 0.3.15, you can publish your Presets to the LM Studio community. This allows you to share your Presets with others and import Presets from other users.
@@ -18,14 +20,20 @@ Presets you share on the LM Studio Hub can be updated.
-## Step 1: Make Changes and Commit
+
+
+ Make Changes and Commit
-Make any changes to your Preset, both in parameters that are already included in the Preset, or by adding new parameters.
+ Make any changes to your Preset, both in parameters that are already included in the Preset, or by adding new parameters.
+
-## Step 2: Click the Push Button
+
+ Click the Push Button
-Once changes are committed, you will see a `Push` button. Click it to push your changes to the Hub.
+ Once changes are committed, you will see a `Push` button. Click it to push your changes to the Hub.
-Pushing changes will result in a new revision of your Preset on the Hub.
+ Pushing changes will result in a new revision of your Preset on the Hub.
-
+
+
+
diff --git a/0_app/5_advanced/meta.json b/0_app/5_advanced/meta.json
new file mode 100644
index 0000000..7c2b36c
--- /dev/null
+++ b/0_app/5_advanced/meta.json
@@ -0,0 +1,14 @@
+{
+ "title": "Advanced",
+ "pages": [
+ "_branching",
+ "_context",
+ "_errors",
+ "import-model",
+ "parallel-requests",
+ "per-model",
+ "prompt-template",
+ "speculative-decoding",
+ "_vision"
+ ]
+}
diff --git a/0_app/5_advanced/per-model.md b/0_app/5_advanced/per-model.mdx
similarity index 68%
rename from 0_app/5_advanced/per-model.md
rename to 0_app/5_advanced/per-model.mdx
index 91ba99f..e542b6c 100644
--- a/0_app/5_advanced/per-model.md
+++ b/0_app/5_advanced/per-model.mdx
@@ -9,31 +9,30 @@ You can set default load settings for each model in LM Studio.
When the model is loaded anywhere in the app (including through [`lms load`](/docs/cli#load-a-model-with-options)) these settings will be used.
-
+
### Setting default parameters for a model
Head to the My Models tab and click on the gear ⚙️ icon to edit the model's default parameters.
-
+
This will open a dialog where you can set the default parameters for the model.
-
-
+
+
Your browser does not support the video tag.
Next time you load the model, these settings will be used.
-```lms_protip
+
#### Reasons to set default load parameters (not required, totally optional)
- Set a particular GPU offload settings for a given model
- Set a particular context size for a given model
- Whether or not to utilize Flash Attention for a given model
-
-```
+
## Advanced Topics
@@ -41,15 +40,15 @@ Next time you load the model, these settings will be used.
When you load a model, you can optionally change the default load settings.
-
+
### Saving your changes as the default settings for a model
If you make changes to load settings when you load a model, you can save them as the default settings for that model.
-
+
-
+
### Community
diff --git a/0_app/5_advanced/prompt-template.md b/0_app/5_advanced/prompt-template.md
index 583a228..a876c1b 100644
--- a/0_app/5_advanced/prompt-template.md
+++ b/0_app/5_advanced/prompt-template.md
@@ -33,7 +33,7 @@ You can make this config box always show up by right clicking the sidebar and se
You can express the prompt template in Jinja.
-###### 💡 [Jinja]() is a templating engine used to encode the prompt template in several popular LLM model file formats.
+###### Jinja is a templating engine used to encode the prompt template in several popular LLM model file formats.
#### Manual
diff --git a/0_app/6_user-interface/languages.md b/0_app/6_user-interface/languages.mdx
similarity index 96%
rename from 0_app/6_user-interface/languages.md
rename to 0_app/6_user-interface/languages.mdx
index 91ab487..2ca12bf 100644
--- a/0_app/6_user-interface/languages.md
+++ b/0_app/6_user-interface/languages.mdx
@@ -6,7 +6,7 @@ description: LM Studio is available in English, Chinese, Spanish, French, German
LM Studio is available in `English`, `Spanish`, `Japanese`, `Chinese`, `German`, `Norwegian`, `Turkish`, `Russian`, `Korean`, `Polish`, `Vietnamese`, `Czech`, `Ukrainian`, `Portuguese (BR,PT)` and many more languages thanks to incredible community localizers.
-
+
### Selecting a Language
@@ -14,13 +14,13 @@ You can choose a language in the Settings tab.
Use the dropdown menu under Preferences > Language.
-```lms_protip
+
You can jump to Settings from anywhere in the app by pressing `cmd` + `,` on macOS or `ctrl` + `,` on Windows/Linux.
-```
+
-###### To get to the Settings page, you need to be on [Power User mode](/docs/modes) or higher.
+To get to the Settings page, you need to be on Power User mode or higher.
-
+
#### Big thank you to community localizers 🙏
diff --git a/0_app/6_user-interface/meta.json b/0_app/6_user-interface/meta.json
new file mode 100644
index 0000000..9a4cb95
--- /dev/null
+++ b/0_app/6_user-interface/meta.json
@@ -0,0 +1,8 @@
+{
+ "title": "User Interface",
+ "pages": [
+ "languages",
+ "modes",
+ "themes"
+ ]
+}
diff --git a/0_app/meta.json b/0_app/meta.json
new file mode 100644
index 0000000..086fdde
--- /dev/null
+++ b/0_app/meta.json
@@ -0,0 +1,19 @@
+{
+ "title": "App",
+ "pages": [
+ "---Introduction---",
+ "...0_root",
+ "---Getting Started---",
+ "...1_basics",
+ "---MCP---",
+ "...2_mcp",
+ "---model.yaml---",
+ "...3_modelyaml",
+ "---Presets---",
+ "...3_presets",
+ "---Advanced---",
+ "...5_advanced",
+ "---User Interface---",
+ "...6_user-interface"
+ ]
+}
diff --git a/1_developer/0_core/0_server/index.md b/1_developer/0_core/0_server/index.md
index 64ac097..8a1542a 100644
--- a/1_developer/0_core/0_server/index.md
+++ b/1_developer/0_core/0_server/index.md
@@ -1,8 +1,8 @@
---
title: LM Studio as a Local LLM API Server
sidebar_title: Running the Server
-description: Run an LLM API server on `localhost` with LM Studio
-fullPage: false
+description: Run an LLM API server on localhost with LM Studio
+full: false
index: 1
---
diff --git a/1_developer/0_core/0_server/meta.json b/1_developer/0_core/0_server/meta.json
new file mode 100644
index 0000000..06caa9f
--- /dev/null
+++ b/1_developer/0_core/0_server/meta.json
@@ -0,0 +1,7 @@
+{
+ "title": "Server",
+ "pages": [
+ "serve-on-network",
+ "settings"
+ ]
+}
diff --git a/1_developer/0_core/0_server/serve-on-network.md b/1_developer/0_core/0_server/serve-on-network.md
index 577ccd7..903ec55 100644
--- a/1_developer/0_core/0_server/serve-on-network.md
+++ b/1_developer/0_core/0_server/serve-on-network.md
@@ -2,7 +2,7 @@
title: Serve on Local Network
sidebar_title: Serve on Local Network
description: Allow other devices in your network use this LM Studio API server
-fullPage: false
+full: false
index: 3
---
diff --git a/1_developer/0_core/0_server/settings.md b/1_developer/0_core/0_server/settings.md
index 568e2f7..ca42094 100644
--- a/1_developer/0_core/0_server/settings.md
+++ b/1_developer/0_core/0_server/settings.md
@@ -2,7 +2,7 @@
title: Server Settings
sidebar_title: Server Settings
description: Configure server settings for LM Studio API Server
-fullPage: false
+full: false
index: 2
---
diff --git a/1_developer/0_core/authentication.md b/1_developer/0_core/authentication.mdx
similarity index 71%
rename from 1_developer/0_core/authentication.md
rename to 1_developer/0_core/authentication.mdx
index 0883597..b0bbeb0 100644
--- a/1_developer/0_core/authentication.md
+++ b/1_developer/0_core/authentication.mdx
@@ -5,7 +5,7 @@ description: Using API Tokens in LM Studio
index: 1
---
-##### Requires [LM Studio 0.4.0](/download) or newer.
+##### Requires LM Studio 0.4.0 or newer.
LM Studio supports API Tokens for authentication, providing a secure and convenient way to access the LM Studio API.
@@ -13,41 +13,41 @@ LM Studio supports API Tokens for authentication, providing a secure and conveni
By default, LM Studio does not require authentication for API requests. To enable authentication so that only requests with a valid API Token are accepted, toggle the switch in the Developers Page > Server Settings.
-```lms_info
+
Once enabled, all requests made through the REST API, Python SDK, or Typescript SDK will need to include a valid API Token. See usage [below](#api-token-usage).
-```
+
-
+
-
+
### Creating API Tokens
To create API Tokens, click on Manage Tokens in the Server Settings. It will open the API Tokens modal where you can create, view, and delete API Tokens.
-
+
Create a token by clicking on the Create Token button. Provide a name for the token and select the desired permissions.
-
+
Once created, make sure to copy the token as it will not be shown again.
-
+
### Configuring API Token Permissions
To edit the permissions of an existing API Token, click on the Edit button next to the token in the API Tokens modal. You can modify the name and permissions of the token.
-
+
## API Token Usage
### Using API Tokens with REST API:
-```lms_noticechill
+
The example below requires [allowing calling servers from mcp.json](/docs/developer/core/server/settings) to be enabled and the [Playwright MCP](https://github.com/microsoft/playwright-mcp) in mcp.json.
-```
+
```bash
curl -X POST \
diff --git a/1_developer/0_core/headless.md b/1_developer/0_core/headless.md
index 8463b09..67e4c97 100644
--- a/1_developer/0_core/headless.md
+++ b/1_developer/0_core/headless.md
@@ -1,6 +1,6 @@
---
title: "Run LM Studio as a service (headless)"
-sidebar_title: "`llmster` - Headless Mode"
+sidebar_title: "llmster - Headless Mode"
description: "GUI-less operation of LM Studio: run in the background, start on machine login, and load models on demand"
index: 2
---
diff --git a/1_developer/0_core/headless_llmster.md b/1_developer/0_core/headless_llmster.mdx
similarity index 98%
rename from 1_developer/0_core/headless_llmster.md
rename to 1_developer/0_core/headless_llmster.mdx
index b2e7cd0..510fcce 100644
--- a/1_developer/0_core/headless_llmster.md
+++ b/1_developer/0_core/headless_llmster.mdx
@@ -7,9 +7,9 @@ index: 3
`llmster`, LM Studio's headless daemon, can be configured to run on startup. This guide covers setting up `llmster` to launch, load a model, and start an HTTP server automatically using `systemctl` on Linux.
-```lms_info
+
This guide is for Linux systems without a graphical interface. For machines with a GUI, you can configure LM Studio to [run as a service on login](/docs/developer/core/headless) instead.
-```
+
## Install the Daemon
diff --git a/1_developer/0_core/mcp.md b/1_developer/0_core/mcp.md
deleted file mode 100644
index 3e98d60..0000000
--- a/1_developer/0_core/mcp.md
+++ /dev/null
@@ -1,475 +0,0 @@
----
-title: Using MCP via API
-sidebar_title: Using MCP via API
-description: Learn how to use Model Context Protocol (MCP) servers with LM Studio API.
-index: 3
----
-
-##### Requires [LM Studio 0.4.0](/download) or newer.
-
-LM Studio supports Model Context Protocol (MCP) usage via API. MCP allows models to interact with external tools and services through standardized servers.
-
-## How it works
-
-MCP servers provide tools that models can call during chat requests. You can enable MCP servers in two ways: as ephemeral servers defined per-request, or as pre-configured servers in your `mcp.json` file.
-
-## Ephemeral vs mcp.json servers
-
-
-
-
- Feature
- Ephemeral
- mcp.json
-
-
-
-
- How to specify in request
- integrations -> "type": "ephemeral_mcp"
- integrations -> "type": "plugin"
-
-
- Configuration
- Only defined per-request
- Pre-configured in mcp.json
-
-
- Use case
- One-off requests, remote MCP tool execution
- MCP servers that require command, frequently used servers
-
-
- Server ID
- Specified via server_label in integration
- Specified via id (e.g., mcp/playwright) in integration
-
-
- Custom headers
- Supported via headers field
- Configured in mcp.json
-
-
-
-
-## Ephemeral MCP servers
-
-Ephemeral MCP servers are defined on-the-fly in each request. This is useful for testing or when you don't want to pre-configure servers.
-
-```lms_info
-Ephemeral MCP servers require the "Allow per-request MCPs" setting to be enabled in [Server Settings](/docs/developer/core/server/settings).
-```
-
-```lms_code_snippet
-variants:
- curl:
- language: bash
- code: |
- curl http://localhost:1234/api/v1/chat \
- -H "Authorization: Bearer $LM_API_TOKEN" \
- -H "Content-Type: application/json" \
- -d '{
- "model": "ibm/granite-4-micro",
- "input": "What is the top trending model on hugging face?",
- "integrations": [
- {
- "type": "ephemeral_mcp",
- "server_label": "huggingface",
- "server_url": "https://huggingface.co/mcp",
- "allowed_tools": ["model_search"]
- }
- ],
- "context_length": 8000
- }'
- Python:
- language: python
- code: |
- import os
- import requests
- import json
-
- response = requests.post(
- "http://localhost:1234/api/v1/chat",
- headers={
- "Authorization": f"Bearer {os.environ['LM_API_TOKEN']}",
- "Content-Type": "application/json"
- },
- json={
- "model": "ibm/granite-4-micro",
- "input": "What is the top trending model on hugging face?",
- "integrations": [
- {
- "type": "ephemeral_mcp",
- "server_label": "huggingface",
- "server_url": "https://huggingface.co/mcp",
- "allowed_tools": ["model_search"]
- }
- ],
- "context_length": 8000
- }
- )
- print(json.dumps(response.json(), indent=2))
- TypeScript:
- language: typescript
- code: |
- const response = await fetch("http://localhost:1234/api/v1/chat", {
- method: "POST",
- headers: {
- "Authorization": `Bearer ${process.env.LM_API_TOKEN}`,
- "Content-Type": "application/json"
- },
- body: JSON.stringify({
- "model": "ibm/granite-4-micro",
- "input": "What is the top trending model on hugging face?",
- "integrations": [
- {
- "type": "ephemeral_mcp",
- "server_label": "huggingface",
- "server_url": "https://huggingface.co/mcp",
- "allowed_tools": ["model_search"]
- }
- ],
- "context_length": 8000
- });
- const data = await response.json();
- console.log(data);
-```
-
-The model can now call tools from the specified MCP server:
-
-```lms_code_snippet
-variants:
- response:
- language: json
- code: |
- {
- "model_instance_id": "ibm/granite-4-micro",
- "output": [
- {
- "type": "reasoning",
- "content": "..."
- },
- {
- "type": "message",
- "content": "..."
- },
- {
- "type": "tool_call",
- "tool": "model_search",
- "arguments": {
- "sort": "trendingScore",
- "limit": 1
- },
- "output": "...",
- "provider_info": {
- "server_label": "huggingface",
- "type": "ephemeral_mcp"
- }
- },
- {
- "type": "reasoning",
- "content": "\n"
- },
- {
- "type": "message",
- "content": "The top trending model is ..."
- }
- ],
- "stats": {
- "input_tokens": 419,
- "total_output_tokens": 362,
- "reasoning_output_tokens": 195,
- "tokens_per_second": 27.620159487314744,
- "time_to_first_token_seconds": 1.437
- },
- "response_id": "resp_7c1a08e3d6e279efcfecb02df9de7cbd316e93422d0bb5cb"
- }
-```
-
-## MCP servers from mcp.json
-
-MCP servers can be pre-configured in your `mcp.json` file. This is the recommended approach for using MCP servers that take actions on your computer (like [microsoft/playwright-mcp](https://github.com/microsoft/playwright-mcp)) and servers that you use frequently.
-
-```lms_info
-MCP servers from mcp.json require the "Allow calling servers from mcp.json" setting to be enabled in [Server Settings](/docs/developer/core/server/settings).
-```
-
-
-
-```lms_code_snippet
-variants:
- curl:
- language: bash
- code: |
- curl http://localhost:1234/api/v1/chat \
- -H "Authorization: Bearer $LM_API_TOKEN" \
- -H "Content-Type: application/json" \
- -d '{
- "model": "ibm/granite-4-micro",
- "input": "Open lmstudio.ai",
- "integrations": ["mcp/playwright"],
- "context_length": 8000,
- "temperature": 0
- }'
- Python:
- language: python
- code: |
- import os
- import requests
- import json
-
- response = requests.post(
- "http://localhost:1234/api/v1/chat",
- headers={
- "Authorization": f"Bearer {os.environ['LM_API_TOKEN']}",
- "Content-Type": "application/json"
- },
- json={
- "model": "ibm/granite-4-micro",
- "input": "Open lmstudio.ai",
- "integrations": ["mcp/playwright"],
- "context_length": 8000,
- "temperature": 0
- }
- )
- print(json.dumps(response.json(), indent=2))
- TypeScript:
- language: typescript
- code: |
- const response = await fetch("http://localhost:1234/api/v1/chat", {
- method: "POST",
- headers: {
- "Authorization": `Bearer ${process.env.LM_API_TOKEN}`,
- "Content-Type": "application/json"
- },
- body: JSON.stringify({
- model: "ibm/granite-4-micro",
- input: "Open lmstudio.ai",
- integrations: ["mcp/playwright"],
- context_length: 8000,
- temperature: 0
- })
- });
- const data = await response.json();
- console.log(data);
-```
-
-The response includes tool calls from the configured MCP server:
-
-```lms_code_snippet
-variants:
- response:
- language: json
- code: |
- {
- "model_instance_id": "ibm/granite-4-micro",
- "output": [
- {
- "type": "reasoning",
- "content": "..."
- },
- {
- "type": "message",
- "content": "..."
- },
- {
- "type": "tool_call",
- "tool": "browser_navigate",
- "arguments": {
- "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
- },
- "output": "...",
- "provider_info": {
- "plugin_id": "mcp/playwright",
- "type": "plugin"
- }
- },
- {
- "type": "reasoning",
- "content": "..."
- },
- {
- "type": "message",
- "content": "The YouTube video page for ..."
- }
- ],
- "stats": {
- "input_tokens": 2614,
- "total_output_tokens": 594,
- "reasoning_output_tokens": 389,
- "tokens_per_second": 26.293245822877495,
- "time_to_first_token_seconds": 0.154
- },
- "response_id": "resp_cdac6a9b5e2a40027112e441ce6189db18c9040f96736407"
- }
-```
-
-## Restricting tool access
-
-For both ephemeral and mcp.json servers, you can limit which tools the model can call using the `allowed_tools` field. This is useful if you do not want certain tools from an MCP server to be used, and can speed up prompt processing due to the model receiving fewer tool definitions.
-
-```lms_code_snippet
-variants:
- curl:
- language: bash
- code: |
- curl http://localhost:1234/api/v1/chat \
- -H "Authorization: Bearer $LM_API_TOKEN" \
- -H "Content-Type: application/json" \
- -d '{
- "model": "ibm/granite-4-micro",
- "input": "What is the top trending model on hugging face?",
- "integrations": [
- {
- "type": "ephemeral_mcp",
- "server_label": "huggingface",
- "server_url": "https://huggingface.co/mcp",
- "allowed_tools": ["model_search"]
- }
- ],
- "context_length": 8000
- }'
- Python:
- language: python
- code: |
- import os
- import requests
- import json
-
- response = requests.post(
- "http://localhost:1234/api/v1/chat",
- headers={
- "Authorization": f"Bearer {os.environ['LM_API_TOKEN']}",
- "Content-Type": "application/json"
- },
- json={
- "model": "ibm/granite-4-micro",
- "input": "What is the top trending model on hugging face?",
- "integrations": [
- {
- "type": "ephemeral_mcp",
- "server_label": "huggingface",
- "server_url": "https://huggingface.co/mcp",
- "allowed_tools": ["model_search"]
- }
- ],
- "context_length": 8000
- }
- )
- print(json.dumps(response.json(), indent=2))
- TypeScript:
- language: typescript
- code: |
- const response = await fetch("http://localhost:1234/api/v1/chat", {
- method: "POST",
- headers: {
- "Authorization": `Bearer ${process.env.LM_API_TOKEN}`,
- "Content-Type": "application/json"
- },
- body: JSON.stringify({
- model: "ibm/granite-4-micro",
- input: "What is the top trending model on hugging face?",
- integrations: [
- {
- type: "ephemeral_mcp",
- server_label: "huggingface",
- server_url: "https://huggingface.co/mcp",
- allowed_tools: ["model_search"]
- }
- ],
- context_length: 8000
- })
- });
- const data = await response.json();
- console.log(data);
-```
-
-If `allowed_tools` is not provided, all tools from the server are available to the model.
-
-## Custom headers for ephemeral servers
-
-When using ephemeral MCP servers that require authentication, you can pass custom headers:
-
-```lms_code_snippet
-variants:
- curl:
- language: bash
- code: |
- curl http://localhost:1234/api/v1/chat \
- -H "Authorization: Bearer $LM_API_TOKEN" \
- -H "Content-Type: application/json" \
- -d '{
- "model": "ibm/granite-4-micro",
- "input": "Give me details about my SUPER-SECRET-PRIVATE Hugging face model",
- "integrations": [
- {
- "type": "ephemeral_mcp",
- "server_label": "huggingface",
- "server_url": "https://huggingface.co/mcp",
- "allowed_tools": ["model_search"],
- "headers": {
- "Authorization": "Bearer "
- }
- }
- ],
- "context_length": 8000
- }'
- Python:
- language: python
- code: |
- import os
- import requests
- import json
-
- response = requests.post(
- "http://localhost:1234/api/v1/chat",
- headers={
- "Authorization": f"Bearer {os.environ['LM_API_TOKEN']}",
- "Content-Type": "application/json"
- },
- json={
- "model": "ibm/granite-4-micro",
- "input": "Give me details about my SUPER-SECRET-PRIVATE Hugging face model",
- "integrations": [
- {
- "type": "ephemeral_mcp",
- "server_label": "huggingface",
- "server_url": "https://huggingface.co/mcp",
- "allowed_tools": ["model_search"],
- "headers": {
- "Authorization": "Bearer "
- }
- }
- ],
- "context_length": 8000
- }
- )
- print(json.dumps(response.json(), indent=2))
- TypeScript:
- language: typescript
- code: |
- const response = await fetch("http://localhost:1234/api/v1/chat", {
- method: "POST",
- headers: {
- "Authorization": `Bearer ${process.env.LM_API_TOKEN}`,
- "Content-Type": "application/json"
- },
- body: JSON.stringify({
- model: "ibm/granite-4-micro",
- input: "Give me details about my SUPER-SECRET-PRIVATE Hugging face model",
- integrations: [
- {
- type: "ephemeral_mcp",
- server_label: "huggingface",
- server_url: "https://huggingface.co/mcp",
- allowed_tools: ["model_search"],
- headers: {
- Authorization: "Bearer "
- }
- }
- ],
- context_length: 8000
- })
- const data = await response.json();
- console.log(data);
-```
diff --git a/1_developer/0_core/mcp.mdx b/1_developer/0_core/mcp.mdx
new file mode 100644
index 0000000..b2f1df1
--- /dev/null
+++ b/1_developer/0_core/mcp.mdx
@@ -0,0 +1,451 @@
+---
+title: Using MCP via API
+sidebar_title: Using MCP via API
+description: Learn how to use Model Context Protocol (MCP) servers with LM Studio API.
+index: 3
+---
+
+##### Requires LM Studio 0.4.0 or newer.
+
+LM Studio supports Model Context Protocol (MCP) usage via API. MCP allows models to interact with external tools and services through standardized servers.
+
+## How it works
+
+MCP servers provide tools that models can call during chat requests. You can enable MCP servers in two ways: as ephemeral servers defined per-request, or as pre-configured servers in your `mcp.json` file.
+
+## Ephemeral vs mcp.json servers
+
+
+
+
+ Feature
+ Ephemeral
+ mcp.json
+
+
+
+
+ How to specify in request
+ integrations -> "type": "ephemeral_mcp"
+ integrations -> "type": "plugin"
+
+
+ Configuration
+ Only defined per-request
+ Pre-configured in mcp.json
+
+
+ Use case
+ One-off requests, remote MCP tool execution
+ MCP servers that require command, frequently used servers
+
+
+ Server ID
+ Specified via server_label in integration
+ Specified via id (e.g., mcp/playwright) in integration
+
+
+ Custom headers
+ Supported via headers field
+ Configured in mcp.json
+
+
+
+
+## Ephemeral MCP servers
+
+Ephemeral MCP servers are defined on-the-fly in each request. This is useful for testing or when you don't want to pre-configure servers.
+
+
+Ephemeral MCP servers require the "Allow per-request MCPs" setting to be enabled in [Server Settings](/docs/developer/core/server/settings).
+
+
+```bash tab="curl"
+curl http://localhost:1234/api/v1/chat \
+ -H "Authorization: Bearer $LM_API_TOKEN" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "ibm/granite-4-micro",
+ "input": "What is the top trending model on hugging face?",
+ "integrations": [
+ {
+ "type": "ephemeral_mcp",
+ "server_label": "huggingface",
+ "server_url": "https://huggingface.co/mcp",
+ "allowed_tools": ["model_search"]
+ }
+ ],
+ "context_length": 8000
+ }'
+```
+
+```python tab="Python"
+import os
+import requests
+import json
+
+response = requests.post(
+ "http://localhost:1234/api/v1/chat",
+ headers={
+ "Authorization": f"Bearer {os.environ['LM_API_TOKEN']}",
+ "Content-Type": "application/json"
+ },
+ json={
+ "model": "ibm/granite-4-micro",
+ "input": "What is the top trending model on hugging face?",
+ "integrations": [
+ {
+ "type": "ephemeral_mcp",
+ "server_label": "huggingface",
+ "server_url": "https://huggingface.co/mcp",
+ "allowed_tools": ["model_search"]
+ }
+ ],
+ "context_length": 8000
+ }
+)
+print(json.dumps(response.json(), indent=2))
+```
+
+```typescript tab="TypeScript"
+const response = await fetch("http://localhost:1234/api/v1/chat", {
+ method: "POST",
+ headers: {
+ "Authorization": `Bearer ${process.env.LM_API_TOKEN}`,
+ "Content-Type": "application/json"
+ },
+ body: JSON.stringify({
+ "model": "ibm/granite-4-micro",
+ "input": "What is the top trending model on hugging face?",
+ "integrations": [
+ {
+ "type": "ephemeral_mcp",
+ "server_label": "huggingface",
+ "server_url": "https://huggingface.co/mcp",
+ "allowed_tools": ["model_search"]
+ }
+ ],
+ "context_length": 8000
+ });
+const data = await response.json();
+console.log(data);
+```
+
+The model can now call tools from the specified MCP server:
+
+```json
+{
+ "model_instance_id": "ibm/granite-4-micro",
+ "output": [
+ {
+ "type": "reasoning",
+ "content": "..."
+ },
+ {
+ "type": "message",
+ "content": "..."
+ },
+ {
+ "type": "tool_call",
+ "tool": "model_search",
+ "arguments": {
+ "sort": "trendingScore",
+ "limit": 1
+ },
+ "output": "...",
+ "provider_info": {
+ "server_label": "huggingface",
+ "type": "ephemeral_mcp"
+ }
+ },
+ {
+ "type": "reasoning",
+ "content": "\n"
+ },
+ {
+ "type": "message",
+ "content": "The top trending model is ..."
+ }
+ ],
+ "stats": {
+ "input_tokens": 419,
+ "total_output_tokens": 362,
+ "reasoning_output_tokens": 195,
+ "tokens_per_second": 27.620159487314744,
+ "time_to_first_token_seconds": 1.437
+ },
+ "response_id": "resp_7c1a08e3d6e279efcfecb02df9de7cbd316e93422d0bb5cb"
+}
+```
+
+## MCP servers from mcp.json
+
+MCP servers can be pre-configured in your `mcp.json` file. This is the recommended approach for using MCP servers that take actions on your computer (like [microsoft/playwright-mcp](https://github.com/microsoft/playwright-mcp)) and servers that you use frequently.
+
+
+MCP servers from mcp.json require the "Allow calling servers from mcp.json" setting to be enabled in [Server Settings](/docs/developer/core/server/settings).
+
+
+
+
+```bash tab="curl"
+curl http://localhost:1234/api/v1/chat \
+ -H "Authorization: Bearer $LM_API_TOKEN" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "ibm/granite-4-micro",
+ "input": "Open lmstudio.ai",
+ "integrations": ["mcp/playwright"],
+ "context_length": 8000,
+ "temperature": 0
+ }'
+```
+
+```python tab="Python"
+import os
+import requests
+import json
+
+response = requests.post(
+ "http://localhost:1234/api/v1/chat",
+ headers={
+ "Authorization": f"Bearer {os.environ['LM_API_TOKEN']}",
+ "Content-Type": "application/json"
+ },
+ json={
+ "model": "ibm/granite-4-micro",
+ "input": "Open lmstudio.ai",
+ "integrations": ["mcp/playwright"],
+ "context_length": 8000,
+ "temperature": 0
+ }
+)
+print(json.dumps(response.json(), indent=2))
+```
+
+```typescript tab="TypeScript"
+const response = await fetch("http://localhost:1234/api/v1/chat", {
+ method: "POST",
+ headers: {
+ "Authorization": `Bearer ${process.env.LM_API_TOKEN}`,
+ "Content-Type": "application/json"
+ },
+ body: JSON.stringify({
+ model: "ibm/granite-4-micro",
+ input: "Open lmstudio.ai",
+ integrations: ["mcp/playwright"],
+ context_length: 8000,
+ temperature: 0
+ })
+});
+const data = await response.json();
+console.log(data);
+```
+
+The response includes tool calls from the configured MCP server:
+
+```json
+{
+ "model_instance_id": "ibm/granite-4-micro",
+ "output": [
+ {
+ "type": "reasoning",
+ "content": "..."
+ },
+ {
+ "type": "message",
+ "content": "..."
+ },
+ {
+ "type": "tool_call",
+ "tool": "browser_navigate",
+ "arguments": {
+ "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
+ },
+ "output": "...",
+ "provider_info": {
+ "plugin_id": "mcp/playwright",
+ "type": "plugin"
+ }
+ },
+ {
+ "type": "reasoning",
+ "content": "..."
+ },
+ {
+ "type": "message",
+ "content": "The YouTube video page for ..."
+ }
+ ],
+ "stats": {
+ "input_tokens": 2614,
+ "total_output_tokens": 594,
+ "reasoning_output_tokens": 389,
+ "tokens_per_second": 26.293245822877495,
+ "time_to_first_token_seconds": 0.154
+ },
+ "response_id": "resp_cdac6a9b5e2a40027112e441ce6189db18c9040f96736407"
+}
+```
+
+## Restricting tool access
+
+For both ephemeral and mcp.json servers, you can limit which tools the model can call using the `allowed_tools` field. This is useful if you do not want certain tools from an MCP server to be used, and can speed up prompt processing due to the model receiving fewer tool definitions.
+
+```bash tab="curl"
+curl http://localhost:1234/api/v1/chat \
+ -H "Authorization: Bearer $LM_API_TOKEN" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "ibm/granite-4-micro",
+ "input": "What is the top trending model on hugging face?",
+ "integrations": [
+ {
+ "type": "ephemeral_mcp",
+ "server_label": "huggingface",
+ "server_url": "https://huggingface.co/mcp",
+ "allowed_tools": ["model_search"]
+ }
+ ],
+ "context_length": 8000
+ }'
+```
+
+```python tab="Python"
+import os
+import requests
+import json
+
+response = requests.post(
+ "http://localhost:1234/api/v1/chat",
+ headers={
+ "Authorization": f"Bearer {os.environ['LM_API_TOKEN']}",
+ "Content-Type": "application/json"
+ },
+ json={
+ "model": "ibm/granite-4-micro",
+ "input": "What is the top trending model on hugging face?",
+ "integrations": [
+ {
+ "type": "ephemeral_mcp",
+ "server_label": "huggingface",
+ "server_url": "https://huggingface.co/mcp",
+ "allowed_tools": ["model_search"]
+ }
+ ],
+ "context_length": 8000
+ }
+)
+print(json.dumps(response.json(), indent=2))
+```
+
+```typescript tab="TypeScript"
+const response = await fetch("http://localhost:1234/api/v1/chat", {
+ method: "POST",
+ headers: {
+ "Authorization": `Bearer ${process.env.LM_API_TOKEN}`,
+ "Content-Type": "application/json"
+ },
+ body: JSON.stringify({
+ model: "ibm/granite-4-micro",
+ input: "What is the top trending model on hugging face?",
+ integrations: [
+ {
+ type: "ephemeral_mcp",
+ server_label: "huggingface",
+ server_url: "https://huggingface.co/mcp",
+ allowed_tools: ["model_search"]
+ }
+ ],
+ context_length: 8000
+ })
+});
+const data = await response.json();
+console.log(data);
+```
+
+If `allowed_tools` is not provided, all tools from the server are available to the model.
+
+## Custom headers for ephemeral servers
+
+When using ephemeral MCP servers that require authentication, you can pass custom headers:
+
+```bash tab="curl"
+curl http://localhost:1234/api/v1/chat \
+ -H "Authorization: Bearer $LM_API_TOKEN" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "ibm/granite-4-micro",
+ "input": "Give me details about my SUPER-SECRET-PRIVATE Hugging face model",
+ "integrations": [
+ {
+ "type": "ephemeral_mcp",
+ "server_label": "huggingface",
+ "server_url": "https://huggingface.co/mcp",
+ "allowed_tools": ["model_search"],
+ "headers": {
+ "Authorization": "Bearer "
+ }
+ }
+ ],
+ "context_length": 8000
+ }'
+```
+
+```python tab="Python"
+import os
+import requests
+import json
+
+response = requests.post(
+ "http://localhost:1234/api/v1/chat",
+ headers={
+ "Authorization": f"Bearer {os.environ['LM_API_TOKEN']}",
+ "Content-Type": "application/json"
+ },
+ json={
+ "model": "ibm/granite-4-micro",
+ "input": "Give me details about my SUPER-SECRET-PRIVATE Hugging face model",
+ "integrations": [
+ {
+ "type": "ephemeral_mcp",
+ "server_label": "huggingface",
+ "server_url": "https://huggingface.co/mcp",
+ "allowed_tools": ["model_search"],
+ "headers": {
+ "Authorization": "Bearer "
+ }
+ }
+ ],
+ "context_length": 8000
+ }
+)
+print(json.dumps(response.json(), indent=2))
+```
+
+```typescript tab="TypeScript"
+const response = await fetch("http://localhost:1234/api/v1/chat", {
+ method: "POST",
+ headers: {
+ "Authorization": `Bearer ${process.env.LM_API_TOKEN}`,
+ "Content-Type": "application/json"
+ },
+ body: JSON.stringify({
+ model: "ibm/granite-4-micro",
+ input: "Give me details about my SUPER-SECRET-PRIVATE Hugging face model",
+ integrations: [
+ {
+ type: "ephemeral_mcp",
+ server_label: "huggingface",
+ server_url: "https://huggingface.co/mcp",
+ allowed_tools: ["model_search"],
+ headers: {
+ Authorization: "Bearer "
+ }
+ }
+ ],
+ context_length: 8000
+ })
+const data = await response.json();
+console.log(data);
+```
diff --git a/1_developer/0_core/meta.json b/1_developer/0_core/meta.json
new file mode 100644
index 0000000..eda74f9
--- /dev/null
+++ b/1_developer/0_core/meta.json
@@ -0,0 +1,12 @@
+{
+ "title": "Core",
+ "pages": [
+ "0_server",
+ "authentication",
+ "headless_llmster",
+ "headless",
+ "lmlink",
+ "mcp",
+ "ttl-and-auto-evict"
+ ]
+}
diff --git a/1_developer/2_rest/chat.md b/1_developer/2_rest/chat.md
index 356bebe..7b6f497 100644
--- a/1_developer/2_rest/chat.md
+++ b/1_developer/2_rest/chat.md
@@ -1,7 +1,7 @@
---
title: "Chat with a model"
description: "Send a message to a model and receive a response. Supports MCP integration."
-fullPage: true
+full: true
index: 5
api_info:
method: POST
@@ -157,59 +157,55 @@ api_info:
description: Identifier of existing response to append to. Must start with `"resp_"`.
```
:::split:::
-```lms_code_snippet
-variants:
- Request with MCP:
- language: bash
- code: |
- curl http://localhost:1234/api/v1/chat \
- -H "Authorization: Bearer $LM_API_TOKEN" \
- -H "Content-Type: application/json" \
- -d '{
- "model": "ibm/granite-4-micro",
- "input": "Tell me the top trending model on hugging face and navigate to https://lmstudio.ai",
- "integrations": [
- {
- "type": "ephemeral_mcp",
- "server_label": "huggingface",
- "server_url": "https://huggingface.co/mcp",
- "allowed_tools": [
- "model_search"
- ]
- },
- {
- "type": "plugin",
- "id": "mcp/playwright",
- "allowed_tools": [
- "browser_navigate"
- ]
- }
- ],
- "context_length": 8000,
- "temperature": 0
- }'
- Request with Images:
- language: bash
- code: |
- # Image is a small red square encoded as a base64 data URL
- curl http://localhost:1234/api/v1/chat \
- -H "Authorization: Bearer $LM_API_TOKEN" \
- -H "Content-Type: application/json" \
- -d '{
- "model": "qwen/qwen3-vl-4b",
- "input": [
- {
- "type": "text",
- "content": "Describe this image in two sentences"
- },
- {
- "type": "image",
- "data_url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8z8BQz0AEYBxVSF+FABJADveWkH6oAAAAAElFTkSuQmCC"
- }
- ],
- "context_length": 2048,
- "temperature": 0
- }'
+```bash tab="Request with MCP"
+curl http://localhost:1234/api/v1/chat \
+ -H "Authorization: Bearer $LM_API_TOKEN" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "ibm/granite-4-micro",
+ "input": "Tell me the top trending model on hugging face and navigate to https://lmstudio.ai",
+ "integrations": [
+ {
+ "type": "ephemeral_mcp",
+ "server_label": "huggingface",
+ "server_url": "https://huggingface.co/mcp",
+ "allowed_tools": [
+ "model_search"
+ ]
+ },
+ {
+ "type": "plugin",
+ "id": "mcp/playwright",
+ "allowed_tools": [
+ "browser_navigate"
+ ]
+ }
+ ],
+ "context_length": 8000,
+ "temperature": 0
+ }'
+```
+
+```bash tab="Request with Images"
+# Image is a small red square encoded as a base64 data URL
+curl http://localhost:1234/api/v1/chat \
+ -H "Authorization: Bearer $LM_API_TOKEN" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "qwen/qwen3-vl-4b",
+ "input": [
+ {
+ "type": "text",
+ "content": "Describe this image in two sentences"
+ },
+ {
+ "type": "image",
+ "data_url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8z8BQz0AEYBxVSF+FABJADveWkH6oAAAAAElFTkSuQmCC"
+ }
+ ],
+ "context_length": 2048,
+ "temperature": 0
+ }'
```
````
@@ -349,78 +345,74 @@ variants:
description: Identifier of the response for subsequent requests. Starts with `"resp_"`. Present when `store` is `true`.
```
:::split:::
-```lms_code_snippet
-variants:
- Request with MCP:
- language: json
- code: |
- {
- "model_instance_id": "ibm/granite-4-micro",
- "output": [
- {
- "type": "tool_call",
- "tool": "model_search",
- "arguments": {
- "sort": "trendingScore",
- "query": "",
- "limit": 1
- },
- "output": "...",
- "provider_info": {
- "server_label": "huggingface",
- "type": "ephemeral_mcp"
- }
- },
- {
- "type": "message",
- "content": "..."
- },
- {
- "type": "tool_call",
- "tool": "browser_navigate",
- "arguments": {
- "url": "https://lmstudio.ai"
- },
- "output": "...",
- "provider_info": {
- "plugin_id": "mcp/playwright",
- "type": "plugin"
- }
- },
- {
- "type": "message",
- "content": "**Top Trending Model on Hugging Face** ... Below is a quick snapshot of what’s on the landing page ... more details on the model or LM Studio itself!"
- }
- ],
- "stats": {
- "input_tokens": 646,
- "total_output_tokens": 586,
- "reasoning_output_tokens": 0,
- "tokens_per_second": 29.753900615398926,
- "time_to_first_token_seconds": 1.088,
- "model_load_time_seconds": 2.656
- },
- "response_id": "resp_4ef013eba0def1ed23f19dde72b67974c579113f544086de"
+```json tab="Request with MCP"
+{
+ "model_instance_id": "ibm/granite-4-micro",
+ "output": [
+ {
+ "type": "tool_call",
+ "tool": "model_search",
+ "arguments": {
+ "sort": "trendingScore",
+ "query": "",
+ "limit": 1
+ },
+ "output": "...",
+ "provider_info": {
+ "server_label": "huggingface",
+ "type": "ephemeral_mcp"
}
- Request with Images:
- language: json
- code: |
- {
- "model_instance_id": "qwen/qwen3-vl-4b",
- "output": [
- {
- "type": "message",
- "content": "This image is a solid, vibrant red square that fills the entire frame, with no discernible texture, pattern, or other elements. It presents a minimalist, uniform visual field of pure red, evoking a sense of boldness or urgency."
- }
- ],
- "stats": {
- "input_tokens": 17,
- "total_output_tokens": 50,
- "reasoning_output_tokens": 0,
- "tokens_per_second": 51.03762685242662,
- "time_to_first_token_seconds": 0.814
- },
- "response_id": "resp_0182bd7c479d7451f9a35471f9c26b34de87a7255856b9a4"
+ },
+ {
+ "type": "message",
+ "content": "..."
+ },
+ {
+ "type": "tool_call",
+ "tool": "browser_navigate",
+ "arguments": {
+ "url": "https://lmstudio.ai"
+ },
+ "output": "...",
+ "provider_info": {
+ "plugin_id": "mcp/playwright",
+ "type": "plugin"
}
+ },
+ {
+ "type": "message",
+ "content": "**Top Trending Model on Hugging Face** ... Below is a quick snapshot of what’s on the landing page ... more details on the model or LM Studio itself!"
+ }
+ ],
+ "stats": {
+ "input_tokens": 646,
+ "total_output_tokens": 586,
+ "reasoning_output_tokens": 0,
+ "tokens_per_second": 29.753900615398926,
+ "time_to_first_token_seconds": 1.088,
+ "model_load_time_seconds": 2.656
+ },
+ "response_id": "resp_4ef013eba0def1ed23f19dde72b67974c579113f544086de"
+}
+```
+
+```json tab="Request with Images"
+{
+ "model_instance_id": "qwen/qwen3-vl-4b",
+ "output": [
+ {
+ "type": "message",
+ "content": "This image is a solid, vibrant red square that fills the entire frame, with no discernible texture, pattern, or other elements. It presents a minimalist, uniform visual field of pure red, evoking a sense of boldness or urgency."
+ }
+ ],
+ "stats": {
+ "input_tokens": 17,
+ "total_output_tokens": 50,
+ "reasoning_output_tokens": 0,
+ "tokens_per_second": 51.03762685242662,
+ "time_to_first_token_seconds": 0.814
+ },
+ "response_id": "resp_0182bd7c479d7451f9a35471f9c26b34de87a7255856b9a4"
+}
```
````
diff --git a/1_developer/2_rest/download-status.md b/1_developer/2_rest/download-status.md
index e67ff48..88a12ae 100644
--- a/1_developer/2_rest/download-status.md
+++ b/1_developer/2_rest/download-status.md
@@ -1,7 +1,7 @@
---
title: "Get download status"
description: "Get the status of model downloads"
-fullPage: true
+full: true
index: 9
api_info:
method: GET
@@ -18,14 +18,9 @@ api_info:
description: The unique identifier of the download job. `job_id` is returned by the [download](/docs/developer/rest/download) endpoint when a download is initiated.
```
:::split:::
-```lms_code_snippet
-title: Example Request
-variants:
- curl:
- language: bash
- code: |
- curl -H "Authorization: Bearer $LM_API_TOKEN" \
- http://localhost:1234/api/v1/models/download/status/job_493c7c9ded
+```bash title="Example Request"
+curl -H "Authorization: Bearer $LM_API_TOKEN" \
+ http://localhost:1234/api/v1/models/download/status/job_493c7c9ded
```
````
@@ -67,19 +62,14 @@ Returns a single download job status object. The response varies based on the do
description: Download start time in ISO 8601 format.
```
:::split:::
-```lms_code_snippet
-title: Response
-variants:
- json:
- language: json
- code: |
- {
- "job_id": "job_493c7c9ded",
- "status": "completed",
- "total_size_bytes": 2279145003,
- "downloaded_bytes": 2279145003,
- "started_at": "2025-10-03T15:33:23.496Z",
- "completed_at": "2025-10-03T15:43:12.102Z"
- }
+```json title="Response"
+{
+ "job_id": "job_493c7c9ded",
+ "status": "completed",
+ "total_size_bytes": 2279145003,
+ "downloaded_bytes": 2279145003,
+ "started_at": "2025-10-03T15:33:23.496Z",
+ "completed_at": "2025-10-03T15:43:12.102Z"
+}
```
````
diff --git a/1_developer/2_rest/download.md b/1_developer/2_rest/download.md
index aa61212..0031e4f 100644
--- a/1_developer/2_rest/download.md
+++ b/1_developer/2_rest/download.md
@@ -1,7 +1,7 @@
---
title: "Download a model"
description: "Download LLMs and embedding models"
-fullPage: true
+full: true
index: 8
api_info:
method: POST
@@ -22,18 +22,13 @@ api_info:
description: Quantization level of the model to download (e.g., `Q4_K_M`). Only supported for Hugging Face links.
```
:::split:::
-```lms_code_snippet
-title: Example Request
-variants:
- curl:
- language: bash
- code: |
- curl http://localhost:1234/api/v1/models/download \
- -H "Authorization: Bearer $LM_API_TOKEN" \
- -H "Content-Type: application/json" \
- -d '{
- "model": "ibm/granite-4-micro"
- }'
+```bash title="Example Request"
+curl http://localhost:1234/api/v1/models/download \
+ -H "Authorization: Bearer $LM_API_TOKEN" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "ibm/granite-4-micro"
+ }'
```
````
@@ -64,17 +59,12 @@ Returns a download job status object. The response varies based on the download
description: Download start time in ISO 8601 format. Absent when `status` is `already_downloaded`.
```
:::split:::
-```lms_code_snippet
-title: Response
-variants:
- json:
- language: json
- code: |
- {
- "job_id": "job_493c7c9ded",
- "status": "downloading",
- "total_size_bytes": 2279145003,
- "started_at": "2025-10-03T15:33:23.496Z"
- }
+```json title="Response"
+{
+ "job_id": "job_493c7c9ded",
+ "status": "downloading",
+ "total_size_bytes": 2279145003,
+ "started_at": "2025-10-03T15:33:23.496Z"
+}
```
````
diff --git a/1_developer/2_rest/endpoints.md b/1_developer/2_rest/endpoints.mdx
similarity index 97%
rename from 1_developer/2_rest/endpoints.md
rename to 1_developer/2_rest/endpoints.mdx
index ef18ba3..99485eb 100644
--- a/1_developer/2_rest/endpoints.md
+++ b/1_developer/2_rest/endpoints.mdx
@@ -3,11 +3,11 @@ title: REST API v0
description: "The REST API includes enhanced stats such as Token / Second and Time To First Token (TTFT), as well as rich information about models such as loaded vs unloaded, max context, quantization, and more."
---
-```lms_warning
+
LM Studio now has a [v1 REST API](/docs/developer/rest)! We recommend using the v1 API for new projects!
-```
+
-##### Requires [LM Studio 0.3.6](/download) or newer.
+##### Requires LM Studio 0.3.6 or newer.
LM Studio now has its own REST API, in addition to OpenAI-compatible endpoints ([learn more](/docs/developer/openai-compat)) and Anthropic-compatible endpoints ([learn more](/docs/developer/anthropic-compat)).
@@ -31,9 +31,9 @@ To start the server, run the following command:
lms server start
```
-```lms_protip
+
You can run LM Studio as a service and get the server to auto-start on boot without launching the GUI. [Learn about Headless Mode](/docs/developer/core/headless).
-```
+
## Endpoints
diff --git a/1_developer/2_rest/index.md b/1_developer/2_rest/index.md
index 6d3bc5c..5c5035f 100644
--- a/1_developer/2_rest/index.md
+++ b/1_developer/2_rest/index.md
@@ -2,7 +2,7 @@
title: LM Studio API
sidebar_title: Overview
description: LM Studio's REST API for local inference and model management
-fullPage: false
+full: false
index: 1
---
diff --git a/1_developer/2_rest/list.md b/1_developer/2_rest/list.md
index dacfa80..856396e 100644
--- a/1_developer/2_rest/list.md
+++ b/1_developer/2_rest/list.md
@@ -1,7 +1,7 @@
---
title: "List your models"
description: "Get a list of available models on your system, including both LLMs and embedding models."
-fullPage: true
+full: true
index: 6
api_info:
method: GET
@@ -12,14 +12,9 @@ api_info:
This endpoint has no request parameters.
:::split:::
-```lms_code_snippet
-title: Example Request
-variants:
- curl:
- language: bash
- code: |
- curl http://localhost:1234/api/v1/models \
- -H "Authorization: Bearer $LM_API_TOKEN"
+```bash title="Example Request"
+curl http://localhost:1234/api/v1/models \
+ -H "Authorization: Bearer $LM_API_TOKEN"
```
````
@@ -140,99 +135,94 @@ variants:
description: The currently selected variant name. Present when `variants` is present.
```
:::split:::
-```lms_code_snippet
-title: Response
-variants:
- json:
- language: json
- code: |
+```json title="Response"
+{
+ "models": [
+ {
+ "type": "llm",
+ "publisher": "google",
+ "key": "google/gemma-4-26b-a4b",
+ "display_name": "Gemma 4 26B A4B",
+ "architecture": "gemma4",
+ "quantization": {
+ "name": "Q4_K_M",
+ "bits_per_weight": 4
+ },
+ "size_bytes": 17990911801,
+ "params_string": "26B-A4B",
+ "loaded_instances": [
+ {
+ "id": "google/gemma-4-26b-a4b",
+ "config": {
+ "context_length": 4096,
+ "eval_batch_size": 512,
+ "parallel": 4,
+ "flash_attention": true,
+ "num_experts": 8,
+ "offload_kv_cache_to_gpu": true
+ }
+ }
+ ],
+ "max_context_length": 262144,
+ "format": "gguf",
+ "capabilities": {
+ "vision": true,
+ "trained_for_tool_use": true,
+ "reasoning": {
+ "allowed_options": [
+ "off",
+ "on"
+ ],
+ "default": "on"
+ }
+ },
+ "description": null,
+ "variants": [
+ "google/gemma-4-26b-a4b@q4_k_m"
+ ],
+ "selected_variant": "google/gemma-4-26b-a4b@q4_k_m"
+ },
{
- "models": [
- {
- "type": "llm",
- "publisher": "google",
- "key": "google/gemma-4-26b-a4b",
- "display_name": "Gemma 4 26B A4B",
- "architecture": "gemma4",
- "quantization": {
- "name": "Q4_K_M",
- "bits_per_weight": 4
- },
- "size_bytes": 17990911801,
- "params_string": "26B-A4B",
- "loaded_instances": [
- {
- "id": "google/gemma-4-26b-a4b",
- "config": {
- "context_length": 4096,
- "eval_batch_size": 512,
- "parallel": 4,
- "flash_attention": true,
- "num_experts": 8,
- "offload_kv_cache_to_gpu": true
- }
- }
- ],
- "max_context_length": 262144,
- "format": "gguf",
- "capabilities": {
- "vision": true,
- "trained_for_tool_use": true,
- "reasoning": {
- "allowed_options": [
- "off",
- "on"
- ],
- "default": "on"
- }
- },
- "description": null,
- "variants": [
- "google/gemma-4-26b-a4b@q4_k_m"
- ],
- "selected_variant": "google/gemma-4-26b-a4b@q4_k_m"
- },
- {
- "type": "llm",
- "publisher": "deepseek",
- "key": "deepseek-r1",
- "display_name": "DeepSeek R1",
- "architecture": "deepseek",
- "quantization": {
- "name": "Q4_K_M",
- "bits_per_weight": 4
- },
- "size_bytes": 40492610355,
- "params_string": "671B",
- "loaded_instances": [],
- "max_context_length": 131072,
- "format": "gguf",
- "capabilities": {
- "vision": false,
- "trained_for_tool_use": true,
- "reasoning": {
- "allowed_options": ["on"],
- "default": "on"
- }
- },
- "description": null
- },
- {
- "type": "embedding",
- "publisher": "gaianet",
- "key": "text-embedding-nomic-embed-text-v1.5-embedding",
- "display_name": "Nomic Embed Text v1.5",
- "quantization": {
- "name": "F16",
- "bits_per_weight": 16
- },
- "size_bytes": 274290560,
- "params_string": null,
- "loaded_instances": [],
- "max_context_length": 2048,
- "format": "gguf"
- }
- ]
+ "type": "llm",
+ "publisher": "deepseek",
+ "key": "deepseek-r1",
+ "display_name": "DeepSeek R1",
+ "architecture": "deepseek",
+ "quantization": {
+ "name": "Q4_K_M",
+ "bits_per_weight": 4
+ },
+ "size_bytes": 40492610355,
+ "params_string": "671B",
+ "loaded_instances": [],
+ "max_context_length": 131072,
+ "format": "gguf",
+ "capabilities": {
+ "vision": false,
+ "trained_for_tool_use": true,
+ "reasoning": {
+ "allowed_options": ["on"],
+ "default": "on"
+ }
+ },
+ "description": null
+ },
+ {
+ "type": "embedding",
+ "publisher": "gaianet",
+ "key": "text-embedding-nomic-embed-text-v1.5-embedding",
+ "display_name": "Nomic Embed Text v1.5",
+ "quantization": {
+ "name": "F16",
+ "bits_per_weight": 16
+ },
+ "size_bytes": 274290560,
+ "params_string": null,
+ "loaded_instances": [],
+ "max_context_length": 2048,
+ "format": "gguf"
}
+ ]
+}
```
````
diff --git a/1_developer/2_rest/load.md b/1_developer/2_rest/load.md
index cdb4a11..ea58e61 100644
--- a/1_developer/2_rest/load.md
+++ b/1_developer/2_rest/load.md
@@ -1,7 +1,7 @@
---
title: "Load a model"
description: "Load an LLM or Embedding model into memory with custom configuration for inference"
-fullPage: true
+full: true
index: 7
api_info:
method: POST
@@ -42,21 +42,16 @@ api_info:
description: If true, echoes the final load configuration in the response under `"load_config"`. Default `false`.
```
:::split:::
-```lms_code_snippet
-title: Example Request
-variants:
- curl:
- language: bash
- code: |
- curl http://localhost:1234/api/v1/models/load \
- -H "Authorization: Bearer $LM_API_TOKEN" \
- -H "Content-Type: application/json" \
- -d '{
- "model": "openai/gpt-oss-20b",
- "context_length": 16384,
- "flash_attention": true,
- "echo_load_config": true
- }'
+```bash title="Example Request"
+curl http://localhost:1234/api/v1/models/load \
+ -H "Authorization: Bearer $LM_API_TOKEN" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "openai/gpt-oss-20b",
+ "context_length": 16384,
+ "flash_attention": true,
+ "echo_load_config": true
+ }'
```
````
@@ -118,24 +113,19 @@ variants:
description: Maximum number of tokens that the model will consider.
```
:::split:::
-```lms_code_snippet
-title: Response
-variants:
- json:
- language: json
- code: |
- {
- "type": "llm",
- "instance_id": "openai/gpt-oss-20b",
- "load_time_seconds": 9.099,
- "status": "loaded",
- "load_config": {
- "context_length": 16384,
- "eval_batch_size": 512,
- "flash_attention": true,
- "offload_kv_cache_to_gpu": true,
- "num_experts": 4
- }
- }
+```json title="Response"
+{
+ "type": "llm",
+ "instance_id": "openai/gpt-oss-20b",
+ "load_time_seconds": 9.099,
+ "status": "loaded",
+ "load_config": {
+ "context_length": 16384,
+ "eval_batch_size": 512,
+ "flash_attention": true,
+ "offload_kv_cache_to_gpu": true,
+ "num_experts": 4
+ }
+}
```
````
diff --git a/1_developer/2_rest/meta.json b/1_developer/2_rest/meta.json
new file mode 100644
index 0000000..97f3ec8
--- /dev/null
+++ b/1_developer/2_rest/meta.json
@@ -0,0 +1,15 @@
+{
+ "title": "REST API",
+ "pages": [
+ "chat",
+ "download-status",
+ "download",
+ "endpoints",
+ "list",
+ "load",
+ "quickstart",
+ "stateful-chats",
+ "streaming-events",
+ "unload"
+ ]
+}
diff --git a/1_developer/2_rest/quickstart.md b/1_developer/2_rest/quickstart.md
index e2c12f6..f0bd98e 100644
--- a/1_developer/2_rest/quickstart.md
+++ b/1_developer/2_rest/quickstart.md
@@ -2,7 +2,7 @@
title: Get up and running with the LM Studio API
sidebar_title: Quickstart
description: Download a model and start a simple Chat session using the REST API
-fullPage: false
+full: false
index: 2
---
@@ -36,53 +36,49 @@ Use the chat endpoint to send a message to a model. By default, the model will b
The `/api/v1/chat` endpoint is stateful, which means you do not need to pass the full history in every request. Read more about it [here](/docs/developer/rest/stateful-chats).
-```lms_code_snippet
-variants:
- curl:
- language: bash
- code: |
- curl http://localhost:1234/api/v1/chat \
- -H "Authorization: Bearer $LM_API_TOKEN" \
- -H "Content-Type: application/json" \
- -d '{
- "model": "ibm/granite-4-micro",
- "input": "Write a short haiku about sunrise."
- }'
- Python:
- language: python
- code: |
- import os
- import requests
- import json
-
- response = requests.post(
- "http://localhost:1234/api/v1/chat",
- headers={
- "Authorization": f"Bearer {os.environ['LM_API_TOKEN']}",
- "Content-Type": "application/json"
- },
- json={
- "model": "ibm/granite-4-micro",
- "input": "Write a short haiku about sunrise."
- }
- )
- print(json.dumps(response.json(), indent=2))
- TypeScript:
- language: typescript
- code: |
- const response = await fetch("http://localhost:1234/api/v1/chat", {
- method: "POST",
- headers: {
- "Authorization": `Bearer ${process.env.LM_API_TOKEN}`,
- "Content-Type": "application/json"
- },
- body: JSON.stringify({
- model: "ibm/granite-4-micro",
- input: "Write a short haiku about sunrise."
- })
- });
- const data = await response.json();
- console.log(data);
+```bash tab="curl"
+curl http://localhost:1234/api/v1/chat \
+ -H "Authorization: Bearer $LM_API_TOKEN" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "ibm/granite-4-micro",
+ "input": "Write a short haiku about sunrise."
+ }'
+```
+
+```python tab="Python"
+import os
+import requests
+import json
+
+response = requests.post(
+ "http://localhost:1234/api/v1/chat",
+ headers={
+ "Authorization": f"Bearer {os.environ['LM_API_TOKEN']}",
+ "Content-Type": "application/json"
+ },
+ json={
+ "model": "ibm/granite-4-micro",
+ "input": "Write a short haiku about sunrise."
+ }
+)
+print(json.dumps(response.json(), indent=2))
+```
+
+```typescript tab="TypeScript"
+const response = await fetch("http://localhost:1234/api/v1/chat", {
+ method: "POST",
+ headers: {
+ "Authorization": `Bearer ${process.env.LM_API_TOKEN}`,
+ "Content-Type": "application/json"
+ },
+ body: JSON.stringify({
+ model: "ibm/granite-4-micro",
+ input: "Write a short haiku about sunrise."
+ })
+});
+const data = await response.json();
+console.log(data);
```
See the full [chat](/docs/developer/rest/chat) docs for more details.
@@ -91,154 +87,146 @@ See the full [chat](/docs/developer/rest/chat) docs for more details.
Enable the model interact with ephemeral Model Context Protocol (MCP) servers in `/api/v1/chat` by specifying servers in the `integrations` field.
-```lms_code_snippet
-variants:
- curl:
- language: bash
- code: |
- curl http://localhost:1234/api/v1/chat \
- -H "Authorization: Bearer $LM_API_TOKEN" \
- -H "Content-Type: application/json" \
- -d '{
- "model": "ibm/granite-4-micro",
- "input": "What is the top trending model on hugging face?",
- "integrations": [
- {
- "type": "ephemeral_mcp",
- "server_label": "huggingface",
- "server_url": "https://huggingface.co/mcp",
- "allowed_tools": ["model_search"]
- }
- ],
- "context_length": 8000
- }'
- Python:
- language: python
- code: |
- import os
- import requests
- import json
-
- response = requests.post(
- "http://localhost:1234/api/v1/chat",
- headers={
- "Authorization": f"Bearer {os.environ['LM_API_TOKEN']}",
- "Content-Type": "application/json"
- },
- json={
- "model": "ibm/granite-4-micro",
- "input": "What is the top trending model on hugging face?",
- "integrations": [
- {
- "type": "ephemeral_mcp",
- "server_label": "huggingface",
- "server_url": "https://huggingface.co/mcp",
- "allowed_tools": ["model_search"]
- }
- ],
- "context_length": 8000
- }
- )
- print(json.dumps(response.json(), indent=2))
- TypeScript:
- language: typescript
- code: |
- const response = await fetch("http://localhost:1234/api/v1/chat", {
- method: "POST",
- headers: {
- "Authorization": `Bearer ${process.env.LM_API_TOKEN}`,
- "Content-Type": "application/json"
- },
- body: JSON.stringify({
- model: "ibm/granite-4-micro",
- input: "What is the top trending model on hugging face?",
- integrations: [
- {
- type: "ephemeral_mcp",
- server_label: "huggingface",
- server_url: "https://huggingface.co/mcp",
- allowed_tools: ["model_search"]
- }
- ],
- context_length: 8000
- })
- const data = await response.json();
- console.log(data);
+```bash tab="curl"
+curl http://localhost:1234/api/v1/chat \
+ -H "Authorization: Bearer $LM_API_TOKEN" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "ibm/granite-4-micro",
+ "input": "What is the top trending model on hugging face?",
+ "integrations": [
+ {
+ "type": "ephemeral_mcp",
+ "server_label": "huggingface",
+ "server_url": "https://huggingface.co/mcp",
+ "allowed_tools": ["model_search"]
+ }
+ ],
+ "context_length": 8000
+ }'
+```
+
+```python tab="Python"
+import os
+import requests
+import json
+
+response = requests.post(
+ "http://localhost:1234/api/v1/chat",
+ headers={
+ "Authorization": f"Bearer {os.environ['LM_API_TOKEN']}",
+ "Content-Type": "application/json"
+ },
+ json={
+ "model": "ibm/granite-4-micro",
+ "input": "What is the top trending model on hugging face?",
+ "integrations": [
+ {
+ "type": "ephemeral_mcp",
+ "server_label": "huggingface",
+ "server_url": "https://huggingface.co/mcp",
+ "allowed_tools": ["model_search"]
+ }
+ ],
+ "context_length": 8000
+ }
+)
+print(json.dumps(response.json(), indent=2))
+```
+
+```typescript tab="TypeScript"
+const response = await fetch("http://localhost:1234/api/v1/chat", {
+ method: "POST",
+ headers: {
+ "Authorization": `Bearer ${process.env.LM_API_TOKEN}`,
+ "Content-Type": "application/json"
+ },
+ body: JSON.stringify({
+ model: "ibm/granite-4-micro",
+ input: "What is the top trending model on hugging face?",
+ integrations: [
+ {
+ type: "ephemeral_mcp",
+ server_label: "huggingface",
+ server_url: "https://huggingface.co/mcp",
+ allowed_tools: ["model_search"]
+ }
+ ],
+ context_length: 8000
+ })
+const data = await response.json();
+console.log(data);
```
You can also use locally configured MCP plugins (from your `mcp.json`) via the `integrations` field. Using locally run MCP plugins requires authentication via an API token passed through the `Authorization` header. Read more about authentication [here](/docs/developer/core/authentication).
-```lms_code_snippet
-variants:
- curl:
- language: bash
- code: |
- curl http://localhost:1234/api/v1/chat \
- -H "Authorization: Bearer $LM_API_TOKEN" \
- -H "Content-Type: application/json" \
- -d '{
- "model": "ibm/granite-4-micro",
- "input": "Open lmstudio.ai",
- "integrations": [
- {
- "type": "plugin",
- "id": "mcp/playwright",
- "allowed_tools": ["browser_navigate"]
- }
- ],
- "context_length": 8000
- }'
- Python:
- language: python
- code: |
- import os
- import requests
- import json
-
- response = requests.post(
- "http://localhost:1234/api/v1/chat",
- headers={
- "Authorization": f"Bearer {os.environ['LM_API_TOKEN']}",
- "Content-Type": "application/json"
- },
- json={
- "model": "ibm/granite-4-micro",
- "input": "Open lmstudio.ai",
- "integrations": [
- {
- "type": "plugin",
- "id": "mcp/playwright",
- "allowed_tools": ["browser_navigate"]
- }
- ],
- "context_length": 8000
- }
- )
- print(json.dumps(response.json(), indent=2))
- TypeScript:
- language: typescript
- code: |
- const response = await fetch("http://localhost:1234/api/v1/chat", {
- method: "POST",
- headers: {
- "Authorization": `Bearer ${process.env.LM_API_TOKEN}`,
- "Content-Type": "application/json"
- },
- body: JSON.stringify({
- model: "ibm/granite-4-micro",
- input: "Open lmstudio.ai",
- integrations: [
- {
- type: "plugin",
- id: "mcp/playwright",
- allowed_tools: ["browser_navigate"]
- }
- ],
- context_length: 8000
- })
- });
- const data = await response.json();
- console.log(data);
+```bash tab="curl"
+curl http://localhost:1234/api/v1/chat \
+ -H "Authorization: Bearer $LM_API_TOKEN" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "ibm/granite-4-micro",
+ "input": "Open lmstudio.ai",
+ "integrations": [
+ {
+ "type": "plugin",
+ "id": "mcp/playwright",
+ "allowed_tools": ["browser_navigate"]
+ }
+ ],
+ "context_length": 8000
+ }'
+```
+
+```python tab="Python"
+import os
+import requests
+import json
+
+response = requests.post(
+ "http://localhost:1234/api/v1/chat",
+ headers={
+ "Authorization": f"Bearer {os.environ['LM_API_TOKEN']}",
+ "Content-Type": "application/json"
+ },
+ json={
+ "model": "ibm/granite-4-micro",
+ "input": "Open lmstudio.ai",
+ "integrations": [
+ {
+ "type": "plugin",
+ "id": "mcp/playwright",
+ "allowed_tools": ["browser_navigate"]
+ }
+ ],
+ "context_length": 8000
+ }
+)
+print(json.dumps(response.json(), indent=2))
+```
+
+```typescript tab="TypeScript"
+const response = await fetch("http://localhost:1234/api/v1/chat", {
+ method: "POST",
+ headers: {
+ "Authorization": `Bearer ${process.env.LM_API_TOKEN}`,
+ "Content-Type": "application/json"
+ },
+ body: JSON.stringify({
+ model: "ibm/granite-4-micro",
+ input: "Open lmstudio.ai",
+ integrations: [
+ {
+ type: "plugin",
+ id: "mcp/playwright",
+ allowed_tools: ["browser_navigate"]
+ }
+ ],
+ context_length: 8000
+ })
+});
+const data = await response.json();
+console.log(data);
```
See the full [chat](/docs/developer/rest/chat) docs for more details.
@@ -247,86 +235,78 @@ See the full [chat](/docs/developer/rest/chat) docs for more details.
Use the download endpoint to download models by identifier from the [LM Studio model catalog](https://lmstudio.ai/models), or by Hugging Face model URL.
-```lms_code_snippet
-variants:
- curl:
- language: bash
- code: |
- curl http://localhost:1234/api/v1/models/download \
- -H "Authorization: Bearer $LM_API_TOKEN" \
- -H "Content-Type: application/json" \
- -d '{
- "model": "ibm/granite-4-micro"
- }'
- Python:
- language: python
- code: |
- import os
- import requests
- import json
-
- response = requests.post(
- "http://localhost:1234/api/v1/models/download",
- headers={
- "Authorization": f"Bearer {os.environ['LM_API_TOKEN']}",
- "Content-Type": "application/json"
- },
- json={"model": "ibm/granite-4-micro"}
- )
- print(json.dumps(response.json(), indent=2))
- TypeScript:
- language: typescript
- code: |
- const response = await fetch("http://localhost:1234/api/v1/models/download", {
- method: "POST",
- headers: {
- "Authorization": `Bearer ${process.env.LM_API_TOKEN}`,
- "Content-Type": "application/json"
- },
- body: JSON.stringify({
- model: "ibm/granite-4-micro"
- })
- });
- const data = await response.json();
- console.log(data);
+```bash tab="curl"
+curl http://localhost:1234/api/v1/models/download \
+ -H "Authorization: Bearer $LM_API_TOKEN" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "ibm/granite-4-micro"
+ }'
+```
+
+```python tab="Python"
+import os
+import requests
+import json
+
+response = requests.post(
+ "http://localhost:1234/api/v1/models/download",
+ headers={
+ "Authorization": f"Bearer {os.environ['LM_API_TOKEN']}",
+ "Content-Type": "application/json"
+ },
+ json={"model": "ibm/granite-4-micro"}
+)
+print(json.dumps(response.json(), indent=2))
+```
+
+```typescript tab="TypeScript"
+const response = await fetch("http://localhost:1234/api/v1/models/download", {
+ method: "POST",
+ headers: {
+ "Authorization": `Bearer ${process.env.LM_API_TOKEN}`,
+ "Content-Type": "application/json"
+ },
+ body: JSON.stringify({
+ model: "ibm/granite-4-micro"
+ })
+});
+const data = await response.json();
+console.log(data);
```
The response will return a `job_id` that you can use to track download progress.
-```lms_code_snippet
-variants:
- curl:
- language: bash
- code: |
- curl -H "Authorization: Bearer $LM_API_TOKEN" \
- http://localhost:1234/api/v1/models/download/status/{job_id}
- Python:
- language: python
- code: |
- import os
- import requests
- import json
-
- job_id = "your-job-id"
- response = requests.get(
- f"http://localhost:1234/api/v1/models/download/status/{job_id}",
- headers={"Authorization": f"Bearer {os.environ['LM_API_TOKEN']}"}
- )
- print(json.dumps(response.json(), indent=2))
- TypeScript:
- language: typescript
- code: |
- const jobId = "your-job-id";
- const response = await fetch(
- `http://localhost:1234/api/v1/models/download/status/${jobId}`,
- {
- headers: {
- "Authorization": `Bearer ${process.env.LM_API_TOKEN}`
- }
- }
- );
- const data = await response.json();
- console.log(data);
+```bash tab="curl"
+curl -H "Authorization: Bearer $LM_API_TOKEN" \
+ http://localhost:1234/api/v1/models/download/status/{job_id}
+```
+
+```python tab="Python"
+import os
+import requests
+import json
+
+job_id = "your-job-id"
+response = requests.get(
+ f"http://localhost:1234/api/v1/models/download/status/{job_id}",
+ headers={"Authorization": f"Bearer {os.environ['LM_API_TOKEN']}"}
+)
+print(json.dumps(response.json(), indent=2))
+```
+
+```typescript tab="TypeScript"
+const jobId = "your-job-id";
+const response = await fetch(
+ `http://localhost:1234/api/v1/models/download/status/${jobId}`,
+ {
+ headers: {
+ "Authorization": `Bearer ${process.env.LM_API_TOKEN}`
+ }
+ }
+);
+const data = await response.json();
+console.log(data);
```
See the [download](/docs/developer/rest/download) and [download status](/docs/developer/rest/download-status) docs for more details.
diff --git a/1_developer/2_rest/stateful-chats.md b/1_developer/2_rest/stateful-chats.md
deleted file mode 100644
index 5280eee..0000000
--- a/1_developer/2_rest/stateful-chats.md
+++ /dev/null
@@ -1,95 +0,0 @@
----
-title: Stateful Chats
-sidebar_title: Stateful Chats
-description: Learn how to maintain conversation context across multiple requests
-index: 3
----
-
-The `/api/v1/chat` endpoint is stateful by default. This means you don't need to pass the full conversation history in every request — LM Studio automatically stores and manages the context for you.
-
-## How it works
-
-When you send a chat request, LM Studio stores the conversation in a chat thread and returns a `response_id` in the response. Use this `response_id` in subsequent requests to continue the conversation.
-
-```lms_code_snippet
-title: Start a new conversation
-variants:
- curl:
- language: bash
- code: |
- curl http://localhost:1234/api/v1/chat \
- -H "Authorization: Bearer $LM_API_TOKEN" \
- -H "Content-Type: application/json" \
- -d '{
- "model": "ibm/granite-4-micro",
- "input": "My favorite color is blue."
- }'
-```
-
-The response includes a `response_id`:
-
-```lms_info
-Every response includes an unique `response_id` that you can use to reference that specific point in the conversation for future requests. This allows you to branch conversations.
-```
-
-```lms_code_snippet
-title: Response
-variants:
- response:
- language: json
- code: |
- {
- "model_instance_id": "ibm/granite-4-micro",
- "output": [
- {
- "type": "message",
- "content": "That's great! Blue is a beautiful color..."
- }
- ],
- "response_id": "resp_abc123xyz..."
- }
-```
-
-## Continue a conversation
-
-Pass the `previous_response_id` in your next request to continue the conversation. The model will remember the previous context.
-
-```lms_code_snippet
-title: Continue the conversation
-variants:
- curl:
- language: bash
- code: |
- curl http://localhost:1234/api/v1/chat \
- -H "Authorization: Bearer $LM_API_TOKEN" \
- -H "Content-Type: application/json" \
- -d '{
- "model": "ibm/granite-4-micro",
- "input": "What color did I just mention?",
- "previous_response_id": "resp_abc123xyz..."
- }'
-```
-
-The model can reference the previous message without you needing to resend it and will return a new `response_id` for further continuation.
-
-## Disable stateful storage
-
-If you don't want to store the conversation, set `store` to `false`. The response will not include a `response_id`.
-
-```lms_code_snippet
-title: Stateless chat
-variants:
- curl:
- language: bash
- code: |
- curl http://localhost:1234/api/v1/chat \
- -H "Authorization: Bearer $LM_API_TOKEN" \
- -H "Content-Type: application/json" \
- -d '{
- "model": "ibm/granite-4-micro",
- "input": "Tell me a joke.",
- "store": false
- }'
-```
-
-This is useful for one-off requests where you don't need to maintain context.
diff --git a/1_developer/2_rest/stateful-chats.mdx b/1_developer/2_rest/stateful-chats.mdx
new file mode 100644
index 0000000..4cec683
--- /dev/null
+++ b/1_developer/2_rest/stateful-chats.mdx
@@ -0,0 +1,75 @@
+---
+title: Stateful Chats
+sidebar_title: Stateful Chats
+description: Learn how to maintain conversation context across multiple requests
+index: 3
+---
+
+The `/api/v1/chat` endpoint is stateful by default. This means you don't need to pass the full conversation history in every request — LM Studio automatically stores and manages the context for you.
+
+## How it works
+
+When you send a chat request, LM Studio stores the conversation in a chat thread and returns a `response_id` in the response. Use this `response_id` in subsequent requests to continue the conversation.
+
+```bash title="Start a new conversation"
+curl http://localhost:1234/api/v1/chat \
+ -H "Authorization: Bearer $LM_API_TOKEN" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "ibm/granite-4-micro",
+ "input": "My favorite color is blue."
+ }'
+```
+
+The response includes a `response_id`:
+
+
+Every response includes an unique `response_id` that you can use to reference that specific point in the conversation for future requests. This allows you to branch conversations.
+
+
+```json title="Response"
+{
+ "model_instance_id": "ibm/granite-4-micro",
+ "output": [
+ {
+ "type": "message",
+ "content": "That's great! Blue is a beautiful color..."
+ }
+ ],
+ "response_id": "resp_abc123xyz..."
+}
+```
+
+## Continue a conversation
+
+Pass the `previous_response_id` in your next request to continue the conversation. The model will remember the previous context.
+
+```bash title="Continue the conversation"
+curl http://localhost:1234/api/v1/chat \
+ -H "Authorization: Bearer $LM_API_TOKEN" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "ibm/granite-4-micro",
+ "input": "What color did I just mention?",
+ "previous_response_id": "resp_abc123xyz..."
+ }'
+```
+
+The model can reference the previous message without you needing to resend it and will return a new `response_id` for further continuation.
+
+## Disable stateful storage
+
+If you don't want to store the conversation, set `store` to `false`. The response will not include a `response_id`.
+
+```bash title="Stateless chat"
+curl http://localhost:1234/api/v1/chat \
+ -H "Authorization: Bearer $LM_API_TOKEN" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "ibm/granite-4-micro",
+ "input": "Tell me a joke.",
+ "store": false
+ }'
+```
+
+This is useful for one-off requests where you don't need to maintain context.
diff --git a/1_developer/2_rest/streaming-events.md b/1_developer/2_rest/streaming-events.md
index b783b38..e3727fe 100644
--- a/1_developer/2_rest/streaming-events.md
+++ b/1_developer/2_rest/streaming-events.md
@@ -1,6 +1,6 @@
---
title: "Streaming events"
-description: "When you chat with a model with `stream` set to `true`, the response is sent as a stream of events using Server-Sent Events (SSE)."
+description: "When you chat with a model with stream set to true, the response is sent as a stream of events using Server-Sent Events (SSE)."
index: 4
---
@@ -48,16 +48,11 @@ An event that is emitted at the start of a chat response stream.
description: The type of the event. Always `chat.start`.
```
:::split:::
-```lms_code_snippet
-title: Example Event Data
-variants:
- json:
- language: json
- code: |
- {
- "type": "chat.start",
- "model_instance_id": "openai/gpt-oss-20b"
- }
+```json title="Example Event Data"
+{
+ "type": "chat.start",
+ "model_instance_id": "openai/gpt-oss-20b"
+}
```
````
@@ -74,16 +69,11 @@ Signals the start of a model being loaded to fulfill the chat request. Will not
description: The type of the event. Always `model_load.start`.
```
:::split:::
-```lms_code_snippet
-title: Example Event Data
-variants:
- json:
- language: json
- code: |
- {
- "type": "model_load.start",
- "model_instance_id": "openai/gpt-oss-20b"
- }
+```json title="Example Event Data"
+{
+ "type": "model_load.start",
+ "model_instance_id": "openai/gpt-oss-20b"
+}
```
````
@@ -103,17 +93,12 @@ Progress of the model load.
description: The type of the event. Always `model_load.progress`.
```
:::split:::
-```lms_code_snippet
-title: Example Event Data
-variants:
- json:
- language: json
- code: |
- {
- "type": "model_load.progress",
- "model_instance_id": "openai/gpt-oss-20b",
- "progress": 0.65
- }
+```json title="Example Event Data"
+{
+ "type": "model_load.progress",
+ "model_instance_id": "openai/gpt-oss-20b",
+ "progress": 0.65
+}
```
````
@@ -133,17 +118,12 @@ Signals a successfully completed model load.
description: The type of the event. Always `model_load.end`.
```
:::split:::
-```lms_code_snippet
-title: Example Event Data
-variants:
- json:
- language: json
- code: |
- {
- "type": "model_load.end",
- "model_instance_id": "openai/gpt-oss-20b",
- "load_time_seconds": 12.34
- }
+```json title="Example Event Data"
+{
+ "type": "model_load.end",
+ "model_instance_id": "openai/gpt-oss-20b",
+ "load_time_seconds": 12.34
+}
```
````
@@ -157,15 +137,10 @@ Signals the start of the model processing a prompt.
description: The type of the event. Always `prompt_processing.start`.
```
:::split:::
-```lms_code_snippet
-title: Example Event Data
-variants:
- json:
- language: json
- code: |
- {
- "type": "prompt_processing.start"
- }
+```json title="Example Event Data"
+{
+ "type": "prompt_processing.start"
+}
```
````
@@ -182,16 +157,11 @@ Progress of the model processing a prompt.
description: The type of the event. Always `prompt_processing.progress`.
```
:::split:::
-```lms_code_snippet
-title: Example Event Data
-variants:
- json:
- language: json
- code: |
- {
- "type": "prompt_processing.progress",
- "progress": 0.5
- }
+```json title="Example Event Data"
+{
+ "type": "prompt_processing.progress",
+ "progress": 0.5
+}
```
````
@@ -205,15 +175,10 @@ Signals the end of the model processing a prompt.
description: The type of the event. Always `prompt_processing.end`.
```
:::split:::
-```lms_code_snippet
-title: Example Event Data
-variants:
- json:
- language: json
- code: |
- {
- "type": "prompt_processing.end"
- }
+```json title="Example Event Data"
+{
+ "type": "prompt_processing.end"
+}
```
````
@@ -227,15 +192,10 @@ Signals the model is starting to stream reasoning content.
description: The type of the event. Always `reasoning.start`.
```
:::split:::
-```lms_code_snippet
-title: Example Event Data
-variants:
- json:
- language: json
- code: |
- {
- "type": "reasoning.start"
- }
+```json title="Example Event Data"
+{
+ "type": "reasoning.start"
+}
```
````
@@ -252,16 +212,11 @@ A chunk of reasoning content. Multiple deltas may arrive.
description: The type of the event. Always `reasoning.delta`.
```
:::split:::
-```lms_code_snippet
-title: Example Event Data
-variants:
- json:
- language: json
- code: |
- {
- "type": "reasoning.delta",
- "content": "Need to"
- }
+```json title="Example Event Data"
+{
+ "type": "reasoning.delta",
+ "content": "Need to"
+}
```
````
@@ -275,15 +230,10 @@ Signals the end of the reasoning stream.
description: The type of the event. Always `reasoning.end`.
```
:::split:::
-```lms_code_snippet
-title: Example Event Data
-variants:
- json:
- language: json
- code: |
- {
- "type": "reasoning.end"
- }
+```json title="Example Event Data"
+{
+ "type": "reasoning.end"
+}
```
````
@@ -324,20 +274,15 @@ Emitted when the model starts a tool call.
description: The type of the event. Always `tool_call.start`.
```
:::split:::
-```lms_code_snippet
-title: Example Event Data
-variants:
- json:
- language: json
- code: |
- {
- "type": "tool_call.start",
- "tool": "model_search",
- "provider_info": {
- "type": "ephemeral_mcp",
- "server_label": "huggingface"
- }
- }
+```json title="Example Event Data"
+{
+ "type": "tool_call.start",
+ "tool": "model_search",
+ "provider_info": {
+ "type": "ephemeral_mcp",
+ "server_label": "huggingface"
+ }
+}
```
````
@@ -381,24 +326,19 @@ Arguments streamed for the current tool call.
description: The type of the event. Always `tool_call.arguments`.
```
:::split:::
-```lms_code_snippet
-title: Example Event Data
-variants:
- json:
- language: json
- code: |
- {
- "type": "tool_call.arguments",
- "tool": "model_search",
- "arguments": {
- "sort": "trendingScore",
- "limit": 1
- },
- "provider_info": {
- "type": "ephemeral_mcp",
- "server_label": "huggingface"
- }
- }
+```json title="Example Event Data"
+{
+ "type": "tool_call.arguments",
+ "tool": "model_search",
+ "arguments": {
+ "sort": "trendingScore",
+ "limit": 1
+ },
+ "provider_info": {
+ "type": "ephemeral_mcp",
+ "server_label": "huggingface"
+ }
+}
```
````
@@ -445,25 +385,20 @@ Result of the tool call, along with the arguments used.
description: The type of the event. Always `tool_call.success`.
```
:::split:::
-```lms_code_snippet
-title: Example Event Data
-variants:
- json:
- language: json
- code: |
- {
- "type": "tool_call.success",
- "tool": "model_search",
- "arguments": {
- "sort": "trendingScore",
- "limit": 1
- },
- "output": "[{\"type\":\"text\",\"text\":\"Showing first 1 models...\"}]",
- "provider_info": {
- "type": "ephemeral_mcp",
- "server_label": "huggingface"
- }
- }
+```json title="Example Event Data"
+{
+ "type": "tool_call.success",
+ "tool": "model_search",
+ "arguments": {
+ "sort": "trendingScore",
+ "limit": 1
+ },
+ "output": "[{\"type\":\"text\",\"text\":\"Showing first 1 models...\"}]",
+ "provider_info": {
+ "type": "ephemeral_mcp",
+ "server_label": "huggingface"
+ }
+}
```
````
@@ -510,20 +445,15 @@ Indicates that the tool call failed.
description: The type of the event. Always `tool_call.failure`.
```
:::split:::
-```lms_code_snippet
-title: Example Event Data
-variants:
- json:
- language: json
- code: |
- {
- "type": "tool_call.failure",
- "reason": "Cannot find tool with name open_browser.",
- "metadata": {
- "type": "invalid_name",
- "tool_name": "open_browser"
- }
- }
+```json title="Example Event Data"
+{
+ "type": "tool_call.failure",
+ "reason": "Cannot find tool with name open_browser.",
+ "metadata": {
+ "type": "invalid_name",
+ "tool_name": "open_browser"
+ }
+}
```
````
@@ -537,15 +467,10 @@ Signals the model is about to stream a message.
description: The type of the event. Always `message.start`.
```
:::split:::
-```lms_code_snippet
-title: Example Event Data
-variants:
- json:
- language: json
- code: |
- {
- "type": "message.start"
- }
+```json title="Example Event Data"
+{
+ "type": "message.start"
+}
```
````
@@ -562,16 +487,11 @@ A chunk of message content. Multiple deltas may arrive.
description: The type of the event. Always `message.delta`.
```
:::split:::
-```lms_code_snippet
-title: Example Event Data
-variants:
- json:
- language: json
- code: |
- {
- "type": "message.delta",
- "content": "The current"
- }
+```json title="Example Event Data"
+{
+ "type": "message.delta",
+ "content": "The current"
+}
```
````
@@ -585,15 +505,10 @@ Signals the end of the message stream.
description: The type of the event. Always `message.end`.
```
:::split:::
-```lms_code_snippet
-title: Example Event Data
-variants:
- json:
- language: json
- code: |
- {
- "type": "message.end"
- }
+```json title="Example Event Data"
+{
+ "type": "message.end"
+}
```
````
@@ -625,21 +540,16 @@ An error occurred during streaming. The final payload will still be sent in `cha
description: The type of the event. Always `error`.
```
:::split:::
-```lms_code_snippet
-title: Example Event Data
-variants:
- json:
- language: json
- code: |
- {
- "type": "error",
- "error": {
- "type": "invalid_request",
- "message": "\"model\" is required",
- "code": "missing_required_parameter",
- "param": "model"
- }
- }
+```json title="Example Event Data"
+{
+ "type": "error",
+ "error": {
+ "type": "invalid_request",
+ "message": "\"model\" is required",
+ "code": "missing_required_parameter",
+ "param": "model"
+ }
+}
```
````
@@ -656,36 +566,31 @@ Final event containing the full aggregated response, equivalent to the non-strea
description: The type of the event. Always `chat.end`.
```
:::split:::
-```lms_code_snippet
-title: Example Event Data
-variants:
- json:
- language: json
- code: |
+```json title="Example Event Data"
+{
+ "type": "chat.end",
+ "result": {
+ "model_instance_id": "openai/gpt-oss-20b",
+ "output": [
+ { "type": "reasoning", "content": "Need to call function." },
{
- "type": "chat.end",
- "result": {
- "model_instance_id": "openai/gpt-oss-20b",
- "output": [
- { "type": "reasoning", "content": "Need to call function." },
- {
- "type": "tool_call",
- "tool": "model_search",
- "arguments": { "sort": "trendingScore", "limit": 1 },
- "output": "[{\"type\":\"text\",\"text\":\"Showing first 1 models...\"}]",
- "provider_info": { "type": "ephemeral_mcp", "server_label": "huggingface" }
- },
- { "type": "message", "content": "The current top‑trending model is..." }
- ],
- "stats": {
- "input_tokens": 329,
- "total_output_tokens": 268,
- "reasoning_output_tokens": 5,
- "tokens_per_second": 43.73,
- "time_to_first_token_seconds": 0.781
- },
- "response_id": "resp_02b2017dbc06c12bfc353a2ed6c2b802f8cc682884bb5716"
- }
- }
+ "type": "tool_call",
+ "tool": "model_search",
+ "arguments": { "sort": "trendingScore", "limit": 1 },
+ "output": "[{\"type\":\"text\",\"text\":\"Showing first 1 models...\"}]",
+ "provider_info": { "type": "ephemeral_mcp", "server_label": "huggingface" }
+ },
+ { "type": "message", "content": "The current top‑trending model is..." }
+ ],
+ "stats": {
+ "input_tokens": 329,
+ "total_output_tokens": 268,
+ "reasoning_output_tokens": 5,
+ "tokens_per_second": 43.73,
+ "time_to_first_token_seconds": 0.781
+ },
+ "response_id": "resp_02b2017dbc06c12bfc353a2ed6c2b802f8cc682884bb5716"
+ }
+}
```
````
diff --git a/1_developer/2_rest/unload.md b/1_developer/2_rest/unload.md
index b021494..4f185c5 100644
--- a/1_developer/2_rest/unload.md
+++ b/1_developer/2_rest/unload.md
@@ -1,7 +1,7 @@
---
title: "Unload a model"
description: "Unload a loaded model from memory"
-fullPage: true
+full: true
index: 8
api_info:
method: POST
@@ -18,18 +18,13 @@ api_info:
description: Unique identifier of the model instance to unload.
```
:::split:::
-```lms_code_snippet
-title: Example Request
-variants:
- curl:
- language: bash
- code: |
- curl http://localhost:1234/api/v1/models/unload \
- -H "Authorization: Bearer $LM_API_TOKEN" \
- -H "Content-Type: application/json" \
- -d '{
- "instance_id": "openai/gpt-oss-20b"
- }'
+```bash title="Example Request"
+curl http://localhost:1234/api/v1/models/unload \
+ -H "Authorization: Bearer $LM_API_TOKEN" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "instance_id": "openai/gpt-oss-20b"
+ }'
```
````
@@ -43,14 +38,9 @@ variants:
description: Unique identifier for the unloaded model instance.
```
:::split:::
-```lms_code_snippet
-title: Response
-variants:
- json:
- language: json
- code: |
- {
- "instance_id": "openai/gpt-oss-20b"
- }
+```json title="Response"
+{
+ "instance_id": "openai/gpt-oss-20b"
+}
```
````
diff --git a/1_developer/3_openai-compat/completions.md b/1_developer/3_openai-compat/completions.mdx
similarity index 89%
rename from 1_developer/3_openai-compat/completions.md
rename to 1_developer/3_openai-compat/completions.mdx
index e37063b..ff87818 100644
--- a/1_developer/3_openai-compat/completions.md
+++ b/1_developer/3_openai-compat/completions.mdx
@@ -6,11 +6,11 @@ api_info:
method: POST
---
-```lms_warning
+
This endpoint is no longer supported by OpenAI. LM Studio continues to support it.
Using this endpoint with chat‑tuned models may produce unexpected tokens. Prefer base models.
-```
+
- Method: `POST`
- Prompt template is not applied
diff --git a/1_developer/3_openai-compat/meta.json b/1_developer/3_openai-compat/meta.json
new file mode 100644
index 0000000..3357bbc
--- /dev/null
+++ b/1_developer/3_openai-compat/meta.json
@@ -0,0 +1,12 @@
+{
+ "title": "OpenAI Compatibility",
+ "pages": [
+ "chat-completions",
+ "completions",
+ "embeddings",
+ "models",
+ "responses",
+ "structured-output",
+ "tools"
+ ]
+}
diff --git a/1_developer/3_openai-compat/tools.md b/1_developer/3_openai-compat/tools.mdx
similarity index 94%
rename from 1_developer/3_openai-compat/tools.md
rename to 1_developer/3_openai-compat/tools.mdx
index 526e73a..0babcac 100644
--- a/1_developer/3_openai-compat/tools.md
+++ b/1_developer/3_openai-compat/tools.mdx
@@ -5,42 +5,52 @@ description: Enable LLMs to interact with external functions and APIs.
index: 2
---
+import { Step, Steps } from "fumadocs-ui/components/steps";
+
Tool use enables LLMs to request calls to external functions and APIs through the `/v1/chat/completions` and `v1/responses` endpoints ([Learn more](/docs/developer/openai-compat)), via LM Studio's REST API (or via any OpenAI client). This expands their functionality far beyond text output.
-
+
## Quick Start
-### 1. Start LM Studio as a server
+
+
+ Start LM Studio as a server
-To use LM Studio programmatically from your own code, run LM Studio as a local server.
+ To use LM Studio programmatically from your own code, run LM Studio as a local server.
-You can turn on the server from the "Developer" tab in LM Studio, or via the `lms` CLI:
+ You can turn on the server from the "Developer" tab in LM Studio, or via the `lms` CLI:
-```bash
-lms server start
-```
+ ```bash
+ lms server start
+ ```
-###### Install `lms` by running `npx lmstudio install-cli`
+ **Install `lms` by running `npx lmstudio install-cli`**
-This will allow you to interact with LM Studio via the REST API. For an intro to LM Studio's REST API, see [REST API Overview](/docs/developer/rest).
+ This will allow you to interact with LM Studio via the REST API. For an intro to LM Studio's REST API, see [REST API Overview](/docs/developer/rest).
+
-### 2. Load a Model
+
+ Load a Model
-You can load a model from the "Chat" or "Developer" tabs in LM Studio, or via the `lms` CLI:
+ You can load a model from the "Chat" or "Developer" tabs in LM Studio, or via the `lms` CLI:
-```bash
-lms load
-```
+ ```bash
+ lms load
+ ```
+
-### 3. Copy, Paste, and Run an Example!
+
+ Copy, Paste, and Run an Example!
-- `Curl`
- - [Single Turn Tool Call Request](#example-using-curl)
-- `Python`
- - [Single Turn Tool Call + Tool Use](#single-turn-example)
- - [Multi-Turn Example](#multi-turn-example)
- - [Advanced Agent Example](#advanced-agent-example)
+ - `Curl`
+ - [Single Turn Tool Call Request](#example-using-curl)
+ - `Python`
+ - [Single Turn Tool Call + Tool Use](#single-turn-example)
+ - [Multi-Turn Example](#multi-turn-example)
+ - [Advanced Agent Example](#advanced-agent-example)
+
+
## Tool Use
@@ -287,26 +297,26 @@ modelIdentifier: gemma-2-2b-it
modelPath: lmstudio-community/gemma-2-2b-it-GGUF/gemma-2-2b-it-Q4_K_M.gguf
input: "system
You are a tool-calling AI. You can request calls to available tools with this EXACT format:
-[TOOL_REQUEST]{"name": "tool_name", "arguments": {"param1": "value1"}}[END_TOOL_REQUEST]
+[TOOL_REQUEST]{\"name\": \"tool_name\", \"arguments\": {\"param1\": \"value1\"}}[END_TOOL_REQUEST]
AVAILABLE TOOLS:
{
- "type": "toolArray",
- "tools": [
+ \"type\": \"toolArray\",
+ \"tools\": [
{
- "type": "function",
- "function": {
- "name": "get_delivery_date",
- "description": "Get the delivery date for a customer's order",
- "parameters": {
- "type": "object",
- "properties": {
- "order_id": {
- "type": "string"
+ \"type\": \"function\",
+ \"function\": {
+ \"name\": \"get_delivery_date\",
+ \"description\": \"Get the delivery date for a customer's order\",
+ \"parameters\": {
+ \"type\": \"object\",
+ \"properties\": {
+ \"order_id\": {
+ \"type\": \"string\"
}
},
- "required": [
- "order_id"
+ \"required\": [
+ \"order_id\"
]
}
}
@@ -322,12 +332,12 @@ RULES:
- If you decide to call one or more tools, there should be no other text in your message
Examples:
-"Check Paris weather"
-[TOOL_REQUEST]{"name": "get_weather", "arguments": {"location": "Paris"}}[END_TOOL_REQUEST]
+\"Check Paris weather\"
+[TOOL_REQUEST]{\"name\": \"get_weather\", \"arguments\": {\"location\": \"Paris\"}}[END_TOOL_REQUEST]
-"Send email to John about meeting and open browser"
-[TOOL_REQUEST]{"name": "send_email", "arguments": {"to": "John", "subject": "meeting"}}[END_TOOL_REQUEST]
-[TOOL_REQUEST]{"name": "open_browser", "arguments": {}}[END_TOOL_REQUEST]
+\"Send email to John about meeting and open browser\"
+[TOOL_REQUEST]{\"name\": \"send_email\", \"arguments\": {\"to\": \"John\", \"subject\": \"meeting\"}}[END_TOOL_REQUEST]
+[TOOL_REQUEST]{\"name\": \"open_browser\", \"arguments\": {}}[END_TOOL_REQUEST]
Respond conversationally if no matching tools exist.
user
@@ -339,7 +349,7 @@ Get me delivery date for order 123
If the model follows this format exactly to call tools, i.e:
```
-[TOOL_REQUEST]{"name": "get_delivery_date", "arguments": {"order_id": "123"}}[END_TOOL_REQUEST]
+[TOOL_REQUEST]{\"name\": \"get_delivery_date\", \"arguments\": {\"order_id\": \"123\"}}[END_TOOL_REQUEST]
```
Then LM Studio will be able to parse those tool calls into the `chat.completions` object, just like for natively supported models.
@@ -742,6 +752,7 @@ def open_safe_url(url: str) -> dict:
"status": "error",
"message": f"Domain {domain} not in allowed list",
}
+ }
except Exception as e:
return {"status": "error", "message": str(e)}
diff --git a/1_developer/4_anthropic-compat/meta.json b/1_developer/4_anthropic-compat/meta.json
new file mode 100644
index 0000000..2ac0ac3
--- /dev/null
+++ b/1_developer/4_anthropic-compat/meta.json
@@ -0,0 +1,6 @@
+{
+ "title": "Anthropic Compatibility",
+ "pages": [
+ "messages"
+ ]
+}
diff --git a/1_developer/api-changelog.md b/1_developer/api-changelog.md
index 6b5b0a7..a7b42d3 100644
--- a/1_developer/api-changelog.md
+++ b/1_developer/api-changelog.md
@@ -136,7 +136,9 @@ index: 2
---
-###### [👾 LM Studio 0.3.15](/blog/lmstudio-v0.3.15) • 2025-04-24
+###### 👾 LM Studio 0.3.15 • 2025-04-24
+
+Release post: [LM Studio 0.3.15](/blog/lmstudio-v0.3.15)
### Improved Tool Use API Support
@@ -156,7 +158,9 @@ Chunked responses now set `"finish_reason": "tool_calls"` when appropriate.
---
-###### [👾 LM Studio 0.3.14](/blog/lmstudio-v0.3.14) • 2025-03-27
+###### 👾 LM Studio 0.3.14 • 2025-03-27
+
+Release post: [LM Studio 0.3.14](/blog/lmstudio-v0.3.14)
### [API/SDK] Preset Support
@@ -164,7 +168,9 @@ RESTful API and SDKs support specifying presets in requests.
_(example needed)_
-###### [👾 LM Studio 0.3.10](/blog/lmstudio-v0.3.10) • 2025-02-18
+###### 👾 LM Studio 0.3.10 • 2025-02-18
+
+Release post: [LM Studio 0.3.10](/blog/lmstudio-v0.3.10)
### Speculative Decoding API
@@ -193,7 +199,9 @@ Responses now include a `stats` object for speculative decoding:
---
-###### [👾 LM Studio 0.3.9](blog/lmstudio-v0.3.9) • 2025-01-30
+###### 👾 LM Studio 0.3.9 • 2025-01-30
+
+Release post: [LM Studio 0.3.9](blog/lmstudio-v0.3.9)
### Idle TTL and Auto Evict
@@ -223,7 +231,9 @@ Turn this on in App Settings > Developer.
---
-###### [👾 LM Studio 0.3.6](blog/lmstudio-v0.3.6) • 2025-01-06
+###### 👾 LM Studio 0.3.6 • 2025-01-06
+
+Release post: [LM Studio 0.3.6](blog/lmstudio-v0.3.6)
### Tool and Function Calling API
@@ -233,7 +243,9 @@ Docs: [Tool Use and Function Calling](/docs/developer/core/tools).
---
-###### [👾 LM Studio 0.3.5](blog/lmstudio-v0.3.5) • 2024-10-22
+###### 👾 LM Studio 0.3.5 • 2024-10-22
+
+Release post: [LM Studio 0.3.5](blog/lmstudio-v0.3.5)
### Introducing `lms get`: download models from the terminal
diff --git a/1_developer/index.md b/1_developer/index.md
deleted file mode 100644
index febc5cc..0000000
--- a/1_developer/index.md
+++ /dev/null
@@ -1,116 +0,0 @@
----
-title: LM Studio Developer Docs
-sidebar_title: Introduction
-description: Build with LM Studio's local APIs and SDKs — TypeScript, Python, REST, and OpenAI and Anthropic-compatible endpoints.
-index: 1
----
-
-```lms_hstack
-## Get to know the stack
-
-- TypeScript SDK: [lmstudio-js](/docs/typescript)
-- Python SDK: [lmstudio-python](/docs/python)
-- LM Studio REST API: [Stateful Chats, MCPs via API](/docs/developer/rest)
-- OpenAI‑compatible: [Chat, Responses, Embeddings](/docs/developer/openai-compat)
-- Anthropic-compatible: [Messages](/docs/developer/anthropic-compat)
-- LM Studio CLI: [`lms`](/docs/cli)
-
-:::split:::
-
-## What you can build
-
-- Chat and text generation with streaming
-- Tool calling and local agents with MCP
-- Structured output (JSON schema)
-- Embeddings and tokenization
-- Model management (load, download, list)
-```
-
-## Install `llmster` for headless deployments
-
-`llmster` is LM Studio's core, packaged as a daemon for headless deployment on servers, cloud instances, or CI. The daemon runs standalone, and it is not dependent on the LM Studio GUI.
-
-**Mac / Linux**
-
-```bash
-curl -fsSL https://lmstudio.ai/install.sh | bash
-```
-
-**Windows**
-
-```powershell
-irm https://lmstudio.ai/install.ps1 | iex
-```
-
-**Basic usage**
-
-```bash
-lms daemon up # Start the daemon
-lms get # Download a model
-lms server start # Start the local server
-lms chat # Open an interactive session
-```
-
-Learn more: [Headless deployments](/blog/0.4.0#deploy-on-servers-deploy-in-ci-deploy-anywhere)
-
-## Super quick start
-
-### TypeScript (`lmstudio-js`)
-
-```bash
-npm install @lmstudio/sdk
-```
-
-```ts
-import { LMStudioClient } from "@lmstudio/sdk";
-
-const client = new LMStudioClient();
-const model = await client.llm.model("openai/gpt-oss-20b");
-const result = await model.respond("Who are you, and what can you do?");
-
-console.info(result.content);
-```
-
-Full docs: [lmstudio-js](/docs/typescript), Source: [GitHub](https://github.com/lmstudio-ai/lmstudio-js)
-
-### Python (`lmstudio-python`)
-
-```bash
-pip install lmstudio
-```
-
-```python
-import lmstudio as lms
-
-with lms.Client() as client:
- model = client.llm.model("openai/gpt-oss-20b")
- result = model.respond("Who are you, and what can you do?")
- print(result)
-```
-
-Full docs: [lmstudio-python](/docs/python), Source: [GitHub](https://github.com/lmstudio-ai/lmstudio-python)
-
-### HTTP (LM Studio REST API)
-
-```bash
-lms server start --port 1234
-```
-
-```bash
-curl http://localhost:1234/api/v1/chat \
- -H "Content-Type: application/json" \
- -H "Authorization: Bearer $LM_API_TOKEN" \
- -d '{
- "model": "openai/gpt-oss-20b",
- "input": "Who are you, and what can you do?"
- }'
-```
-
-Full docs: [LM Studio REST API](/docs/developer/rest)
-
-## Helpful links
-
-- [API Changelog](/docs/developer/api-changelog)
-- [Local server basics](/docs/developer/core/server)
-- [CLI reference](/docs/cli)
-- [Discord Community](https://discord.gg/lmstudio)
diff --git a/1_developer/index.mdx b/1_developer/index.mdx
new file mode 100644
index 0000000..ee3022d
--- /dev/null
+++ b/1_developer/index.mdx
@@ -0,0 +1,186 @@
+---
+title: LM Studio Developer Docs
+sidebar_title: Introduction
+description: Build with LM Studio's local APIs and SDKs — TypeScript, Python, REST, and OpenAI and Anthropic-compatible endpoints.
+index: 1
+---
+
+import { Card, Cards } from "fumadocs-ui/components/card";
+import {
+ Blocks,
+ Bot,
+ Braces,
+ Cable,
+ Code2,
+ Command,
+ Database,
+ MessageSquareText,
+ PackageSearch,
+ Sparkles,
+ Waves,
+} from "lucide-react";
+
+## Get to know the stack
+
+
+ }
+ />
+ }
+ />
+ }
+ />
+ }
+ />
+ }
+ />
+ }
+ />
+
+
+## What you can build
+
+
+ }
+ />
+ }
+ />
+ }
+ />
+ }
+ />
+ }
+ />
+
+
+## Install `llmster` for headless deployments
+
+`llmster` is LM Studio's core, packaged as a daemon for headless deployment on servers, cloud instances, or CI. The daemon runs standalone, and it is not dependent on the LM Studio GUI.
+
+**Mac / Linux**
+
+```bash
+curl -fsSL https://lmstudio.ai/install.sh | bash
+```
+
+**Windows**
+
+```powershell
+irm https://lmstudio.ai/install.ps1 | iex
+```
+
+**Basic usage**
+
+```bash
+lms daemon up # Start the daemon
+lms get # Download a model
+lms server start # Start the local server
+lms chat # Open an interactive session
+```
+
+Learn more: [Headless deployments](/blog/0.4.0#deploy-on-servers-deploy-in-ci-deploy-anywhere)
+
+## Super quick start
+
+### TypeScript (`lmstudio-js`)
+
+```bash
+npm install @lmstudio/sdk
+```
+
+```ts
+import { LMStudioClient } from "@lmstudio/sdk";
+
+const client = new LMStudioClient();
+const model = await client.llm.model("openai/gpt-oss-20b");
+const result = await model.respond("Who are you, and what can you do?");
+
+console.info(result.content);
+```
+
+Full docs: [lmstudio-js](/docs/typescript), Source: [GitHub](https://github.com/lmstudio-ai/lmstudio-js)
+
+### Python (`lmstudio-python`)
+
+```bash
+pip install lmstudio
+```
+
+```python
+import lmstudio as lms
+
+with lms.Client() as client:
+ model = client.llm.model("openai/gpt-oss-20b")
+ result = model.respond("Who are you, and what can you do?")
+ print(result)
+```
+
+Full docs: [lmstudio-python](/docs/python), Source: [GitHub](https://github.com/lmstudio-ai/lmstudio-python)
+
+### HTTP (LM Studio REST API)
+
+```bash
+lms server start --port 1234
+```
+
+```bash
+curl http://localhost:1234/api/v1/chat \
+ -H "Content-Type: application/json" \
+ -H "Authorization: Bearer $LM_API_TOKEN" \
+ -d '{
+ "model": "openai/gpt-oss-20b",
+ "input": "Who are you, and what can you do?"
+ }'
+```
+
+Full docs: [LM Studio REST API](/docs/developer/rest)
+
+## Helpful links
+
+- [API Changelog](/docs/developer/api-changelog)
+- [Local server basics](/docs/developer/core/server)
+- [CLI reference](/docs/cli)
+- [Discord Community](https://discord.gg/lmstudio)
diff --git a/1_developer/meta.json b/1_developer/meta.json
new file mode 100644
index 0000000..f81a22e
--- /dev/null
+++ b/1_developer/meta.json
@@ -0,0 +1,17 @@
+{
+ "title": "Developer",
+ "pages": [
+ "---Introduction---",
+ "index",
+ "api-changelog",
+ "_embeddings",
+ "---Core---",
+ "...0_core",
+ "---REST API---",
+ "...2_rest",
+ "---OpenAI Compatibility---",
+ "...3_openai-compat",
+ "---Anthropic Compatibility---",
+ "...4_anthropic-compat"
+ ]
+}
diff --git a/1_python/1_getting-started/authentication.md b/1_python/1_getting-started/authentication.md
deleted file mode 100644
index 71858e9..0000000
--- a/1_python/1_getting-started/authentication.md
+++ /dev/null
@@ -1,62 +0,0 @@
----
-title: Authentication
-sidebar_title: Authentication
-description: Using API Tokens in LM Studio
-index: 3
----
-
-##### Requires [LM Studio 0.4.0](/download) or newer.
-
-LM Studio supports API Tokens for authentication, providing a secure and convenient way to access the LM Studio API.
-
-By default, the LM Studio API runs **without enforcing authentication**. For production or shared environments, enable API Token authentication for secure access.
-
-```lms_info
-To enable API Token authentication, create tokens and control granular permissions, check [this guide](/docs/developer/core/authentication) for more details.
-```
-
-## Providing the API Token
-
-The API Token can be provided in two ways:
-
-1. **Environment Variable (Recommended)**: Set the `LM_API_TOKEN` environment variable, and the SDK will automatically read it.
-2. **Function Argument**: Pass the token directly as the `api_token` parameter.
-
-```lms_code_snippet
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import lmstudio as lms
-
- # Configure the default client with an API token
- lms.configure_default_client(api_token="your-token-here")
-
- model = lms.llm()
- result = model.respond("What is the meaning of life?")
- print(result)
-
- "Python (scoped resource API)":
- language: python
- code: |
- import lmstudio as lms
-
- # Pass api_token to the Client constructor
- with lms.Client(api_token="your-token-here") as client:
- model = client.llm.model()
- result = model.respond("What is the meaning of life?")
- print(result)
-
- "Python (asynchronous API)":
- language: python
- code: |
- # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
- # Requires Python SDK version 1.5.0 or later
- import lmstudio as lms
-
- # Pass api_token to the AsyncClient constructor
- async with lms.AsyncClient(api_token="your-token-here") as client:
- model = await client.llm.model()
- result = await model.respond("What is the meaning of life?")
- print(result)
-```
diff --git a/1_python/1_getting-started/authentication.mdx b/1_python/1_getting-started/authentication.mdx
new file mode 100644
index 0000000..3487e99
--- /dev/null
+++ b/1_python/1_getting-started/authentication.mdx
@@ -0,0 +1,56 @@
+---
+title: Authentication
+sidebar_title: Authentication
+description: Using API Tokens in LM Studio
+index: 3
+---
+
+##### Requires LM Studio 0.4.0 or newer.
+
+LM Studio supports API Tokens for authentication, providing a secure and convenient way to access the LM Studio API.
+
+By default, the LM Studio API runs **without enforcing authentication**. For production or shared environments, enable API Token authentication for secure access.
+
+
+To enable API Token authentication, create tokens and control granular permissions, check [this guide](/docs/developer/core/authentication) for more details.
+
+
+## Providing the API Token
+
+The API Token can be provided in two ways:
+
+1. **Environment Variable (Recommended)**: Set the `LM_API_TOKEN` environment variable, and the SDK will automatically read it.
+2. **Function Argument**: Pass the token directly as the `api_token` parameter.
+
+```python tab="Python (convenience API)"
+import lmstudio as lms
+
+# Configure the default client with an API token
+lms.configure_default_client(api_token="your-token-here")
+
+model = lms.llm()
+result = model.respond("What is the meaning of life?")
+print(result)
+```
+
+```python tab="Python (scoped resource API)"
+import lmstudio as lms
+
+# Pass api_token to the Client constructor
+with lms.Client(api_token="your-token-here") as client:
+ model = client.llm.model()
+ result = model.respond("What is the meaning of life?")
+ print(result)
+```
+
+```python tab="Python (asynchronous API)"
+# Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
+# Requires Python SDK version 1.5.0 or later
+import lmstudio as lms
+
+# Pass api_token to the AsyncClient constructor
+async with lms.AsyncClient(api_token="your-token-here") as client:
+ model = await client.llm.model()
+ result = await model.respond("What is the meaning of life?")
+ print(result)
+```
diff --git a/1_python/1_getting-started/meta.json b/1_python/1_getting-started/meta.json
new file mode 100644
index 0000000..0696e54
--- /dev/null
+++ b/1_python/1_getting-started/meta.json
@@ -0,0 +1,8 @@
+{
+ "title": "Getting Started",
+ "pages": [
+ "authentication",
+ "project-setup",
+ "repl"
+ ]
+}
diff --git a/1_python/1_getting-started/project-setup.md b/1_python/1_getting-started/project-setup.md
index f620af1..1baa661 100644
--- a/1_python/1_getting-started/project-setup.md
+++ b/1_python/1_getting-started/project-setup.md
@@ -1,7 +1,7 @@
---
title: "Project Setup"
sidebar_title: "Project Setup"
-description: "Set up your `lmstudio-python` app or script."
+description: "Set up your lmstudio-python app or script."
index: 2
---
@@ -15,20 +15,16 @@ As it is published to PyPI, `lmstudio-python` may be installed using `pip`
or your preferred project dependency manager (`pdm` and `uv` are shown, but other
Python project management tools offer similar dependency addition commands).
-```lms_code_snippet
- variants:
- pip:
- language: bash
- code: |
- pip install lmstudio
- pdm:
- language: bash
- code: |
- pdm add lmstudio
- uv:
- language: bash
- code: |
- uv add lmstudio
+```bash tab="pip"
+pip install lmstudio
+```
+
+```bash tab="pdm"
+pdm add lmstudio
+```
+
+```bash tab="uv"
+uv add lmstudio
```
## Customizing the server API host and TCP port
@@ -40,53 +36,47 @@ SDK also required that the optional HTTP REST server be enabled).
The network location of the server API can be overridden by
passing a `"host:port"` string when creating the client instance.
-```lms_code_snippet
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import lmstudio as lms
- SERVER_API_HOST = "localhost:1234"
-
- # This must be the *first* convenience API interaction (otherwise the SDK
- # implicitly creates a client that accesses the default server API host)
- lms.configure_default_client(SERVER_API_HOST)
-
- # Note: the dedicated configuration API was added in lmstudio-python 1.3.0
- # For compatibility with earlier SDK versions, it is still possible to use
- # lms.get_default_client(SERVER_API_HOST) to configure the default client
-
- "Python (scoped resource API)":
- language: python
- code: |
- import lmstudio as lms
- SERVER_API_HOST = "localhost:1234"
-
- # When using the scoped resource API, each client instance
- # can be configured to use a specific server API host
- with lms.Client(SERVER_API_HOST) as client:
- model = client.llm.model()
-
- for fragment in model.respond_stream("What is the meaning of life?"):
- print(fragment.content, end="", flush=True)
- print() # Advance to a new line at the end of the response
-
- "Python (asynchronous API)":
- language: python
- code: |
- # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
- # Requires Python SDK version 1.5.0 or later
- import lmstudio as lms
- SERVER_API_HOST = "localhost:1234"
-
- # When using the asynchronous API, each client instance
- # can be configured to use a specific server API host
- async with lms.AsyncClient(SERVER_API_HOST) as client:
- model = await client.llm.model()
-
- for fragment in await model.respond_stream("What is the meaning of life?"):
- print(fragment.content, end="", flush=True)
- print() # Advance to a new line at the end of the response
+```python tab="Python (convenience API)"
+import lmstudio as lms
+SERVER_API_HOST = "localhost:1234"
+
+# This must be the *first* convenience API interaction (otherwise the SDK
+# implicitly creates a client that accesses the default server API host)
+lms.configure_default_client(SERVER_API_HOST)
+
+# Note: the dedicated configuration API was added in lmstudio-python 1.3.0
+# For compatibility with earlier SDK versions, it is still possible to use
+# lms.get_default_client(SERVER_API_HOST) to configure the default client
+```
+
+```python tab="Python (scoped resource API)"
+import lmstudio as lms
+SERVER_API_HOST = "localhost:1234"
+
+# When using the scoped resource API, each client instance
+# can be configured to use a specific server API host
+with lms.Client(SERVER_API_HOST) as client:
+ model = client.llm.model()
+
+ for fragment in model.respond_stream("What is the meaning of life?"):
+ print(fragment.content, end="", flush=True)
+ print() # Advance to a new line at the end of the response
+```
+
+```python tab="Python (asynchronous API)"
+# Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
+# Requires Python SDK version 1.5.0 or later
+import lmstudio as lms
+SERVER_API_HOST = "localhost:1234"
+
+# When using the asynchronous API, each client instance
+# can be configured to use a specific server API host
+async with lms.AsyncClient(SERVER_API_HOST) as client:
+ model = await client.llm.model()
+
+ for fragment in await model.respond_stream("What is the meaning of life?"):
+ print(fragment.content, end="", flush=True)
+ print() # Advance to a new line at the end of the response
```
### Checking a specified API server host is running
@@ -97,31 +87,26 @@ While the most common connection pattern is to let the SDK raise an exception if
connect to the specified API server host, the SDK also supports running the API check directly
without creating an SDK client instance first:
-```lms_code_snippet
- variants:
- "Python (synchronous API)":
- language: python
- code: |
- import lmstudio as lms
- SERVER_API_HOST = "localhost:1234"
-
- if lms.Client.is_valid_api_host(SERVER_API_HOST):
- print(f"An LM Studio API server instance is available at {SERVER_API_HOST}")
- else:
- print("No LM Studio API server instance found at {SERVER_API_HOST}")
-
- "Python (asynchronous API)":
- language: python
- code: |
- # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
- # Requires Python SDK version 1.5.0 or later
- import lmstudio as lms
- SERVER_API_HOST = "localhost:1234"
-
- if await lms.AsyncClient.is_valid_api_host(SERVER_API_HOST):
- print(f"An LM Studio API server instance is available at {SERVER_API_HOST}")
- else:
- print("No LM Studio API server instance found at {SERVER_API_HOST}")
+```python tab="Python (synchronous API)"
+import lmstudio as lms
+SERVER_API_HOST = "localhost:1234"
+
+if lms.Client.is_valid_api_host(SERVER_API_HOST):
+ print(f"An LM Studio API server instance is available at {SERVER_API_HOST}")
+else:
+ print("No LM Studio API server instance found at {SERVER_API_HOST}")
+```
+
+```python tab="Python (asynchronous API)"
+# Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
+# Requires Python SDK version 1.5.0 or later
+import lmstudio as lms
+SERVER_API_HOST = "localhost:1234"
+
+if await lms.AsyncClient.is_valid_api_host(SERVER_API_HOST):
+ print(f"An LM Studio API server instance is available at {SERVER_API_HOST}")
+else:
+ print("No LM Studio API server instance found at {SERVER_API_HOST}")
```
### Determining the default local API server port
@@ -133,29 +118,24 @@ interface for a running API server instance. This scan is repeated for each new
created. Rather than letting the SDK perform this scan implicitly, the SDK also supports running
the scan explicitly, and passing in the reported API server details when creating clients:
-```lms_code_snippet
- variants:
- "Python (synchronous API)":
- language: python
- code: |
- import lmstudio as lms
-
- api_host = lms.Client.find_default_local_api_host()
- if api_host is not None:
- print(f"An LM Studio API server instance is available at {api_host}")
- else:
- print("No LM Studio API server instance found on any of the default local ports")
-
- "Python (asynchronous API)":
- language: python
- code: |
- # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
- # Requires Python SDK version 1.5.0 or later
- import lmstudio as lms
-
- api_host = await lms.AsyncClient.find_default_local_api_host()
- if api_host is not None:
- print(f"An LM Studio API server instance is available at {api_host}")
- else:
- print("No LM Studio API server instance found on any of the default local ports")
+```python tab="Python (synchronous API)"
+import lmstudio as lms
+
+api_host = lms.Client.find_default_local_api_host()
+if api_host is not None:
+ print(f"An LM Studio API server instance is available at {api_host}")
+ else:
+ print("No LM Studio API server instance found on any of the default local ports")
+```
+
+```python tab="Python (asynchronous API)"
+# Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
+# Requires Python SDK version 1.5.0 or later
+import lmstudio as lms
+
+api_host = await lms.AsyncClient.find_default_local_api_host()
+if api_host is not None:
+ print(f"An LM Studio API server instance is available at {api_host}")
+ else:
+ print("No LM Studio API server instance found on any of the default local ports")
```
diff --git a/1_python/1_getting-started/repl.md b/1_python/1_getting-started/repl.md
index df545a0..f4c0fdc 100644
--- a/1_python/1_getting-started/repl.md
+++ b/1_python/1_getting-started/repl.md
@@ -1,7 +1,7 @@
---
-title: "Using `lmstudio-python` in REPL"
+title: "Using lmstudio-python in REPL"
sidebar_title: "REPL Usage"
-description: "You can use `lmstudio-python` in REPL (Read-Eval-Print Loop) to interact with LLMs, manage models, and more."
+description: "You can use lmstudio-python in REPL (Read-Eval-Print Loop) to interact with LLMs, manage models, and more."
index: 2
---
@@ -18,70 +18,58 @@ The convenience API allows the standard Python REPL, or more flexible alternativ
Juypter Notebooks, to be used to interact with AI models loaded into LM Studio. For
example:
-```lms_code_snippet
- title: "Python REPL"
- variants:
- "Interactive chat session":
- language: python
- code: |
- >>> import lmstudio as lms
- >>> loaded_models = lms.list_loaded_models()
- >>> for idx, model in enumerate(loaded_models):
- ... print(f"{idx:>3} {model}")
- ...
- 0 LLM(identifier='qwen2.5-7b-instruct')
- >>> model = loaded_models[0]
- >>> chat = lms.Chat("You answer questions concisely")
- >>> chat = lms.Chat("You answer questions concisely")
- >>> chat.add_user_message("Tell me three fruits")
- UserMessage(content=[TextData(text='Tell me three fruits')])
- >>> print(model.respond(chat, on_message=chat.append))
- Banana, apple, orange.
- >>> chat.add_user_message("Tell me three more fruits")
- UserMessage(content=[TextData(text='Tell me three more fruits')])
- >>> print(model.respond(chat, on_message=chat.append))
- Mango, strawberry, avocado.
- >>> chat.add_user_message("How many fruits have you told me?")
- UserMessage(content=[TextData(text='How many fruits have you told me?')])
- >>> print(model.respond(chat, on_message=chat.append))
- You asked for three initial fruits and three more, so I've listed a total of six fruits.
-
+```python title="Python REPL"
+>>> import lmstudio as lms
+>>> loaded_models = lms.list_loaded_models()
+>>> for idx, model in enumerate(loaded_models):
+... print(f"{idx:>3} {model}")
+...
+ 0 LLM(identifier='qwen2.5-7b-instruct')
+>>> model = loaded_models[0]
+>>> chat = lms.Chat("You answer questions concisely")
+>>> chat = lms.Chat("You answer questions concisely")
+>>> chat.add_user_message("Tell me three fruits")
+UserMessage(content=[TextData(text='Tell me three fruits')])
+>>> print(model.respond(chat, on_message=chat.append))
+Banana, apple, orange.
+>>> chat.add_user_message("Tell me three more fruits")
+UserMessage(content=[TextData(text='Tell me three more fruits')])
+>>> print(model.respond(chat, on_message=chat.append))
+Mango, strawberry, avocado.
+>>> chat.add_user_message("How many fruits have you told me?")
+UserMessage(content=[TextData(text='How many fruits have you told me?')])
+>>> print(model.respond(chat, on_message=chat.append))
+You asked for three initial fruits and three more, so I've listed a total of six fruits.
```
While not primarily intended for use this way, the SDK's asynchronous structured concurrency API
is compatible with the asynchronous Python REPL that is launched by `python -m asyncio`.
For example:
-```lms_code_snippet
- title: "Python REPL"
- variants:
- "Asynchronous chat session":
- language: python
- code: |
- # Note: assumes use of the "python -m asyncio" asynchronous REPL (or equivalent)
- # Requires Python SDK version 1.5.0 or later
- >>> from contextlib import AsyncExitStack
- >>> import lmstudio as lms
- >>> resources = AsyncExitStack()
- >>> client = await resources.enter_async_context(lms.AsyncClient())
- >>> loaded_models = await client.llm.list_loaded()
- >>> for idx, model in enumerate(loaded_models):
- ... print(f"{idx:>3} {model}")
- ...
- 0 AsyncLLM(identifier='qwen2.5-7b-instruct-1m')
- >>> model = loaded_models[0]
- >>> chat = lms.Chat("You answer questions concisely")
- >>> chat.add_user_message("Tell me three fruits")
- UserMessage(content=[TextData(text='Tell me three fruits')])
- >>> print(await model.respond(chat, on_message=chat.append))
- Apple, banana, and orange.
- >>> chat.add_user_message("Tell me three more fruits")
- UserMessage(content=[TextData(text='Tell me three more fruits')])
- >>> print(await model.respond(chat, on_message=chat.append))
- Mango, strawberry, and pineapple.
- >>> chat.add_user_message("How many fruits have you told me?")
- UserMessage(content=[TextData(text='How many fruits have you told me?')])
- >>> print(await model.respond(chat, on_message=chat.append))
- You asked for three fruits initially, then three more, so I've listed six fruits in total.
-
+```python title="Python REPL"
+# Note: assumes use of the "python -m asyncio" asynchronous REPL (or equivalent)
+# Requires Python SDK version 1.5.0 or later
+>>> from contextlib import AsyncExitStack
+>>> import lmstudio as lms
+>>> resources = AsyncExitStack()
+>>> client = await resources.enter_async_context(lms.AsyncClient())
+>>> loaded_models = await client.llm.list_loaded()
+>>> for idx, model in enumerate(loaded_models):
+... print(f"{idx:>3} {model}")
+...
+ 0 AsyncLLM(identifier='qwen2.5-7b-instruct-1m')
+>>> model = loaded_models[0]
+>>> chat = lms.Chat("You answer questions concisely")
+>>> chat.add_user_message("Tell me three fruits")
+UserMessage(content=[TextData(text='Tell me three fruits')])
+>>> print(await model.respond(chat, on_message=chat.append))
+Apple, banana, and orange.
+>>> chat.add_user_message("Tell me three more fruits")
+UserMessage(content=[TextData(text='Tell me three more fruits')])
+>>> print(await model.respond(chat, on_message=chat.append))
+Mango, strawberry, and pineapple.
+>>> chat.add_user_message("How many fruits have you told me?")
+UserMessage(content=[TextData(text='How many fruits have you told me?')])
+>>> print(await model.respond(chat, on_message=chat.append))
+You asked for three fruits initially, then three more, so I've listed six fruits in total.
```
diff --git a/1_python/1_llm-prediction/cancelling-predictions.md b/1_python/1_llm-prediction/cancelling-predictions.md
index 5a1ba0e..bbe38d5 100644
--- a/1_python/1_llm-prediction/cancelling-predictions.md
+++ b/1_python/1_llm-prediction/cancelling-predictions.md
@@ -1,6 +1,6 @@
---
title: Cancelling Predictions
-description: Stop an ongoing prediction in `lmstudio-python`
+description: Stop an ongoing prediction in lmstudio-python
index: 4
---
@@ -12,72 +12,65 @@ The following snippet illustrates cancelling the request in response
to an application specification cancellation condition (such as polling
an event set by another thread).
-```lms_code_snippet
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import lmstudio as lms
- model = lms.llm()
+```python tab="Python (convenience API)"
+import lmstudio as lms
+model = lms.llm()
- prediction_stream = model.respond_stream("What is the meaning of life?")
- cancelled = False
- for fragment in prediction_stream:
- if ...: # Cancellation condition will be app specific
- cancelled = True
- prediction_stream.cancel()
- # Note: it is recommended to let the iteration complete,
- # as doing so allows the partial result to be recorded.
- # Breaking the loop *is* permitted, but means the partial result
- # and final prediction stats won't be available to the client
- # The stream allows the prediction result to be retrieved after iteration
- if not cancelled:
- print(prediction_stream.result())
-
- "Python (scoped resource API)":
- language: python
- code: |
- import lmstudio as lms
+prediction_stream = model.respond_stream("What is the meaning of life?")
+cancelled = False
+for fragment in prediction_stream:
+ if ...: # Cancellation condition will be app specific
+ cancelled = True
+ prediction_stream.cancel()
+ # Note: it is recommended to let the iteration complete,
+ # as doing so allows the partial result to be recorded.
+ # Breaking the loop *is* permitted, but means the partial result
+ # and final prediction stats won't be available to the client
+# The stream allows the prediction result to be retrieved after iteration
+if not cancelled:
+ print(prediction_stream.result())
+```
- with lms.Client() as client:
- model = client.llm.model()
+```python tab="Python (scoped resource API)"
+import lmstudio as lms
- prediction_stream = model.respond_stream("What is the meaning of life?")
- cancelled = False
- for fragment in prediction_stream:
- if ...: # Cancellation condition will be app specific
- cancelled = True
- prediction_stream.cancel()
- # Note: it is recommended to let the iteration complete,
- # as doing so allows the partial result to be recorded.
- # Breaking the loop *is* permitted, but means the partial result
- # and final prediction stats won't be available to the client
- # The stream allows the prediction result to be retrieved after iteration
- if not cancelled:
- print(prediction_stream.result())
+with lms.Client() as client:
+ model = client.llm.model()
- "Python (asynchronous API)":
- language: python
- code: |
- # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
- # Requires Python SDK version 1.5.0 or later
- import lmstudio as lms
+ prediction_stream = model.respond_stream("What is the meaning of life?")
+ cancelled = False
+ for fragment in prediction_stream:
+ if ...: # Cancellation condition will be app specific
+ cancelled = True
+ prediction_stream.cancel()
+ # Note: it is recommended to let the iteration complete,
+ # as doing so allows the partial result to be recorded.
+ # Breaking the loop *is* permitted, but means the partial result
+ # and final prediction stats won't be available to the client
+ # The stream allows the prediction result to be retrieved after iteration
+ if not cancelled:
+ print(prediction_stream.result())
+```
- async with lms.AsyncClient() as client:
- model = await client.llm.model()
+```python tab="Python (asynchronous API)"
+# Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
+# Requires Python SDK version 1.5.0 or later
+import lmstudio as lms
- prediction_stream = await model.respond_stream("What is the meaning of life?")
- cancelled = False
- async for fragment in prediction_stream:
- if ...: # Cancellation condition will be app specific
- cancelled = True
- await prediction_stream.cancel()
- # Note: it is recommended to let the iteration complete,
- # as doing so allows the partial result to be recorded.
- # Breaking the loop *is* permitted, but means the partial result
- # and final prediction stats won't be available to the client
- # The stream allows the prediction result to be retrieved after iteration
- if not cancelled:
- print(prediction_stream.result())
+async with lms.AsyncClient() as client:
+ model = await client.llm.model()
+ prediction_stream = await model.respond_stream("What is the meaning of life?")
+ cancelled = False
+ async for fragment in prediction_stream:
+ if ...: # Cancellation condition will be app specific
+ cancelled = True
+ await prediction_stream.cancel()
+ # Note: it is recommended to let the iteration complete,
+ # as doing so allows the partial result to be recorded.
+ # Breaking the loop *is* permitted, but means the partial result
+ # and final prediction stats won't be available to the client
+ # The stream allows the prediction result to be retrieved after iteration
+ if not cancelled:
+ print(prediction_stream.result())
```
diff --git a/1_python/1_llm-prediction/chat-completion.md b/1_python/1_llm-prediction/chat-completion.md
index 35e1a42..8b426c4 100644
--- a/1_python/1_llm-prediction/chat-completion.md
+++ b/1_python/1_llm-prediction/chat-completion.md
@@ -11,36 +11,29 @@ Use `llm.respond(...)` to generate completions for a chat conversation.
The following snippet shows how to obtain the AI's response to a quick chat prompt.
-```lms_code_snippet
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import lmstudio as lms
-
- model = lms.llm()
- print(model.respond("What is the meaning of life?"))
-
- "Python (scoped resource API)":
- language: python
- code: |
- import lmstudio as lms
-
- with lms.Client() as client:
- model = client.llm.model()
- print(model.respond("What is the meaning of life?"))
-
- "Python (asynchronous API)":
- language: python
- code: |
- # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
- # Requires Python SDK version 1.5.0 or later
- import lmstudio as lms
-
- async with lms.AsyncClient() as client:
- model = await client.llm.model()
- print(await model.respond("What is the meaning of life?"))
+```python tab="Python (convenience API)"
+import lmstudio as lms
+model = lms.llm()
+print(model.respond("What is the meaning of life?"))
+```
+
+```python tab="Python (scoped resource API)"
+import lmstudio as lms
+
+with lms.Client() as client:
+ model = client.llm.model()
+ print(model.respond("What is the meaning of life?"))
+```
+
+```python tab="Python (asynchronous API)"
+# Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
+# Requires Python SDK version 1.5.0 or later
+import lmstudio as lms
+
+async with lms.AsyncClient() as client:
+ model = await client.llm.model()
+ print(await model.respond("What is the meaning of life?"))
```
## Streaming a Chat Response
@@ -49,44 +42,37 @@ The following snippet shows how to stream the AI's response to a chat prompt,
displaying text fragments as they are received (rather than waiting for the
entire response to be generated before displaying anything).
-```lms_code_snippet
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import lmstudio as lms
- model = lms.llm()
-
- for fragment in model.respond_stream("What is the meaning of life?"):
- print(fragment.content, end="", flush=True)
- print() # Advance to a new line at the end of the response
+```python tab="Python (convenience API)"
+import lmstudio as lms
+model = lms.llm()
- "Python (scoped resource API)":
- language: python
- code: |
- import lmstudio as lms
+for fragment in model.respond_stream("What is the meaning of life?"):
+ print(fragment.content, end="", flush=True)
+print() # Advance to a new line at the end of the response
+```
- with lms.Client() as client:
- model = client.llm.model()
+```python tab="Python (scoped resource API)"
+import lmstudio as lms
- for fragment in model.respond_stream("What is the meaning of life?"):
- print(fragment.content, end="", flush=True)
- print() # Advance to a new line at the end of the response
+with lms.Client() as client:
+ model = client.llm.model()
- "Python (asynchronous API)":
- language: python
- code: |
- # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
- # Requires Python SDK version 1.5.0 or later
- import lmstudio as lms
+ for fragment in model.respond_stream("What is the meaning of life?"):
+ print(fragment.content, end="", flush=True)
+ print() # Advance to a new line at the end of the response
+```
- async with lms.AsyncClient() as client:
- model = await client.llm.model()
+```python tab="Python (asynchronous API)"
+# Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
+# Requires Python SDK version 1.5.0 or later
+import lmstudio as lms
- async for fragment in model.respond_stream("What is the meaning of life?"):
- print(fragment.content, end="", flush=True)
- print() # Advance to a new line at the end of the response
+async with lms.AsyncClient() as client:
+ model = await client.llm.model()
+ async for fragment in model.respond_stream("What is the meaning of life?"):
+ print(fragment.content, end="", flush=True)
+ print() # Advance to a new line at the end of the response
```
## Cancelling a Chat Response
@@ -100,33 +86,26 @@ This can be done using the top-level `llm` convenience API,
or the `model` method in the `llm` namespace when using the scoped resource API.
For example, here is how to use Qwen2.5 7B Instruct.
-```lms_code_snippet
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import lmstudio as lms
+```python tab="Python (convenience API)"
+import lmstudio as lms
- model = lms.llm("qwen2.5-7b-instruct")
-
- "Python (scoped resource API)":
- language: python
- code: |
- import lmstudio as lms
+model = lms.llm("qwen2.5-7b-instruct")
+```
- with lms.Client() as client:
- model = client.llm.model("qwen2.5-7b-instruct")
+```python tab="Python (scoped resource API)"
+import lmstudio as lms
- "Python (asynchronous API)":
- language: python
- code: |
- # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
- # Requires Python SDK version 1.5.0 or later
- import lmstudio as lms
+with lms.Client() as client:
+ model = client.llm.model("qwen2.5-7b-instruct")
+```
- async with lms.AsyncClient() as client:
- model = await client.llm.model("qwen2.5-7b-instruct")
+```python tab="Python (asynchronous API)"
+# Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
+# Requires Python SDK version 1.5.0 or later
+import lmstudio as lms
+async with lms.AsyncClient() as client:
+ model = await client.llm.model("qwen2.5-7b-instruct")
```
There are other ways to get a model handle. See [Managing Models in Memory](./../manage-models/loading) for more info.
@@ -137,34 +116,15 @@ The input to the model is referred to as the "context".
Conceptually, the model receives a multi-turn conversation as input,
and it is asked to predict the assistant's response in that conversation.
-```lms_code_snippet
- variants:
- "Constructing a Chat object":
- language: python
- code: |
- import lmstudio as lms
-
- # Create a chat with an initial system prompt.
- chat = lms.Chat("You are a resident AI philosopher.")
-
- # Build the chat context by adding messages of relevant types.
- chat.add_user_message("What is the meaning of life?")
- # ... continued in next example
-
- "From chat history data":
- language: python
- code: |
- import lmstudio as lms
-
- # Create a chat object from a chat history dict
- chat = lms.Chat.from_history({
- "messages": [
- { "role": "system", "content": "You are a resident AI philosopher." },
- { "role": "user", "content": "What is the meaning of life?" },
- ]
- })
- # ... continued in next example
+```python
+import lmstudio as lms
+
+# Create a chat with an initial system prompt.
+chat = lms.Chat("You are a resident AI philosopher.")
+# Build the chat context by adding messages of relevant types.
+chat.add_user_message("What is the meaning of life?")
+# ... continued in next example
```
See [Working with Chats](./working-with-chats) for more information on managing chat context.
@@ -175,92 +135,76 @@ See [Working with Chats](./working-with-chats) for more information on managing
You can ask the LLM to predict the next response in the chat context using the `respond()` method.
-```lms_code_snippet
- variants:
- "Non-streaming (synchronous API)":
- language: python
- code: |
- # The `chat` object is created in the previous step.
- result = model.respond(chat)
-
- print(result)
-
- "Streaming (synchronous API)":
- language: python
- code: |
- # The `chat` object is created in the previous step.
- prediction_stream = model.respond_stream(chat)
-
- for fragment in prediction_stream:
- print(fragment.content, end="", flush=True)
- print() # Advance to a new line at the end of the response
-
- "Non-streaming (asynchronous API)":
- language: python
- code: |
- # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
- # Requires Python SDK version 1.5.0 or later
- # The `chat` object is created in the previous step.
- result = await model.respond(chat)
-
- print(result)
-
- "Streaming (asynchronous API)":
- language: python
- code: |
- # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
- # Requires Python SDK version 1.5.0 or later
- # The `chat` object is created in the previous step.
- prediction_stream = await model.respond_stream(chat)
-
- async for fragment in prediction_stream:
- print(fragment.content, end="", flush=True)
- print() # Advance to a new line at the end of the response
+```python tab="Non-streaming (synchronous API)"
+# The `chat` object is created in the previous step.
+result = model.respond(chat)
+
+print(result)
+```
+
+```python tab="Streaming (synchronous API)"
+# The `chat` object is created in the previous step.
+prediction_stream = model.respond_stream(chat)
+
+for fragment in prediction_stream:
+ print(fragment.content, end="", flush=True)
+print() # Advance to a new line at the end of the response
+```
+
+```python tab="Non-streaming (asynchronous API)"
+# Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
+# Requires Python SDK version 1.5.0 or later
+# The `chat` object is created in the previous step.
+result = await model.respond(chat)
+
+print(result)
+```
+
+```python tab="Streaming (asynchronous API)"
+# Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
+# Requires Python SDK version 1.5.0 or later
+# The `chat` object is created in the previous step.
+prediction_stream = await model.respond_stream(chat)
+async for fragment in prediction_stream:
+ print(fragment.content, end="", flush=True)
+print() # Advance to a new line at the end of the response
```
## Customize Inferencing Parameters
You can pass in inferencing parameters via the `config` keyword parameter on `.respond()`.
-```lms_code_snippet
- variants:
- "Non-streaming (synchronous API)":
- language: python
- code: |
- result = model.respond(chat, config={
- "temperature": 0.6,
- "maxTokens": 50,
- })
-
- "Streaming (synchronous API)":
- language: python
- code: |
- prediction_stream = model.respond_stream(chat, config={
- "temperature": 0.6,
- "maxTokens": 50,
- })
-
- "Non-streaming (asynchronous API)":
- language: python
- code: |
- # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
- # Requires Python SDK version 1.5.0 or later
- result = await model.respond(chat, config={
- "temperature": 0.6,
- "maxTokens": 50,
- })
-
- "Streaming (asynchronous API)":
- language: python
- code: |
- # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
- # Requires Python SDK version 1.5.0 or later
- prediction_stream = await model.respond_stream(chat, config={
- "temperature": 0.6,
- "maxTokens": 50,
- })
+```python tab="Non-streaming (synchronous API)"
+result = model.respond(chat, config={
+ "temperature": 0.6,
+ "maxTokens": 50,
+})
+```
+
+```python tab="Streaming (synchronous API)"
+prediction_stream = model.respond_stream(chat, config={
+ "temperature": 0.6,
+ "maxTokens": 50,
+})
+```
+```python tab="Non-streaming (asynchronous API)"
+# Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
+# Requires Python SDK version 1.5.0 or later
+result = await model.respond(chat, config={
+ "temperature": 0.6,
+ "maxTokens": 50,
+})
+```
+
+```python tab="Streaming (asynchronous API)"
+# Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
+# Requires Python SDK version 1.5.0 or later
+prediction_stream = await model.respond_stream(chat, config={
+ "temperature": 0.6,
+ "maxTokens": 50,
+})
```
See [Configuring the Model](./parameters) for more information on what can be configured.
@@ -270,29 +214,23 @@ See [Configuring the Model](./parameters) for more information on what can be co
You can also print prediction metadata, such as the model used for generation, number of generated
tokens, time to first token, and stop reason.
-```lms_code_snippet
- variants:
- "Non-streaming":
- language: python
- code: |
- # `result` is the response from the model.
- print("Model used:", result.model_info.display_name)
- print("Predicted tokens:", result.stats.predicted_tokens_count)
- print("Time to first token (seconds):", result.stats.time_to_first_token_sec)
- print("Stop reason:", result.stats.stop_reason)
-
- "Streaming":
- language: python
- code: |
- # After iterating through the prediction fragments,
- # the overall prediction result may be obtained from the stream
- result = prediction_stream.result()
-
- print("Model used:", result.model_info.display_name)
- print("Predicted tokens:", result.stats.predicted_tokens_count)
- print("Time to first token (seconds):", result.stats.time_to_first_token_sec)
- print("Stop reason:", result.stats.stop_reason)
+```python tab="Non-streaming"
+# `result` is the response from the model.
+print("Model used:", result.model_info.display_name)
+print("Predicted tokens:", result.stats.predicted_tokens_count)
+print("Time to first token (seconds):", result.stats.time_to_first_token_sec)
+print("Stop reason:", result.stats.stop_reason)
+```
+
+```python tab="Streaming"
+# After iterating through the prediction fragments,
+# the overall prediction result may be obtained from the stream
+result = prediction_stream.result()
+print("Model used:", result.model_info.display_name)
+print("Predicted tokens:", result.stats.predicted_tokens_count)
+print("Time to first token (seconds):", result.stats.time_to_first_token_sec)
+print("Stop reason:", result.stats.stop_reason)
```
Both the non-streaming and streaming result access is consistent across the synchronous and
@@ -304,35 +242,29 @@ iterating the stream to completion before returning the result.
## Example: Multi-turn Chat
-```lms_code_snippet
- title: "chatbot.py"
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import lmstudio as lms
-
- model = lms.llm()
- chat = lms.Chat("You are a task focused AI assistant")
-
- while True:
- try:
- user_input = input("You (leave blank to exit): ")
- except EOFError:
- print()
- break
- if not user_input:
- break
- chat.add_user_message(user_input)
- prediction_stream = model.respond_stream(
- chat,
- on_message=chat.append,
- )
- print("Bot: ", end="", flush=True)
- for fragment in prediction_stream:
- print(fragment.content, end="", flush=True)
- print()
-
+```python title="chatbot.py"
+import lmstudio as lms
+
+model = lms.llm()
+chat = lms.Chat("You are a task focused AI assistant")
+
+while True:
+ try:
+ user_input = input("You (leave blank to exit): ")
+ except EOFError:
+ print()
+ break
+ if not user_input:
+ break
+ chat.add_user_message(user_input)
+ prediction_stream = model.respond_stream(
+ chat,
+ on_message=chat.append,
+ )
+ print("Bot: ", end="", flush=True)
+ for fragment in prediction_stream:
+ print(fragment.content, end="", flush=True)
+ print()
```
### Progress Callbacks
@@ -341,49 +273,41 @@ Long prompts will often take a long time to first token, i.e. it takes the model
If you want to get updates on the progress of this process, you can provide a float callback to `respond`
that receives a float from 0.0-1.0 representing prompt processing progress.
-```lms_code_snippet
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import lmstudio as lms
+```python tab="Python (convenience API)"
+import lmstudio as lms
- llm = lms.llm()
+llm = lms.llm()
- response = llm.respond(
- "What is LM Studio?",
- on_prompt_processing_progress = (lambda progress: print(f"{progress*100}% complete")),
- )
-
- "Python (scoped resource API)":
- language: python
- code: |
- import lmstudio as lms
-
- with lms.Client() as client:
- llm = client.llm.model()
+response = llm.respond(
+ "What is LM Studio?",
+ on_prompt_processing_progress = (lambda progress: print(f"{progress*100}% complete")),
+)
+```
- response = llm.respond(
- "What is LM Studio?",
- on_prompt_processing_progress = (lambda progress: print(f"{progress*100}% complete")),
- )
+```python tab="Python (scoped resource API)"
+import lmstudio as lms
- "Python (asynchronous API)":
- language: python
- code: |
- # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
- # Requires Python SDK version 1.5.0 or later
- import lmstudio as lms
+with lms.Client() as client:
+ llm = client.llm.model()
- async with lms.AsyncClient() as client:
- llm = await client.llm.model()
+ response = llm.respond(
+ "What is LM Studio?",
+ on_prompt_processing_progress = (lambda progress: print(f"{progress*100}% complete")),
+ )
+```
- response = await llm.respond(
- "What is LM Studio?",
- on_prompt_processing_progress = (lambda progress: print(f"{progress*100}% complete")),
- )
+```python tab="Python (asynchronous API)"
+# Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
+# Requires Python SDK version 1.5.0 or later
+import lmstudio as lms
+async with lms.AsyncClient() as client:
+ llm = await client.llm.model()
+ response = await llm.respond(
+ "What is LM Studio?",
+ on_prompt_processing_progress = (lambda progress: print(f"{progress*100}% complete")),
+ )
```
In addition to `on_prompt_processing_progress`, the other available progress callbacks are:
diff --git a/1_python/1_llm-prediction/completion.md b/1_python/1_llm-prediction/completion.md
deleted file mode 100644
index 8f38617..0000000
--- a/1_python/1_llm-prediction/completion.md
+++ /dev/null
@@ -1,271 +0,0 @@
----
-title: Text Completions
-description: "Provide a string input for the model to complete"
----
-
-Use `llm.complete(...)` to generate text completions from a loaded language model.
-Text completions mean sending a non-formatted string to the model with the expectation that the model will complete the text.
-
-This is different from multi-turn chat conversations. For more information on chat completions, see [Chat Completions](./chat-completion).
-
-## 1. Instantiate a Model
-
-First, you need to load a model to generate completions from.
-This can be done using the top-level `llm` convenience API,
-or the `model` method in the `llm` namespace when using the scoped resource API.
-For example, here is how to use Qwen2.5 7B Instruct.
-
-```lms_code_snippet
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import lmstudio as lms
-
- model = lms.llm("qwen2.5-7b-instruct")
-
- "Python (scoped resource API)":
- language: python
- code: |
- import lmstudio as lms
-
- with lms.Client() as client:
- model = client.llm.model("qwen2.5-7b-instruct")
-
- "Python (asynchronous API)":
- language: python
- code: |
- # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
- # Requires Python SDK version 1.5.0 or later
- import lmstudio as lms
-
- async with lms.AsyncClient() as client:
- model = await client.llm.model("qwen2.5-7b-instruct")
-
-```
-
-## 2. Generate a Completion
-
-Once you have a loaded model, you can generate completions by passing a string to the `complete` method on the `llm` handle.
-
-```lms_code_snippet
- variants:
- "Non-streaming (synchronous API)":
- language: python
- code: |
- # The `chat` object is created in the previous step.
- result = model.complete("My name is", config={"maxTokens": 100})
-
- print(result)
-
- "Streaming (synchronous API)":
- language: python
- code: |
- # The `chat` object is created in the previous step.
- prediction_stream = model.complete_stream("My name is", config={"maxTokens": 100})
-
- for fragment in prediction_stream:
- print(fragment.content, end="", flush=True)
- print() # Advance to a new line at the end of the response
-
- "Non-streaming (asynchronous API)":
- language: python
- code: |
- # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
- # Requires Python SDK version 1.5.0 or later
- # The `chat` object is created in the previous step.
- result = await model.complete("My name is", config={"maxTokens": 100})
-
- print(result)
-
- "Streaming (asynchronous API)":
- language: python
- code: |
- # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
- # Requires Python SDK version 1.5.0 or later
- # The `chat` object is created in the previous step.
- prediction_stream = await model.complete_stream("My name is", config={"maxTokens": 100})
-
- async for fragment in prediction_stream:
- print(fragment.content, end="", flush=True)
- print() # Advance to a new line at the end of the response
-
-```
-
-## 3. Print Prediction Stats
-
-You can also print prediction metadata, such as the model used for generation, number of generated tokens, time to first token, and stop reason.
-
-```lms_code_snippet
- variants:
- "Non-streaming":
- language: python
- code: |
- # `result` is the response from the model.
- print("Model used:", result.model_info.display_name)
- print("Predicted tokens:", result.stats.predicted_tokens_count)
- print("Time to first token (seconds):", result.stats.time_to_first_token_sec)
- print("Stop reason:", result.stats.stop_reason)
-
- "Streaming":
- language: python
- code: |
- # After iterating through the prediction fragments,
- # the overall prediction result may be obtained from the stream
- result = prediction_stream.result()
-
- print("Model used:", result.model_info.display_name)
- print("Predicted tokens:", result.stats.predicted_tokens_count)
- print("Time to first token (seconds):", result.stats.time_to_first_token_sec)
- print("Stop reason:", result.stats.stop_reason)
-
-```
-
-Both the non-streaming and streaming result access is consistent across the synchronous and
-asynchronous APIs, as `prediction_stream.result()` is a non-blocking API that raises an exception
-if no result is available (either because the prediction is still running, or because the
-prediction request failed). Prediction streams also offer a blocking (synchronous API) or
-awaitable (asynchronous API) `prediction_stream.wait_for_result()` method that internally handles
-iterating the stream to completion before returning the result.
-
-## Example: Get an LLM to Simulate a Terminal
-
-Here's an example of how you might use the `complete` method to simulate a terminal.
-
-```lms_code_snippet
- title: "terminal-sim.py"
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import lmstudio as lms
-
- model = lms.llm()
- console_history = []
-
- while True:
- try:
- user_command = input("$ ")
- except EOFError:
- print()
- break
- if user_command.strip() == "exit":
- break
- console_history.append(f"$ {user_command}")
- history_prompt = "\n".join(console_history)
- prediction_stream = model.complete_stream(
- history_prompt,
- config={ "stopStrings": ["$"] },
- )
- for fragment in prediction_stream:
- print(fragment.content, end="", flush=True)
- print()
- console_history.append(prediction_stream.result().content)
-
-```
-
-## Customize Inferencing Parameters
-
-You can pass in inferencing parameters via the `config` keyword parameter on `.complete()`.
-
-```lms_code_snippet
- variants:
- "Non-streaming (synchronous API)":
- language: python
- code: |
- result = model.complete(initial_text, config={
- "temperature": 0.6,
- "maxTokens": 50,
- })
-
- "Streaming (synchronous API)":
- language: python
- code: |
- prediction_stream = model.complete_stream(initial_text, config={
- "temperature": 0.6,
- "maxTokens": 50,
- })
-
- "Non-streaming (asynchronous API)":
- language: python
- code: |
- # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
- # Requires Python SDK version 1.5.0 or later
- result = await model.complete(initial_text, config={
- "temperature": 0.6,
- "maxTokens": 50,
- })
-
- "Streaming (asynchronous API)":
- language: python
- code: |
- # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
- # Requires Python SDK version 1.5.0 or later
- prediction_stream = await model.complete_stream(initial_text, config={
- "temperature": 0.6,
- "maxTokens": 50,
- })
-
-```
-
-See [Configuring the Model](./parameters) for more information on what can be configured.
-
-### Progress Callbacks
-
-Long prompts will often take a long time to first token, i.e. it takes the model a long time to process your prompt.
-If you want to get updates on the progress of this process, you can provide a float callback to `complete`
-that receives a float from 0.0-1.0 representing prompt processing progress.
-
-```lms_code_snippet
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import lmstudio as lms
-
- llm = lms.llm()
-
- completion = llm.complete(
- "My name is",
- on_prompt_processing_progress = (lambda progress: print(f"{progress*100}% complete")),
- )
-
- "Python (scoped resource API)":
- language: python
- code: |
- import lmstudio as lms
-
- with lms.Client() as client:
- llm = client.llm.model()
-
- completion = llm.complete(
- "My name is",
- on_prompt_processing_progress = (lambda progress: print(f"{progress*100}% processed")),
- )
-
- "Python (asynchronous API)":
- language: python
- code: |
- # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
- # Requires Python SDK version 1.5.0 or later
- import lmstudio as lms
-
- async with lms.AsyncClient() as client:
- llm = await client.llm.model()
-
- completion = await llm.complete(
- "My name is",
- on_prompt_processing_progress = (lambda progress: print(f"{progress*100}% processed")),
- )
-
-```
-
-In addition to `on_prompt_processing_progress`, the other available progress callbacks are:
-
-- `on_first_token`: called after prompt processing is complete and the first token is being emitted.
- Does not receive any arguments (use the streaming iteration API or `on_prediction_fragment`
- to process tokens as they are emitted).
-- `on_prediction_fragment`: called for each prediction fragment received by the client.
- Receives the same prediction fragments as iterating over the stream iteration API.
-- `on_message`: called with an assistant response message when the prediction is complete.
- Intended for appending received messages to a chat history instance.
diff --git a/1_python/1_llm-prediction/completion.mdx b/1_python/1_llm-prediction/completion.mdx
new file mode 100644
index 0000000..ac58d19
--- /dev/null
+++ b/1_python/1_llm-prediction/completion.mdx
@@ -0,0 +1,241 @@
+---
+title: Text Completions
+description: "Provide a string input for the model to complete"
+---
+
+import { Step, Steps } from "fumadocs-ui/components/steps";
+
+Use `llm.complete(...)` to generate text completions from a loaded language model.
+Text completions mean sending a non-formatted string to the model with the expectation that the model will complete the text.
+
+This is different from multi-turn chat conversations. For more information on chat completions, see [Chat Completions](./chat-completion).
+
+## Quickstart
+
+
+
+ Instantiate a Model
+
+ First, you need to load a model to generate completions from.
+ This can be done using the top-level `llm` convenience API,
+ or the `model` method in the `llm` namespace when using the scoped resource API.
+ For example, here is how to use Qwen2.5 7B Instruct.
+
+ ```python tab="Python (convenience API)"
+ import lmstudio as lms
+
+ model = lms.llm("qwen2.5-7b-instruct")
+ ```
+
+ ```python tab="Python (scoped resource API)"
+ import lmstudio as lms
+
+ with lms.Client() as client:
+ model = client.llm.model("qwen2.5-7b-instruct")
+ ```
+
+ ```python tab="Python (asynchronous API)"
+ # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
+ # Requires Python SDK version 1.5.0 or later
+ import lmstudio as lms
+
+ async with lms.AsyncClient() as client:
+ model = await client.llm.model("qwen2.5-7b-instruct")
+ ```
+
+
+
+ Generate a Completion
+
+ Once you have a loaded model, you can generate completions by passing a string to the `complete` method on the `llm` handle.
+
+ ```python tab="Non-streaming (synchronous API)"
+ # The `model` object is created in the previous step.
+ result = model.complete("My name is", config={"maxTokens": 100})
+
+ print(result)
+ ```
+
+ ```python tab="Streaming (synchronous API)"
+ # The `model` object is created in the previous step.
+ prediction_stream = model.complete_stream("My name is", config={"maxTokens": 100})
+
+ for fragment in prediction_stream:
+ print(fragment.content, end="", flush=True)
+ print() # Advance to a new line at the end of the response
+ ```
+
+ ```python tab="Non-streaming (asynchronous API)"
+ # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
+ # Requires Python SDK version 1.5.0 or later
+ # The `model` object is created in the previous step.
+ result = await model.complete("My name is", config={"maxTokens": 100})
+
+ print(result)
+ ```
+
+ ```python tab="Streaming (asynchronous API)"
+ # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
+ # Requires Python SDK version 1.5.0 or later
+ # The `model` object is created in the previous step.
+ prediction_stream = await model.complete_stream("My name is", config={"maxTokens": 100})
+
+ async for fragment in prediction_stream:
+ print(fragment.content, end="", flush=True)
+ print() # Advance to a new line at the end of the response
+ ```
+
+
+
+ Print Prediction Stats
+
+ You can also print prediction metadata, such as the model used for generation, number of generated tokens, time to first token, and stop reason.
+
+ ```python tab="Non-streaming"
+ # `result` is the response from the model.
+ print("Model used:", result.model_info.display_name)
+ print("Predicted tokens:", result.stats.predicted_tokens_count)
+ print("Time to first token (seconds):", result.stats.time_to_first_token_sec)
+ print("Stop reason:", result.stats.stop_reason)
+ ```
+
+ ```python tab="Streaming"
+ # After iterating through the prediction fragments,
+ # the overall prediction result may be obtained from the stream
+ result = prediction_stream.result()
+
+ print("Model used:", result.model_info.display_name)
+ print("Predicted tokens:", result.stats.predicted_tokens_count)
+ print("Time to first token (seconds):", result.stats.time_to_first_token_sec)
+ print("Stop reason:", result.stats.stop_reason)
+ ```
+
+ Both the non-streaming and streaming result access is consistent across the synchronous and
+ asynchronous APIs, as `prediction_stream.result()` is a non-blocking API that raises an exception
+ if no result is available (either because the prediction is still running, or because the
+ prediction request failed). Prediction streams also offer a blocking (synchronous API) or
+ awaitable (asynchronous API) `prediction_stream.wait_for_result()` method that internally handles
+ iterating the stream to completion before returning the result.
+
+
+
+## Example: Get an LLM to Simulate a Terminal
+
+Here's an example of how you might use the `complete` method to simulate a terminal.
+
+```python title="terminal-sim.py"
+import lmstudio as lms
+
+model = lms.llm()
+console_history = []
+
+while True:
+ try:
+ user_command = input("$ ")
+ except EOFError:
+ print()
+ break
+ if user_command.strip() == "exit":
+ break
+ console_history.append(f"$ {user_command}")
+ history_prompt = "\n".join(console_history)
+ prediction_stream = model.complete_stream(
+ history_prompt,
+ config={ "stopStrings": ["$"] },
+ )
+ for fragment in prediction_stream:
+ print(fragment.content, end="", flush=True)
+ print()
+ console_history.append(prediction_stream.result().content)
+```
+
+## Customize Inferencing Parameters
+
+You can pass in inferencing parameters via the `config` keyword parameter on `.complete()`.
+
+```python tab="Non-streaming (synchronous API)"
+result = model.complete(initial_text, config={
+ "temperature": 0.6,
+ "maxTokens": 50,
+})
+```
+
+```python tab="Streaming (synchronous API)"
+prediction_stream = model.complete_stream(initial_text, config={
+ "temperature": 0.6,
+ "maxTokens": 50,
+})
+```
+
+```python tab="Non-streaming (asynchronous API)"
+# Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
+# Requires Python SDK version 1.5.0 or later
+result = await model.complete(initial_text, config={
+ "temperature": 0.6,
+ "maxTokens": 50,
+})
+```
+
+```python tab="Streaming (asynchronous API)"
+# Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
+# Requires Python SDK version 1.5.0 or later
+prediction_stream = await model.complete_stream(initial_text, config={
+ "temperature": 0.6,
+ "maxTokens": 50,
+})
+```
+
+See [Configuring the Model](./parameters) for more information on what can be configured.
+
+### Progress Callbacks
+
+Long prompts will often take a long time to first token, i.e. it takes the model a long time to process your prompt.
+If you want to get updates on the progress of this process, you can provide a float callback to `complete`
+that receives a float from 0.0-1.0 representing prompt processing progress.
+
+```python tab="Python (convenience API)"
+import lmstudio as lms
+
+llm = lms.llm()
+
+completion = llm.complete(
+ "My name is",
+ on_prompt_processing_progress = (lambda progress: print(f"{progress*100}% complete")),
+)
+```
+
+```python tab="Python (scoped resource API)"
+import lmstudio as lms
+
+with lms.Client() as client:
+ llm = client.llm.model()
+
+ completion = llm.complete(
+ "My name is",
+ on_prompt_processing_progress = (lambda progress: print(f"{progress*100}% processed")),
+ )
+```
+
+```python tab="Python (asynchronous API)"
+# Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
+# Requires Python SDK version 1.5.0 or later
+import lmstudio as lms
+
+async with lms.AsyncClient() as client:
+ llm = await client.llm.model()
+
+ completion = await llm.complete(
+ "My name is",
+ on_prompt_processing_progress = (lambda progress: print(f"{progress*100}% processed")),
+ )
+```
+
+In addition to `on_prompt_processing_progress`, the other available progress callbacks are:
+
+- `on_first_token`: called after prompt processing is complete and the first token is being emitted.
+ Does not receive any arguments (use the streaming iteration API or `on_prediction_fragment`
+ to process tokens as they are emitted).
+- `on_prediction_fragment`: called for each prediction fragment received by the client.
+ Receives the same prediction fragments as iterating over the stream iteration API.
+- `on_message`: called with an assistant response message when the prediction is complete.
+ Intended for appending received messages to a chat history instance.
diff --git a/1_python/1_llm-prediction/image-input.md b/1_python/1_llm-prediction/image-input.md
deleted file mode 100644
index a3ff321..0000000
--- a/1_python/1_llm-prediction/image-input.md
+++ /dev/null
@@ -1,144 +0,0 @@
----
-title: Image Input
-description: API for passing images as input to the model
-index: 2
----
-
-_Required Python SDK version_: **1.1.0**
-
-Some models, known as VLMs (Vision-Language Models), can accept images as input. You can pass images to the model using the `.respond()` method.
-
-### Prerequisite: Get a VLM (Vision-Language Model)
-
-If you don't yet have a VLM, you can download a model like `qwen2-vl-2b-instruct` using the following command:
-
-```bash
-lms get qwen2-vl-2b-instruct
-```
-
-## 1. Instantiate the Model
-
-Connect to LM Studio and obtain a handle to the VLM (Vision-Language Model) you want to use.
-
-```lms_code_snippet
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import lmstudio as lms
-
- model = lms.llm("qwen2-vl-2b-instruct")
-
- "Python (scoped resource API)":
- language: python
- code: |
- import lmstudio as lms
-
- with lms.Client() as client:
- model = client.llm.model("qwen2-vl-2b-instruct")
-
- "Python (asynchronous API)":
- language: python
- code: |
- # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
- # Requires Python SDK version 1.5.0 or later
- import lmstudio as lms
-
- async with lms.AsyncClient() as client:
- model = await client.llm.model("qwen2-vl-2b-instruct")
-
-```
-
-## 2. Prepare the Image
-
-Use the `prepare_image()` function or `files` namespace method to
-get a handle to the image that can subsequently be passed to the model.
-
-```lms_code_snippet
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import lmstudio as lms
-
- image_path = "/path/to/image.jpg" # Replace with the path to your image
- image_handle = lms.prepare_image(image_path)
-
- "Python (scoped resource API)":
- language: python
- code: |
- import lmstudio as lms
-
- with lms.Client() as client:
- image_path = "/path/to/image.jpg" # Replace with the path to your image
- image_handle = client.files.prepare_image(image_path)
-
- "Python (asynchronous API)":
- language: python
- code: |
- # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
- # Requires Python SDK version 1.5.0 or later
- import lmstudio as lms
-
- async with lms.AsyncClient() as client:
- image_path = "/path/to/image.jpg" # Replace with the path to your image
- image_handle = await client.files.prepare_image(image_path)
-
-```
-
-If you only have the raw data of the image, you can supply the raw data directly as a bytes
-object without having to write it to disk first. Due to this feature, binary filesystem
-paths are _not_ supported (as they will be handled as malformed image data rather than as
-filesystem paths).
-
-Binary IO objects are also accepted as local file inputs.
-
-The LM Studio server supports JPEG, PNG, and WebP image formats.
-
-## 3. Pass the Image to the Model in `.respond()`
-
-Generate a prediction by passing the image to the model in the `.respond()` method.
-
-```lms_code_snippet
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import lmstudio as lms
-
- image_path = "/path/to/image.jpg" # Replace with the path to your image
- image_handle = lms.prepare_image(image_path)
- model = lms.llm("qwen2-vl-2b-instruct")
- chat = lms.Chat()
- chat.add_user_message("Describe this image please", images=[image_handle])
- prediction = model.respond(chat)
-
- "Python (scoped resource API)":
- language: python
- code: |
- import lmstudio as lms
-
- with lms.Client() as client:
- image_path = "/path/to/image.jpg" # Replace with the path to your image
- image_handle = client.files.prepare_image(image_path)
- model = client.llm.model("qwen2-vl-2b-instruct")
- chat = lms.Chat()
- chat.add_user_message("Describe this image please", images=[image_handle])
- prediction = model.respond(chat)
-
- "Python (asynchronous API)":
- language: python
- code: |
- # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
- # Requires Python SDK version 1.5.0 or later
- import lmstudio as lms
-
- async with lms.AsyncClient() as client:
- image_path = "/path/to/image.jpg" # Replace with the path to your image
- image_handle = client.files.prepare_image(image_path)
- model = await client.llm.model("qwen2-vl-2b-instruct")
- chat = lms.Chat()
- chat.add_user_message("Describe this image please", images=[image_handle])
- prediction = await model.respond(chat)
-
-```
diff --git a/1_python/1_llm-prediction/image-input.mdx b/1_python/1_llm-prediction/image-input.mdx
new file mode 100644
index 0000000..ba2c10c
--- /dev/null
+++ b/1_python/1_llm-prediction/image-input.mdx
@@ -0,0 +1,133 @@
+---
+title: Image Input
+description: API for passing images as input to the model
+index: 2
+---
+
+import { Step, Steps } from "fumadocs-ui/components/steps";
+
+_Required Python SDK version_: **1.1.0**
+
+Some models, known as VLMs (Vision-Language Models), can accept images as input. You can pass images to the model using the `.respond()` method.
+
+### Prerequisite: Get a VLM (Vision-Language Model)
+
+If you don't yet have a VLM, you can download a model like `qwen2-vl-2b-instruct` using the following command:
+
+```bash
+lms get qwen2-vl-2b-instruct
+```
+
+
+
+ Instantiate the Model
+
+ Connect to LM Studio and obtain a handle to the VLM (Vision-Language Model) you want to use.
+
+ ```python tab="Python (convenience API)"
+ import lmstudio as lms
+
+ model = lms.llm("qwen2-vl-2b-instruct")
+ ```
+
+ ```python tab="Python (scoped resource API)"
+ import lmstudio as lms
+
+ with lms.Client() as client:
+ model = client.llm.model("qwen2-vl-2b-instruct")
+ ```
+
+ ```python tab="Python (asynchronous API)"
+ # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
+ # Requires Python SDK version 1.5.0 or later
+ import lmstudio as lms
+
+ async with lms.AsyncClient() as client:
+ model = await client.llm.model("qwen2-vl-2b-instruct")
+ ```
+
+
+
+ Prepare the Image
+
+ Use the `prepare_image()` function or `files` namespace method to
+ get a handle to the image that can subsequently be passed to the model.
+
+ ```python tab="Python (convenience API)"
+ import lmstudio as lms
+
+ image_path = "/path/to/image.jpg" # Replace with the path to your image
+ image_handle = lms.prepare_image(image_path)
+ ```
+
+ ```python tab="Python (scoped resource API)"
+ import lmstudio as lms
+
+ with lms.Client() as client:
+ image_path = "/path/to/image.jpg" # Replace with the path to your image
+ image_handle = client.files.prepare_image(image_path)
+ ```
+
+ ```python tab="Python (asynchronous API)"
+ # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
+ # Requires Python SDK version 1.5.0 or later
+ import lmstudio as lms
+
+ async with lms.AsyncClient() as client:
+ image_path = "/path/to/image.jpg" # Replace with the path to your image
+ image_handle = await client.files.prepare_image(image_path)
+ ```
+
+ If you only have the raw data of the image, you can supply the raw data directly as a bytes
+ object without having to write it to disk first. Due to this feature, binary filesystem
+ paths are _not_ supported (as they will be handled as malformed image data rather than as
+ filesystem paths).
+
+ Binary IO objects are also accepted as local file inputs.
+
+ The LM Studio server supports JPEG, PNG, and WebP image formats.
+
+
+
+ Pass the Image to the Model in `.respond()`
+
+ Generate a prediction by passing the image to the model in the `.respond()` method.
+
+ ```python tab="Python (convenience API)"
+ import lmstudio as lms
+
+ image_path = "/path/to/image.jpg" # Replace with the path to your image
+ image_handle = lms.prepare_image(image_path)
+ model = lms.llm("qwen2-vl-2b-instruct")
+ chat = lms.Chat()
+ chat.add_user_message("Describe this image please", images=[image_handle])
+ prediction = model.respond(chat)
+ ```
+
+ ```python tab="Python (scoped resource API)"
+ import lmstudio as lms
+
+ with lms.Client() as client:
+ image_path = "/path/to/image.jpg" # Replace with the path to your image
+ image_handle = client.files.prepare_image(image_path)
+ model = client.llm.model("qwen2-vl-2b-instruct")
+ chat = lms.Chat()
+ chat.add_user_message("Describe this image please", images=[image_handle])
+ prediction = model.respond(chat)
+ ```
+
+ ```python tab="Python (asynchronous API)"
+ # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
+ # Requires Python SDK version 1.5.0 or later
+ import lmstudio as lms
+
+ async with lms.AsyncClient() as client:
+ image_path = "/path/to/image.jpg" # Replace with the path to your image
+ image_handle = client.files.prepare_image(image_path)
+ model = await client.llm.model("qwen2-vl-2b-instruct")
+ chat = lms.Chat()
+ chat.add_user_message("Describe this image please", images=[image_handle])
+ prediction = await model.respond(chat)
+ ```
+
+
diff --git a/1_python/1_llm-prediction/meta.json b/1_python/1_llm-prediction/meta.json
new file mode 100644
index 0000000..d56764e
--- /dev/null
+++ b/1_python/1_llm-prediction/meta.json
@@ -0,0 +1,14 @@
+{
+ "title": "Basics",
+ "pages": [
+ "cancelling-predictions",
+ "chat-completion",
+ "completion",
+ "image-input",
+ "_index",
+ "parameters",
+ "speculative-decoding",
+ "structured-response",
+ "working-with-chats"
+ ]
+}
diff --git a/1_python/1_llm-prediction/parameters.md b/1_python/1_llm-prediction/parameters.md
index 4bc9e62..a17cc97 100644
--- a/1_python/1_llm-prediction/parameters.md
+++ b/1_python/1_llm-prediction/parameters.md
@@ -10,25 +10,19 @@ You can customize both inference-time and load-time parameters for your model. I
Set inference-time parameters such as `temperature`, `maxTokens`, `topP` and more.
-```lms_code_snippet
- variants:
- ".respond()":
- language: python
- code: |
- result = model.respond(chat, config={
- "temperature": 0.6,
- "maxTokens": 50,
- })
-
- ".complete()":
- language: python
- code: |
- result = model.complete(chat, config={
- "temperature": 0.6,
- "maxTokens": 50,
- "stopStrings": ["\n\n"],
- })
+```python tab=".respond()"
+result = model.respond(chat, config={
+ "temperature": 0.6,
+ "maxTokens": 50,
+})
+```
+```python tab=".complete()"
+result = model.complete(chat, config={
+ "temperature": 0.6,
+ "maxTokens": 50,
+ "stopStrings": ["\n\n"],
+ })
```
See [`LLMPredictionConfigInput`](./../../typescript/api-reference/llm-prediction-config-input) in the
@@ -49,54 +43,47 @@ The `.model()` retrieves a handle to a model that has already been loaded, or lo
**Note**: if the model is already loaded, the given configuration will be **ignored**.
-```lms_code_snippet
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import lmstudio as lms
+```python tab="Python (convenience API)"
+import lmstudio as lms
+
+model = lms.llm("qwen2.5-7b-instruct", config={
+ "contextLength": 8192,
+ "gpu": {
+ "ratio": 0.5,
+ }
+})
+```
- model = lms.llm("qwen2.5-7b-instruct", config={
+```python tab="Python (scoped resource API)"
+import lmstudio as lms
+
+with lms.Client() as client:
+ model = client.llm.model(
+ "qwen2.5-7b-instruct",
+ config={
"contextLength": 8192,
"gpu": {
"ratio": 0.5,
}
- })
-
- "Python (scoped resource API)":
- language: python
- code: |
- import lmstudio as lms
-
- with lms.Client() as client:
- model = client.llm.model(
- "qwen2.5-7b-instruct",
- config={
- "contextLength": 8192,
- "gpu": {
- "ratio": 0.5,
- }
- }
- )
-
- "Python (asynchronous API)":
- language: python
- code: |
- # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
- # Requires Python SDK version 1.5.0 or later
- import lmstudio as lms
-
- async with lms.AsyncClient() as client:
- model = await client.llm.model(
- "qwen2.5-7b-instruct",
- config={
- "contextLength": 8192,
- "gpu": {
- "ratio": 0.5,
- }
- }
- )
+ }
+ )
+```
+```python tab="Python (asynchronous API)"
+# Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
+# Requires Python SDK version 1.5.0 or later
+import lmstudio as lms
+
+async with lms.AsyncClient() as client:
+ model = await client.llm.model(
+ "qwen2.5-7b-instruct",
+ config={
+ "contextLength": 8192,
+ "gpu": {
+ "ratio": 0.5,
+ }
+ }
+ )
```
See [`LLMLoadModelConfig`](./../../typescript/api-reference/llm-load-model-config) in the
@@ -106,55 +93,48 @@ Typescript SDK documentation for all configurable fields.
The `.load_new_instance()` method creates a new model instance and loads it with the specified configuration.
-```lms_code_snippet
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import lmstudio as lms
+```python tab="Python (convenience API)"
+import lmstudio as lms
- client = lms.get_default_client()
- model = client.llm.load_new_instance("qwen2.5-7b-instruct", config={
+client = lms.get_default_client()
+model = client.llm.load_new_instance("qwen2.5-7b-instruct", config={
+ "contextLength": 8192,
+ "gpu": {
+ "ratio": 0.5,
+ }
+})
+```
+
+```python tab="Python (scoped resource API)"
+import lmstudio as lms
+
+with lms.Client() as client:
+ model = client.llm.load_new_instance(
+ "qwen2.5-7b-instruct",
+ config={
"contextLength": 8192,
"gpu": {
"ratio": 0.5,
}
- })
-
- "Python (scoped resource API)":
- language: python
- code: |
- import lmstudio as lms
-
- with lms.Client() as client:
- model = client.llm.load_new_instance(
- "qwen2.5-7b-instruct",
- config={
- "contextLength": 8192,
- "gpu": {
- "ratio": 0.5,
- }
- }
- )
-
- "Python (asynchronous API)":
- language: python
- code: |
- # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
- # Requires Python SDK version 1.5.0 or later
- import lmstudio as lms
-
- async with lms.AsyncClient() as client:
- model = await client.llm.load_new_instance(
- "qwen2.5-7b-instruct",
- config={
- "contextLength": 8192,
- "gpu": {
- "ratio": 0.5,
- }
- }
- )
+ }
+ )
+```
+
+```python tab="Python (asynchronous API)"
+# Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
+# Requires Python SDK version 1.5.0 or later
+import lmstudio as lms
+async with lms.AsyncClient() as client:
+ model = await client.llm.load_new_instance(
+ "qwen2.5-7b-instruct",
+ config={
+ "contextLength": 8192,
+ "gpu": {
+ "ratio": 0.5,
+ }
+ }
+ )
```
See [`LLMLoadModelConfig`](./../../typescript/api-reference/llm-load-model-config) in the
diff --git a/1_python/1_llm-prediction/speculative-decoding.md b/1_python/1_llm-prediction/speculative-decoding.md
index 6f5c305..ef5f821 100644
--- a/1_python/1_llm-prediction/speculative-decoding.md
+++ b/1_python/1_llm-prediction/speculative-decoding.md
@@ -1,6 +1,6 @@
---
title: Speculative Decoding
-description: API to use a draft model in speculative decoding in `lmstudio-python`
+description: API to use a draft model in speculative decoding in lmstudio-python
index: 5
---
@@ -10,48 +10,42 @@ Speculative decoding is a technique that can substantially increase the generati
To use speculative decoding in `lmstudio-python`, simply provide a `draftModel` parameter when performing the prediction. You do not need to load the draft model separately.
-```lms_code_snippet
- variants:
- "Non-streaming":
- language: python
- code: |
- import lmstudio as lms
-
- main_model_key = "qwen2.5-7b-instruct"
- draft_model_key = "qwen2.5-0.5b-instruct"
-
- model = lms.llm(main_model_key)
- result = model.respond(
- "What are the prime numbers between 0 and 100?",
- config={
- "draftModel": draft_model_key,
- }
- )
-
- print(result)
- stats = result.stats
- print(f"Accepted {stats.accepted_draft_tokens_count}/{stats.predicted_tokens_count} tokens")
-
-
- Streaming:
- language: python
- code: |
- import lmstudio as lms
-
- main_model_key = "qwen2.5-7b-instruct"
- draft_model_key = "qwen2.5-0.5b-instruct"
-
- model = lms.llm(main_model_key)
- prediction_stream = model.respond_stream(
- "What are the prime numbers between 0 and 100?",
- config={
- "draftModel": draft_model_key,
- }
- )
- for fragment in prediction_stream:
- print(fragment.content, end="", flush=True)
- print() # Advance to a new line at the end of the response
-
- stats = prediction_stream.result().stats
- print(f"Accepted {stats.accepted_draft_tokens_count}/{stats.predicted_tokens_count} tokens")
+```python tab="Non-streaming"
+import lmstudio as lms
+
+main_model_key = "qwen2.5-7b-instruct"
+draft_model_key = "qwen2.5-0.5b-instruct"
+
+model = lms.llm(main_model_key)
+result = model.respond(
+ "What are the prime numbers between 0 and 100?",
+ config={
+ "draftModel": draft_model_key,
+ }
+)
+
+print(result)
+stats = result.stats
+print(f"Accepted {stats.accepted_draft_tokens_count}/{stats.predicted_tokens_count} tokens")
+```
+
+```python tab="Streaming"
+import lmstudio as lms
+
+main_model_key = "qwen2.5-7b-instruct"
+draft_model_key = "qwen2.5-0.5b-instruct"
+
+model = lms.llm(main_model_key)
+prediction_stream = model.respond_stream(
+ "What are the prime numbers between 0 and 100?",
+ config={
+ "draftModel": draft_model_key,
+ }
+)
+for fragment in prediction_stream:
+ print(fragment.content, end="", flush=True)
+print() # Advance to a new line at the end of the response
+
+stats = prediction_stream.result().stats
+print(f"Accepted {stats.accepted_draft_tokens_count}/{stats.predicted_tokens_count} tokens")
```
diff --git a/1_python/1_llm-prediction/structured-response.md b/1_python/1_llm-prediction/structured-response.md
index 3c6f910..2270f56 100644
--- a/1_python/1_llm-prediction/structured-response.md
+++ b/1_python/1_llm-prediction/structured-response.md
@@ -39,64 +39,53 @@ while `lmstudio.BaseModel` is a `msgspec.Struct` subclass that implements `.mode
#### Define a Class Based Schema
-```lms_code_snippet
- variants:
- "pydantic.BaseModel":
- language: python
- code: |
- from pydantic import BaseModel
-
- # A class based schema for a book
- class BookSchema(BaseModel):
- title: str
- author: str
- year: int
-
- "lmstudio.BaseModel":
- language: python
- code: |
- from lmstudio import BaseModel
-
- # A class based schema for a book
- class BookSchema(BaseModel):
- title: str
- author: str
- year: int
+```python tab="pydantic.BaseModel"
+from pydantic import BaseModel
+
+# A class based schema for a book
+class BookSchema(BaseModel):
+ title: str
+ author: str
+ year: int
+```
+
+```python tab="lmstudio.BaseModel"
+from lmstudio import BaseModel
+# A class based schema for a book
+class BookSchema(BaseModel):
+ title: str
+ author: str
+ year: int
```
#### Generate a Structured Response
-```lms_code_snippet
- variants:
- "Non-streaming":
- language: python
- code: |
- result = model.respond("Tell me about The Hobbit", response_format=BookSchema)
- book = result.parsed
-
- print(book)
- # ^
- # Note that `book` is correctly typed as { title: string, author: string, year: number }
-
- Streaming:
- language: python
- code: |
- prediction_stream = model.respond_stream("Tell me about The Hobbit", response_format=BookSchema)
-
- # Optionally stream the response
- # for fragment in prediction:
- # print(fragment.content, end="", flush=True)
- # print()
- # Note that even for structured responses, the *fragment* contents are still only text
-
- # Get the final structured result
- result = prediction_stream.result()
- book = result.parsed
-
- print(book)
- # ^
- # Note that `book` is correctly typed as { title: string, author: string, year: number }
+```python tab="Non-streaming"
+result = model.respond("Tell me about The Hobbit", response_format=BookSchema)
+book = result.parsed
+
+print(book)
+# ^
+# Note that `book` is correctly typed as { title: string, author: string, year: number }
+```
+
+```python tab="Streaming"
+prediction_stream = model.respond_stream("Tell me about The Hobbit", response_format=BookSchema)
+
+# Optionally stream the response
+# for fragment in prediction:
+# print(fragment.content, end="", flush=True)
+# print()
+# Note that even for structured responses, the *fragment* contents are still only text
+
+# Get the final structured result
+result = prediction_stream.result()
+book = result.parsed
+
+print(book)
+# ^
+# Note that `book` is correctly typed as { title: string, author: string, year: number }
```
## Enforce Using a JSON Schema
@@ -120,36 +109,31 @@ schema = {
#### Generate a Structured Response
-```lms_code_snippet
- variants:
- "Non-streaming":
- language: python
- code: |
- result = model.respond("Tell me about The Hobbit", response_format=schema)
- book = result.parsed
-
- print(book)
- # ^
- # Note that `book` is correctly typed as { title: string, author: string, year: number }
-
- Streaming:
- language: python
- code: |
- prediction_stream = model.respond_stream("Tell me about The Hobbit", response_format=schema)
-
- # Stream the response
- for fragment in prediction:
- print(fragment.content, end="", flush=True)
- print()
- # Note that even for structured responses, the *fragment* contents are still only text
-
- # Get the final structured result
- result = prediction_stream.result()
- book = result.parsed
-
- print(book)
- # ^
- # Note that `book` is correctly typed as { title: string, author: string, year: number }
+```python tab="Non-streaming"
+result = model.respond("Tell me about The Hobbit", response_format=schema)
+book = result.parsed
+
+print(book)
+# ^
+# Note that `book` is correctly typed as { title: string, author: string, year: number }
+```
+
+```python tab="Streaming"
+prediction_stream = model.respond_stream("Tell me about The Hobbit", response_format=schema)
+
+# Stream the response
+for fragment in prediction:
+ print(fragment.content, end="", flush=True)
+print()
+# Note that even for structured responses, the *fragment* contents are still only text
+
+# Get the final structured result
+result = prediction_stream.result()
+book = result.parsed
+
+print(book)
+# ^
+# Note that `book` is correctly typed as { title: string, author: string, year: number }
```
diff --git a/1_python/1_llm-prediction/working-with-chats.md b/1_python/1_llm-prediction/working-with-chats.md
index 3be7998..d88a710 100644
--- a/1_python/1_llm-prediction/working-with-chats.md
+++ b/1_python/1_llm-prediction/working-with-chats.md
@@ -12,12 +12,8 @@ There are a few ways to represent a chat when using the SDK.
If your chat only has one single user message, you can use a single string to represent the chat.
Here is an example with the `.respond` method.
-```lms_code_snippet
-variants:
- "Single string":
- language: python
- code: |
- prediction = llm.respond("What is the meaning of life?")
+```python
+prediction = llm.respond("What is the meaning of life?")
```
## Option 2: Using the `Chat` Helper Class
@@ -28,35 +24,25 @@ Here is an example with the `Chat` class, where the initial system prompt
is supplied when initializing the chat instance, and then the initial user
message is added via the corresponding method call.
-```lms_code_snippet
-variants:
- "Simple chat":
- language: python
- code: |
- chat = Chat("You are a resident AI philosopher.")
- chat.add_user_message("What is the meaning of life?")
+```python
+chat = Chat("You are a resident AI philosopher.")
+chat.add_user_message("What is the meaning of life?")
- prediction = llm.respond(chat)
+prediction = llm.respond(chat)
```
You can also quickly construct a `Chat` object using the `Chat.from_history` method.
-```lms_code_snippet
-variants:
- "Chat history data":
- language: python
- code: |
- chat = Chat.from_history({"messages": [
- { "role": "system", "content": "You are a resident AI philosopher." },
- { "role": "user", "content": "What is the meaning of life?" },
- ]})
-
- "Single string":
- language: python
- code: |
- # This constructs a chat with a single user message
- chat = Chat.from_history("What is the meaning of life?")
+```python tab="Chat history data"
+chat = Chat.from_history({"messages": [
+ { "role": "system", "content": "You are a resident AI philosopher." },
+ { "role": "user", "content": "What is the meaning of life?" },
+]})
+```
+```python tab="Single string"
+# This constructs a chat with a single user message
+chat = Chat.from_history("What is the meaning of life?")
```
## Option 3: Providing Chat History Data Directly
@@ -64,13 +50,9 @@ variants:
As the APIs that accept chat histories use `Chat.from_history` internally,
they also accept the chat history data format as a regular dictionary:
-```lms_code_snippet
-variants:
- "Chat history data":
- language: python
- code: |
- prediction = llm.respond({"messages": [
- { "role": "system", "content": "You are a resident AI philosopher." },
- { "role": "user", "content": "What is the meaning of life?" },
- ]})
+```python
+prediction = llm.respond({"messages": [
+ { "role": "system", "content": "You are a resident AI philosopher." },
+ { "role": "user", "content": "What is the meaning of life?" },
+]})
```
diff --git a/1_python/2_agent/act.md b/1_python/2_agent/act.md
index 3e90948..24878b9 100644
--- a/1_python/2_agent/act.md
+++ b/1_python/2_agent/act.md
@@ -1,6 +1,6 @@
---
-title: The `.act()` call
-description: How to use the `.act()` call to turn LLMs into autonomous agents that can perform tasks on your local machine.
+title: The .act() call
+description: How to use the .act() call to turn LLMs into autonomous agents that can perform tasks on your local machine.
index: 1
---
@@ -24,23 +24,19 @@ With this in mind, we say that the `.act()` API is an automatic "multi-round" to
### Quick Example
-```lms_code_snippet
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import lmstudio as lms
-
- def multiply(a: float, b: float) -> float:
- """Given two numbers a and b. Returns the product of them."""
- return a * b
-
- model = lms.llm("qwen2.5-7b-instruct")
- model.act(
- "What is the result of 12345 multiplied by 54321?",
- [multiply],
- on_message=print,
- )
+```python
+import lmstudio as lms
+
+def multiply(a: float, b: float) -> float:
+ """Given two numbers a and b. Returns the product of them."""
+ return a * b
+
+model = lms.llm("qwen2.5-7b-instruct")
+model.act(
+ "What is the result of 12345 multiplied by 54321?",
+ [multiply],
+ on_message=print,
+)
```
### What does it mean for an LLM to "use a tool"?
@@ -76,88 +72,79 @@ Some general guidance when selecting a model:
The following code demonstrates how to provide multiple tools in a single `.act()` call.
-```lms_code_snippet
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import math
- import lmstudio as lms
-
- def add(a: int, b: int) -> int:
- """Given two numbers a and b, returns the sum of them."""
- return a + b
-
- def is_prime(n: int) -> bool:
- """Given a number n, returns True if n is a prime number."""
- if n < 2:
- return False
- sqrt = int(math.sqrt(n))
- for i in range(2, sqrt):
- if n % i == 0:
- return False
- return True
-
- model = lms.llm("qwen2.5-7b-instruct")
- model.act(
- "Is the result of 12345 + 45668 a prime? Think step by step.",
- [add, is_prime],
- on_message=print,
- )
+```python
+import math
+import lmstudio as lms
+
+def add(a: int, b: int) -> int:
+ """Given two numbers a and b, returns the sum of them."""
+ return a + b
+
+def is_prime(n: int) -> bool:
+ """Given a number n, returns True if n is a prime number."""
+ if n < 2:
+ return False
+ sqrt = int(math.sqrt(n))
+ for i in range(2, sqrt):
+ if n % i == 0:
+ return False
+ return True
+
+model = lms.llm("qwen2.5-7b-instruct")
+model.act(
+ "Is the result of 12345 + 45668 a prime? Think step by step.",
+ [add, is_prime],
+ on_message=print,
+)
```
### Example: Chat Loop with Create File Tool
The following code creates a conversation loop with an LLM agent that can create files.
-```lms_code_snippet
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import readline # Enables input line editing
- from pathlib import Path
-
- import lmstudio as lms
-
- def create_file(name: str, content: str):
- """Create a file with the given name and content."""
- dest_path = Path(name)
- if dest_path.exists():
- return "Error: File already exists."
- try:
- dest_path.write_text(content, encoding="utf-8")
- except Exception as exc:
- return "Error: {exc!r}"
- return "File created."
-
- def print_fragment(fragment, round_index=0):
- # .act() supplies the round index as the second parameter
- # Setting a default value means the callback is also
- # compatible with .complete() and .respond().
- print(fragment.content, end="", flush=True)
-
- model = lms.llm()
- chat = lms.Chat("You are a task focused AI assistant")
-
- while True:
- try:
- user_input = input("You (leave blank to exit): ")
- except EOFError:
- print()
- break
- if not user_input:
- break
- chat.add_user_message(user_input)
- print("Bot: ", end="", flush=True)
- model.act(
- chat,
- [create_file],
- on_message=chat.append,
- on_prediction_fragment=print_fragment,
- )
- print()
-
+```python
+import readline # Enables input line editing
+from pathlib import Path
+
+import lmstudio as lms
+
+def create_file(name: str, content: str):
+ """Create a file with the given name and content."""
+ dest_path = Path(name)
+ if dest_path.exists():
+ return "Error: File already exists."
+ try:
+ dest_path.write_text(content, encoding="utf-8")
+ except Exception as exc:
+ return "Error: {exc!r}"
+ return "File created."
+
+def print_fragment(fragment, round_index=0):
+ # .act() supplies the round index as the second parameter
+ # Setting a default value means the callback is also
+ # compatible with .complete() and .respond().
+ print(fragment.content, end="", flush=True)
+
+model = lms.llm()
+chat = lms.Chat("You are a task focused AI assistant")
+
+while True:
+ try:
+ user_input = input("You (leave blank to exit): ")
+ except EOFError:
+ print()
+ break
+ if not user_input:
+ break
+ chat.add_user_message(user_input)
+ print("Bot: ", end="", flush=True)
+ model.act(
+ chat,
+ [create_file],
+ on_message=chat.append,
+ on_prediction_fragment=print_fragment,
+ )
+ print()
```
### Progress Callbacks
diff --git a/1_python/2_agent/meta.json b/1_python/2_agent/meta.json
new file mode 100644
index 0000000..64a4fe7
--- /dev/null
+++ b/1_python/2_agent/meta.json
@@ -0,0 +1,8 @@
+{
+ "title": "Agentic Flows",
+ "pages": [
+ "act",
+ "_index",
+ "tools"
+ ]
+}
diff --git a/1_python/2_agent/tools.md b/1_python/2_agent/tools.md
index fa44073..70fa818 100644
--- a/1_python/2_agent/tools.md
+++ b/1_python/2_agent/tools.md
@@ -1,6 +1,6 @@
---
title: Tool Definition
-description: Define tools to be called by the LLM, and pass them to the model in the `act()` call.
+description: Define tools to be called by the LLM, and pass them to the model in the act() call.
index: 2
---
@@ -13,58 +13,51 @@ name and description passed to the language model.
Follow one of the following examples to define functions as tools (the first approach
is typically going to be the most convenient):
-```lms_code_snippet
- variants:
- "Python function":
- language: python
- code: |
- # Type hinted functions with clear names and docstrings
- # may be used directly as tool definitions
- def add(a: int, b: int) -> int:
- """Given two numbers a and b, returns the sum of them."""
- # The SDK ensures arguments are coerced to their specified types
- return a + b
-
- # Pass `add` directly to `act()` as a tool definition
-
- "ToolFunctionDef.from_callable":
- language: python
- code: |
- from lmstudio import ToolFunctionDef
-
- def cryptic_name(a: int, b: int) -> int:
- return a + b
-
- # Type hinted functions with cryptic names and missing or poor docstrings
- # can be turned into clear tool definitions with `from_callable`
- tool_def = ToolFunctionDef.from_callable(
- cryptic_name,
- name="add",
- description="Given two numbers a and b, returns the sum of them."
- )
- # Pass `tool_def` to `act()` as a tool definition
-
- "ToolFunctionDef":
- language: python
- code: |
- from lmstudio import ToolFunctionDef
-
- def cryptic_name(a, b):
- return a + b
-
- # Functions without type hints can be used without wrapping them
- # at runtime by defining a tool function directly.
- tool_def = ToolFunctionDef(
- name="add",
- description="Given two numbers a and b, returns the sum of them.",
- parameters={
- "a": int,
- "b": int,
- },
- implementation=cryptic_name,
- )
- # Pass `tool_def` to `act()` as a tool definition
+```python tab="Python function"
+# Type hinted functions with clear names and docstrings
+# may be used directly as tool definitions
+def add(a: int, b: int) -> int:
+ """Given two numbers a and b, returns the sum of them."""
+ # The SDK ensures arguments are coerced to their specified types
+ return a + b
+# Pass `add` directly to `act()` as a tool definition
+```
+
+```python tab="ToolFunctionDef.from_callable"
+from lmstudio import ToolFunctionDef
+
+def cryptic_name(a: int, b: int) -> int:
+ return a + b
+
+# Type hinted functions with cryptic names and missing or poor docstrings
+# can be turned into clear tool definitions with `from_callable`
+tool_def = ToolFunctionDef.from_callable(
+ cryptic_name,
+ name="add",
+ description="Given two numbers a and b, returns the sum of them."
+)
+# Pass `tool_def` to `act()` as a tool definition
+```
+
+```python tab="ToolFunctionDef"
+from lmstudio import ToolFunctionDef
+
+def cryptic_name(a, b):
+ return a + b
+
+# Functions without type hints can be used without wrapping them
+# at runtime by defining a tool function directly.
+tool_def = ToolFunctionDef(
+ name="add",
+ description="Given two numbers a and b, returns the sum of them.",
+ parameters={
+ "a": int,
+ "b": int,
+ },
+ implementation=cryptic_name,
+)
+# Pass `tool_def` to `act()` as a tool definition
```
**Important**: The tool name, description, and the parameter definitions are all passed to the model!
@@ -80,43 +73,32 @@ can essentially turn your LLMs into autonomous agents that can perform tasks on
### Tool Definition
-```lms_code_snippet
- title: "create_file_tool.py"
- variants:
- Python:
- language: python
- code: |
- from pathlib import Path
-
- def create_file(name: str, content: str):
- """Create a file with the given name and content."""
- dest_path = Path(name)
- if dest_path.exists():
- return "Error: File already exists."
- try:
- dest_path.write_text(content, encoding="utf-8")
- except Exception as exc:
- return "Error: {exc!r}"
- return "File created."
-
+```python title="create_file_tool.py"
+from pathlib import Path
+
+def create_file(name: str, content: str):
+ """Create a file with the given name and content."""
+ dest_path = Path(name)
+ if dest_path.exists():
+ return "Error: File already exists."
+ try:
+ dest_path.write_text(content, encoding="utf-8")
+ except Exception as exc:
+ return "Error: {exc!r}"
+ return "File created."
```
### Example code using the `create_file` tool:
-```lms_code_snippet
- title: "example.py"
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import lmstudio as lms
- from create_file_tool import create_file
-
- model = lms.llm("qwen2.5-7b-instruct")
- model.act(
- "Please create a file named output.txt with your understanding of the meaning of life.",
- [create_file],
- )
+```python title="example.py"
+import lmstudio as lms
+from create_file_tool import create_file
+
+model = lms.llm("qwen2.5-7b-instruct")
+model.act(
+ "Please create a file named output.txt with your understanding of the meaning of life.",
+ [create_file],
+)
```
## Handling tool calling errors
@@ -134,34 +116,29 @@ This error handling behaviour can be overridden using the `handle_invalid_tool_r
callback. For example, the following code reverts the error handling back to raising
exceptions locally in the client:
-```lms_code_snippet
- title: "example.py"
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import lmstudio as lms
-
- def divide(numerator: float, denominator: float) -> float:
- """Divide the given numerator by the given denominator. Return the result."""
- return numerator / denominator
-
- model = lms.llm("qwen2.5-7b-instruct")
- chat = Chat()
- chat.add_user_message(
- "Attempt to divide 1 by 0 using the tool. Explain the result."
- )
-
- def _raise_exc_in_client(
- exc: LMStudioPredictionError, request: ToolCallRequest | None
- ) -> None:
- raise exc
-
- act_result = llm.act(
- chat,
- [divide],
- handle_invalid_tool_request=_raise_exc_in_client,
- )
+```python title="example.py"
+import lmstudio as lms
+
+def divide(numerator: float, denominator: float) -> float:
+ """Divide the given numerator by the given denominator. Return the result."""
+ return numerator / denominator
+
+model = lms.llm("qwen2.5-7b-instruct")
+chat = Chat()
+chat.add_user_message(
+ "Attempt to divide 1 by 0 using the tool. Explain the result."
+)
+
+def _raise_exc_in_client(
+ exc: LMStudioPredictionError, request: ToolCallRequest | None
+) -> None:
+ raise exc
+
+act_result = llm.act(
+ chat,
+ [divide],
+ handle_invalid_tool_request=_raise_exc_in_client,
+)
```
When a tool request is passed in, the callback results are processed as follows:
diff --git a/1_python/3_embedding/index.md b/1_python/3_embedding/index.md
index 7a7a022..3e80f1a 100644
--- a/1_python/3_embedding/index.md
+++ b/1_python/3_embedding/index.md
@@ -18,16 +18,10 @@ lms get nomic-ai/nomic-embed-text-v1.5
To convert a string to a vector representation, pass it to the `embed` method on the corresponding embedding model handle.
-```lms_code_snippet
- title: "example.py"
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import lmstudio as lms
+```python title="example.py"
+import lmstudio as lms
- model = lms.embedding_model("nomic-embed-text-v1.5")
-
- embedding = model.embed("Hello, world!")
+model = lms.embedding_model("nomic-embed-text-v1.5")
+embedding = model.embed("Hello, world!")
```
diff --git a/1_python/4_tokenization/index.md b/1_python/4_tokenization/index.md
index ec71f6f..7121855 100644
--- a/1_python/4_tokenization/index.md
+++ b/1_python/4_tokenization/index.md
@@ -12,31 +12,23 @@ You can tokenize a string with a loaded LLM or embedding model using the SDK.
In the below examples, the LLM reference can be replaced with an
embedding model reference without requiring any other changes.
-```lms_code_snippet
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import lmstudio as lms
+```python
+import lmstudio as lms
- model = lms.llm()
+model = lms.llm()
- tokens = model.tokenize("Hello, world!")
+tokens = model.tokenize("Hello, world!")
- print(tokens) # Array of token IDs.
+print(tokens) # Array of token IDs.
```
## Count tokens
If you only care about the number of tokens, simply check the length of the resulting array.
-```lms_code_snippet
- variants:
- "Python (convenience API)":
- language: python
- code: |
- token_count = len(model.tokenize("Hello, world!"))
- print("Token count:", token_count)
+```python
+token_count = len(model.tokenize("Hello, world!"))
+print("Token count:", token_count)
```
### Example: count context
@@ -47,32 +39,27 @@ You can determine if a given conversation fits into a model's context by doing t
2. Count the number of tokens in the string.
3. Compare the token count to the model's context length.
-```lms_code_snippet
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import lmstudio as lms
-
- def does_chat_fit_in_context(model: lms.LLM, chat: lms.Chat) -> bool:
- # Convert the conversation to a string using the prompt template.
- formatted = model.apply_prompt_template(chat)
- # Count the number of tokens in the string.
- token_count = len(model.tokenize(formatted))
- # Get the current loaded context length of the model
- context_length = model.get_context_length()
- return token_count < context_length
-
- model = lms.llm()
-
- chat = lms.Chat.from_history({
- "messages": [
- { "role": "user", "content": "What is the meaning of life." },
- { "role": "assistant", "content": "The meaning of life is..." },
- # ... More messages
- ]
- })
-
- print("Fits in context:", does_chat_fit_in_context(model, chat))
-
+```python
+import lmstudio as lms
+
+def does_chat_fit_in_context(model: lms.LLM, chat: lms.Chat) -> bool:
+ # Convert the conversation to a string using the prompt template.
+ formatted = model.apply_prompt_template(chat)
+ # Count the number of tokens in the string.
+ token_count = len(model.tokenize(formatted))
+ # Get the current loaded context length of the model
+ context_length = model.get_context_length()
+ return token_count < context_length
+
+model = lms.llm()
+
+chat = lms.Chat.from_history({
+ "messages": [
+ { "role": "user", "content": "What is the meaning of life." },
+ { "role": "assistant", "content": "The meaning of life is..." },
+ # ... More messages
+ ]
+})
+
+print("Fits in context:", does_chat_fit_in_context(model, chat))
```
diff --git a/1_python/5_manage-models/_download-models.md b/1_python/5_manage-models/_download-models.md
index 43ef689..71234de 100644
--- a/1_python/5_manage-models/_download-models.md
+++ b/1_python/5_manage-models/_download-models.md
@@ -22,35 +22,31 @@ Downloading models consists of three steps:
TODO: Actually translate this example code from TS to Python
-```lms_code_snippet
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import { LMStudioClient } from "@lmstudio/sdk";
-
- const client = new LMStudioClient()
-
- # 1. Search for the model you want
- # Specify any/all of searchTerm, limit, compatibilityTypes
- const searchResults = client.repository.searchModels({
- searchTerm: "llama 3.2 1b", # Search for Llama 3.2 1B
- limit: 5, # Get top 5 results
- compatibilityTypes: ["gguf"], # Only download GGUFs
- })
-
- # 2. Find download options
- const bestResult = searchResults[0];
- const downloadOptions = bestResult.getDownloadOptions()
-
- # Let's download Q4_K_M, a good middle ground quantization
- const desiredModel = downloadOptions.find(option => option.quantization === 'Q4_K_M')
-
- # 3. Download it!
- const modelKey = desiredModel.download()
-
- # This returns a path you can use to load the model
- const loadedModel = client.llm.model(modelKey)
+```python
+import { LMStudioClient } from "@lmstudio/sdk";
+
+const client = new LMStudioClient()
+
+# 1. Search for the model you want
+# Specify any/all of searchTerm, limit, compatibilityTypes
+const searchResults = client.repository.searchModels({
+ searchTerm: "llama 3.2 1b", # Search for Llama 3.2 1B
+ limit: 5, # Get top 5 results
+ compatibilityTypes: ["gguf"], # Only download GGUFs
+})
+
+# 2. Find download options
+const bestResult = searchResults[0];
+const downloadOptions = bestResult.getDownloadOptions()
+
+# Let's download Q4_K_M, a good middle ground quantization
+const desiredModel = downloadOptions.find(option => option.quantization === 'Q4_K_M')
+
+# 3. Download it!
+const modelKey = desiredModel.download()
+
+# This returns a path you can use to load the model
+const loadedModel = client.llm.model(modelKey)
```
## Advanced Usage
@@ -64,51 +60,44 @@ If you want to get updates on the progress of this process, you can provide call
one for progress updates and/or one when the download is being finalized
(validating checksums, etc.)
-```lms_code_snippet
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import { LMStudioClient, type DownloadProgressUpdate } from "@lmstudio/sdk";
-
- function printProgressUpdate(update: DownloadProgressUpdate) {
- process.stdout.write(`Downloaded ${update.downloadedBytes} bytes of ${update.totalBytes} total \
- at ${update.speed_bytes_per_second} bytes/sec`)
- }
+```python tab="Python (convenience API)"
+import { LMStudioClient, type DownloadProgressUpdate } from "@lmstudio/sdk";
- const client = new LMStudioClient()
+function printProgressUpdate(update: DownloadProgressUpdate) {
+ process.stdout.write(`Downloaded ${update.downloadedBytes} bytes of ${update.totalBytes} total \
+ at ${update.speed_bytes_per_second} bytes/sec`)
+}
- # ... Same code as before ...
+const client = new LMStudioClient()
- modelKey = desiredModel.download({
- onProgress: printProgressUpdate,
- onStartFinalizing: () => console.log("Finalizing..."),
- })
+# ... Same code as before ...
- const loadedModel = client.llm.model(modelKey)
+modelKey = desiredModel.download({
+ onProgress: printProgressUpdate,
+ onStartFinalizing: () => console.log("Finalizing..."),
+})
- "Python (scoped resource API)":
- language: python
- code: |
- import lmstudio as lms
+const loadedModel = client.llm.model(modelKey)
+```
- def print_progress_update(update: lmstudio.DownloadProgressUpdate) -> None:
- print(f"Downloaded {update.downloaded_bytes} bytes of {update.total_bytes} total \
- at {update.speed_bytes_per_second} bytes/sec")
+```python tab="Python (scoped resource API)"
+import lmstudio as lms
- with lms.Client() as client:
- # ... Same code as before ...
+def print_progress_update(update: lmstudio.DownloadProgressUpdate) -> None:
+ print(f"Downloaded {update.downloaded_bytes} bytes of {update.total_bytes} total \
+ at {update.speed_bytes_per_second} bytes/sec")
- model_key = desired_model.download(
- on_progress=print_progress_update,
- on_finalize: lambda: print("Finalizing download...")
- )
+with lms.Client() as client:
+ # ... Same code as before ...
- "Python (asynchronous API)":
- language: python
- code: |
- # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
- # Requires Python SDK version 1.5.0 or later
- import lmstudio as lms
+ model_key = desired_model.download(
+ on_progress=print_progress_update,
+ on_finalize: lambda: print("Finalizing download...")
+ )
+```
+```python tab="Python (asynchronous API)"
+# Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
+# Requires Python SDK version 1.5.0 or later
+import lmstudio as lms
```
diff --git a/1_python/5_manage-models/list-downloaded.md b/1_python/5_manage-models/list-downloaded.md
index 6f09c1f..9035458 100644
--- a/1_python/5_manage-models/list-downloaded.md
+++ b/1_python/5_manage-models/list-downloaded.md
@@ -10,48 +10,41 @@ downloaded model reference to be converted in the full SDK handle for a loaded m
## Available Models on the LM Studio Server
-```lms_code_snippet
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import lmstudio as lms
+```python tab="Python (convenience API)"
+import lmstudio as lms
- downloaded = lms.list_downloaded_models()
- llm_only = lms.list_downloaded_models("llm")
- embedding_only = lms.list_downloaded_models("embedding")
+downloaded = lms.list_downloaded_models()
+llm_only = lms.list_downloaded_models("llm")
+embedding_only = lms.list_downloaded_models("embedding")
- for model in downloaded:
- print(model)
-
- "Python (scoped resource API)":
- language: python
- code: |
- import lmstudio as lms
+for model in downloaded:
+ print(model)
+```
- with lms.Client() as client:
- downloaded = client.list_downloaded_models()
- llm_only = client.llm.list_downloaded()
- embedding_only = client.embedding.list_downloaded()
+```python tab="Python (scoped resource API)"
+import lmstudio as lms
- for model in downloaded:
- print(model)
+with lms.Client() as client:
+ downloaded = client.list_downloaded_models()
+ llm_only = client.llm.list_downloaded()
+ embedding_only = client.embedding.list_downloaded()
- "Python (asynchronous API)":
- language: python
- code: |
- # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
- # Requires Python SDK version 1.5.0 or later
- import lmstudio as lms
+for model in downloaded:
+ print(model)
+```
- async with lms.AsyncClient() as client:
- downloaded = await client.list_downloaded_models()
- llm_only = await client.llm.list_downloaded()
- embedding_only = await client.embedding.list_downloaded()
+```python tab="Python (asynchronous API)"
+# Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
+# Requires Python SDK version 1.5.0 or later
+import lmstudio as lms
- for model in downloaded:
- print(model)
+async with lms.AsyncClient() as client:
+ downloaded = await client.list_downloaded_models()
+ llm_only = await client.llm.list_downloaded()
+ embedding_only = await client.embedding.list_downloaded()
+for model in downloaded:
+ print(model)
```
This will give you results equivalent to using [`lms ls`](../../cli/ls) in the CLI.
diff --git a/1_python/5_manage-models/list-loaded.md b/1_python/5_manage-models/list-loaded.md
index e41ebd2..bb32254 100644
--- a/1_python/5_manage-models/list-loaded.md
+++ b/1_python/5_manage-models/list-loaded.md
@@ -11,43 +11,36 @@ The results are full SDK model handles, allowing access to all model functionali
This will give you results equivalent to using [`lms ps`](../../cli/ps) in the CLI.
-```lms_code_snippet
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import lmstudio as lms
+```python tab="Python (convenience API)"
+import lmstudio as lms
- all_loaded_models = lms.list_loaded_models()
- llm_only = lms.list_loaded_models("llm")
- embedding_only = lms.list_loaded_models("embedding")
+all_loaded_models = lms.list_loaded_models()
+llm_only = lms.list_loaded_models("llm")
+embedding_only = lms.list_loaded_models("embedding")
- print(all_loaded_models)
-
- Python (scoped resource API):
- language: python
- code: |
- import lms
+print(all_loaded_models)
+```
- with lms.Client() as client:
- all_loaded_models = client.list_loaded_models()
- llm_only = client.llm.list_loaded()
- embedding_only = client.embedding.list_loaded()
+```python tab="Python (scoped resource API)"
+import lms
- print(all_loaded_models)
+with lms.Client() as client:
+ all_loaded_models = client.list_loaded_models()
+ llm_only = client.llm.list_loaded()
+ embedding_only = client.embedding.list_loaded()
- "Python (asynchronous API)":
- language: python
- code: |
- # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
- # Requires Python SDK version 1.5.0 or later
- import lmstudio as lms
+ print(all_loaded_models)
+```
- async with lms.AsyncClient() as client:
- all_loaded_models = await client.list_loaded_models()
- llm_only = await client.llm.list_loaded()
- embedding_only = await client.embedding.list_loaded()
+```python tab="Python (asynchronous API)"
+# Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
+# Requires Python SDK version 1.5.0 or later
+import lmstudio as lms
- print(all_loaded_models)
+async with lms.AsyncClient() as client:
+ all_loaded_models = await client.list_loaded_models()
+ llm_only = await client.llm.list_loaded()
+ embedding_only = await client.embedding.list_loaded()
+ print(all_loaded_models)
```
diff --git a/1_python/5_manage-models/loading.md b/1_python/5_manage-models/loading.md
deleted file mode 100644
index 6a43ddf..0000000
--- a/1_python/5_manage-models/loading.md
+++ /dev/null
@@ -1,232 +0,0 @@
----
-title: "Manage Models in Memory"
-sidebar_title: Load and Access Models
-description: APIs to load, access, and unload models from memory
----
-
-AI models are huge. It can take a while to load them into memory. LM Studio's SDK allows you to precisely control this process.
-
-**Model namespaces:**
-
-- LLMs are accessed through the `client.llm` namespace
-- Embedding models are accessed through the `client.embedding` namespace
-- `lmstudio.llm` is equivalent to `client.llm.model` on the default client
-- `lmstudio.embedding_model` is equivalent to `client.embedding.model` on the default client
-
-**Most commonly:**
-
-- Use `.model()` to get any currently loaded model
-- Use `.model("model-key")` to use a specific model
-
-**Advanced (manual model management):**
-
-- Use `.load_new_instance("model-key")` to load a new instance of a model
-- Use `.unload("model-key")` or `model_handle.unload()` to unload a model from memory
-
-## Get the Current Model with `.model()`
-
-If you already have a model loaded in LM Studio (either via the GUI or `lms load`),
-you can use it by calling `.model()` without any arguments.
-
-```lms_code_snippet
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import lmstudio as lms
-
- model = lms.llm()
-
- "Python (scoped resource API)":
- language: python
- code: |
- import lmstudio as lms
-
- with lms.Client() as client:
- model = client.llm.model()
-
- "Python (asynchronous API)":
- language: python
- code: |
- # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
- # Requires Python SDK version 1.5.0 or later
- import lmstudio as lms
-
- async with lms.AsyncClient() as client:
- model = await client.llm.model()
-
-```
-
-## Get a Specific Model with `.model("model-key")`
-
-If you want to use a specific model, you can provide the model key as an argument to `.model()`.
-
-#### Get if Loaded, or Load if not
-
-Calling `.model("model-key")` will load the model if it's not already loaded, or return the existing instance if it is.
-
-```lms_code_snippet
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import lmstudio as lms
-
- model = lms.llm("qwen/qwen3-4b-2507")
-
- "Python (scoped resource API)":
- language: python
- code: |
- import lmstudio as lms
-
- with lms.Client() as client:
- model = client.llm.model("qwen/qwen3-4b-2507")
-
- "Python (asynchronous API)":
- language: python
- code: |
- # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
- # Requires Python SDK version 1.5.0 or later
- import lmstudio as lms
-
- async with lms.AsyncClient() as client:
- model = await client.llm.model("qwen/qwen3-4b-2507")
-
-```
-
-
-
-## Load a New Instance of a Model with `.load_new_instance()`
-
-Use `load_new_instance()` to load a new instance of a model, even if one already exists.
-This allows you to have multiple instances of the same or different models loaded at the same time.
-
-```lms_code_snippet
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import lmstudio as lms
-
- client = lms.get_default_client()
- model = client.llm.load_new_instance("qwen/qwen3-4b-2507")
- another_model = client.llm.load_new_instance("qwen/qwen3-4b-2507", "my-second-model")
-
- "Python (scoped resource API)":
- language: python
- code: |
- import lmstudio as lms
-
- with lms.Client() as client:
- model = client.llm.load_new_instance("qwen/qwen3-4b-2507")
- another_model = client.llm.load_new_instance("qwen/qwen3-4b-2507", "my-second-model")
-
- "Python (asynchronous API)":
- language: python
- code: |
- # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
- # Requires Python SDK version 1.5.0 or later
- import lmstudio as lms
-
- async with lms.AsyncClient() as client:
- model = await client.llm.load_new_instance("qwen/qwen3-4b-2507")
- another_model = await client.llm.load_new_instance("qwen/qwen3-4b-2507", "my-second-model")
-
-```
-
-
-
-### Note about Instance Identifiers
-
-If you provide an instance identifier that already exists, the server will throw an error.
-So if you don't really care, it's safer to not provide an identifier, in which case
-the server will generate one for you. You can always check in the server tab in LM Studio, too!
-
-## Unload a Model from Memory with `.unload()`
-
-Once you no longer need a model, you can unload it by simply calling `unload()` on its handle.
-
-```lms_code_snippet
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import lmstudio as lms
-
- model = lms.llm()
- model.unload()
-
- "Python (scoped resource API)":
- language: python
- code: |
- import lmstudio as lms
-
- with lms.Client() as client:
- model = client.llm.model()
- model.unload()
-
- "Python (asynchronous API)":
- language: python
- code: |
- # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
- # Requires Python SDK version 1.5.0 or later
- import lmstudio as lms
-
- async with lms.AsyncClient() as client:
- model = await client.llm.model()
- await model.unload()
-
-```
-
-## Set Custom Load Config Parameters
-
-You can also specify the same load-time configuration options when loading a model, such as Context Length and GPU offload.
-
-See [load-time configuration](../llm-prediction/parameters) for more.
-
-## Set an Auto Unload Timer (TTL)
-
-You can specify a _time to live_ for a model you load, which is the idle time (in seconds)
-after the last request until the model unloads. See [Idle TTL](/docs/app/api/ttl-and-auto-evict) for more on this.
-
-```lms_protip
-If you specify a TTL to `model()`, it will only apply if `model()` loads
-a new instance, and will _not_ retroactively change the TTL of an existing instance.
-```
-
-```lms_code_snippet
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import lmstudio as lms
-
- model = lms.llm("qwen/qwen3-4b-2507", ttl=3600)
-
- "Python (scoped resource API)":
- language: python
- code: |
- import lmstudio as lms
-
- with lms.Client() as client:
- model = client.llm.model("qwen/qwen3-4b-2507", ttl=3600)
-
- "Python (asynchronous API)":
- language: python
- code: |
- # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
- # Requires Python SDK version 1.5.0 or later
- import lmstudio as lms
-
- async with lms.AsyncClient() as client:
- model = await client.llm.model("qwen/qwen3-4b-2507", ttl=3600)
-
-```
-
-
diff --git a/1_python/5_manage-models/loading.mdx b/1_python/5_manage-models/loading.mdx
new file mode 100644
index 0000000..9fdf941
--- /dev/null
+++ b/1_python/5_manage-models/loading.mdx
@@ -0,0 +1,185 @@
+---
+title: "Manage Models in Memory"
+sidebar_title: Load and Access Models
+description: APIs to load, access, and unload models from memory
+---
+
+AI models are huge. It can take a while to load them into memory. LM Studio's SDK allows you to precisely control this process.
+
+**Model namespaces:**
+
+- LLMs are accessed through the `client.llm` namespace
+- Embedding models are accessed through the `client.embedding` namespace
+- `lmstudio.llm` is equivalent to `client.llm.model` on the default client
+- `lmstudio.embedding_model` is equivalent to `client.embedding.model` on the default client
+
+**Most commonly:**
+
+- Use `.model()` to get any currently loaded model
+- Use `.model("model-key")` to use a specific model
+
+**Advanced (manual model management):**
+
+- Use `.load_new_instance("model-key")` to load a new instance of a model
+- Use `.unload("model-key")` or `model_handle.unload()` to unload a model from memory
+
+## Get the Current Model with `.model()`
+
+If you already have a model loaded in LM Studio (either via the GUI or `lms load`),
+you can use it by calling `.model()` without any arguments.
+
+```python tab="Python (convenience API)"
+import lmstudio as lms
+
+model = lms.llm()
+```
+
+```python tab="Python (scoped resource API)"
+import lmstudio as lms
+
+with lms.Client() as client:
+ model = client.llm.model()
+```
+
+```python tab="Python (asynchronous API)"
+# Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
+# Requires Python SDK version 1.5.0 or later
+import lmstudio as lms
+
+async with lms.AsyncClient() as client:
+ model = await client.llm.model()
+```
+
+## Get a Specific Model with `.model("model-key")`
+
+If you want to use a specific model, you can provide the model key as an argument to `.model()`.
+
+#### Get if Loaded, or Load if not
+
+Calling `.model("model-key")` will load the model if it's not already loaded, or return the existing instance if it is.
+
+```python tab="Python (convenience API)"
+import lmstudio as lms
+
+model = lms.llm("qwen/qwen3-4b-2507")
+```
+
+```python tab="Python (scoped resource API)"
+import lmstudio as lms
+
+with lms.Client() as client:
+ model = client.llm.model("qwen/qwen3-4b-2507")
+```
+
+```python tab="Python (asynchronous API)"
+# Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
+# Requires Python SDK version 1.5.0 or later
+import lmstudio as lms
+
+async with lms.AsyncClient() as client:
+ model = await client.llm.model("qwen/qwen3-4b-2507")
+```
+
+## Load a New Instance of a Model with `.load_new_instance()`
+
+Use `load_new_instance()` to load a new instance of a model, even if one already exists.
+This allows you to have multiple instances of the same or different models loaded at the same time.
+
+```python tab="Python (convenience API)"
+import lmstudio as lms
+
+client = lms.get_default_client()
+model = client.llm.load_new_instance("qwen/qwen3-4b-2507")
+another_model = client.llm.load_new_instance("qwen/qwen3-4b-2507", "my-second-model")
+```
+
+```python tab="Python (scoped resource API)"
+import lmstudio as lms
+
+with lms.Client() as client:
+ model = client.llm.load_new_instance("qwen/qwen3-4b-2507")
+ another_model = client.llm.load_new_instance("qwen/qwen3-4b-2507", "my-second-model")
+```
+
+```python tab="Python (asynchronous API)"
+# Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
+# Requires Python SDK version 1.5.0 or later
+import lmstudio as lms
+
+async with lms.AsyncClient() as client:
+ model = await client.llm.load_new_instance("qwen/qwen3-4b-2507")
+ another_model = await client.llm.load_new_instance("qwen/qwen3-4b-2507", "my-second-model")
+```
+
+### Note about Instance Identifiers
+
+If you provide an instance identifier that already exists, the server will throw an error.
+So if you don't really care, it's safer to not provide an identifier, in which case
+the server will generate one for you. You can always check in the server tab in LM Studio, too!
+
+## Unload a Model from Memory with `.unload()`
+
+Once you no longer need a model, you can unload it by simply calling `unload()` on its handle.
+
+```python tab="Python (convenience API)"
+import lmstudio as lms
+
+model = lms.llm()
+model.unload()
+```
+
+```python tab="Python (scoped resource API)"
+import lmstudio as lms
+
+with lms.Client() as client:
+ model = client.llm.model()
+ model.unload()
+```
+
+```python tab="Python (asynchronous API)"
+# Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
+# Requires Python SDK version 1.5.0 or later
+import lmstudio as lms
+
+async with lms.AsyncClient() as client:
+ model = await client.llm.model()
+ await model.unload()
+```
+
+## Set Custom Load Config Parameters
+
+You can also specify the same load-time configuration options when loading a model, such as Context Length and GPU offload.
+
+See [load-time configuration](../llm-prediction/parameters) for more.
+
+## Set an Auto Unload Timer (TTL)
+
+You can specify a _time to live_ for a model you load, which is the idle time (in seconds)
+after the last request until the model unloads. See [Idle TTL](/docs/app/api/ttl-and-auto-evict) for more on this.
+
+
+If you specify a TTL to `model()`, it will only apply if `model()` loads
+a new instance, and will _not_ retroactively change the TTL of an existing instance.
+
+
+```python tab="Python (convenience API)"
+import lmstudio as lms
+
+model = lms.llm("qwen/qwen3-4b-2507", ttl=3600)
+```
+
+```python tab="Python (scoped resource API)"
+import lmstudio as lms
+
+with lms.Client() as client:
+ model = client.llm.model("qwen/qwen3-4b-2507", ttl=3600)
+```
+
+```python tab="Python (asynchronous API)"
+# Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
+# Requires Python SDK version 1.5.0 or later
+import lmstudio as lms
+
+async with lms.AsyncClient() as client:
+ model = await client.llm.model("qwen/qwen3-4b-2507", ttl=3600)
+```
diff --git a/1_python/5_manage-models/meta.json b/1_python/5_manage-models/meta.json
new file mode 100644
index 0000000..9f1b188
--- /dev/null
+++ b/1_python/5_manage-models/meta.json
@@ -0,0 +1,9 @@
+{
+ "title": "Manage Models",
+ "pages": [
+ "_download-models",
+ "list-downloaded",
+ "list-loaded",
+ "loading"
+ ]
+}
diff --git a/1_python/6_model-info/get-context-length.md b/1_python/6_model-info/get-context-length.md
index 4590a76..3515023 100644
--- a/1_python/6_model-info/get-context-length.md
+++ b/1_python/6_model-info/get-context-length.md
@@ -9,13 +9,8 @@ LLMs and embedding models, due to their fundamental architecture, have a propert
It's useful to be able to check the context length of a model, especially as an extra check before providing potentially long input to the model.
-```lms_code_snippet
- title: "example.py"
- variants:
- "Python (convenience API)":
- language: python
- code: |
- context_length = model.get_context_length()
+```python title="example.py"
+context_length = model.get_context_length()
```
The `model` in the above code snippet is an instance of a loaded model you get from the `llm.model` method. See [Manage Models in Memory](../manage-models/loading) for more information.
@@ -28,32 +23,27 @@ You can determine if a given conversation fits into a model's context by doing t
2. Count the number of tokens in the string.
3. Compare the token count to the model's context length.
-```lms_code_snippet
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import lmstudio as lms
-
- def does_chat_fit_in_context(model: lms.LLM, chat: lms.Chat) -> bool:
- # Convert the conversation to a string using the prompt template.
- formatted = model.apply_prompt_template(chat)
- # Count the number of tokens in the string.
- token_count = len(model.tokenize(formatted))
- # Get the current loaded context length of the model
- context_length = model.get_context_length()
- return token_count < context_length
-
- model = lms.llm()
-
- chat = lms.Chat.from_history({
- "messages": [
- { "role": "user", "content": "What is the meaning of life." },
- { "role": "assistant", "content": "The meaning of life is..." },
- # ... More messages
- ]
- })
-
- print("Fits in context:", does_chat_fit_in_context(model, chat))
-
+```python
+import lmstudio as lms
+
+def does_chat_fit_in_context(model: lms.LLM, chat: lms.Chat) -> bool:
+ # Convert the conversation to a string using the prompt template.
+ formatted = model.apply_prompt_template(chat)
+ # Count the number of tokens in the string.
+ token_count = len(model.tokenize(formatted))
+ # Get the current loaded context length of the model
+ context_length = model.get_context_length()
+ return token_count < context_length
+
+model = lms.llm()
+
+chat = lms.Chat.from_history({
+ "messages": [
+ { "role": "user", "content": "What is the meaning of life." },
+ { "role": "assistant", "content": "The meaning of life is..." },
+ # ... More messages
+ ]
+})
+
+print("Fits in context:", does_chat_fit_in_context(model, chat))
```
diff --git a/1_python/6_model-info/get-load-config.md b/1_python/6_model-info/get-load-config.md
deleted file mode 100644
index a2ba677..0000000
--- a/1_python/6_model-info/get-load-config.md
+++ /dev/null
@@ -1,53 +0,0 @@
----
-title: Get Load Config
-description: Get the load configuration of the model
----
-
-_Required Python SDK version_: **1.2.0**
-
-LM Studio allows you to configure certain parameters when loading a model
-[through the server UI](/docs/advanced/per-model) or [through the API](/docs/api/sdk/load-model).
-
-You can retrieve the config with which a given model was loaded using the SDK.
-
-In the below examples, the LLM reference can be replaced with an
-embedding model reference without requiring any other changes.
-
-```lms_protip
-Context length is a special case that [has its own method](/docs/api/sdk/get-context-length).
-```
-
-```lms_code_snippet
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import lmstudio as lms
-
- model = lms.llm()
-
- print(model.get_load_config())
-
- "Python (scoped resource API)":
- language: python
- code: |
- import lmstudio as lms
-
- with lms.Client() as client:
- model = client.llm.model()
-
- print(model.get_load_config())
-
- "Python (asynchronous API)":
- language: python
- code: |
- # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
- # Requires Python SDK version 1.5.0 or later
- import lmstudio as lms
-
- async with lms.Client() as client:
- model = await client.llm.model()
-
- print(await model.get_load_config())
-
-```
diff --git a/1_python/6_model-info/get-load-config.mdx b/1_python/6_model-info/get-load-config.mdx
new file mode 100644
index 0000000..f3e0563
--- /dev/null
+++ b/1_python/6_model-info/get-load-config.mdx
@@ -0,0 +1,46 @@
+---
+title: Get Load Config
+description: Get the load configuration of the model
+---
+
+_Required Python SDK version_: **1.2.0**
+
+LM Studio allows you to configure certain parameters when loading a model
+[through the server UI](/docs/advanced/per-model) or [through the API](/docs/api/sdk/load-model).
+
+You can retrieve the config with which a given model was loaded using the SDK.
+
+In the below examples, the LLM reference can be replaced with an
+embedding model reference without requiring any other changes.
+
+
+Context length is a special case that [has its own method](/docs/api/sdk/get-context-length).
+
+
+```python tab="Python (convenience API)"
+import lmstudio as lms
+
+model = lms.llm()
+
+print(model.get_load_config())
+```
+
+```python tab="Python (scoped resource API)"
+import lmstudio as lms
+
+with lms.Client() as client:
+ model = client.llm.model()
+
+ print(model.get_load_config())
+```
+
+```python tab="Python (asynchronous API)"
+# Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
+# Requires Python SDK version 1.5.0 or later
+import lmstudio as lms
+
+async with lms.Client() as client:
+ model = await client.llm.model()
+
+ print(await model.get_load_config())
+```
diff --git a/1_python/6_model-info/get-model-info.md b/1_python/6_model-info/get-model-info.md
index 200063f..1c1dd3e 100644
--- a/1_python/6_model-info/get-model-info.md
+++ b/1_python/6_model-info/get-model-info.md
@@ -9,39 +9,32 @@ instance of that model.
In the below examples, the LLM reference can be replaced with an
embedding model reference without requiring any other changes.
-```lms_code_snippet
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import lmstudio as lms
+```python tab="Python (convenience API)"
+import lmstudio as lms
- model = lms.llm()
+model = lms.llm()
- print(model.get_info())
-
- "Python (scoped resource API)":
- language: python
- code: |
- import lmstudio as lms
+print(model.get_info())
+```
- with lms.Client() as client:
- model = client.llm.model()
+```python tab="Python (scoped resource API)"
+import lmstudio as lms
- print(model.get_info())
+with lms.Client() as client:
+ model = client.llm.model()
- "Python (asynchronous API)":
- language: python
- code: |
- # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
- # Requires Python SDK version 1.5.0 or later
- import lmstudio as lms
+ print(model.get_info())
+```
- async with lms.AsyncClient() as client:
- model = await client.llm.model()
+```python tab="Python (asynchronous API)"
+# Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
+# Requires Python SDK version 1.5.0 or later
+import lmstudio as lms
- print(await model.get_info())
+async with lms.AsyncClient() as client:
+ model = await client.llm.model()
+ print(await model.get_info())
```
## Example output
diff --git a/1_python/6_model-info/meta.json b/1_python/6_model-info/meta.json
new file mode 100644
index 0000000..be931d1
--- /dev/null
+++ b/1_python/6_model-info/meta.json
@@ -0,0 +1,8 @@
+{
+ "title": "Model Info",
+ "pages": [
+ "get-context-length",
+ "get-load-config",
+ "get-model-info"
+ ]
+}
diff --git a/1_python/_7_api-reference/act.md b/1_python/_7_api-reference/act.md
index 74e4476..1063c93 100644
--- a/1_python/_7_api-reference/act.md
+++ b/1_python/_7_api-reference/act.md
@@ -1,6 +1,6 @@
---
-title: "`.act()`"
-sidebar_title: "`.act()`"
+title: ".act()"
+sidebar_title: ".act()"
description: ".act() - API reference for automatic tool use in a multi-turn chat conversation"
index: 3
---
diff --git a/1_python/_7_api-reference/chat.md b/1_python/_7_api-reference/chat.md
index 33b5f8b..70eea8b 100644
--- a/1_python/_7_api-reference/chat.md
+++ b/1_python/_7_api-reference/chat.md
@@ -1,7 +1,7 @@
---
-title: "`Chat`"
-sidebar_title: "`Chat`"
-description: "`Chat` - API reference for representing a chat conversation with an LLM"
+title: "Chat"
+sidebar_title: "Chat"
+description: "Chat - API reference for representing a chat conversation with an LLM"
index: 5
---
diff --git a/1_python/_7_api-reference/complete.md b/1_python/_7_api-reference/complete.md
index 43d79d1..8c496b2 100644
--- a/1_python/_7_api-reference/complete.md
+++ b/1_python/_7_api-reference/complete.md
@@ -1,6 +1,6 @@
---
-title: "`.complete()`"
-sidebar_title: "`.complete()`"
+title: ".complete()"
+sidebar_title: ".complete()"
description: ".complete() - API reference for generating text completions from a loaded language model"
index: 4
---
diff --git a/1_python/_7_api-reference/count-tokens.md b/1_python/_7_api-reference/count-tokens.md
index 1198d6d..719e7ee 100644
--- a/1_python/_7_api-reference/count-tokens.md
+++ b/1_python/_7_api-reference/count-tokens.md
@@ -1,6 +1,6 @@
---
-title: "`.countTokens()`"
-sidebar_title: "`.countTokens()`"
+title: ".countTokens()"
+sidebar_title: ".countTokens()"
description: ".countTokens() - API reference for counting tokens in a string using a model's tokenizer"
---
diff --git a/1_python/_7_api-reference/embed.md b/1_python/_7_api-reference/embed.md
index 9864258..3f6c93c 100644
--- a/1_python/_7_api-reference/embed.md
+++ b/1_python/_7_api-reference/embed.md
@@ -1,6 +1,6 @@
---
-title: "`.embed()`"
-sidebar_title: "`.embed()`"
+title: ".embed()"
+sidebar_title: ".embed()"
description: ".embed() - API reference for generating embeddings from a loaded embedding model"
---
diff --git a/1_python/_7_api-reference/llm-load-model-config.md b/1_python/_7_api-reference/llm-load-model-config.md
index b82097d..4fe05ac 100644
--- a/1_python/_7_api-reference/llm-load-model-config.md
+++ b/1_python/_7_api-reference/llm-load-model-config.md
@@ -1,5 +1,5 @@
---
-title: "`LLMLoadModelConfig`"
+title: "LLMLoadModelConfig"
---
### Parameters
diff --git a/1_python/_7_api-reference/llm-namespace.md b/1_python/_7_api-reference/llm-namespace.md
index 033677d..65bd34d 100644
--- a/1_python/_7_api-reference/llm-namespace.md
+++ b/1_python/_7_api-reference/llm-namespace.md
@@ -1,7 +1,7 @@
---
-title: "`client.llm`"
-sidebar_title: "`client.llm` namespace"
-description: "`client.llm` - API reference for the llm namespace in an `LMStudioClient` instance"
+title: "client.llm"
+sidebar_title: "client.llm namespace"
+description: "client.llm - API reference for the llm namespace in an LMStudioClient instance"
index: 6
---
diff --git a/1_python/_7_api-reference/llm-prediction-config-input.md b/1_python/_7_api-reference/llm-prediction-config-input.md
index 798f00c..4dff4b6 100644
--- a/1_python/_7_api-reference/llm-prediction-config-input.md
+++ b/1_python/_7_api-reference/llm-prediction-config-input.md
@@ -1,5 +1,5 @@
---
-title: "`LLMPredictionConfigInput`"
+title: "LLMPredictionConfigInput"
---
### Fields
diff --git a/1_python/_7_api-reference/lmstudioclient.md b/1_python/_7_api-reference/lmstudioclient.md
index 4fd0e95..589bd5b 100644
--- a/1_python/_7_api-reference/lmstudioclient.md
+++ b/1_python/_7_api-reference/lmstudioclient.md
@@ -1,7 +1,7 @@
---
-title: "`LMStudioClient`"
-sidebar_title: "`LMStudioClient`"
-description: "LMStudioClient - API reference for the `LMStudioClient` class"
+title: "LMStudioClient"
+sidebar_title: "LMStudioClient"
+description: "LMStudioClient - API reference for the LMStudioClient class"
index: 1
---
diff --git a/1_python/_7_api-reference/meta.json b/1_python/_7_api-reference/meta.json
new file mode 100644
index 0000000..fb651ae
--- /dev/null
+++ b/1_python/_7_api-reference/meta.json
@@ -0,0 +1,18 @@
+{
+ "title": "API Reference",
+ "pages": [
+ "act",
+ "chat",
+ "complete",
+ "count-tokens",
+ "embed",
+ "llm-load-model-config",
+ "llm-namespace",
+ "llm-prediction-config-input",
+ "lmstudioclient",
+ "model",
+ "respond",
+ "system-namespace",
+ "tokenize"
+ ]
+}
diff --git a/1_python/_7_api-reference/model.md b/1_python/_7_api-reference/model.md
index 515ab0c..0d5c59d 100644
--- a/1_python/_7_api-reference/model.md
+++ b/1_python/_7_api-reference/model.md
@@ -1,7 +1,7 @@
---
-title: "`.model()`"
-sidebar_title: "`.model()`"
-description: ".model() - API reference for obtaining a model handle from an `LMStudioClient` instance"
+title: ".model()"
+sidebar_title: ".model()"
+description: ".model() - API reference for obtaining a model handle from an LMStudioClient instance"
index: 2
---
diff --git a/1_python/_7_api-reference/respond.md b/1_python/_7_api-reference/respond.md
index 89876f9..ff73ea8 100644
--- a/1_python/_7_api-reference/respond.md
+++ b/1_python/_7_api-reference/respond.md
@@ -1,6 +1,6 @@
---
-title: "`.respond()`"
-sidebar_title: "`.respond()`"
+title: ".respond()"
+sidebar_title: ".respond()"
description: ".respond() - API reference for generating chat responses from a loaded language model"
index: 2
---
diff --git a/1_python/_7_api-reference/system-namespace.md b/1_python/_7_api-reference/system-namespace.md
index 2b4f4f8..49c1c91 100644
--- a/1_python/_7_api-reference/system-namespace.md
+++ b/1_python/_7_api-reference/system-namespace.md
@@ -1,7 +1,7 @@
---
-title: "`client.system`"
-sidebar_title: "`client.system` namespace"
-description: "`client.system` - API reference for the system namespace in an `LMStudioClient` instance"
+title: "client.system"
+sidebar_title: "client.system namespace"
+description: "client.system - API reference for the system namespace in an LMStudioClient instance"
index: 6
---
diff --git a/1_python/_7_api-reference/tokenize.md b/1_python/_7_api-reference/tokenize.md
index bb31b61..1181a05 100644
--- a/1_python/_7_api-reference/tokenize.md
+++ b/1_python/_7_api-reference/tokenize.md
@@ -1,6 +1,6 @@
---
-title: "`.tokenize()`"
-sidebar_title: "`.tokenize()`"
+title: ".tokenize()"
+sidebar_title: ".tokenize()"
description: ".tokenize() - API reference for converting text input into tokens using a model's tokenizer"
---
diff --git a/1_python/_more/_apply-prompt-template.md b/1_python/_more/_apply-prompt-template.mdx
similarity index 50%
rename from 1_python/_more/_apply-prompt-template.md
rename to 1_python/_more/_apply-prompt-template.mdx
index bf346b6..7e627b2 100644
--- a/1_python/_more/_apply-prompt-template.md
+++ b/1_python/_more/_apply-prompt-template.mdx
@@ -7,49 +7,41 @@ description: Apply a model's prompt template to a conversation
LLMs (Large Language Models) operate on a text-in, text-out basis. Before processing conversations through these models, the input must be converted into a properly formatted string using a prompt template. If you need to inspect or work with this formatted string directly, the LM Studio SDK provides a streamlined way to apply a model's prompt template to your conversations.
-```lms_info
+
You do not need to use this method when using `.respond`. It will automatically apply the prompt template for you.
-```
+
## Usage with a Chat
You can apply a prompt template to a `Chat` by using the `applyPromptTemplate` method. This method takes a `Chat` object as input and returns a formatted string.
-```lms_code_snippet
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import { Chat, LMStudioClient } from "@lmstudio/sdk";
+```python
+import { Chat, LMStudioClient } from "@lmstudio/sdk";
- client = new LMStudioClient()
- model = client.llm.model() # Use any loaded LLM
+client = new LMStudioClient()
+model = client.llm.model() # Use any loaded LLM
- chat = Chat.createEmpty()
- chat.append("system", "You are a helpful assistant.")
- chat.append("user", "What is LM Studio?")
+chat = Chat.createEmpty()
+chat.append("system", "You are a helpful assistant.")
+chat.append("user", "What is LM Studio?")
- formatted = model.applyPromptTemplate(chat)
- print(formatted)
+formatted = model.applyPromptTemplate(chat)
+print(formatted)
```
## Usage with an Array of Messages
The same method can also be used with any object that can be converted to a `Chat`, for example, an array of messages.
-```lms_code_snippet
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import { LMStudioClient } from "@lmstudio/sdk";
-
- client = new LMStudioClient()
- model = client.llm.model() # Use any loaded LLM
-
- formatted = model.applyPromptTemplate([
- { role: "system", content: "You are a helpful assistant." },
- { role: "user", content: "What is LM Studio?" },
- ])
- print(formatted)
+```python
+import { LMStudioClient } from "@lmstudio/sdk";
+
+client = new LMStudioClient()
+model = client.llm.model() # Use any loaded LLM
+
+formatted = model.applyPromptTemplate([
+ { role: "system", content: "You are a helpful assistant." },
+ { role: "user", content: "What is LM Studio?" },
+])
+print(formatted)
```
diff --git a/1_python/_more/meta.json b/1_python/_more/meta.json
new file mode 100644
index 0000000..48efd12
--- /dev/null
+++ b/1_python/_more/meta.json
@@ -0,0 +1,6 @@
+{
+ "title": "More",
+ "pages": [
+ "_apply-prompt-template"
+ ]
+}
diff --git a/1_python/index.md b/1_python/index.md
index 67d695d..3390340 100644
--- a/1_python/index.md
+++ b/1_python/index.md
@@ -1,5 +1,5 @@
---
-title: "`lmstudio-python` (Python SDK)"
+title: "lmstudio-python (Python SDK)"
sidebar_title: "Introduction"
description: "Getting started with LM Studio's Python SDK"
---
@@ -10,12 +10,8 @@ description: "Getting started with LM Studio's Python SDK"
`lmstudio-python` is available as a PyPI package. You can install it using pip.
-```lms_code_snippet
- variants:
- pip:
- language: bash
- code: |
- pip install lmstudio
+```bash
+pip install lmstudio
```
For the source code and open source contribution, visit [lmstudio-python](https://github.com/lmstudio-ai/lmstudio-python) on GitHub.
@@ -29,41 +25,35 @@ For the source code and open source contribution, visit [lmstudio-python](https:
## Quick Example: Chat with a Llama Model
-```lms_code_snippet
- variants:
- "Python (convenience API)":
- language: python
- code: |
- import lmstudio as lms
+```python tab="Python (convenience API)"
+import lmstudio as lms
- model = lms.llm("qwen/qwen3-4b-2507")
- result = model.respond("What is the meaning of life?")
+model = lms.llm("qwen/qwen3-4b-2507")
+result = model.respond("What is the meaning of life?")
- print(result)
+print(result)
+```
- "Python (scoped resource API)":
- language: python
- code: |
- import lmstudio as lms
+```python tab="Python (scoped resource API)"
+import lmstudio as lms
- with lms.Client() as client:
- model = client.llm.model("qwen/qwen3-4b-2507")
- result = model.respond("What is the meaning of life?")
+with lms.Client() as client:
+ model = client.llm.model("qwen/qwen3-4b-2507")
+ result = model.respond("What is the meaning of life?")
- print(result)
+ print(result)
+```
- "Python (asynchronous API)":
- language: python
- code: |
- # Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
- # Requires Python SDK version 1.5.0 or later
- import lmstudio as lms
+```python tab="Python (asynchronous API)"
+# Note: assumes use of an async function or the "python -m asyncio" asynchronous REPL
+# Requires Python SDK version 1.5.0 or later
+import lmstudio as lms
- async with lms.AsyncClient() as client:
- model = await client.llm.model("qwen/qwen3-4b-2507")
- result = await model.respond("What is the meaning of life?")
+async with lms.AsyncClient() as client:
+ model = await client.llm.model("qwen/qwen3-4b-2507")
+ result = await model.respond("What is the meaning of life?")
- print(result)
+ print(result)
```
### Getting Local Models
diff --git a/1_python/meta.json b/1_python/meta.json
new file mode 100644
index 0000000..9e939d0
--- /dev/null
+++ b/1_python/meta.json
@@ -0,0 +1,25 @@
+{
+ "title": "Python SDK",
+ "pages": [
+ "---Introduction---",
+ "index",
+ "---Getting Started---",
+ "...1_getting-started",
+ "---Basics---",
+ "...1_llm-prediction",
+ "---Agentic Flows---",
+ "...2_agent",
+ "---Text Embedding---",
+ "...3_embedding",
+ "---Tokenization---",
+ "...4_tokenization",
+ "---Manage Models---",
+ "...5_manage-models",
+ "---Model Info---",
+ "...6_model-info",
+ "---API Reference---",
+ "..._7_api-reference",
+ "---More---",
+ "..._more"
+ ]
+}
diff --git a/2_typescript/2_llm-prediction/cancelling-predictions.md b/2_typescript/2_llm-prediction/cancelling-predictions.md
deleted file mode 100644
index 6eed470..0000000
--- a/2_typescript/2_llm-prediction/cancelling-predictions.md
+++ /dev/null
@@ -1,57 +0,0 @@
----
-title: Cancelling Predictions
-description: Stop an ongoing prediction in `lmstudio-js`
-index: 4
----
-
-Sometimes you may want to halt a prediction before it finishes. For example, the user might change their mind or your UI may navigate away. `lmstudio-js` provides two simple ways to cancel a running prediction.
-
-## 1. Call `.cancel()` on the prediction
-
-Every prediction method returns an `OngoingPrediction` instance. Calling `.cancel()` stops generation and causes the final `stopReason` to be `"userStopped"`. In the example below we schedule the cancel call on a timer:
-
-```lms_code_snippet
- variants:
- TypeScript:
- language: typescript
- code: |
- import { LMStudioClient } from "@lmstudio/sdk";
-
- const client = new LMStudioClient();
- const model = await client.llm.model("qwen2.5-7b-instruct");
-
- const prediction = model.respond("What is the meaning of life?", {
- maxTokens: 50,
- });
- setTimeout(() => prediction.cancel(), 1000); // cancel after 1 second
-
- const result = await prediction.result();
- console.info(result.stats.stopReason); // "userStopped"
-```
-
-## 2. Use an `AbortController`
-
-If your application already uses an `AbortController` to propagate cancellation, you can pass its `signal` to the prediction method. Aborting the controller stops the prediction with the same `stopReason`:
-
-```lms_code_snippet
- variants:
- TypeScript:
- language: typescript
- code: |
- import { LMStudioClient } from "@lmstudio/sdk";
-
- const client = new LMStudioClient();
- const model = await client.llm.model("qwen2.5-7b-instruct");
-
- const controller = new AbortController();
- const prediction = model.respond("What is the meaning of life?", {
- maxTokens: 50,
- signal: controller.signal,
- });
- setTimeout(() => controller.abort(), 1000); // cancel after 1 second
-
- const result = await prediction.result();
- console.info(result.stats.stopReason); // "userStopped"
-```
-
-Both approaches halt generation immediately, and the returned stats indicate that the prediction ended because you stopped it.
diff --git a/2_typescript/2_llm-prediction/cancelling-predictions.mdx b/2_typescript/2_llm-prediction/cancelling-predictions.mdx
new file mode 100644
index 0000000..0fb1b8b
--- /dev/null
+++ b/2_typescript/2_llm-prediction/cancelling-predictions.mdx
@@ -0,0 +1,57 @@
+---
+title: Cancelling Predictions
+description: Stop an ongoing prediction in lmstudio-js
+index: 4
+---
+
+import { Step, Steps } from "fumadocs-ui/components/steps";
+
+Sometimes you may want to halt a prediction before it finishes. For example, the user might change their mind or your UI may navigate away. `lmstudio-js` provides two simple ways to cancel a running prediction.
+
+
+
+ Call `.cancel()` on the prediction
+
+ Every prediction method returns an `OngoingPrediction` instance. Calling `.cancel()` stops generation and causes the final `stopReason` to be `"userStopped"`. In the example below we schedule the cancel call on a timer:
+
+ ```typescript
+ import { LMStudioClient } from "@lmstudio/sdk";
+
+ const client = new LMStudioClient();
+ const model = await client.llm.model("qwen2.5-7b-instruct");
+
+ const prediction = model.respond("What is the meaning of life?", {
+ maxTokens: 50,
+ });
+ setTimeout(() => prediction.cancel(), 1000); // cancel after 1 second
+
+ const result = await prediction.result();
+ console.info(result.stats.stopReason); // "userStopped"
+ ```
+
+
+
+ Use an `AbortController`
+
+ If your application already uses an `AbortController` to propagate cancellation, you can pass its `signal` to the prediction method. Aborting the controller stops the prediction with the same `stopReason`:
+
+ ```typescript
+ import { LMStudioClient } from "@lmstudio/sdk";
+
+ const client = new LMStudioClient();
+ const model = await client.llm.model("qwen2.5-7b-instruct");
+
+ const controller = new AbortController();
+ const prediction = model.respond("What is the meaning of life?", {
+ maxTokens: 50,
+ signal: controller.signal,
+ });
+ setTimeout(() => controller.abort(), 1000); // cancel after 1 second
+
+ const result = await prediction.result();
+ console.info(result.stats.stopReason); // "userStopped"
+ ```
+
+
+
+Both approaches halt generation immediately, and the returned stats indicate that the prediction ended because you stopped it.
diff --git a/2_typescript/2_llm-prediction/chat-completion.md b/2_typescript/2_llm-prediction/chat-completion.md
index 3e9f039..5fd27a9 100644
--- a/2_typescript/2_llm-prediction/chat-completion.md
+++ b/2_typescript/2_llm-prediction/chat-completion.md
@@ -11,36 +11,26 @@ Use `llm.respond(...)` to generate completions for a chat conversation.
The following snippet shows how to stream the AI's response to quick chat prompt.
-```lms_code_snippet
- title: "index.ts"
- variants:
- TypeScript:
- language: typescript
- code: |
- import { LMStudioClient } from "@lmstudio/sdk";
- const client = new LMStudioClient();
-
- const model = await client.llm.model();
-
- for await (const fragment of model.respond("What is the meaning of life?")) {
- process.stdout.write(fragment.content);
- }
+```typescript title="index.ts"
+import { LMStudioClient } from "@lmstudio/sdk";
+const client = new LMStudioClient();
+
+const model = await client.llm.model();
+
+for await (const fragment of model.respond("What is the meaning of life?")) {
+ process.stdout.write(fragment.content);
+}
```
## Obtain a Model
First, you need to get a model handle. This can be done using the `model` method in the `llm` namespace. For example, here is how to use Qwen2.5 7B Instruct.
-```lms_code_snippet
- title: "index.ts"
- variants:
- TypeScript:
- language: typescript
- code: |
- import { LMStudioClient } from "@lmstudio/sdk";
- const client = new LMStudioClient();
+```typescript title="index.ts"
+import { LMStudioClient } from "@lmstudio/sdk";
+const client = new LMStudioClient();
- const model = await client.llm.model("qwen2.5-7b-instruct");
+const model = await client.llm.model("qwen2.5-7b-instruct");
```
There are other ways to get a model handle. See [Managing Models in Memory](./../manage-models/loading) for more info.
@@ -49,29 +39,25 @@ There are other ways to get a model handle. See [Managing Models in Memory](./..
The input to the model is referred to as the "context". Conceptually, the model receives a multi-turn conversation as input, and it is asked to predict the assistant's response in that conversation.
-```lms_code_snippet
- variants:
- "Using an array of messages":
- language: typescript
- code: |
- import { Chat } from "@lmstudio/sdk";
-
- // Create a chat object from an array of messages.
- const chat = Chat.from([
- { role: "system", content: "You are a resident AI philosopher." },
- { role: "user", content: "What is the meaning of life?" },
- ]);
- "Constructing a Chat object":
- language: typescript
- code: |
- import { Chat } from "@lmstudio/sdk";
-
- // Create an empty chat object.
- const chat = Chat.empty();
-
- // Build the chat context by appending messages.
- chat.append("system", "You are a resident AI philosopher.");
- chat.append("user", "What is the meaning of life?");
+```typescript tab="Using an array of messages"
+import { Chat } from "@lmstudio/sdk";
+
+// Create a chat object from an array of messages.
+const chat = Chat.from([
+ { role: "system", content: "You are a resident AI philosopher." },
+ { role: "user", content: "What is the meaning of life?" },
+]);
+```
+
+```typescript tab="Constructing a Chat object"
+import { Chat } from "@lmstudio/sdk";
+
+// Create an empty chat object.
+const chat = Chat.empty();
+
+// Build the chat context by appending messages.
+chat.append("system", "You are a resident AI philosopher.");
+chat.append("user", "What is the meaning of life?");
```
See [Working with Chats](./working-with-chats) for more information on managing chat context.
@@ -82,50 +68,40 @@ See [Working with Chats](./working-with-chats) for more information on managing
You can ask the LLM to predict the next response in the chat context using the `respond()` method.
-```lms_code_snippet
- variants:
- Streaming:
- language: typescript
- code: |
- // The `chat` object is created in the previous step.
- const prediction = model.respond(chat);
+```typescript tab="Streaming"
+// The `chat` object is created in the previous step.
+const prediction = model.respond(chat);
- for await (const { content } of prediction) {
- process.stdout.write(content);
- }
+for await (const { content } of prediction) {
+ process.stdout.write(content);
+}
- console.info(); // Write a new line to prevent text from being overwritten by your shell.
+console.info(); // Write a new line to prevent text from being overwritten by your shell.
+```
- "Non-streaming":
- language: typescript
- code: |
- // The `chat` object is created in the previous step.
- const result = await model.respond(chat);
+```typescript tab="Non-streaming"
+// The `chat` object is created in the previous step.
+const result = await model.respond(chat);
- console.info(result.content);
+console.info(result.content);
```
## Customize Inferencing Parameters
You can pass in inferencing parameters as the second parameter to `.respond()`.
-```lms_code_snippet
- variants:
- Streaming:
- language: typescript
- code: |
- const prediction = model.respond(chat, {
- temperature: 0.6,
- maxTokens: 50,
- });
-
- "Non-streaming":
- language: typescript
- code: |
- const result = await model.respond(chat, {
- temperature: 0.6,
- maxTokens: 50,
- });
+```typescript tab="Streaming"
+const prediction = model.respond(chat, {
+ temperature: 0.6,
+ maxTokens: 50,
+});
+```
+
+```typescript tab="Non-streaming"
+const result = await model.respond(chat, {
+ temperature: 0.6,
+ maxTokens: 50,
+});
```
See [Configuring the Model](./parameters) for more information on what can be configured.
@@ -135,61 +111,53 @@ See [Configuring the Model](./parameters) for more information on what can be co
You can also print prediction metadata, such as the model used for generation, number of generated
tokens, time to first token, and stop reason.
-```lms_code_snippet
- variants:
- Streaming:
- language: typescript
- code: |
- // If you have already iterated through the prediction fragments,
- // doing this will not result in extra waiting.
- const result = await prediction.result();
-
- console.info("Model used:", result.modelInfo.displayName);
- console.info("Predicted tokens:", result.stats.predictedTokensCount);
- console.info("Time to first token (seconds):", result.stats.timeToFirstTokenSec);
- console.info("Stop reason:", result.stats.stopReason);
- "Non-streaming":
- language: typescript
- code: |
- // `result` is the response from the model.
- console.info("Model used:", result.modelInfo.displayName);
- console.info("Predicted tokens:", result.stats.predictedTokensCount);
- console.info("Time to first token (seconds):", result.stats.timeToFirstTokenSec);
- console.info("Stop reason:", result.stats.stopReason);
+```typescript tab="Streaming"
+// If you have already iterated through the prediction fragments,
+// doing this will not result in extra waiting.
+const result = await prediction.result();
+
+console.info("Model used:", result.modelInfo.displayName);
+console.info("Predicted tokens:", result.stats.predictedTokensCount);
+console.info("Time to first token (seconds):", result.stats.timeToFirstTokenSec);
+console.info("Stop reason:", result.stats.stopReason);
+```
+
+```typescript tab="Non-streaming"
+// `result` is the response from the model.
+console.info("Model used:", result.modelInfo.displayName);
+console.info("Predicted tokens:", result.stats.predictedTokensCount);
+console.info("Time to first token (seconds):", result.stats.timeToFirstTokenSec);
+console.info("Stop reason:", result.stats.stopReason);
```
## Example: Multi-turn Chat
-```lms_code_snippet
- variants:
- TypeScript:
- language: typescript
- code: |
- import { Chat, LMStudioClient } from "@lmstudio/sdk";
- import { createInterface } from "readline/promises";
-
- const rl = createInterface({ input: process.stdin, output: process.stdout });
- const client = new LMStudioClient();
- const model = await client.llm.model();
- const chat = Chat.empty();
-
- while (true) {
- const input = await rl.question("You: ");
- // Append the user input to the chat
- chat.append("user", input);
-
- const prediction = model.respond(chat, {
- // When the model finish the entire message, push it to the chat
- onMessage: (message) => chat.append(message),
- });
- process.stdout.write("Bot: ");
- for await (const { content } of prediction) {
- process.stdout.write(content);
- }
- process.stdout.write("\n");
- }
+```typescript
+import { Chat, LMStudioClient } from "@lmstudio/sdk";
+import { createInterface } from "readline/promises";
+
+const rl = createInterface({ input: process.stdin, output: process.stdout });
+const client = new LMStudioClient();
+const model = await client.llm.model();
+const chat = Chat.empty();
+
+while (true) {
+ const input = await rl.question("You: ");
+ // Append the user input to the chat
+ chat.append("user", input);
+
+ const prediction = model.respond(chat, {
+ // When the model finish the entire message, push it to the chat
+ onMessage: (message) => chat.append(message),
+ });
+ process.stdout.write("Bot: ");
+ for await (const { content } of prediction) {
+ process.stdout.write(content);
+ }
+ process.stdout.write("\n");
+}
```
diff --git a/2_typescript/2_llm-prediction/completion.mdx b/2_typescript/2_llm-prediction/completion.mdx
new file mode 100644
index 0000000..ddbc834
--- /dev/null
+++ b/2_typescript/2_llm-prediction/completion.mdx
@@ -0,0 +1,152 @@
+---
+title: Generate Completions
+description: "Provide a string input for the model to complete"
+index: 6
+---
+
+import { Step, Steps } from "fumadocs-ui/components/steps";
+
+Use `llm.complete(...)` to generate text completions from a loaded language model. Text completions mean sending an non-formatted string to the model with the expectation that the model will complete the text.
+
+This is different from multi-turn chat conversations. For more information on chat completions, see [Chat Completions](./chat-completion).
+
+## Quickstart
+
+
+
+ Instantiate a Model
+
+ First, you need to load a model to generate completions from. This can be done using the `model` method on the `llm` handle.
+
+ ```typescript title="index.ts"
+ import { LMStudioClient } from "@lmstudio/sdk";
+
+ const client = new LMStudioClient();
+ const model = await client.llm.model("qwen2.5-7b-instruct");
+ ```
+
+
+
+ Generate a Completion
+
+ Once you have a loaded model, you can generate completions by passing a string to the `complete` method on the `llm` handle.
+
+ ```typescript tab="Streaming"
+ const completion = model.complete("My name is", {
+ maxTokens: 100,
+ });
+
+ for await (const { content } of completion) {
+ process.stdout.write(content);
+ }
+
+ console.info(); // Write a new line for cosmetic purposes
+ ```
+
+ ```typescript tab="Non-streaming"
+ const completion = await model.complete("My name is", {
+ maxTokens: 100,
+ });
+
+ console.info(completion.content);
+ ```
+
+
+
+ Print Prediction Stats
+
+ You can also print prediction metadata, such as the model used for generation, number of generated tokens, time to first token, and stop reason.
+
+ ```typescript title="index.ts"
+ console.info("Model used:", completion.modelInfo.displayName);
+ console.info("Predicted tokens:", completion.stats.predictedTokensCount);
+ console.info("Time to first token (seconds):", completion.stats.timeToFirstTokenSec);
+ console.info("Stop reason:", completion.stats.stopReason);
+ ```
+
+
+
+## Example: Get an LLM to Simulate a Terminal
+
+Here's an example of how you might use the `complete` method to simulate a terminal.
+
+```typescript title="terminal-sim.ts"
+import { LMStudioClient } from "@lmstudio/sdk";
+import { createInterface } from "node:readline/promises";
+
+const rl = createInterface({ input: process.stdin, output: process.stdout });
+const client = new LMStudioClient();
+const model = await client.llm.model();
+let history = "";
+
+while (true) {
+ const command = await rl.question("$ ");
+ history += "$ " + command + "\n";
+
+ const prediction = model.complete(history, { stopStrings: ["$"] });
+ for await (const { content } of prediction) {
+ process.stdout.write(content);
+ }
+ process.stdout.write("\n");
+
+ const { content } = await prediction.result();
+ history += content;
+}
+```
+
+{/* ## Advanced Usage
+
+### Prediction metadata
+
+Prediction responses are really returned as `PredictionResult` objects that contain additional dot-accessible metadata about the inference request.
+This entails info about the model used, the configuration with which it was loaded, and the configuration for this particular prediction. It also provides
+inference statistics like stop reason, time to first token, tokens per second, and number of generated tokens.
+
+Please consult your specific SDK to see exact syntax.
+
+### Progress callbacks
+
+TODO: TS has onFirstToken callback which Python does not
+
+Long prompts will often take a long time to first token, i.e. it takes the model a long time to process your prompt.
+If you want to get updates on the progress of this process, you can provide a float callback to `complete`
+that receives a float from 0.0-1.0 representing prompt processing progress.
+
+```python tab="Python"
+import lmstudio as lm
+
+llm = lm.llm()
+
+completion = llm.complete(
+ "My name is",
+ on_progress: lambda progress: print(f"{progress*100}% complete")
+)
+```
+
+```python tab="Python (with scoped resources)"
+import lmstudio
+
+with lmstudio.Client() as client:
+ llm = client.llm.model()
+
+ completion = llm.complete(
+ "My name is",
+ on_progress: lambda progress: print(f"{progress*100}% processed")
+ )
+```
+
+```typescript tab="TypeScript"
+import { LMStudioClient } from "@lmstudio/sdk";
+
+const client = new LMStudioClient();
+const llm = await client.llm.model();
+
+const prediction = llm.complete(
+ "My name is",
+ {onPromptProcessingProgress: (progress) => process.stdout.write(`${progress*100}% processed`)});
+```
+
+### Prediction configuration
+
+You can also specify the same prediction configuration options as you could in the
+in-app chat window sidebar. Please consult your specific SDK to see exact syntax. */}
diff --git a/2_typescript/2_llm-prediction/image-input.md b/2_typescript/2_llm-prediction/image-input.md
deleted file mode 100644
index c536c18..0000000
--- a/2_typescript/2_llm-prediction/image-input.md
+++ /dev/null
@@ -1,71 +0,0 @@
----
-title: Image Input
-description: API for passing images as input to the model
-index: 4
----
-
-Some models, known as VLMs (Vision-Language Models), can accept images as input. You can pass images to the model using the `.respond()` method.
-
-### Prerequisite: Get a VLM (Vision-Language Model)
-
-If you don't yet have a VLM, you can download a model like `qwen2-vl-2b-instruct` using the following command:
-
-```bash
-lms get qwen2-vl-2b-instruct
-```
-
-## 1. Instantiate the Model
-
-Connect to LM Studio and obtain a handle to the VLM (Vision-Language Model) you want to use.
-
-```lms_code_snippet
- variants:
- Example:
- language: typescript
- code: |
- import { LMStudioClient } from "@lmstudio/sdk";
- const client = new LMStudioClient();
-
- const model = await client.llm.model("qwen2-vl-2b-instruct");
-```
-
-## 2. Prepare the Image
-
-Use the `client.files.prepareImage()` method to get a handle to the image that can be subsequently passed to the model.
-
-```lms_code_snippet
- variants:
- Example:
- language: typescript
- code: |
- const imagePath = "/path/to/image.jpg"; // Replace with the path to your image
- const image = await client.files.prepareImage(imagePath);
-
-```
-
-If you only have the image in the form of a base64 string, you can use the `client.files.prepareImageBase64()` method instead.
-
-```lms_code_snippet
- variants:
- Example:
- language: typescript
- code: |
- const imageBase64 = "Your base64 string here";
- const image = await client.files.prepareImageBase64(imageBase64);
-```
-
-The LM Studio server supports JPEG, PNG, and WebP image formats.
-
-## 3. Pass the Image to the Model in `.respond()`
-
-Generate a prediction by passing the image to the model in the `.respond()` method.
-
-```lms_code_snippet
- variants:
- Example:
- language: typescript
- code: |
- const prediction = model.respond([
- { role: "user", content: "Describe this image please", images: [image] },
- ]);
-```
diff --git a/2_typescript/2_llm-prediction/image-input.mdx b/2_typescript/2_llm-prediction/image-input.mdx
new file mode 100644
index 0000000..7658685
--- /dev/null
+++ b/2_typescript/2_llm-prediction/image-input.mdx
@@ -0,0 +1,64 @@
+---
+title: Image Input
+description: API for passing images as input to the model
+index: 4
+---
+
+import { Step, Steps } from "fumadocs-ui/components/steps";
+
+Some models, known as VLMs (Vision-Language Models), can accept images as input. You can pass images to the model using the `.respond()` method.
+
+### Prerequisite: Get a VLM (Vision-Language Model)
+
+If you don't yet have a VLM, you can download a model like `qwen2-vl-2b-instruct` using the following command:
+
+```bash
+lms get qwen2-vl-2b-instruct
+```
+
+
+
+ Instantiate the Model
+
+ Connect to LM Studio and obtain a handle to the VLM (Vision-Language Model) you want to use.
+
+ ```typescript
+ import { LMStudioClient } from "@lmstudio/sdk";
+ const client = new LMStudioClient();
+
+ const model = await client.llm.model("qwen2-vl-2b-instruct");
+ ```
+
+
+
+ Prepare the Image
+
+ Use the `client.files.prepareImage()` method to get a handle to the image that can be subsequently passed to the model.
+
+ ```typescript
+ const imagePath = "/path/to/image.jpg"; // Replace with the path to your image
+ const image = await client.files.prepareImage(imagePath);
+ ```
+
+ If you only have the image in the form of a base64 string, you can use the `client.files.prepareImageBase64()` method instead.
+
+ ```typescript
+ const imageBase64 = "Your base64 string here";
+ const image = await client.files.prepareImageBase64(imageBase64);
+ ```
+
+ The LM Studio server supports JPEG, PNG, and WebP image formats.
+
+
+
+ Pass the Image to the Model in `.respond()`
+
+ Generate a prediction by passing the image to the model in the `.respond()` method.
+
+ ```typescript
+ const prediction = model.respond([
+ { role: "user", content: "Describe this image please", images: [image] },
+ ]);
+ ```
+
+
diff --git a/2_typescript/2_llm-prediction/meta.json b/2_typescript/2_llm-prediction/meta.json
new file mode 100644
index 0000000..d56764e
--- /dev/null
+++ b/2_typescript/2_llm-prediction/meta.json
@@ -0,0 +1,14 @@
+{
+ "title": "Basics",
+ "pages": [
+ "cancelling-predictions",
+ "chat-completion",
+ "completion",
+ "image-input",
+ "_index",
+ "parameters",
+ "speculative-decoding",
+ "structured-response",
+ "working-with-chats"
+ ]
+}
diff --git a/2_typescript/2_llm-prediction/parameters.md b/2_typescript/2_llm-prediction/parameters.md
index 6961d43..f1e1b24 100644
--- a/2_typescript/2_llm-prediction/parameters.md
+++ b/2_typescript/2_llm-prediction/parameters.md
@@ -10,23 +10,19 @@ You can customize both inference-time and load-time parameters for your model. I
Set inference-time parameters such as `temperature`, `maxTokens`, `topP` and more.
-```lms_code_snippet
- variants:
- ".respond()":
- language: typescript
- code: |
- const prediction = model.respond(chat, {
- temperature: 0.6,
- maxTokens: 50,
- });
- ".complete()":
- language: typescript
- code: |
- const prediction = model.complete(prompt, {
- temperature: 0.6,
- maxTokens: 50,
- stop: ["\n\n"],
- });
+```typescript tab=".respond()"
+const prediction = model.respond(chat, {
+ temperature: 0.6,
+ maxTokens: 50,
+});
+```
+
+```typescript tab=".complete()"
+const prediction = model.complete(prompt, {
+ temperature: 0.6,
+ maxTokens: 50,
+ stop: ["\n\n"],
+});
```
See [`LLMPredictionConfigInput`](./../api-reference/llm-prediction-config-input) for all configurable fields.
@@ -43,19 +39,15 @@ The `.model()` retrieves a handle to a model that has already been loaded, or lo
**Note**: if the model is already loaded, the configuration will be **ignored**.
-```lms_code_snippet
- variants:
- TypeScript:
- language: typescript
- code: |
- const model = await client.llm.model("qwen2.5-7b-instruct", {
- config: {
- contextLength: 8192,
- gpu: {
- ratio: 0.5,
- },
- },
- });
+```typescript
+const model = await client.llm.model("qwen2.5-7b-instruct", {
+ config: {
+ contextLength: 8192,
+ gpu: {
+ ratio: 0.5,
+ },
+ },
+});
```
See [`LLMLoadModelConfig`](./../api-reference/llm-load-model-config) for all configurable fields.
@@ -64,19 +56,15 @@ See [`LLMLoadModelConfig`](./../api-reference/llm-load-model-config) for all con
The `.load()` method creates a new model instance and loads it with the specified configuration.
-```lms_code_snippet
- variants:
- TypeScript:
- language: typescript
- code: |
- const model = await client.llm.load("qwen2.5-7b-instruct", {
- config: {
- contextLength: 8192,
- gpu: {
- ratio: 0.5,
- },
- },
- });
+```typescript
+const model = await client.llm.load("qwen2.5-7b-instruct", {
+ config: {
+ contextLength: 8192,
+ gpu: {
+ ratio: 0.5,
+ },
+ },
+});
```
See [`LLMLoadModelConfig`](./../api-reference/llm-load-model-config) for all configurable fields.
diff --git a/2_typescript/2_llm-prediction/speculative-decoding.md b/2_typescript/2_llm-prediction/speculative-decoding.md
index b015c26..34c40be 100644
--- a/2_typescript/2_llm-prediction/speculative-decoding.md
+++ b/2_typescript/2_llm-prediction/speculative-decoding.md
@@ -1,6 +1,6 @@
---
title: Speculative Decoding
-description: API to use a draft model in speculative decoding in `lmstudio-js`
+description: API to use a draft model in speculative decoding in lmstudio-js
index: 5
---
@@ -8,48 +8,42 @@ Speculative decoding is a technique that can substantially increase the generati
To use speculative decoding in `lmstudio-js`, simply provide a `draftModel` parameter when performing the prediction. You do not need to load the draft model separately.
-```lms_code_snippet
- variants:
- "Non-streaming":
- language: typescript
- code: |
- import { LMStudioClient } from "@lmstudio/sdk";
+```typescript tab="Non-streaming"
+import { LMStudioClient } from "@lmstudio/sdk";
- const client = new LMStudioClient();
+const client = new LMStudioClient();
- const mainModelKey = "qwen2.5-7b-instruct";
- const draftModelKey = "qwen2.5-0.5b-instruct";
+const mainModelKey = "qwen2.5-7b-instruct";
+const draftModelKey = "qwen2.5-0.5b-instruct";
- const model = await client.llm.model(mainModelKey);
- const result = await model.respond("What are the prime numbers between 0 and 100?", {
- draftModel: draftModelKey,
- });
-
- const { content, stats } = result;
- console.info(content);
- console.info(`Accepted ${stats.acceptedDraftTokensCount}/${stats.predictedTokensCount} tokens`);
+const model = await client.llm.model(mainModelKey);
+const result = await model.respond("What are the prime numbers between 0 and 100?", {
+ draftModel: draftModelKey,
+});
+const { content, stats } = result;
+console.info(content);
+console.info(`Accepted ${stats.acceptedDraftTokensCount}/${stats.predictedTokensCount} tokens`);
+```
- Streaming:
- language: typescript
- code: |
- import { LMStudioClient } from "@lmstudio/sdk";
+```typescript tab="Streaming"
+import { LMStudioClient } from "@lmstudio/sdk";
- const client = new LMStudioClient();
+const client = new LMStudioClient();
- const mainModelKey = "qwen2.5-7b-instruct";
- const draftModelKey = "qwen2.5-0.5b-instruct";
+const mainModelKey = "qwen2.5-7b-instruct";
+const draftModelKey = "qwen2.5-0.5b-instruct";
- const model = await client.llm.model(mainModelKey);
- const prediction = model.respond("What are the prime numbers between 0 and 100?", {
- draftModel: draftModelKey,
- });
+const model = await client.llm.model(mainModelKey);
+const prediction = model.respond("What are the prime numbers between 0 and 100?", {
+ draftModel: draftModelKey,
+});
- for await (const { content } of prediction) {
- process.stdout.write(content);
- }
- process.stdout.write("\n");
+for await (const { content } of prediction) {
+ process.stdout.write(content);
+}
+process.stdout.write("\n");
- const { stats } = await prediction.result();
- console.info(`Accepted ${stats.acceptedDraftTokensCount}/${stats.predictedTokensCount} tokens`);
+const { stats } = await prediction.result();
+console.info(`Accepted ${stats.acceptedDraftTokensCount}/${stats.predictedTokensCount} tokens`);
```
diff --git a/2_typescript/2_llm-prediction/structured-response.md b/2_typescript/2_llm-prediction/structured-response.md
deleted file mode 100644
index 6bce990..0000000
--- a/2_typescript/2_llm-prediction/structured-response.md
+++ /dev/null
@@ -1,171 +0,0 @@
----
-title: Structured Response
-description: Enforce a structured response from the model using Pydantic (Python), Zod (TypeScript), or JSON Schema
-index: 4
----
-
-You can enforce a particular response format from an LLM by providing a schema (JSON or `zod`) to the `.respond()` method. This guarantees that the model's output conforms to the schema you provide.
-
-## Enforce Using a `zod` Schema
-
-If you wish the model to generate JSON that satisfies a given schema, it is recommended to provide
-the schema using [`zod`](https://zod.dev/). When a `zod` schema is provided, the prediction result will contain an extra field `parsed`, which contains parsed, validated, and typed result.
-
-#### Define a `zod` Schema
-
-```ts
-import { z } from "zod";
-
-// A zod schema for a book
-const bookSchema = z.object({
- title: z.string(),
- author: z.string(),
- year: z.number().int(),
-});
-```
-
-#### Generate a Structured Response
-
-```lms_code_snippet
- variants:
- "Non-streaming":
- language: typescript
- code: |
- const result = await model.respond("Tell me about The Hobbit.",
- { structured: bookSchema },
- maxTokens: 100, // Recommended to avoid getting stuck
- );
-
- const book = result.parsed;
- console.info(book);
- // ^
- // Note that `book` is now correctly typed as { title: string, author: string, year: number }
-
- Streaming:
- language: typescript
- code: |
- const prediction = model.respond("Tell me about The Hobbit.",
- { structured: bookSchema },
- maxTokens: 100, // Recommended to avoid getting stuck
- );
-
- for await (const { content } of prediction) {
- process.stdout.write(content);
- }
- process.stdout.write("\n");
-
- // Get the final structured result
- const result = await prediction.result();
- const book = result.parsed;
-
- console.info(book);
- // ^
- // Note that `book` is now correctly typed as { title: string, author: string, year: number }
-```
-
-## Enforce Using a JSON Schema
-
-You can also enforce a structured response using a JSON schema.
-
-#### Define a JSON Schema
-
-```ts
-// A JSON schema for a book
-const schema = {
- type: "object",
- properties: {
- title: { type: "string" },
- author: { type: "string" },
- year: { type: "integer" },
- },
- required: ["title", "author", "year"],
-};
-```
-
-#### Generate a Structured Response
-
-```lms_code_snippet
- variants:
- "Non-streaming":
- language: typescript
- code: |
- const result = await model.respond("Tell me about The Hobbit.", {
- structured: {
- type: "json",
- jsonSchema: schema,
- },
- maxTokens: 100, // Recommended to avoid getting stuck
- });
-
- const book = JSON.parse(result.content);
- console.info(book);
- Streaming:
- language: typescript
- code: |
- const prediction = model.respond("Tell me about The Hobbit.", {
- structured: {
- type: "json",
- jsonSchema: schema,
- },
- maxTokens: 100, // Recommended to avoid getting stuck
- });
-
- for await (const { content } of prediction) {
- process.stdout.write(content);
- }
- process.stdout.write("\n");
-
- const result = await prediction.result();
- const book = JSON.parse(result.content);
-
- console.info("Parsed", book);
-```
-
-```lms_warning
-Structured generation works by constraining the model to only generate tokens that conform to the provided schema. This ensures valid output in normal cases, but comes with two important limitations:
-
-1. Models (especially smaller ones) may occasionally get stuck in an unclosed structure (like an open bracket), when they "forget" they are in such structure and cannot stop due to schema requirements. Thus, it is recommended to always include a `maxTokens` parameter to prevent infinite generation.
-
-2. Schema compliance is only guaranteed for complete, successful generations. If generation is interrupted (by cancellation, reaching the `maxTokens` limit, or other reasons), the output will likely violate the schema. With `zod` schema input, this will raise an error; with JSON schema, you'll receive an invalid string that doesn't satisfy schema.
-```
-
-
diff --git a/2_typescript/2_llm-prediction/structured-response.mdx b/2_typescript/2_llm-prediction/structured-response.mdx
new file mode 100644
index 0000000..fe28a6a
--- /dev/null
+++ b/2_typescript/2_llm-prediction/structured-response.mdx
@@ -0,0 +1,121 @@
+---
+title: Structured Response
+description: Enforce a structured response from the model using Pydantic (Python), Zod (TypeScript), or JSON Schema
+index: 4
+---
+
+You can enforce a particular response format from an LLM by providing a schema (JSON or `zod`) to the `.respond()` method. This guarantees that the model's output conforms to the schema you provide.
+
+## Enforce Using a `zod` Schema
+
+If you wish the model to generate JSON that satisfies a given schema, it is recommended to provide
+the schema using [`zod`](https://zod.dev/). When a `zod` schema is provided, the prediction result will contain an extra field `parsed`, which contains parsed, validated, and typed result.
+
+#### Define a `zod` Schema
+
+```ts
+import { z } from "zod";
+
+// A zod schema for a book
+const bookSchema = z.object({
+ title: z.string(),
+ author: z.string(),
+ year: z.number().int(),
+});
+```
+
+#### Generate a Structured Response
+
+```typescript tab="Non-streaming"
+const result = await model.respond("Tell me about The Hobbit.",
+ { structured: bookSchema },
+ maxTokens: 100, // Recommended to avoid getting stuck
+);
+
+const book = result.parsed;
+console.info(book);
+// ^
+// Note that `book` is now correctly typed as { title: string, author: string, year: number }
+```
+
+```typescript tab="Streaming"
+const prediction = model.respond("Tell me about The Hobbit.",
+ { structured: bookSchema },
+ maxTokens: 100, // Recommended to avoid getting stuck
+);
+
+for await (const { content } of prediction) {
+ process.stdout.write(content);
+}
+process.stdout.write("\n");
+
+// Get the final structured result
+const result = await prediction.result();
+const book = result.parsed;
+
+console.info(book);
+// ^
+// Note that `book` is now correctly typed as { title: string, author: string, year: number }
+```
+
+## Enforce Using a JSON Schema
+
+You can also enforce a structured response using a JSON schema.
+
+#### Define a JSON Schema
+
+```ts
+// A JSON schema for a book
+const schema = {
+ type: "object",
+ properties: {
+ title: { type: "string" },
+ author: { type: "string" },
+ year: { type: "integer" },
+ },
+ required: ["title", "author", "year"],
+};
+```
+
+#### Generate a Structured Response
+
+```typescript tab="Non-streaming"
+const result = await model.respond("Tell me about The Hobbit.", {
+ structured: {
+ type: "json",
+ jsonSchema: schema,
+ },
+ maxTokens: 100, // Recommended to avoid getting stuck
+});
+
+const book = JSON.parse(result.content);
+console.info(book);
+```
+
+```typescript tab="Streaming"
+const prediction = model.respond("Tell me about The Hobbit.", {
+ structured: {
+ type: "json",
+ jsonSchema: schema,
+ },
+ maxTokens: 100, // Recommended to avoid getting stuck
+});
+
+for await (const { content } of prediction) {
+ process.stdout.write(content);
+}
+process.stdout.write("\n");
+
+const result = await prediction.result();
+const book = JSON.parse(result.content);
+
+console.info("Parsed", book);
+```
+
+
+Structured generation works by constraining the model to only generate tokens that conform to the provided schema. This ensures valid output in normal cases, but comes with two important limitations:
+
+1. Models (especially smaller ones) may occasionally get stuck in an unclosed structure (like an open bracket), when they "forget" they are in such structure and cannot stop due to schema requirements. Thus, it is recommended to always include a `maxTokens` parameter to prevent infinite generation.
+
+2. Schema compliance is only guaranteed for complete, successful generations. If generation is interrupted (by cancellation, reaching the `maxTokens` limit, or other reasons), the output will likely violate the schema. With `zod` schema input, this will raise an error; with JSON schema, you'll receive an invalid string that doesn't satisfy schema.
+
diff --git a/2_typescript/2_llm-prediction/working-with-chats.md b/2_typescript/2_llm-prediction/working-with-chats.md
index 028884c..57c7c21 100644
--- a/2_typescript/2_llm-prediction/working-with-chats.md
+++ b/2_typescript/2_llm-prediction/working-with-chats.md
@@ -11,78 +11,62 @@ takes in a chat parameter as an input. There are a few ways to represent a chat
You can use an array of messages to represent a chat. Here is an example with the `.respond()` method.
-```lms_code_snippet
-variants:
- "Text-only":
- language: typescript
- code: |
- const prediction = model.respond([
- { role: "system", content: "You are a resident AI philosopher." },
- { role: "user", content: "What is the meaning of life?" },
- ]);
- With Images:
- language: typescript
- code: |
- const image = await client.files.prepareImage("/path/to/image.jpg");
-
- const prediction = model.respond([
- { role: "system", content: "You are a state-of-art object recognition system." },
- { role: "user", content: "What is this object?", images: [image] },
- ]);
+```typescript tab="Text-only"
+const prediction = model.respond([
+ { role: "system", content: "You are a resident AI philosopher." },
+ { role: "user", content: "What is the meaning of life?" },
+]);
+```
+
+```typescript tab="With Images"
+const image = await client.files.prepareImage("/path/to/image.jpg");
+
+const prediction = model.respond([
+ { role: "system", content: "You are a state-of-art object recognition system." },
+ { role: "user", content: "What is this object?", images: [image] },
+]);
```
## Option 2: Input a Single String
If your chat only has one single user message, you can use a single string to represent the chat. Here is an example with the `.respond` method.
-```lms_code_snippet
-variants:
- TypeScript:
- language: typescript
- code: |
- const prediction = model.respond("What is the meaning of life?");
+```typescript
+const prediction = model.respond("What is the meaning of life?");
```
## Option 3: Using the `Chat` Helper Class
For more complex tasks, it is recommended to use the `Chat` helper classes. It provides various commonly used methods to manage the chat. Here is an example with the `Chat` class.
-```lms_code_snippet
-variants:
- "Text-only":
- language: typescript
- code: |
- const chat = Chat.empty();
- chat.append("system", "You are a resident AI philosopher.");
- chat.append("user", "What is the meaning of life?");
-
- const prediction = model.respond(chat);
- With Images:
- language: typescript
- code: |
- const image = await client.files.prepareImage("/path/to/image.jpg");
-
- const chat = Chat.empty();
- chat.append("system", "You are a state-of-art object recognition system.");
- chat.append("user", "What is this object?", { images: [image] });
-
- const prediction = model.respond(chat);
+```typescript tab="Text-only"
+const chat = Chat.empty();
+chat.append("system", "You are a resident AI philosopher.");
+chat.append("user", "What is the meaning of life?");
+
+const prediction = model.respond(chat);
+```
+
+```typescript tab="With Images"
+const image = await client.files.prepareImage("/path/to/image.jpg");
+
+const chat = Chat.empty();
+chat.append("system", "You are a state-of-art object recognition system.");
+chat.append("user", "What is this object?", { images: [image] });
+
+const prediction = model.respond(chat);
```
You can also quickly construct a `Chat` object using the `Chat.from` method.
-```lms_code_snippet
-variants:
- "Array of messages":
- language: typescript
- code: |
- const chat = Chat.from([
- { role: "system", content: "You are a resident AI philosopher." },
- { role: "user", content: "What is the meaning of life?" },
- ]);
- "Single string":
- language: typescript
- code: |
- // This constructs a chat with a single user message
- const chat = Chat.from("What is the meaning of life?");
+```typescript tab="Array of messages"
+const chat = Chat.from([
+ { role: "system", content: "You are a resident AI philosopher." },
+ { role: "user", content: "What is the meaning of life?" },
+]);
+```
+
+```typescript tab="Single string"
+// This constructs a chat with a single user message
+const chat = Chat.from("What is the meaning of life?");
```
diff --git a/2_typescript/3_agent/act.md b/2_typescript/3_agent/act.md
index f49a9c0..2fa09c9 100644
--- a/2_typescript/3_agent/act.md
+++ b/2_typescript/3_agent/act.md
@@ -1,6 +1,6 @@
---
-title: The `.act()` call
-description: How to use the `.act()` call to turn LLMs into autonomous agents that can perform tasks on your local machine.
+title: The .act() call
+description: How to use the .act() call to turn LLMs into autonomous agents that can perform tasks on your local machine.
index: 1
---
@@ -24,27 +24,23 @@ With this in mind, we say that the `.act()` API is an automatic "multi-round" to
### Quick Example
-```lms_code_snippet
- variants:
- TypeScript:
- language: typescript
- code: |
- import { LMStudioClient, tool } from "@lmstudio/sdk";
- import { z } from "zod";
-
- const client = new LMStudioClient();
-
- const multiplyTool = tool({
- name: "multiply",
- description: "Given two numbers a and b. Returns the product of them.",
- parameters: { a: z.number(), b: z.number() },
- implementation: ({ a, b }) => a * b,
- });
-
- const model = await client.llm.model("qwen2.5-7b-instruct");
- await model.act("What is the result of 12345 multiplied by 54321?", [multiplyTool], {
- onMessage: (message) => console.info(message.toString()),
- });
+```typescript
+import { LMStudioClient, tool } from "@lmstudio/sdk";
+import { z } from "zod";
+
+const client = new LMStudioClient();
+
+const multiplyTool = tool({
+ name: "multiply",
+ description: "Given two numbers a and b. Returns the product of them.",
+ parameters: { a: z.number(), b: z.number() },
+ implementation: ({ a, b }) => a * b,
+});
+
+const model = await client.llm.model("qwen2.5-7b-instruct");
+await model.act("What is the result of 12345 multiplied by 54321?", [multiplyTool], {
+ onMessage: (message) => console.info(message.toString()),
+});
```
> **_NOTE:_** at this time, this code expects zod v3
@@ -70,91 +66,83 @@ Some general guidance when selecting a model:
The following code demonstrates how to provide multiple tools in a single `.act()` call.
-```lms_code_snippet
- variants:
- TypeScript:
- language: typescript
- code: |
- import { LMStudioClient, tool } from "@lmstudio/sdk";
- import { z } from "zod";
-
- const client = new LMStudioClient();
-
- const additionTool = tool({
- name: "add",
- description: "Given two numbers a and b. Returns the sum of them.",
- parameters: { a: z.number(), b: z.number() },
- implementation: ({ a, b }) => a + b,
- });
-
- const isPrimeTool = tool({
- name: "isPrime",
- description: "Given a number n. Returns true if n is a prime number.",
- parameters: { n: z.number() },
- implementation: ({ n }) => {
- if (n < 2) return false;
- const sqrt = Math.sqrt(n);
- for (let i = 2; i <= sqrt; i++) {
- if (n % i === 0) return false;
- }
- return true;
- },
- });
-
- const model = await client.llm.model("qwen2.5-7b-instruct");
- await model.act(
- "Is the result of 12345 + 45668 a prime? Think step by step.",
- [additionTool, isPrimeTool],
- { onMessage: (message) => console.info(message.toString()) },
- );
+```typescript
+import { LMStudioClient, tool } from "@lmstudio/sdk";
+import { z } from "zod";
+
+const client = new LMStudioClient();
+
+const additionTool = tool({
+ name: "add",
+ description: "Given two numbers a and b. Returns the sum of them.",
+ parameters: { a: z.number(), b: z.number() },
+ implementation: ({ a, b }) => a + b,
+});
+
+const isPrimeTool = tool({
+ name: "isPrime",
+ description: "Given a number n. Returns true if n is a prime number.",
+ parameters: { n: z.number() },
+ implementation: ({ n }) => {
+ if (n < 2) return false;
+ const sqrt = Math.sqrt(n);
+ for (let i = 2; i <= sqrt; i++) {
+ if (n % i === 0) return false;
+ }
+ return true;
+ },
+});
+
+const model = await client.llm.model("qwen2.5-7b-instruct");
+await model.act(
+ "Is the result of 12345 + 45668 a prime? Think step by step.",
+ [additionTool, isPrimeTool],
+ { onMessage: (message) => console.info(message.toString()) },
+);
```
### Example: Chat Loop with Create File Tool
The following code creates a conversation loop with an LLM agent that can create files.
-```lms_code_snippet
- variants:
- TypeScript:
- language: typescript
- code: |
- import { Chat, LMStudioClient, tool } from "@lmstudio/sdk";
- import { existsSync } from "fs";
- import { writeFile } from "fs/promises";
- import { createInterface } from "readline/promises";
- import { z } from "zod";
-
- const rl = createInterface({ input: process.stdin, output: process.stdout });
- const client = new LMStudioClient();
- const model = await client.llm.model();
- const chat = Chat.empty();
-
- const createFileTool = tool({
- name: "createFile",
- description: "Create a file with the given name and content.",
- parameters: { name: z.string(), content: z.string() },
- implementation: async ({ name, content }) => {
- if (existsSync(name)) {
- return "Error: File already exists.";
- }
- await writeFile(name, content, "utf-8");
- return "File created.";
- },
- });
-
- while (true) {
- const input = await rl.question("You: ");
- // Append the user input to the chat
- chat.append("user", input);
-
- process.stdout.write("Bot: ");
- await model.act(chat, [createFileTool], {
- // When the model finish the entire message, push it to the chat
- onMessage: (message) => chat.append(message),
- onPredictionFragment: ({ content }) => {
- process.stdout.write(content);
- },
- });
- process.stdout.write("\n");
- }
+```typescript
+import { Chat, LMStudioClient, tool } from "@lmstudio/sdk";
+import { existsSync } from "fs";
+import { writeFile } from "fs/promises";
+import { createInterface } from "readline/promises";
+import { z } from "zod";
+
+const rl = createInterface({ input: process.stdin, output: process.stdout });
+const client = new LMStudioClient();
+const model = await client.llm.model();
+const chat = Chat.empty();
+
+const createFileTool = tool({
+ name: "createFile",
+ description: "Create a file with the given name and content.",
+ parameters: { name: z.string(), content: z.string() },
+ implementation: async ({ name, content }) => {
+ if (existsSync(name)) {
+ return "Error: File already exists.";
+ }
+ await writeFile(name, content, "utf-8");
+ return "File created.";
+ },
+});
+
+while (true) {
+ const input = await rl.question("You: ");
+ // Append the user input to the chat
+ chat.append("user", input);
+
+ process.stdout.write("Bot: ");
+ await model.act(chat, [createFileTool], {
+ // When the model finish the entire message, push it to the chat
+ onMessage: (message) => chat.append(message),
+ onPredictionFragment: ({ content }) => {
+ process.stdout.write(content);
+ },
+ });
+ process.stdout.write("\n");
+}
```
diff --git a/2_typescript/3_agent/meta.json b/2_typescript/3_agent/meta.json
new file mode 100644
index 0000000..64a4fe7
--- /dev/null
+++ b/2_typescript/3_agent/meta.json
@@ -0,0 +1,8 @@
+{
+ "title": "Agentic Flows",
+ "pages": [
+ "act",
+ "_index",
+ "tools"
+ ]
+}
diff --git a/2_typescript/3_agent/tools.md b/2_typescript/3_agent/tools.md
index 560bbcf..f334cf8 100644
--- a/2_typescript/3_agent/tools.md
+++ b/2_typescript/3_agent/tools.md
@@ -1,6 +1,6 @@
---
title: Tool Definition
-description: Define tools with the `tool()` function and pass them to the model in the `act()` call.
+description: Define tools with the tool() function and pass them to the model in the act() call.
index: 2
---
@@ -10,28 +10,23 @@ You can define tools with the `tool()` function and pass them to the model in th
Follow this standard format to define functions as tools:
-```lms_code_snippet
- title: "index.ts"
- variants:
- TypeScript:
- language: typescript
- code: |
- import { tool } from "@lmstudio/sdk";
- import { z } from "zod";
+```typescript title="index.ts"
+import { tool } from "@lmstudio/sdk";
+import { z } from "zod";
- const exampleTool = tool({
- // The name of the tool
- name: "add",
+const exampleTool = tool({
+ // The name of the tool
+ name: "add",
- // A description of the tool
- description: "Given two numbers a and b. Returns the sum of them.",
+ // A description of the tool
+ description: "Given two numbers a and b. Returns the sum of them.",
- // zod schema of the parameters
- parameters: { a: z.number(), b: z.number() },
+ // zod schema of the parameters
+ parameters: { a: z.number(), b: z.number() },
- // The implementation of the tool. Just a regular function.
- implementation: ({ a, b }) => a + b,
- });
+ // The implementation of the tool. Just a regular function.
+ implementation: ({ a, b }) => a + b,
+});
```
**Important**: The tool name, description, and the parameter definitions are all passed to the model!
@@ -47,47 +42,37 @@ can essentially turn your LLMs into autonomous agents that can perform tasks on
### Tool Definition
-```lms_code_snippet
- title: "createFileTool.ts"
- variants:
- TypeScript:
- language: typescript
- code: |
- import { tool } from "@lmstudio/sdk";
- import { existsSync } from "fs";
- import { writeFile } from "fs/promises";
- import { z } from "zod";
-
- const createFileTool = tool({
- name: "createFile",
- description: "Create a file with the given name and content.",
- parameters: { name: z.string(), content: z.string() },
- implementation: async ({ name, content }) => {
- if (existsSync(name)) {
- return "Error: File already exists.";
- }
- await writeFile(name, content, "utf-8");
- return "File created.";
- },
- });
+```typescript title="createFileTool.ts"
+import { tool } from "@lmstudio/sdk";
+import { existsSync } from "fs";
+import { writeFile } from "fs/promises";
+import { z } from "zod";
+
+const createFileTool = tool({
+ name: "createFile",
+ description: "Create a file with the given name and content.",
+ parameters: { name: z.string(), content: z.string() },
+ implementation: async ({ name, content }) => {
+ if (existsSync(name)) {
+ return "Error: File already exists.";
+ }
+ await writeFile(name, content, "utf-8");
+ return "File created.";
+ },
+});
```
### Example code using the `createFile` tool:
-```lms_code_snippet
- title: "index.ts"
- variants:
- TypeScript:
- language: typescript
- code: |
- import { LMStudioClient } from "@lmstudio/sdk";
- import { createFileTool } from "./createFileTool";
-
- const client = new LMStudioClient();
-
- const model = await client.llm.model("qwen2.5-7b-instruct");
- await model.act(
- "Please create a file named output.txt with your understanding of the meaning of life.",
- [createFileTool],
- );
+```typescript title="index.ts"
+import { LMStudioClient } from "@lmstudio/sdk";
+import { createFileTool } from "./createFileTool";
+
+const client = new LMStudioClient();
+
+const model = await client.llm.model("qwen2.5-7b-instruct");
+await model.act(
+ "Please create a file named output.txt with your understanding of the meaning of life.",
+ [createFileTool],
+);
```
diff --git a/2_typescript/3_plugins/1_tools-provider/custom-configuration.md b/2_typescript/3_plugins/1_tools-provider/custom-configuration.md
deleted file mode 100644
index fd6f804..0000000
--- a/2_typescript/3_plugins/1_tools-provider/custom-configuration.md
+++ /dev/null
@@ -1,81 +0,0 @@
----
-title: "Custom Configuration"
-description: "Add custom configuration options to your tools provider"
-index: 5
----
-
-You can add custom configuration options to your tools provider, so the user of plugin can customize the behavior without modifying the code.
-
-In the example below, we will ask the user to specify a folder name, and we will create files inside that folder within the working directory.
-
-First, add the config field to `config.ts`:
-
-```lms_code_snippet
- title: "src/config.ts"
- variants:
- TypeScript:
- language: typescript
- code: |
- export const configSchematics = createConfigSchematics()
- .field(
- "folderName", // Key of the configuration field
- "string", // Type of the configuration field
- {
- displayName: "Folder Name",
- subtitle: "The name of the folder where files will be created.",
- },
- "default_folder", // Default value
- )
- .build();
-```
-
-```lms_info
-In this example, we added the field to `configSchematics`, which is the "per-chat" configuration. If you want to add a global configuration field that is shared across different chats, you should add it under the section `globalConfigSchematics` in the same file.
-
-Learn more about configurations in [Custom Configurations](../plugins/configurations).
-```
-
-Then, modify the tools provider to use the configuration value:
-
-```lms_code_snippet
- title: "src/toolsProvider.ts"
- variants:
- TypeScript:
- language: typescript
- code: |
- import { tool, Tool, ToolsProviderController } from "@lmstudio/sdk";
- import { existsSync } from "fs";
- import { mkdir, writeFile } from "fs/promises";
- import { join } from "path";
- import { z } from "zod";
- import { configSchematics } from "./config";
-
- export async function toolsProvider(ctl: ToolsProviderController) {
- const tools: Tool[] = [];
-
- const createFileTool = tool({
- name: `create_file`,
- description: "Create a file with the given name and content.",
- parameters: { file_name: z.string(), content: z.string() },
- implementation: async ({ file_name, content }) => {
- // Read the config field
- const folderName = ctl.getPluginConfig(configSchematics).get("folderName");
- const folderPath = join(ctl.getWorkingDirectory(), folderName);
-
- // Ensure the folder exists
- await mkdir(folderPath, { recursive: true });
-
- // Create the file
- const filePath = join(folderPath, file_name);
- if (existsSync(filePath)) {
- return "Error: File already exists.";
- }
- await writeFile(filePath, content, "utf-8");
- return "File created.";
- },
- });
- tools.push(createFileTool); // First tool
-
- return tools; // Return the tools array
- }
-```
diff --git a/2_typescript/3_plugins/1_tools-provider/custom-configuration.mdx b/2_typescript/3_plugins/1_tools-provider/custom-configuration.mdx
new file mode 100644
index 0000000..f1a76e7
--- /dev/null
+++ b/2_typescript/3_plugins/1_tools-provider/custom-configuration.mdx
@@ -0,0 +1,71 @@
+---
+title: "Custom Configuration"
+description: "Add custom configuration options to your tools provider"
+index: 5
+---
+
+You can add custom configuration options to your tools provider, so the user of plugin can customize the behavior without modifying the code.
+
+In the example below, we will ask the user to specify a folder name, and we will create files inside that folder within the working directory.
+
+First, add the config field to `config.ts`:
+
+```typescript title="src/config.ts"
+export const configSchematics = createConfigSchematics()
+ .field(
+ "folderName", // Key of the configuration field
+ "string", // Type of the configuration field
+ {
+ displayName: "Folder Name",
+ subtitle: "The name of the folder where files will be created.",
+ },
+ "default_folder", // Default value
+ )
+ .build();
+```
+
+
+In this example, we added the field to `configSchematics`, which is the "per-chat" configuration. If you want to add a global configuration field that is shared across different chats, you should add it under the section `globalConfigSchematics` in the same file.
+
+Learn more about configurations in [Custom Configurations](../plugins/configurations).
+
+
+Then, modify the tools provider to use the configuration value:
+
+```typescript title="src/toolsProvider.ts"
+import { tool, Tool, ToolsProviderController } from "@lmstudio/sdk";
+import { existsSync } from "fs";
+import { mkdir, writeFile } from "fs/promises";
+import { join } from "path";
+import { z } from "zod";
+import { configSchematics } from "./config";
+
+export async function toolsProvider(ctl: ToolsProviderController) {
+ const tools: Tool[] = [];
+
+ const createFileTool = tool({
+ name: `create_file`,
+ description: "Create a file with the given name and content.",
+ parameters: { file_name: z.string(), content: z.string() },
+ implementation: async ({ file_name, content }) => {
+ // Read the config field
+ const folderName = ctl.getPluginConfig(configSchematics).get("folderName");
+ const folderPath = join(ctl.getWorkingDirectory(), folderName);
+
+ // Ensure the folder exists
+ await mkdir(folderPath, { recursive: true });
+
+ // Create the file
+ const filePath = join(folderPath, file_name);
+ if (existsSync(filePath)) {
+ return "Error: File already exists.";
+ }
+ await writeFile(filePath, content, "utf-8");
+ return "File created.";
+ },
+ });
+ tools.push(createFileTool); // First tool
+
+ return tools; // Return the tools array
+}
+```
diff --git a/2_typescript/3_plugins/1_tools-provider/handling-aborts.md b/2_typescript/3_plugins/1_tools-provider/handling-aborts.md
index b54f0d5..be00a51 100644
--- a/2_typescript/3_plugins/1_tools-provider/handling-aborts.md
+++ b/2_typescript/3_plugins/1_tools-provider/handling-aborts.md
@@ -6,42 +6,37 @@ index: 7
A prediction may be aborted by the user while your tool is still running. In such cases, you should handle the abort gracefully by handling the `AbortSignal` object passed as the second parameter to the tool's implementation function.
-```lms_code_snippet
- title: "src/toolsProvider.ts"
- variants:
- TypeScript:
- language: typescript
- code: |
- import { tool, Tool, ToolsProviderController } from "@lmstudio/sdk";
- import { z } from "zod";
+```typescript title="src/toolsProvider.ts"
+import { tool, Tool, ToolsProviderController } from "@lmstudio/sdk";
+import { z } from "zod";
- export async function toolsProvider(ctl: ToolsProviderController) {
- const tools: Tool[] = [];
+export async function toolsProvider(ctl: ToolsProviderController) {
+ const tools: Tool[] = [];
- const fetchTool = tool({
- name: `fetch`,
- description: "Fetch a URL using GET method.",
- parameters: { url: z.string() },
- implementation: async ({ url }, { signal }) => {
- const response = await fetch(url, {
- method: "GET",
- signal, // <-- Here, we pass the signal to fetch to allow cancellation
- });
- if (!response.ok) {
- return `Error: Failed to fetch ${url}: ${response.statusText}`;
- }
- const data = await response.text();
- return {
- status: response.status,
- headers: Object.fromEntries(response.headers.entries()),
- data: data.substring(0, 1000), // Limit to 1000 characters
- };
- },
- });
- tools.push(fetchTool);
+ const fetchTool = tool({
+ name: `fetch`,
+ description: "Fetch a URL using GET method.",
+ parameters: { url: z.string() },
+ implementation: async ({ url }, { signal }) => {
+ const response = await fetch(url, {
+ method: "GET",
+ signal, // <-- Here, we pass the signal to fetch to allow cancellation
+ });
+ if (!response.ok) {
+ return `Error: Failed to fetch ${url}: ${response.statusText}`;
+ }
+ const data = await response.text();
+ return {
+ status: response.status,
+ headers: Object.fromEntries(response.headers.entries()),
+ data: data.substring(0, 1000), // Limit to 1000 characters
+ };
+ },
+ });
+ tools.push(fetchTool);
- return tools;
- }
+ return tools;
+}
```
You can learn more about `AbortSignal` in the [MDN documentation](https://developer.mozilla.org/en-US/docs/Web/API/AbortSignal).
diff --git a/2_typescript/3_plugins/1_tools-provider/meta.json b/2_typescript/3_plugins/1_tools-provider/meta.json
new file mode 100644
index 0000000..6eb3fbf
--- /dev/null
+++ b/2_typescript/3_plugins/1_tools-provider/meta.json
@@ -0,0 +1,10 @@
+{
+ "title": "Tools Provider",
+ "pages": [
+ "custom-configuration",
+ "handling-aborts",
+ "multiple-tools",
+ "single-tool",
+ "status-reports-and-warnings"
+ ]
+}
diff --git a/2_typescript/3_plugins/1_tools-provider/multiple-tools.md b/2_typescript/3_plugins/1_tools-provider/multiple-tools.md
index 7b53404..0e9c8b2 100644
--- a/2_typescript/3_plugins/1_tools-provider/multiple-tools.md
+++ b/2_typescript/3_plugins/1_tools-provider/multiple-tools.md
@@ -8,51 +8,46 @@ A tools provider can define multiple tools for the model to use. Simply create a
In the example below, we add a second tool to read the content of a file:
-```lms_code_snippet
- title: "src/toolsProvider.ts"
- variants:
- TypeScript:
- language: typescript
- code: |
- import { tool, Tool, ToolsProviderController } from "@lmstudio/sdk";
- import { z } from "zod";
- import { existsSync } from "fs";
- import { readFile, writeFile } from "fs/promises";
- import { join } from "path";
+```typescript title="src/toolsProvider.ts"
+import { tool, Tool, ToolsProviderController } from "@lmstudio/sdk";
+import { z } from "zod";
+import { existsSync } from "fs";
+import { readFile, writeFile } from "fs/promises";
+import { join } from "path";
- export async function toolsProvider(ctl: ToolsProviderController) {
- const tools: Tool[] = [];
+export async function toolsProvider(ctl: ToolsProviderController) {
+ const tools: Tool[] = [];
- const createFileTool = tool({
- name: `create_file`,
- description: "Create a file with the given name and content.",
- parameters: { file_name: z.string(), content: z.string() },
- implementation: async ({ file_name, content }) => {
- const filePath = join(ctl.getWorkingDirectory(), file_name);
- if (existsSync(filePath)) {
- return "Error: File already exists.";
- }
- await writeFile(filePath, content, "utf-8");
- return "File created.";
- },
- });
- tools.push(createFileTool); // First tool
+ const createFileTool = tool({
+ name: `create_file`,
+ description: "Create a file with the given name and content.",
+ parameters: { file_name: z.string(), content: z.string() },
+ implementation: async ({ file_name, content }) => {
+ const filePath = join(ctl.getWorkingDirectory(), file_name);
+ if (existsSync(filePath)) {
+ return "Error: File already exists.";
+ }
+ await writeFile(filePath, content, "utf-8");
+ return "File created.";
+ },
+ });
+ tools.push(createFileTool); // First tool
- const readFileTool = tool({
- name: `read_file`,
- description: "Read the content of a file with the given name.",
- parameters: { file_name: z.string() },
- implementation: async ({ file_name }) => {
- const filePath = join(ctl.getWorkingDirectory(), file_name);
- if (!existsSync(filePath)) {
- return "Error: File does not exist.";
- }
- const content = await readFile(filePath, "utf-8");
- return content;
- },
- });
- tools.push(readFileTool); // Second tool
+ const readFileTool = tool({
+ name: `read_file`,
+ description: "Read the content of a file with the given name.",
+ parameters: { file_name: z.string() },
+ implementation: async ({ file_name }) => {
+ const filePath = join(ctl.getWorkingDirectory(), file_name);
+ if (!existsSync(filePath)) {
+ return "Error: File does not exist.";
+ }
+ const content = await readFile(filePath, "utf-8");
+ return content;
+ },
+ });
+ tools.push(readFileTool); // Second tool
- return tools; // Return the tools array
- }
+ return tools; // Return the tools array
+}
```
diff --git a/2_typescript/3_plugins/1_tools-provider/single-tool.md b/2_typescript/3_plugins/1_tools-provider/single-tool.md
index 736b8e4..a6c5ba3 100644
--- a/2_typescript/3_plugins/1_tools-provider/single-tool.md
+++ b/2_typescript/3_plugins/1_tools-provider/single-tool.md
@@ -6,63 +6,53 @@ index: 3
To setup a tools provider, first create the a file `toolsProvider.ts` in your plugin's `src` directory:
-```lms_code_snippet
- title: "src/toolsProvider.ts"
- variants:
- TypeScript:
- language: typescript
- code: |
- import { tool, Tool, ToolsProviderController } from "@lmstudio/sdk";
- import { z } from "zod";
- import { existsSync } from "fs";
- import { writeFile } from "fs/promises";
- import { join } from "path";
-
- export async function toolsProvider(ctl: ToolsProviderController) {
- const tools: Tool[] = [];
-
- const createFileTool = tool({
- // Name of the tool, this will be passed to the model. Aim for concise, descriptive names
- name: `create_file`,
- // Your description here, more details will help the model to understand when to use the tool
- description: "Create a file with the given name and content.",
- parameters: { file_name: z.string(), content: z.string() },
- implementation: async ({ file_name, content }) => {
- const filePath = join(ctl.getWorkingDirectory(), file_name);
- if (existsSync(filePath)) {
- return "Error: File already exists.";
- }
- await writeFile(filePath, content, "utf-8");
- return "File created.";
- },
- });
- tools.push(createFileTool);
-
- return tools;
- }
+```typescript title="src/toolsProvider.ts"
+import { tool, Tool, ToolsProviderController } from "@lmstudio/sdk";
+import { z } from "zod";
+import { existsSync } from "fs";
+import { writeFile } from "fs/promises";
+import { join } from "path";
+
+export async function toolsProvider(ctl: ToolsProviderController) {
+ const tools: Tool[] = [];
+
+ const createFileTool = tool({
+ // Name of the tool, this will be passed to the model. Aim for concise, descriptive names
+ name: `create_file`,
+ // Your description here, more details will help the model to understand when to use the tool
+ description: "Create a file with the given name and content.",
+ parameters: { file_name: z.string(), content: z.string() },
+ implementation: async ({ file_name, content }) => {
+ const filePath = join(ctl.getWorkingDirectory(), file_name);
+ if (existsSync(filePath)) {
+ return "Error: File already exists.";
+ }
+ await writeFile(filePath, content, "utf-8");
+ return "File created.";
+ },
+ });
+ tools.push(createFileTool);
+
+ return tools;
+}
```
The above tools provider defines a single tool called `create_file` that allows the model to create a file with a specified name and content inside the working directory. You can learn more about defining tools in [Tool Definition](../agent/tools).
Then register the tools provider in your plugin's `index.ts`:
-```lms_code_snippet
- title: "src/index.ts"
- variants:
- TypeScript:
- language: typescript
- code: |
- // ... other imports ...
- import { toolsProvider } from "./toolsProvider";
+```typescript title="src/index.ts"
+// ... other imports ...
+import { toolsProvider } from "./toolsProvider";
- export async function main(context: PluginContext) {
- // ... other plugin setup code ...
+export async function main(context: PluginContext) {
+ // ... other plugin setup code ...
- // Register the tools provider.
- context.withToolsProvider(toolsProvider); // <-- Register the tools provider
+ // Register the tools provider.
+ context.withToolsProvider(toolsProvider); // <-- Register the tools provider
- // ... other plugin setup code ...
- }
+ // ... other plugin setup code ...
+}
```
Now, you can try to ask the LLM to create a file, and it should be able to do so using the tool you just created.
diff --git a/2_typescript/3_plugins/1_tools-provider/status-reports-and-warnings.md b/2_typescript/3_plugins/1_tools-provider/status-reports-and-warnings.md
index 333676b..75270fb 100644
--- a/2_typescript/3_plugins/1_tools-provider/status-reports-and-warnings.md
+++ b/2_typescript/3_plugins/1_tools-provider/status-reports-and-warnings.md
@@ -10,36 +10,31 @@ You can use `status` and `warn` methods on the second parameter of the tool's im
The following example shows how to implement a tool that waits for a specified number of seconds, providing status updates and warnings if the wait time exceeds 10 seconds:
-```lms_code_snippet
- title: "src/toolsProvider.ts"
- variants:
- TypeScript:
- language: typescript
- code: |
- import { tool, Tool, ToolsProviderController } from "@lmstudio/sdk";
- import { z } from "zod";
+```typescript title="src/toolsProvider.ts"
+import { tool, Tool, ToolsProviderController } from "@lmstudio/sdk";
+import { z } from "zod";
- export async function toolsProvider(ctl: ToolsProviderController) {
- const tools: Tool[] = [];
+export async function toolsProvider(ctl: ToolsProviderController) {
+ const tools: Tool[] = [];
- const waitTool = tool({
- name: `wait`,
- description: "Wait for a specified number of seconds.",
- parameters: { seconds: z.number().min(1) },
- implementation: async ({ seconds }, { status, warn }) => {
- if (seconds > 10) {
- warn("The model asks to wait for more than 10 seconds.");
- }
- for (let i = 0; i < seconds; i++) {
- status(`Waiting... ${i + 1}/${seconds} seconds`);
- await new Promise((resolve) => setTimeout(resolve, 1000));
- }
- },
- });
- tools.push(waitTool);
+ const waitTool = tool({
+ name: `wait`,
+ description: "Wait for a specified number of seconds.",
+ parameters: { seconds: z.number().min(1) },
+ implementation: async ({ seconds }, { status, warn }) => {
+ if (seconds > 10) {
+ warn("The model asks to wait for more than 10 seconds.");
+ }
+ for (let i = 0; i < seconds; i++) {
+ status(`Waiting... ${i + 1}/${seconds} seconds`);
+ await new Promise((resolve) => setTimeout(resolve, 1000));
+ }
+ },
+ });
+ tools.push(waitTool);
- return tools; // Return the tools array
- }
+ return tools; // Return the tools array
+}
```
Note status updates and warnings are only visible to the user. If you want the model to also see those messages, you should return them as part of the tool's return value.
@@ -48,42 +43,37 @@ Note status updates and warnings are only visible to the user. If you want the m
A prediction may be aborted by the user while your tool is still running. In such cases, you should handle the abort gracefully by handling the `AbortSignal` object passed as the second parameter to the tool's implementation function.
-```lms_code_snippet
- title: "src/toolsProvider.ts"
- variants:
- TypeScript:
- language: typescript
- code: |
- import { tool, Tool, ToolsProviderController } from "@lmstudio/sdk";
- import { z } from "zod";
+```typescript title="src/toolsProvider.ts"
+import { tool, Tool, ToolsProviderController } from "@lmstudio/sdk";
+import { z } from "zod";
- export async function toolsProvider(ctl: ToolsProviderController) {
- const tools: Tool[] = [];
+export async function toolsProvider(ctl: ToolsProviderController) {
+ const tools: Tool[] = [];
- const fetchTool = tool({
- name: `fetch`,
- description: "Fetch a URL using GET method.",
- parameters: { url: z.string() },
- implementation: async ({ url }, { signal }) => {
- const response = await fetch(url, {
- method: "GET",
- signal, // <-- Here, we pass the signal to fetch to allow cancellation
- });
- if (!response.ok) {
- return `Error: Failed to fetch ${url}: ${response.statusText}`;
- }
- const data = await response.text();
- return {
- status: response.status,
- headers: Object.fromEntries(response.headers.entries()),
- data: data.substring(0, 1000), // Limit to 1000 characters
- };
- },
- });
- tools.push(fetchTool);
+ const fetchTool = tool({
+ name: `fetch`,
+ description: "Fetch a URL using GET method.",
+ parameters: { url: z.string() },
+ implementation: async ({ url }, { signal }) => {
+ const response = await fetch(url, {
+ method: "GET",
+ signal, // <-- Here, we pass the signal to fetch to allow cancellation
+ });
+ if (!response.ok) {
+ return `Error: Failed to fetch ${url}: ${response.statusText}`;
+ }
+ const data = await response.text();
+ return {
+ status: response.status,
+ headers: Object.fromEntries(response.headers.entries()),
+ data: data.substring(0, 1000), // Limit to 1000 characters
+ };
+ },
+ });
+ tools.push(fetchTool);
- return tools;
- }
+ return tools;
+}
```
You can learn more about `AbortSignal` in the [MDN documentation](https://developer.mozilla.org/en-US/docs/Web/API/AbortSignal).
diff --git a/2_typescript/3_plugins/2_prompt-preprocessor/custom-configuration.md b/2_typescript/3_plugins/2_prompt-preprocessor/custom-configuration.md
deleted file mode 100644
index ef63d04..0000000
--- a/2_typescript/3_plugins/2_prompt-preprocessor/custom-configuration.md
+++ /dev/null
@@ -1,69 +0,0 @@
----
-title: "Custom Configuration"
-+description: "Access custom configuration options in your prompt preprocessor"
-index: 3
----
-
-You can access custom configurations via `ctl.getPluginConfig` and `ctl.getGlobalPluginConfig`. See [Custom Configurations](./configurations) for more details.
-
-The following is an example of how you can make the `specialInstructions` and `triggerWord` configurable:
-
-First, add the config field to `config.ts`:
-
-```lms_code_snippet
- title: "src/config.ts"
- variants:
- TypeScript:
- language: typescript
- code: |
- import { createConfigSchematics } from "@lmstudio/sdk";
- export const configSchematics = createConfigSchematics()
- .field(
- "specialInstructions",
- "string",
- {
- displayName: "Special Instructions",
- subtitle: "Special instructions to be injected when the trigger word is found.",
- },
- "Here is some default special instructions.",
- )
- .field(
- "triggerWord",
- "string",
- {
- displayName: "Trigger Word",
- subtitle: "The word that will trigger the special instructions.",
- },
- "@init",
- )
- .build();
-```
-
-```lms_info
-In this example, we added the field to `configSchematics`, which is the "per-chat" configuration. If you want to add a global configuration field that is shared across different chats, you should add it under the section `globalConfigSchematics` in the same file.
-
-Learn more about configurations in [Custom Configurations](../plugins/configurations).
-```
-
-Then, modify the prompt preprocessor to use the configuration:
-
-```lms_code_snippet
- title: "src/promptPreprocessor.ts"
- variants:
- TypeScript:
- language: typescript
- code: |
- import { type PromptPreprocessorController, type ChatMessage } from "@lmstudio/sdk";
- import { configSchematics } from "./config";
-
- export async function preprocess(ctl: PromptPreprocessorController, userMessage: ChatMessage) {
- const textContent = userMessage.getText();
- const pluginConfig = ctl.getPluginConfig(configSchematics);
-
- const triggerWord = pluginConfig.get("triggerWord");
- const specialInstructions = pluginConfig.get("specialInstructions");
-
- const transformed = textContent.replaceAll(triggerWord, specialInstructions);
- return transformed;
- }
-```
diff --git a/2_typescript/3_plugins/2_prompt-preprocessor/custom-configuration.mdx b/2_typescript/3_plugins/2_prompt-preprocessor/custom-configuration.mdx
new file mode 100644
index 0000000..212a2da
--- /dev/null
+++ b/2_typescript/3_plugins/2_prompt-preprocessor/custom-configuration.mdx
@@ -0,0 +1,59 @@
+---
+title: "Custom Configuration"
++description: "Access custom configuration options in your prompt preprocessor"
+index: 3
+---
+
+You can access custom configurations via `ctl.getPluginConfig` and `ctl.getGlobalPluginConfig`. See [Custom Configurations](./configurations) for more details.
+
+The following is an example of how you can make the `specialInstructions` and `triggerWord` configurable:
+
+First, add the config field to `config.ts`:
+
+```typescript title="src/config.ts"
+import { createConfigSchematics } from "@lmstudio/sdk";
+export const configSchematics = createConfigSchematics()
+ .field(
+ "specialInstructions",
+ "string",
+ {
+ displayName: "Special Instructions",
+ subtitle: "Special instructions to be injected when the trigger word is found.",
+ },
+ "Here is some default special instructions.",
+ )
+ .field(
+ "triggerWord",
+ "string",
+ {
+ displayName: "Trigger Word",
+ subtitle: "The word that will trigger the special instructions.",
+ },
+ "@init",
+ )
+ .build();
+```
+
+
+In this example, we added the field to `configSchematics`, which is the "per-chat" configuration. If you want to add a global configuration field that is shared across different chats, you should add it under the section `globalConfigSchematics` in the same file.
+
+Learn more about configurations in [Custom Configurations](../plugins/configurations).
+
+
+Then, modify the prompt preprocessor to use the configuration:
+
+```typescript title="src/promptPreprocessor.ts"
+import { type PromptPreprocessorController, type ChatMessage } from "@lmstudio/sdk";
+import { configSchematics } from "./config";
+
+export async function preprocess(ctl: PromptPreprocessorController, userMessage: ChatMessage) {
+ const textContent = userMessage.getText();
+ const pluginConfig = ctl.getPluginConfig(configSchematics);
+
+ const triggerWord = pluginConfig.get("triggerWord");
+ const specialInstructions = pluginConfig.get("specialInstructions");
+
+ const transformed = textContent.replaceAll(triggerWord, specialInstructions);
+ return transformed;
+}
+```
diff --git a/2_typescript/3_plugins/2_prompt-preprocessor/custom-status-report.md b/2_typescript/3_plugins/2_prompt-preprocessor/custom-status-report.md
index 022bdd9..b6e9ca0 100644
--- a/2_typescript/3_plugins/2_prompt-preprocessor/custom-status-report.md
+++ b/2_typescript/3_plugins/2_prompt-preprocessor/custom-status-report.md
@@ -6,42 +6,27 @@ index: 4
Depending on the task, the prompt preprocessor may take some time to complete, for example, it may need to fetch some data from the internet or perform some heavy computation. In such cases, you can report the status of the preprocessing using `ctl.setStatus`.
-```lms_code_snippet
- title: "src/promptPreprocessor.ts"
- variants:
- TypeScript:
- language: typescript
- code: |
- const status = ctl.createStatus({
- status: "loading",
- text: "Preprocessing.",
- });
+```typescript title="src/promptPreprocessor.ts"
+const status = ctl.createStatus({
+ status: "loading",
+ text: "Preprocessing.",
+});
```
You can update the status at any time by calling `status.setState`.
-```lms_code_snippet
- title: "src/promptPreprocessor.ts"
- variants:
- TypeScript:
- language: typescript
- code: |
- status.setState({
- status: "done",
- text: "Preprocessing done.",
- })
+```typescript title="src/promptPreprocessor.ts"
+status.setState({
+ status: "done",
+ text: "Preprocessing done.",
+})
```
You can even add sub status to the status:
-```lms_code_snippet
- title: "src/promptPreprocessor.ts"
- variants:
- TypeScript:
- language: typescript
- code: |
- const subStatus = status.addSubStatus({
- status: "loading",
- text: "I am a sub status."
- });
+```typescript title="src/promptPreprocessor.ts"
+const subStatus = status.addSubStatus({
+ status: "loading",
+ text: "I am a sub status."
+});
```
diff --git a/2_typescript/3_plugins/2_prompt-preprocessor/examples.md b/2_typescript/3_plugins/2_prompt-preprocessor/examples.md
index af5c304..e8162fa 100644
--- a/2_typescript/3_plugins/2_prompt-preprocessor/examples.md
+++ b/2_typescript/3_plugins/2_prompt-preprocessor/examples.md
@@ -8,39 +8,29 @@ index: 2
The following is an example preprocessor that injects the current time before each user message.
-```lms_code_snippet
- title: "src/promptPreprocessor.ts"
- variants:
- TypeScript:
- language: typescript
- code: |
- import { type PromptPreprocessorController, type ChatMessage } from "@lmstudio/sdk";
- export async function preprocess(ctl: PromptPreprocessorController, userMessage: ChatMessage) {
- const textContent = userMessage.getText();
- const transformed = `Current time: ${new Date().toString()}\n\n${textContent}`;
- return transformed;
- }
+```typescript title="src/promptPreprocessor.ts"
+import { type PromptPreprocessorController, type ChatMessage } from "@lmstudio/sdk";
+export async function preprocess(ctl: PromptPreprocessorController, userMessage: ChatMessage) {
+ const textContent = userMessage.getText();
+ const transformed = `Current time: ${new Date().toString()}\n\n${textContent}`;
+ return transformed;
+}
```
### Example: Replace Trigger Words
Another example you can do it with simple text only processing is by replacing certain trigger words. For example, you can replace a `@init` trigger with a special initialization message.
-```lms_code_snippet
- title: "src/promptPreprocessor.ts"
- variants:
- TypeScript:
- language: typescript
- code: |
- import { type PromptPreprocessorController, type ChatMessage, text } from "@lmstudio/sdk";
-
- const mySpecialInstructions = text`
- Here are some special instructions...
- `;
-
- export async function preprocess(ctl: PromptPreprocessorController, userMessage: ChatMessage) {
- const textContent = userMessage.getText();
- const transformed = textContent.replaceAll("@init", mySpecialInstructions);
- return transformed;
- }
+```typescript title="src/promptPreprocessor.ts"
+import { type PromptPreprocessorController, type ChatMessage, text } from "@lmstudio/sdk";
+
+const mySpecialInstructions = text`
+ Here are some special instructions...
+`;
+
+export async function preprocess(ctl: PromptPreprocessorController, userMessage: ChatMessage) {
+ const textContent = userMessage.getText();
+ const transformed = textContent.replaceAll("@init", mySpecialInstructions);
+ return transformed;
+}
```
diff --git a/2_typescript/3_plugins/2_prompt-preprocessor/meta.json b/2_typescript/3_plugins/2_prompt-preprocessor/meta.json
new file mode 100644
index 0000000..4ec5301
--- /dev/null
+++ b/2_typescript/3_plugins/2_prompt-preprocessor/meta.json
@@ -0,0 +1,9 @@
+{
+ "title": "Prompt Preprocessor",
+ "pages": [
+ "custom-configuration",
+ "custom-status-report",
+ "examples",
+ "handling-aborts"
+ ]
+}
diff --git a/2_typescript/3_plugins/3_generator/meta.json b/2_typescript/3_plugins/3_generator/meta.json
new file mode 100644
index 0000000..904b360
--- /dev/null
+++ b/2_typescript/3_plugins/3_generator/meta.json
@@ -0,0 +1,7 @@
+{
+ "title": "Generators",
+ "pages": [
+ "text-only-generators",
+ "tool-calling-generators"
+ ]
+}
diff --git a/2_typescript/3_plugins/3_generator/text-only-generators.md b/2_typescript/3_plugins/3_generator/text-only-generators.md
index 1450b25..5927598 100644
--- a/2_typescript/3_plugins/3_generator/text-only-generators.md
+++ b/2_typescript/3_plugins/3_generator/text-only-generators.md
@@ -8,25 +8,20 @@ Generators take in the the generator controller and the current conversation sta
The following is an example of a simple generator that echos back the last user message with 200 ms delay between each word:
-```lms_code_snippet
- title: "src/toolsProvider.ts"
- variants:
- TypeScript:
- language: typescript
- code: |
- import { Chat, GeneratorController } from "@lmstudio/sdk";
-
- export async function generate(ctl: GeneratorController, chat: Chat) {
- // Just echo back the last message
- const lastMessage = chat.at(-1).getText();
- // Split the last message into words
- const words = lastMessage.split(/(?= )/);
- for (const word of words) {
- ctl.fragmentGenerated(word); // Send each word as a fragment
- ctl.abortSignal.throwIfAborted(); // Allow for cancellation
- await new Promise((resolve) => setTimeout(resolve, 200)); // Simulate some processing time
- }
- }
+```typescript title="src/toolsProvider.ts"
+import { Chat, GeneratorController } from "@lmstudio/sdk";
+
+export async function generate(ctl: GeneratorController, chat: Chat) {
+ // Just echo back the last message
+ const lastMessage = chat.at(-1).getText();
+ // Split the last message into words
+ const words = lastMessage.split(/(?= )/);
+ for (const word of words) {
+ ctl.fragmentGenerated(word); // Send each word as a fragment
+ ctl.abortSignal.throwIfAborted(); // Allow for cancellation
+ await new Promise((resolve) => setTimeout(resolve, 200)); // Simulate some processing time
+ }
+}
```
## Custom Configurations
diff --git a/2_typescript/3_plugins/4_custom-configuration/accessing-config.md b/2_typescript/3_plugins/4_custom-configuration/accessing-config.md
index 382560b..cde4736 100644
--- a/2_typescript/3_plugins/4_custom-configuration/accessing-config.md
+++ b/2_typescript/3_plugins/4_custom-configuration/accessing-config.md
@@ -8,26 +8,21 @@ You can access the configuration using the method `ctl.getPluginConfig(configSch
For example, here is how to access the config within the promptPreprocessor:
-```lms_code_snippet
- title: "src/promptPreprocessor.ts"
- variants:
- TypeScript:
- language: typescript
- code: |
- import { type PreprocessorController, type ChatMessage } from "@lmstudio/sdk";
- import { configSchematics } from "./config";
+```typescript title="src/promptPreprocessor.ts"
+import { type PreprocessorController, type ChatMessage } from "@lmstudio/sdk";
+import { configSchematics } from "./config";
- export async function preprocess(ctl: PreprocessorController, userMessage: ChatMessage) {
- const pluginConfig = ctl.getPluginConfig(configSchematics);
- const myCustomField = pluginConfig.get("myCustomField");
+export async function preprocess(ctl: PreprocessorController, userMessage: ChatMessage) {
+ const pluginConfig = ctl.getPluginConfig(configSchematics);
+ const myCustomField = pluginConfig.get("myCustomField");
- const globalPluginConfig = ctl.getGlobalPluginConfig(configSchematics);
- const globalMyCustomField = globalPluginConfig.get("myCustomField");
+ const globalPluginConfig = ctl.getGlobalPluginConfig(configSchematics);
+ const globalMyCustomField = globalPluginConfig.get("myCustomField");
- return (
- `${userMessage.getText()},` +
- `myCustomField: ${myCustomField}, ` +
- `globalMyCustomField: ${globalMyCustomField}`
- );
- }
+ return (
+ `${userMessage.getText()},` +
+ `myCustomField: ${myCustomField}, ` +
+ `globalMyCustomField: ${globalMyCustomField}`
+ );
+}
```
diff --git a/2_typescript/3_plugins/4_custom-configuration/config-ts.md b/2_typescript/3_plugins/4_custom-configuration/config-ts.md
index e1d2551..a0cf762 100644
--- a/2_typescript/3_plugins/4_custom-configuration/config-ts.md
+++ b/2_typescript/3_plugins/4_custom-configuration/config-ts.md
@@ -1,73 +1,63 @@
---
-title: "`config.ts` File"
+title: "config.ts File"
+description: "Define custom configuration options for your LM Studio plugin in config.ts"
index: 2
---
By default, the plugin scaffold will create a `config.ts` file in the `src/` directory which will contain the schematics of the configurations. If the files does not exist, you can create it manually:
-```lms_code_snippet
- title: "src/toolsProvider.ts"
- variants:
- TypeScript:
- language: typescript
- code: |
- import { createConfigSchematics } from "@lmstudio/sdk";
-
- export const configSchematics = createConfigSchematics()
- .field(
- "myCustomField", // The key of the field.
- "numeric", // Type of the field.
- // Options for the field. Different field types will have different options.
- {
- displayName: "My Custom Field",
- hint: "This is my custom field. Doesn't do anything special.",
- slider: { min: 0, max: 100, step: 1 }, // Add a slider to the field.
- },
- 80, // Default Value
- )
- // You can add more fields by chaining the field method.
- // For example:
- // .field("anotherField", ...)
- .build();
-
- export const globalConfigSchematics = createConfigSchematics()
- .field(
- "myGlobalCustomField", // The key of the field.
- "string",
- {
- displayName: "My Global Custom Field",
- hint: "This is my global custom field. Doesn't do anything special.",
- },
- "default value", // Default Value
- )
- // You can add more fields by chaining the field method.
- // For example:
- // .field("anotherGlobalField", ...)
- .build();
+```typescript title="src/toolsProvider.ts"
+import { createConfigSchematics } from "@lmstudio/sdk";
+
+export const configSchematics = createConfigSchematics()
+ .field(
+ "myCustomField", // The key of the field.
+ "numeric", // Type of the field.
+ // Options for the field. Different field types will have different options.
+ {
+ displayName: "My Custom Field",
+ hint: "This is my custom field. Doesn't do anything special.",
+ slider: { min: 0, max: 100, step: 1 }, // Add a slider to the field.
+ },
+ 80, // Default Value
+ )
+ // You can add more fields by chaining the field method.
+ // For example:
+ // .field("anotherField", ...)
+ .build();
+
+export const globalConfigSchematics = createConfigSchematics()
+ .field(
+ "myGlobalCustomField", // The key of the field.
+ "string",
+ {
+ displayName: "My Global Custom Field",
+ hint: "This is my global custom field. Doesn't do anything special.",
+ },
+ "default value", // Default Value
+ )
+ // You can add more fields by chaining the field method.
+ // For example:
+ // .field("anotherGlobalField", ...)
+ .build();
```
If you've added your config schematics manual, you will also need to register the configurations in your plugin's `index.ts` file.
This is done by calling `context.withConfigSchematics(configSchematics)` and `context.withGlobalConfigSchematics(globalConfigSchematics)` in the `main` function of your plugin.
-```lms_code_snippet
- title: "src/index.ts"
- variants:
- TypeScript:
- language: typescript
- code: |
- // ... other imports ...
- import { toolsProvider } from "./toolsProvider";
+```typescript title="src/index.ts"
+// ... other imports ...
+import { toolsProvider } from "./toolsProvider";
- export async function main(context: PluginContext) {
- // ... other plugin setup code ...
+export async function main(context: PluginContext) {
+ // ... other plugin setup code ...
- // Register the configuration schematics.
- context.withConfigSchematics(configSchematics);
- // Register the global configuration schematics.
- context.withGlobalConfigSchematics(globalConfigSchematics);
+ // Register the configuration schematics.
+ context.withConfigSchematics(configSchematics);
+ // Register the global configuration schematics.
+ context.withGlobalConfigSchematics(globalConfigSchematics);
- // ... other plugin setup code ...
- }
+ // ... other plugin setup code ...
+}
```
diff --git a/2_typescript/3_plugins/4_custom-configuration/defining-new-fields.md b/2_typescript/3_plugins/4_custom-configuration/defining-new-fields.md
index 4dc0d2d..db65722 100644
--- a/2_typescript/3_plugins/4_custom-configuration/defining-new-fields.md
+++ b/2_typescript/3_plugins/4_custom-configuration/defining-new-fields.md
@@ -8,124 +8,104 @@ We support the following field types:
- `string`: A text input field.
- ```lms_code_snippet
- variants:
- TypeScript:
- language: typescript
- code: |
- // ... other fields ...
- .field(
- "stringField", // The key of the field.
- "string", // Type of the field.
- {
- displayName: "A string field",
- subtitle: "Subtitle", // Optional subtitle for the field. (Show below the field)
- hint: "Hint", // Optional hint for the field. (Show on hover)
- isParagraph: false, // Whether to show a large text input area for this field.
- isProtected: false, // Whether the value should be obscured in the UI (e.g., for passwords).
- placeholder: "Placeholder text", // Optional placeholder text for the field.
- },
- "default value", // Default Value
- )
- // ... other fields ...
+ ```typescript
+ // ... other fields ...
+ .field(
+ "stringField", // The key of the field.
+ "string", // Type of the field.
+ {
+ displayName: "A string field",
+ subtitle: "Subtitle", // Optional subtitle for the field. (Show below the field)
+ hint: "Hint", // Optional hint for the field. (Show on hover)
+ isParagraph: false, // Whether to show a large text input area for this field.
+ isProtected: false, // Whether the value should be obscured in the UI (e.g., for passwords).
+ placeholder: "Placeholder text", // Optional placeholder text for the field.
+ },
+ "default value", // Default Value
+ )
+ // ... other fields ...
```
- `numeric`: A number input field with optional validation and slider UI.
- ```lms_code_snippet
- variants:
- TypeScript:
- language: typescript
- code: |
- // ... other fields ...
- .field(
- "numberField", // The key of the field.
- "numeric", // Type of the field.
- {
- displayName: "A number field",
- subtitle: "Subtitle for", // Optional subtitle for the field. (Show below the field)
- hint: "Hint for number field", // Optional hint for the field. (Show on hover)
- int: false, // Whether the field should accept only integer values.
- min: 0, // Minimum value for the field.
- max: 100, // Maximum value for the field.
- slider: {
- // If present, configurations for the slider UI
- min: 0, // Minimum value for the slider.
- max: 100, // Maximum value for the slider.
- step: 1, // Step value for the slider.
- },
- },
- 42, // Default Value
- )
- // ... other fields ...
+ ```typescript
+ // ... other fields ...
+ .field(
+ "numberField", // The key of the field.
+ "numeric", // Type of the field.
+ {
+ displayName: "A number field",
+ subtitle: "Subtitle for", // Optional subtitle for the field. (Show below the field)
+ hint: "Hint for number field", // Optional hint for the field. (Show on hover)
+ int: false, // Whether the field should accept only integer values.
+ min: 0, // Minimum value for the field.
+ max: 100, // Maximum value for the field.
+ slider: {
+ // If present, configurations for the slider UI
+ min: 0, // Minimum value for the slider.
+ max: 100, // Maximum value for the slider.
+ step: 1, // Step value for the slider.
+ },
+ },
+ 42, // Default Value
+ )
+ // ... other fields ...
```
- `boolean`: A checkbox or toggle input field.
- ```lms_code_snippet
- variants:
- TypeScript:
- language: typescript
- code: |
- // ... other fields ...
- .field(
- "booleanField", // The key of the field.
- "boolean", // Type of the field.
- {
- displayName: "A boolean field",
- subtitle: "Subtitle", // Optional subtitle for the field. (Show below the field)
- hint: "Hint", // Optional hint for the field. (Show on hover)
- },
- true, // Default Value
- )
- // ... other fields ...
+ ```typescript
+ // ... other fields ...
+ .field(
+ "booleanField", // The key of the field.
+ "boolean", // Type of the field.
+ {
+ displayName: "A boolean field",
+ subtitle: "Subtitle", // Optional subtitle for the field. (Show below the field)
+ hint: "Hint", // Optional hint for the field. (Show on hover)
+ },
+ true, // Default Value
+ )
+ // ... other fields ...
```
- `stringArray`: An array of string values with configurable constraints.
- ```lms_code_snippet
- variants:
- TypeScript:
- language: typescript
- code: |
- // ... other fields ...
- .field(
- "stringArrayField",
- "stringArray",
- {
- displayName: "A string array field",
- subtitle: "Subtitle", // Optional subtitle for the field. (Show below the field)
- hint: "Hint", // Optional hint for the field. (Show on hover)
- allowEmptyStrings: true, // Whether to allow empty strings in the array.
- maxNumItems: 5, // Maximum number of items in the array.
- },
- ["default", "values"], // Default Value
- )
- // ... other fields ...
+ ```typescript
+ // ... other fields ...
+ .field(
+ "stringArrayField",
+ "stringArray",
+ {
+ displayName: "A string array field",
+ subtitle: "Subtitle", // Optional subtitle for the field. (Show below the field)
+ hint: "Hint", // Optional hint for the field. (Show on hover)
+ allowEmptyStrings: true, // Whether to allow empty strings in the array.
+ maxNumItems: 5, // Maximum number of items in the array.
+ },
+ ["default", "values"], // Default Value
+ )
+ // ... other fields ...
```
- `select`: A dropdown selection field with predefined options.
- ```lms_code_snippet
- variants:
- TypeScript:
- language: typescript
- code: |
- // ... other fields ...
- .field(
- "selectField",
- "select",
- {
- displayName: "A select field",
- options: [
- { value: "option1", displayName: "Option 1" },
- { value: "option2", displayName: "Option 2" },
- { value: "option3", displayName: "Option 3" },
- ],
- subtitle: "Subtitle", // Optional subtitle for the field. (Show below the field)
- hint: "Hint", // Optional hint for the field. (Show on hover)
- },
- "option1", // Default Value
- )
- // ... other fields ...
+ ```typescript
+ // ... other fields ...
+ .field(
+ "selectField",
+ "select",
+ {
+ displayName: "A select field",
+ options: [
+ { value: "option1", displayName: "Option 1" },
+ { value: "option2", displayName: "Option 2" },
+ { value: "option3", displayName: "Option 3" },
+ ],
+ subtitle: "Subtitle", // Optional subtitle for the field. (Show below the field)
+ hint: "Hint", // Optional hint for the field. (Show on hover)
+ },
+ "option1", // Default Value
+ )
+ // ... other fields ...
```
diff --git a/2_typescript/3_plugins/4_custom-configuration/meta.json b/2_typescript/3_plugins/4_custom-configuration/meta.json
new file mode 100644
index 0000000..0970a6d
--- /dev/null
+++ b/2_typescript/3_plugins/4_custom-configuration/meta.json
@@ -0,0 +1,8 @@
+{
+ "title": "Custom Configuration",
+ "pages": [
+ "accessing-config",
+ "config-ts",
+ "defining-new-fields"
+ ]
+}
diff --git a/2_typescript/3_plugins/dependencies.md b/2_typescript/3_plugins/dependencies.md
index 3247009..507e08a 100644
--- a/2_typescript/3_plugins/dependencies.md
+++ b/2_typescript/3_plugins/dependencies.md
@@ -1,5 +1,5 @@
---
-title: "Using `npm` Dependencies"
+title: "Using npm Dependencies"
description: "How to use npm packages in LM Studio plugins"
index: 6
---
diff --git a/2_typescript/3_plugins/meta.json b/2_typescript/3_plugins/meta.json
new file mode 100644
index 0000000..d09cde0
--- /dev/null
+++ b/2_typescript/3_plugins/meta.json
@@ -0,0 +1,11 @@
+{
+ "title": "Plugins",
+ "pages": [
+ "1_tools-provider",
+ "2_prompt-preprocessor",
+ "3_generator",
+ "4_custom-configuration",
+ "5_publish-plugins",
+ "dependencies"
+ ]
+}
diff --git a/2_typescript/4_embedding/index.md b/2_typescript/4_embedding/index.md
index d292c8d..b72e84f 100644
--- a/2_typescript/4_embedding/index.md
+++ b/2_typescript/4_embedding/index.md
@@ -18,16 +18,11 @@ lms get nomic-ai/nomic-embed-text-v1.5
To convert a string to a vector representation, pass it to the `embed` method on the corresponding embedding model handle.
-```lms_code_snippet
- title: "index.ts"
- variants:
- TypeScript:
- language: typescript
- code: |
- import { LMStudioClient } from "@lmstudio/sdk";
- const client = new LMStudioClient();
-
- const model = await client.embedding.model("nomic-embed-text-v1.5");
-
- const { embedding } = await model.embed("Hello, world!");
+```typescript title="index.ts"
+import { LMStudioClient } from "@lmstudio/sdk";
+const client = new LMStudioClient();
+
+const model = await client.embedding.model("nomic-embed-text-v1.5");
+
+const { embedding } = await model.embed("Hello, world!");
```
diff --git a/2_typescript/5_tokenization/index.md b/2_typescript/5_tokenization/index.md
index b03aba2..5dfcabb 100644
--- a/2_typescript/5_tokenization/index.md
+++ b/2_typescript/5_tokenization/index.md
@@ -10,32 +10,24 @@ Models use a tokenizer to internally convert text into "tokens" they can deal wi
You can tokenize a string with a loaded LLM or embedding model using the SDK. In the below examples, `llm` can be replaced with an embedding model `emb`.
-```lms_code_snippet
- variants:
- TypeScript:
- language: typescript
- code: |
- import { LMStudioClient } from "@lmstudio/sdk";
+```typescript
+import { LMStudioClient } from "@lmstudio/sdk";
- const client = new LMStudioClient();
- const model = await client.llm.model();
+const client = new LMStudioClient();
+const model = await client.llm.model();
- const tokens = await model.tokenize("Hello, world!");
+const tokens = await model.tokenize("Hello, world!");
- console.info(tokens); // Array of token IDs.
+console.info(tokens); // Array of token IDs.
```
## Count tokens
If you only care about the number of tokens, you can use the `.countTokens` method instead.
-```lms_code_snippet
- variants:
- TypeScript:
- language: typescript
- code: |
- const tokenCount = await model.countTokens("Hello, world!");
- console.info("Token count:", tokenCount);
+```typescript
+const tokenCount = await model.countTokens("Hello, world!");
+console.info("Token count:", tokenCount);
```
### Example: Count Context
@@ -46,33 +38,29 @@ You can determine if a given conversation fits into a model's context by doing t
2. Count the number of tokens in the string.
3. Compare the token count to the model's context length.
-```lms_code_snippet
- variants:
- TypeScript:
- language: typescript
- code: |
- import { Chat, type LLM, LMStudioClient } from "@lmstudio/sdk";
-
- async function doesChatFitInContext(model: LLM, chat: Chat) {
- // Convert the conversation to a string using the prompt template.
- const formatted = await model.applyPromptTemplate(chat);
- // Count the number of tokens in the string.
- const tokenCount = await model.countTokens(formatted);
- // Get the current loaded context length of the model
- const contextLength = await model.getContextLength();
- return tokenCount < contextLength;
- }
-
- const client = new LMStudioClient();
- const model = await client.llm.model();
-
- const chat = Chat.from([
- { role: "user", content: "What is the meaning of life." },
- { role: "assistant", content: "The meaning of life is..." },
- // ... More messages
- ]);
-
- console.info("Fits in context:", await doesChatFitInContext(model, chat));
+```typescript
+import { Chat, type LLM, LMStudioClient } from "@lmstudio/sdk";
+
+async function doesChatFitInContext(model: LLM, chat: Chat) {
+ // Convert the conversation to a string using the prompt template.
+ const formatted = await model.applyPromptTemplate(chat);
+ // Count the number of tokens in the string.
+ const tokenCount = await model.countTokens(formatted);
+ // Get the current loaded context length of the model
+ const contextLength = await model.getContextLength();
+ return tokenCount < contextLength;
+}
+
+const client = new LMStudioClient();
+const model = await client.llm.model();
+
+const chat = Chat.from([
+ { role: "user", content: "What is the meaning of life." },
+ { role: "assistant", content: "The meaning of life is..." },
+ // ... More messages
+]);
+
+console.info("Fits in context:", await doesChatFitInContext(model, chat));
```
diff --git a/2_typescript/6_manage-models/_download-models.md b/2_typescript/6_manage-models/_download-models.md
index 07e6ebe..1c1f494 100644
--- a/2_typescript/6_manage-models/_download-models.md
+++ b/2_typescript/6_manage-models/_download-models.md
@@ -17,35 +17,31 @@ Downloading models consists of three steps:
2. Find the download option you want (e.g. quantization); and
3. Download the model!
-```lms_code_snippet
- variants:
- TypeScript:
- language: typescript
- code: |
- import { LMStudioClient } from "@lmstudio/sdk";
-
- const client = new LMStudioClient();
-
- // 1. Search for the model you want
- // Specify any/all of searchTerm, limit, compatibilityTypes
- const searchResults = await client.repository.searchModels({
- searchTerm: "llama 3.2 1b", // Search for Llama 3.2 1B
- limit: 5, // Get top 5 results
- compatibilityTypes: ["gguf"], // Only download GGUFs
- });
-
- // 2. Find download options
- const bestResult = searchResults[0];
- const downloadOptions = await bestResult.getDownloadOptions();
-
- // Let's download Q4_K_M, a good middle ground quantization
- const desiredModel = downloadOptions.find(option => option.quantization === 'Q4_K_M');
-
- // 3. Download it!
- const modelKey = await desiredModel.download();
-
- // This returns a path you can use to load the model
- const loadedModel = await client.llm.model(modelKey);
+```typescript
+import { LMStudioClient } from "@lmstudio/sdk";
+
+const client = new LMStudioClient();
+
+// 1. Search for the model you want
+// Specify any/all of searchTerm, limit, compatibilityTypes
+const searchResults = await client.repository.searchModels({
+ searchTerm: "llama 3.2 1b", // Search for Llama 3.2 1B
+ limit: 5, // Get top 5 results
+ compatibilityTypes: ["gguf"], // Only download GGUFs
+});
+
+// 2. Find download options
+const bestResult = searchResults[0];
+const downloadOptions = await bestResult.getDownloadOptions();
+
+// Let's download Q4_K_M, a good middle ground quantization
+const desiredModel = downloadOptions.find(option => option.quantization === 'Q4_K_M');
+
+// 3. Download it!
+const modelKey = await desiredModel.download();
+
+// This returns a path you can use to load the model
+const loadedModel = await client.llm.model(modelKey);
```
## Advanced Usage
@@ -59,43 +55,38 @@ If you want to get updates on the progress of this process, you can provide call
one for progress updates and/or one when the download is being finalized
(validating checksums, etc.)
-```lms_code_snippet
- variants:
- Python (with scoped resources):
- language: python
- code: |
- import lmstudio
+```python tab="Python (with scoped resources)"
+import lmstudio
- def print_progress_update(update: lmstudio.DownloadProgressUpdate) -> None:
- print(f"Downloaded {update.downloaded_bytes} bytes of {update.total_bytes} total \
- at {update.speed_bytes_per_second} bytes/sec")
+def print_progress_update(update: lmstudio.DownloadProgressUpdate) -> None:
+ print(f"Downloaded {update.downloaded_bytes} bytes of {update.total_bytes} total \
+ at {update.speed_bytes_per_second} bytes/sec")
- with lmstudio.Client() as client:
- # ... Same code as before ...
+with lmstudio.Client() as client:
+ # ... Same code as before ...
- model_key = desired_model.download(
- on_progress=print_progress_update,
- on_finalize: lambda: print("Finalizing download...")
- )
+ model_key = desired_model.download(
+ on_progress=print_progress_update,
+ on_finalize: lambda: print("Finalizing download...")
+ )
+```
- TypeScript:
- language: typescript
- code: |
- import { LMStudioClient, type DownloadProgressUpdate } from "@lmstudio/sdk";
+```typescript tab="TypeScript"
+import { LMStudioClient, type DownloadProgressUpdate } from "@lmstudio/sdk";
- function printProgressUpdate(update: DownloadProgressUpdate) {
- process.stdout.write(`Downloaded ${update.downloadedBytes} bytes of ${update.totalBytes} total \
- at ${update.speed_bytes_per_second} bytes/sec`);
- }
+function printProgressUpdate(update: DownloadProgressUpdate) {
+ process.stdout.write(`Downloaded ${update.downloadedBytes} bytes of ${update.totalBytes} total \
+ at ${update.speed_bytes_per_second} bytes/sec`);
+}
- const client = new LMStudioClient();
+const client = new LMStudioClient();
- // ... Same code as before ...
+// ... Same code as before ...
- modelKey = await desiredModel.download({
- onProgress: printProgressUpdate,
- onStartFinalizing: () => console.log("Finalizing..."),
- });
+modelKey = await desiredModel.download({
+ onProgress: printProgressUpdate,
+ onStartFinalizing: () => console.log("Finalizing..."),
+});
- const loadedModel = await client.llm.model(modelKey);
+const loadedModel = await client.llm.model(modelKey);
```
diff --git a/2_typescript/6_manage-models/list-downloaded.md b/2_typescript/6_manage-models/list-downloaded.md
index 07a88c7..48c789b 100644
--- a/2_typescript/6_manage-models/list-downloaded.md
+++ b/2_typescript/6_manage-models/list-downloaded.md
@@ -9,15 +9,11 @@ You can iterate through locally available models using the `listLocalModels` met
`listLocalModels` lives under the `system` namespace of the `LMStudioClient` object.
-```lms_code_snippet
- variants:
- TypeScript:
- language: typescript
- code: |
- import { LMStudioClient } from "@lmstudio/sdk";
- const client = new LMStudioClient();
-
- console.info(await client.system.listDownloadedModels());
+```typescript
+import { LMStudioClient } from "@lmstudio/sdk";
+const client = new LMStudioClient();
+
+console.info(await client.system.listDownloadedModels());
```
This will give you results equivalent to using [`lms ls`](../../cli/ls) in the CLI.
diff --git a/2_typescript/6_manage-models/list-loaded.md b/2_typescript/6_manage-models/list-loaded.md
index a971436..d6df419 100644
--- a/2_typescript/6_manage-models/list-loaded.md
+++ b/2_typescript/6_manage-models/list-loaded.md
@@ -9,17 +9,13 @@ You can iterate through models loaded into memory using the `listLoaded` method.
This will give you results equivalent to using [`lms ps`](../../cli/ps) in the CLI.
-```lms_code_snippet
- variants:
- TypeScript:
- language: typescript
- code: |
- import { LMStudioClient } from "@lmstudio/sdk";
+```typescript
+import { LMStudioClient } from "@lmstudio/sdk";
- const client = new LMStudioClient();
+const client = new LMStudioClient();
- const llmOnly = await client.llm.listLoaded();
- const embeddingOnly = await client.embedding.listLoaded();
+const llmOnly = await client.llm.listLoaded();
+const embeddingOnly = await client.embedding.listLoaded();
```
diff --git a/2_typescript/6_manage-models/loading.md b/2_typescript/6_manage-models/loading.md
index 0ea3aa8..80cb392 100644
--- a/2_typescript/6_manage-models/loading.md
+++ b/2_typescript/6_manage-models/loading.md
@@ -20,15 +20,11 @@ AI models are huge. It can take a while to load them into memory. LM Studio's SD
If you already have a model loaded in LM Studio (either via the GUI or `lms load`), you can use it by calling `.model()` without any arguments.
-```lms_code_snippet
- variants:
- TypeScript:
- language: typescript
- code: |
- import { LMStudioClient } from "@lmstudio/sdk";
- const client = new LMStudioClient();
-
- const model = await client.llm.model();
+```typescript
+import { LMStudioClient } from "@lmstudio/sdk";
+const client = new LMStudioClient();
+
+const model = await client.llm.model();
```
## Get a Specific Model with `.model("model-key")`
@@ -39,15 +35,11 @@ If you want to use a specific model, you can provide the model key as an argumen
Calling `.model("model-key")` will load the model if it's not already loaded, or return the existing instance if it is.
-```lms_code_snippet
- variants:
- TypeScript:
- language: typescript
- code: |
- import { LMStudioClient } from "@lmstudio/sdk";
- const client = new LMStudioClient();
+```typescript
+import { LMStudioClient } from "@lmstudio/sdk";
+const client = new LMStudioClient();
- const model = await client.llm.model("qwen/qwen3-4b-2507");
+const model = await client.llm.model("qwen/qwen3-4b-2507");
```
@@ -56,18 +48,14 @@ Calling `.model("model-key")` will load the model if it's not already loaded, or
Use `load()` to load a new instance of a model, even if one already exists. This allows you to have multiple instances of the same or different models loaded at the same time.
-```lms_code_snippet
- variants:
- TypeScript:
- language: typescript
- code: |
- import { LMStudioClient } from "@lmstudio/sdk";
- const client = new LMStudioClient();
-
- const llama = await client.llm.load("qwen/qwen3-4b-2507");
- const another_llama = await client.llm.load("qwen/qwen3-4b-2507", {
- identifier: "second-llama"
- });
+```typescript
+import { LMStudioClient } from "@lmstudio/sdk";
+const client = new LMStudioClient();
+
+const llama = await client.llm.load("qwen/qwen3-4b-2507");
+const another_llama = await client.llm.load("qwen/qwen3-4b-2507", {
+ identifier: "second-llama"
+});
```
@@ -82,17 +70,13 @@ the server will generate one for you. You can always check in the server tab in
Once you no longer need a model, you can unload it by simply calling `unload()` on its handle.
-```lms_code_snippet
- variants:
- TypeScript:
- language: typescript
- code: |
- import { LMStudioClient } from "@lmstudio/sdk";
+```typescript
+import { LMStudioClient } from "@lmstudio/sdk";
- const client = new LMStudioClient();
+const client = new LMStudioClient();
- const model = await client.llm.model();
- await model.unload();
+const model = await client.llm.model();
+await model.unload();
```
## Set Custom Load Config Parameters
@@ -106,28 +90,24 @@ See [load-time configuration](../llm-prediction/parameters) for more.
You can specify a _time to live_ for a model you load, which is the idle time (in seconds)
after the last request until the model unloads. See [Idle TTL](/docs/api/ttl-and-auto-evict) for more on this.
-```lms_code_snippet
- variants:
- "Using .load":
- language: typescript
- code: |
- import { LMStudioClient } from "@lmstudio/sdk";
-
- const client = new LMStudioClient();
-
- const model = await client.llm.load("qwen/qwen3-4b-2507", {
- ttl: 300, // 300 seconds
- });
- "Using .model":
- language: typescript
- code: |
- import { LMStudioClient } from "@lmstudio/sdk";
-
- const client = new LMStudioClient();
-
- const model = await client.llm.model("qwen/qwen3-4b-2507", {
- // Note: specifying ttl in `.model` will only set the TTL for the model if the model is
- // loaded from this call. If the model was already loaded, the TTL will not be updated.
- ttl: 300, // 300 seconds
- });
+```typescript tab="Using .load"
+import { LMStudioClient } from "@lmstudio/sdk";
+
+const client = new LMStudioClient();
+
+const model = await client.llm.load("qwen/qwen3-4b-2507", {
+ ttl: 300, // 300 seconds
+});
+```
+
+```typescript tab="Using .model"
+import { LMStudioClient } from "@lmstudio/sdk";
+
+const client = new LMStudioClient();
+
+const model = await client.llm.model("qwen/qwen3-4b-2507", {
+ // Note: specifying ttl in `.model` will only set the TTL for the model if the model is
+ // loaded from this call. If the model was already loaded, the TTL will not be updated.
+ ttl: 300, // 300 seconds
+});
```
diff --git a/2_typescript/6_manage-models/meta.json b/2_typescript/6_manage-models/meta.json
new file mode 100644
index 0000000..9f1b188
--- /dev/null
+++ b/2_typescript/6_manage-models/meta.json
@@ -0,0 +1,9 @@
+{
+ "title": "Manage Models",
+ "pages": [
+ "_download-models",
+ "list-downloaded",
+ "list-loaded",
+ "loading"
+ ]
+}
diff --git a/2_typescript/7_api-reference/_act.md b/2_typescript/7_api-reference/_act.md
index 74e4476..1063c93 100644
--- a/2_typescript/7_api-reference/_act.md
+++ b/2_typescript/7_api-reference/_act.md
@@ -1,6 +1,6 @@
---
-title: "`.act()`"
-sidebar_title: "`.act()`"
+title: ".act()"
+sidebar_title: ".act()"
description: ".act() - API reference for automatic tool use in a multi-turn chat conversation"
index: 3
---
diff --git a/2_typescript/7_api-reference/_chat.md b/2_typescript/7_api-reference/_chat.md
index 33b5f8b..70eea8b 100644
--- a/2_typescript/7_api-reference/_chat.md
+++ b/2_typescript/7_api-reference/_chat.md
@@ -1,7 +1,7 @@
---
-title: "`Chat`"
-sidebar_title: "`Chat`"
-description: "`Chat` - API reference for representing a chat conversation with an LLM"
+title: "Chat"
+sidebar_title: "Chat"
+description: "Chat - API reference for representing a chat conversation with an LLM"
index: 5
---
diff --git a/2_typescript/7_api-reference/_complete.md b/2_typescript/7_api-reference/_complete.md
index 43d79d1..8c496b2 100644
--- a/2_typescript/7_api-reference/_complete.md
+++ b/2_typescript/7_api-reference/_complete.md
@@ -1,6 +1,6 @@
---
-title: "`.complete()`"
-sidebar_title: "`.complete()`"
+title: ".complete()"
+sidebar_title: ".complete()"
description: ".complete() - API reference for generating text completions from a loaded language model"
index: 4
---
diff --git a/2_typescript/7_api-reference/_count-tokens.md b/2_typescript/7_api-reference/_count-tokens.md
index 1198d6d..719e7ee 100644
--- a/2_typescript/7_api-reference/_count-tokens.md
+++ b/2_typescript/7_api-reference/_count-tokens.md
@@ -1,6 +1,6 @@
---
-title: "`.countTokens()`"
-sidebar_title: "`.countTokens()`"
+title: ".countTokens()"
+sidebar_title: ".countTokens()"
description: ".countTokens() - API reference for counting tokens in a string using a model's tokenizer"
---
diff --git a/2_typescript/7_api-reference/_embed.md b/2_typescript/7_api-reference/_embed.md
index 9864258..3f6c93c 100644
--- a/2_typescript/7_api-reference/_embed.md
+++ b/2_typescript/7_api-reference/_embed.md
@@ -1,6 +1,6 @@
---
-title: "`.embed()`"
-sidebar_title: "`.embed()`"
+title: ".embed()"
+sidebar_title: ".embed()"
description: ".embed() - API reference for generating embeddings from a loaded embedding model"
---
diff --git a/2_typescript/7_api-reference/_llm-namespace.md b/2_typescript/7_api-reference/_llm-namespace.md
index 033677d..65bd34d 100644
--- a/2_typescript/7_api-reference/_llm-namespace.md
+++ b/2_typescript/7_api-reference/_llm-namespace.md
@@ -1,7 +1,7 @@
---
-title: "`client.llm`"
-sidebar_title: "`client.llm` namespace"
-description: "`client.llm` - API reference for the llm namespace in an `LMStudioClient` instance"
+title: "client.llm"
+sidebar_title: "client.llm namespace"
+description: "client.llm - API reference for the llm namespace in an LMStudioClient instance"
index: 6
---
diff --git a/2_typescript/7_api-reference/_lmstudioclient.md b/2_typescript/7_api-reference/_lmstudioclient.md
index 4fd0e95..589bd5b 100644
--- a/2_typescript/7_api-reference/_lmstudioclient.md
+++ b/2_typescript/7_api-reference/_lmstudioclient.md
@@ -1,7 +1,7 @@
---
-title: "`LMStudioClient`"
-sidebar_title: "`LMStudioClient`"
-description: "LMStudioClient - API reference for the `LMStudioClient` class"
+title: "LMStudioClient"
+sidebar_title: "LMStudioClient"
+description: "LMStudioClient - API reference for the LMStudioClient class"
index: 1
---
diff --git a/2_typescript/7_api-reference/_model.md b/2_typescript/7_api-reference/_model.md
index 515ab0c..0d5c59d 100644
--- a/2_typescript/7_api-reference/_model.md
+++ b/2_typescript/7_api-reference/_model.md
@@ -1,7 +1,7 @@
---
-title: "`.model()`"
-sidebar_title: "`.model()`"
-description: ".model() - API reference for obtaining a model handle from an `LMStudioClient` instance"
+title: ".model()"
+sidebar_title: ".model()"
+description: ".model() - API reference for obtaining a model handle from an LMStudioClient instance"
index: 2
---
diff --git a/2_typescript/7_api-reference/_respond.md b/2_typescript/7_api-reference/_respond.md
index 89876f9..ff73ea8 100644
--- a/2_typescript/7_api-reference/_respond.md
+++ b/2_typescript/7_api-reference/_respond.md
@@ -1,6 +1,6 @@
---
-title: "`.respond()`"
-sidebar_title: "`.respond()`"
+title: ".respond()"
+sidebar_title: ".respond()"
description: ".respond() - API reference for generating chat responses from a loaded language model"
index: 2
---
diff --git a/2_typescript/7_api-reference/_system-namespace.md b/2_typescript/7_api-reference/_system-namespace.md
index 2b4f4f8..49c1c91 100644
--- a/2_typescript/7_api-reference/_system-namespace.md
+++ b/2_typescript/7_api-reference/_system-namespace.md
@@ -1,7 +1,7 @@
---
-title: "`client.system`"
-sidebar_title: "`client.system` namespace"
-description: "`client.system` - API reference for the system namespace in an `LMStudioClient` instance"
+title: "client.system"
+sidebar_title: "client.system namespace"
+description: "client.system - API reference for the system namespace in an LMStudioClient instance"
index: 6
---
diff --git a/2_typescript/7_api-reference/_tokenize.md b/2_typescript/7_api-reference/_tokenize.md
index bb31b61..1181a05 100644
--- a/2_typescript/7_api-reference/_tokenize.md
+++ b/2_typescript/7_api-reference/_tokenize.md
@@ -1,6 +1,6 @@
---
-title: "`.tokenize()`"
-sidebar_title: "`.tokenize()`"
+title: ".tokenize()"
+sidebar_title: ".tokenize()"
description: ".tokenize() - API reference for converting text input into tokens using a model's tokenizer"
---
diff --git a/2_typescript/7_api-reference/llm-load-model-config.md b/2_typescript/7_api-reference/llm-load-model-config.md
index a3b10c4..f18460e 100644
--- a/2_typescript/7_api-reference/llm-load-model-config.md
+++ b/2_typescript/7_api-reference/llm-load-model-config.md
@@ -1,6 +1,6 @@
---
-title: "`LLMLoadModelConfig`"
-description: "API Reference for `LLMLoadModelConfig`"
+title: "LLMLoadModelConfig"
+description: "API Reference for LLMLoadModelConfig"
---
### Parameters
diff --git a/2_typescript/7_api-reference/llm-prediction-config-input.md b/2_typescript/7_api-reference/llm-prediction-config-input.md
index 798f00c..4dff4b6 100644
--- a/2_typescript/7_api-reference/llm-prediction-config-input.md
+++ b/2_typescript/7_api-reference/llm-prediction-config-input.md
@@ -1,5 +1,5 @@
---
-title: "`LLMPredictionConfigInput`"
+title: "LLMPredictionConfigInput"
---
### Fields
diff --git a/2_typescript/7_api-reference/meta.json b/2_typescript/7_api-reference/meta.json
new file mode 100644
index 0000000..361a193
--- /dev/null
+++ b/2_typescript/7_api-reference/meta.json
@@ -0,0 +1,18 @@
+{
+ "title": "API Reference",
+ "pages": [
+ "_act",
+ "_chat",
+ "_complete",
+ "_count-tokens",
+ "_embed",
+ "llm-load-model-config",
+ "_llm-namespace",
+ "llm-prediction-config-input",
+ "_lmstudioclient",
+ "_model",
+ "_respond",
+ "_system-namespace",
+ "_tokenize"
+ ]
+}
diff --git a/2_typescript/8_model-info/_get-load-config.md b/2_typescript/8_model-info/_get-load-config.mdx
similarity index 62%
rename from 2_typescript/8_model-info/_get-load-config.md
rename to 2_typescript/8_model-info/_get-load-config.mdx
index c7b423d..ff981fb 100644
--- a/2_typescript/8_model-info/_get-load-config.md
+++ b/2_typescript/8_model-info/_get-load-config.mdx
@@ -8,19 +8,15 @@ LM Studio allows you to configure certain parameters when loading a model
You can retrieve the config with which a given model was loaded using the SDK. In the below examples, `llm` can be replaced with an embedding model `emb`.
-```lms_protip
+
Context length is a special case that [has its own method](/docs/api/sdk/get-context-length).
-```
+
-```lms_code_snippet
- variants:
- TypeScript:
- language: typescript
- code: |
- import { LMStudioClient } from "@lmstudio/sdk";
+```typescript
+import { LMStudioClient } from "@lmstudio/sdk";
- const client = new LMStudioClient();
- const model = await client.llm.model();
+const client = new LMStudioClient();
+const model = await client.llm.model();
- loadConfig = await model.getLoadConfig()
+loadConfig = await model.getLoadConfig()
```
diff --git a/2_typescript/8_model-info/get-context-length.md b/2_typescript/8_model-info/get-context-length.md
index 1a57013..00bbf04 100644
--- a/2_typescript/8_model-info/get-context-length.md
+++ b/2_typescript/8_model-info/get-context-length.md
@@ -9,13 +9,8 @@ LLMs and embedding models, due to their fundamental architecture, have a propert
It's useful to be able to check the context length of a model, especially as an extra check before providing potentially long input to the model.
-```lms_code_snippet
- title: "index.ts"
- variants:
- TypeScript:
- language: typescript
- code: |
- const contextLength = await model.getContextLength();
+```typescript title="index.ts"
+const contextLength = await model.getContextLength();
```
The `model` in the above code snippet is an instance of a loaded model you get from the `llm.model` method. See [Manage Models in Memory](../manage-models/loading) for more information.
@@ -28,31 +23,27 @@ You can determine if a given conversation fits into a model's context by doing t
2. Count the number of tokens in the string.
3. Compare the token count to the model's context length.
-```lms_code_snippet
- variants:
- TypeScript:
- language: typescript
- code: |
- import { Chat, type LLM, LMStudioClient } from "@lmstudio/sdk";
-
- async function doesChatFitInContext(model: LLM, chat: Chat) {
- // Convert the conversation to a string using the prompt template.
- const formatted = await model.applyPromptTemplate(chat);
- // Count the number of tokens in the string.
- const tokenCount = await model.countTokens(formatted);
- // Get the current loaded context length of the model
- const contextLength = await model.getContextLength();
- return tokenCount < contextLength;
- }
-
- const client = new LMStudioClient();
- const model = await client.llm.model();
-
- const chat = Chat.from([
- { role: "user", content: "What is the meaning of life." },
- { role: "assistant", content: "The meaning of life is..." },
- // ... More messages
- ]);
-
- console.info("Fits in context:", await doesChatFitInContext(model, chat));
+```typescript
+import { Chat, type LLM, LMStudioClient } from "@lmstudio/sdk";
+
+async function doesChatFitInContext(model: LLM, chat: Chat) {
+ // Convert the conversation to a string using the prompt template.
+ const formatted = await model.applyPromptTemplate(chat);
+ // Count the number of tokens in the string.
+ const tokenCount = await model.countTokens(formatted);
+ // Get the current loaded context length of the model
+ const contextLength = await model.getContextLength();
+ return tokenCount < contextLength;
+}
+
+const client = new LMStudioClient();
+const model = await client.llm.model();
+
+const chat = Chat.from([
+ { role: "user", content: "What is the meaning of life." },
+ { role: "assistant", content: "The meaning of life is..." },
+ // ... More messages
+]);
+
+console.info("Fits in context:", await doesChatFitInContext(model, chat));
```
diff --git a/2_typescript/8_model-info/get-model-info.md b/2_typescript/8_model-info/get-model-info.md
index 9e9545a..eda0945 100644
--- a/2_typescript/8_model-info/get-model-info.md
+++ b/2_typescript/8_model-info/get-model-info.md
@@ -5,33 +5,29 @@ description: Get information about the model
You can access information about a loaded model using the `getInfo` method.
-```lms_code_snippet
- variants:
- LLM:
- language: typescript
- code: |
- import { LMStudioClient } from "@lmstudio/sdk";
-
- const client = new LMStudioClient();
- const model = await client.llm.model();
-
- const modelInfo = await model.getInfo();
-
- console.info("Model Key", modelInfo.modelKey);
- console.info("Current Context Length", model.contextLength);
- console.info("Model Trained for Tool Use", modelInfo.trainedForToolUse);
- // etc.
- Embedding Model:
- language: typescript
- code: |
- import { LMStudioClient } from "@lmstudio/sdk";
-
- const client = new LMStudioClient();
- const model = await client.embedding.model();
-
- const modelInfo = await model.getInfo();
-
- console.info("Model Key", modelInfo.modelKey);
- console.info("Current Context Length", modelInfo.contextLength);
- // etc.
+```typescript tab="LLM"
+import { LMStudioClient } from "@lmstudio/sdk";
+
+const client = new LMStudioClient();
+const model = await client.llm.model();
+
+const modelInfo = await model.getInfo();
+
+console.info("Model Key", modelInfo.modelKey);
+console.info("Current Context Length", model.contextLength);
+console.info("Model Trained for Tool Use", modelInfo.trainedForToolUse);
+// etc.
+```
+
+```typescript tab="Embedding Model"
+import { LMStudioClient } from "@lmstudio/sdk";
+
+const client = new LMStudioClient();
+const model = await client.embedding.model();
+
+const modelInfo = await model.getInfo();
+
+console.info("Model Key", modelInfo.modelKey);
+console.info("Current Context Length", modelInfo.contextLength);
+// etc.
```
diff --git a/2_typescript/8_model-info/meta.json b/2_typescript/8_model-info/meta.json
new file mode 100644
index 0000000..b5ea714
--- /dev/null
+++ b/2_typescript/8_model-info/meta.json
@@ -0,0 +1,8 @@
+{
+ "title": "Model Info",
+ "pages": [
+ "get-context-length",
+ "_get-load-config",
+ "get-model-info"
+ ]
+}
diff --git a/2_typescript/_more/_apply-prompt-template.md b/2_typescript/_more/_apply-prompt-template.md
deleted file mode 100644
index 798eed6..0000000
--- a/2_typescript/_more/_apply-prompt-template.md
+++ /dev/null
@@ -1,54 +0,0 @@
----
-title: Apply Prompt Template
-description: Apply a model's prompt template to a conversation
----
-
-## Overview
-
-LLMs (Large Language Models) operate on a text-in, text-out basis. Before processing conversations through these models, the input must be converted into a properly formatted string using a prompt template. If you need to inspect or work with this formatted string directly, the LM Studio SDK provides a streamlined way to apply a model's prompt template to your conversations.
-
-```lms_info
-You do not need to use this method when using `.respond`. It will automatically apply the prompt template for you.
-```
-
-## Usage with a Chat
-
-You can apply a prompt template to a `Chat` by using the `applyPromptTemplate` method. This method takes a `Chat` object as input and returns a formatted string.
-
-```lms_code_snippet
- variants:
- TypeScript:
- language: typescript
- code: |
- import { Chat, LMStudioClient } from "@lmstudio/sdk";
-
- const client = new LMStudioClient();
- const llm = await client.llm.model(); // Use any loaded LLM
-
- const chat = Chat.createEmpty();
- chat.append("system", "You are a helpful assistant.");
- chat.append("user", "What is LM Studio?");
- const formatted = await llm.applyPromptTemplate(chat);
- console.info(formatted);
-```
-
-## Usage with an Array of Messages
-
-The same method can also be used with any object that can be converted to a `Chat`, for example, an array of messages.
-
-```lms_code_snippet
- variants:
- TypeScript:
- language: typescript
- code: |
- import { LMStudioClient } from "@lmstudio/sdk";
-
- const client = new LMStudioClient();
- const llm = await client.llm.model(); // Use any loaded LLM
-
- const formatted = await llm.applyPromptTemplate([
- { role: "system", content: "You are a helpful assistant." },
- { role: "user", content: "What is LM Studio?" },
- ]);
- console.info(formatted);
-```
diff --git a/2_typescript/_more/_apply-prompt-template.mdx b/2_typescript/_more/_apply-prompt-template.mdx
new file mode 100644
index 0000000..88254b0
--- /dev/null
+++ b/2_typescript/_more/_apply-prompt-template.mdx
@@ -0,0 +1,46 @@
+---
+title: Apply Prompt Template
+description: Apply a model's prompt template to a conversation
+---
+
+## Overview
+
+LLMs (Large Language Models) operate on a text-in, text-out basis. Before processing conversations through these models, the input must be converted into a properly formatted string using a prompt template. If you need to inspect or work with this formatted string directly, the LM Studio SDK provides a streamlined way to apply a model's prompt template to your conversations.
+
+
+You do not need to use this method when using `.respond`. It will automatically apply the prompt template for you.
+
+
+## Usage with a Chat
+
+You can apply a prompt template to a `Chat` by using the `applyPromptTemplate` method. This method takes a `Chat` object as input and returns a formatted string.
+
+```typescript
+import { Chat, LMStudioClient } from "@lmstudio/sdk";
+
+const client = new LMStudioClient();
+const llm = await client.llm.model(); // Use any loaded LLM
+
+const chat = Chat.createEmpty();
+chat.append("system", "You are a helpful assistant.");
+chat.append("user", "What is LM Studio?");
+const formatted = await llm.applyPromptTemplate(chat);
+console.info(formatted);
+```
+
+## Usage with an Array of Messages
+
+The same method can also be used with any object that can be converted to a `Chat`, for example, an array of messages.
+
+```typescript
+import { LMStudioClient } from "@lmstudio/sdk";
+
+const client = new LMStudioClient();
+const llm = await client.llm.model(); // Use any loaded LLM
+
+const formatted = await llm.applyPromptTemplate([
+ { role: "system", content: "You are a helpful assistant." },
+ { role: "user", content: "What is LM Studio?" },
+]);
+console.info(formatted);
+```
diff --git a/2_typescript/_more/meta.json b/2_typescript/_more/meta.json
new file mode 100644
index 0000000..48efd12
--- /dev/null
+++ b/2_typescript/_more/meta.json
@@ -0,0 +1,6 @@
+{
+ "title": "More",
+ "pages": [
+ "_apply-prompt-template"
+ ]
+}
diff --git a/2_typescript/authentication.md b/2_typescript/authentication.md
deleted file mode 100644
index 56894d5..0000000
--- a/2_typescript/authentication.md
+++ /dev/null
@@ -1,53 +0,0 @@
----
-title: Authentication
-sidebar_title: Authentication
-description: Using API Tokens in LM Studio
-index: 2
----
-
-##### Requires [LM Studio 0.4.0](/download) or newer.
-
-LM Studio supports API Tokens for authentication, providing a secure and convenient way to access the LM Studio API.
-
-By default, the LM Studio API runs **without enforcing authentication**. For production or shared environments, enable API Token authentication for secure access.
-
-```lms_info
-To enable API Token authentication, create tokens and control granular permissions, check [this guide](/docs/developer/core/authentication) for more details.
-```
-
-## Providing the API Token
-
-There are two ways to provide the API Token when creating an instance of `LMStudioClient`:
-
-1. **Environment Variable (Recommended)**: Set the `LM_API_TOKEN` environment variable, and the SDK will automatically read it.
-2. **Function Argument**: Pass the token directly as the `apiToken` parameter in the constructor.
-
-```lms_code_snippet
- variants:
- Environment Variable:
- language: typescript
- code: |
- // Set environment variables in your terminal before running the code:
- // export LM_API_TOKEN="your-token-here"
-
- import { LMStudioClient } from "@lmstudio/sdk";
- // The SDK automatically reads from LM_API_TOKEN environment variable
- const client = new LMStudioClient();
-
- const model = await client.llm.model("qwen/qwen3-4b-2507");
- const result = await model.respond("What is the meaning of life?");
-
- console.info(result.content);
- Function Argument:
- language: typescript
- code: |
- import { LMStudioClient } from "@lmstudio/sdk";
- const client = new LMStudioClient({
- apiToken: "your-token-here",
- });
-
- const model = await client.llm.model("qwen/qwen3-4b-2507");
- const result = await model.respond("What is the meaning of life?");
-
- console.info(result.content);
-```
diff --git a/2_typescript/authentication.mdx b/2_typescript/authentication.mdx
new file mode 100644
index 0000000..52db967
--- /dev/null
+++ b/2_typescript/authentication.mdx
@@ -0,0 +1,49 @@
+---
+title: Authentication
+sidebar_title: Authentication
+description: Using API Tokens in LM Studio
+index: 2
+---
+
+##### Requires LM Studio 0.4.0 or newer.
+
+LM Studio supports API Tokens for authentication, providing a secure and convenient way to access the LM Studio API.
+
+By default, the LM Studio API runs **without enforcing authentication**. For production or shared environments, enable API Token authentication for secure access.
+
+
+To enable API Token authentication, create tokens and control granular permissions, check [this guide](/docs/developer/core/authentication) for more details.
+
+
+## Providing the API Token
+
+There are two ways to provide the API Token when creating an instance of `LMStudioClient`:
+
+1. **Environment Variable (Recommended)**: Set the `LM_API_TOKEN` environment variable, and the SDK will automatically read it.
+2. **Function Argument**: Pass the token directly as the `apiToken` parameter in the constructor.
+
+```typescript tab="Environment Variable"
+// Set environment variables in your terminal before running the code:
+// export LM_API_TOKEN="your-token-here"
+
+import { LMStudioClient } from "@lmstudio/sdk";
+// The SDK automatically reads from LM_API_TOKEN environment variable
+const client = new LMStudioClient();
+
+const model = await client.llm.model("qwen/qwen3-4b-2507");
+const result = await model.respond("What is the meaning of life?");
+
+console.info(result.content);
+```
+
+```typescript tab="Function Argument"
+import { LMStudioClient } from "@lmstudio/sdk";
+const client = new LMStudioClient({
+ apiToken: "your-token-here",
+});
+
+const model = await client.llm.model("qwen/qwen3-4b-2507");
+const result = await model.respond("What is the meaning of life?");
+
+console.info(result.content);
+```
diff --git a/2_typescript/index.md b/2_typescript/index.md
index 2208c8e..8de9bd0 100644
--- a/2_typescript/index.md
+++ b/2_typescript/index.md
@@ -1,5 +1,5 @@
---
-title: "`lmstudio-js` (TypeScript SDK)"
+title: "lmstudio-js (TypeScript SDK)"
sidebar_title: "Introduction"
description: "Getting started with LM Studio's Typescript / JavaScript SDK"
---
@@ -10,20 +10,16 @@ The SDK provides you a set of programmatic tools to interact with LLMs, embeddin
`lmstudio-js` is available as an npm package. You can install it using npm, yarn, or pnpm.
-```lms_code_snippet
- variants:
- npm:
- language: bash
- code: |
- npm install @lmstudio/sdk --save
- yarn:
- language: bash
- code: |
- yarn add @lmstudio/sdk
- pnpm:
- language: bash
- code: |
- pnpm add @lmstudio/sdk
+```bash tab="npm"
+npm install @lmstudio/sdk --save
+```
+
+```bash tab="yarn"
+yarn add @lmstudio/sdk
+```
+
+```bash tab="pnpm"
+pnpm add @lmstudio/sdk
```
For the source code and open source contribution, visit [lmstudio-js](https://github.com/lmstudio-ai/lmstudio-js) on GitHub.
@@ -38,19 +34,14 @@ For the source code and open source contribution, visit [lmstudio-js](https://gi
## Quick Example: Chat with a Llama Model
-```lms_code_snippet
- title: "index.ts"
- variants:
- TypeScript:
- language: typescript
- code: |
- import { LMStudioClient } from "@lmstudio/sdk";
- const client = new LMStudioClient();
+```typescript title="index.ts"
+import { LMStudioClient } from "@lmstudio/sdk";
+const client = new LMStudioClient();
- const model = await client.llm.model("qwen/qwen3-4b-2507");
- const result = await model.respond("What is the meaning of life?");
+const model = await client.llm.model("qwen/qwen3-4b-2507");
+const result = await model.respond("What is the meaning of life?");
- console.info(result.content);
+console.info(result.content);
```
### Getting Local Models
diff --git a/2_typescript/meta.json b/2_typescript/meta.json
new file mode 100644
index 0000000..356449f
--- /dev/null
+++ b/2_typescript/meta.json
@@ -0,0 +1,27 @@
+{
+ "title": "TypeScript SDK",
+ "pages": [
+ "---Introduction---",
+ "index",
+ "authentication",
+ "project-setup",
+ "---Basics---",
+ "...2_llm-prediction",
+ "---Agentic Flows---",
+ "...3_agent",
+ "---Plugins---",
+ "...3_plugins",
+ "---Text Embedding---",
+ "...4_embedding",
+ "---Tokenization---",
+ "...5_tokenization",
+ "---Manage Models---",
+ "...6_manage-models",
+ "---API Reference---",
+ "...7_api-reference",
+ "---Model Info---",
+ "...8_model-info",
+ "---More---",
+ "..._more"
+ ]
+}
diff --git a/2_typescript/project-setup.md b/2_typescript/project-setup.md
index 35a4a0e..46858e1 100644
--- a/2_typescript/project-setup.md
+++ b/2_typescript/project-setup.md
@@ -1,7 +1,7 @@
---
title: "Project Setup"
sidebar_title: "Project Setup"
-description: "Set up your `lmstudio-js` app or script."
+description: "Set up your lmstudio-js app or script."
index: 2
---
@@ -11,34 +11,26 @@ index: 2
Use the following command to start an interactive project setup:
-```lms_code_snippet
- variants:
- TypeScript (Recommended):
- language: bash
- code: |
- lms create node-typescript
- Javascript:
- language: bash
- code: |
- lms create node-javascript
+```bash tab="TypeScript (Recommended)"
+lms create node-typescript
+```
+
+```bash tab="Javascript"
+lms create node-javascript
```
## Add `lmstudio-js` to an Exiting Project
If you have already created a project and would like to use `lmstudio-js` in it, you can install it using npm, yarn, or pnpm.
-```lms_code_snippet
- variants:
- npm:
- language: bash
- code: |
- npm install @lmstudio/sdk --save
- yarn:
- language: bash
- code: |
- yarn add @lmstudio/sdk
- pnpm:
- language: bash
- code: |
- pnpm add @lmstudio/sdk
+```bash tab="npm"
+npm install @lmstudio/sdk --save
+```
+
+```bash tab="yarn"
+yarn add @lmstudio/sdk
+```
+
+```bash tab="pnpm"
+pnpm add @lmstudio/sdk
```
diff --git a/3_cli/0_local-models/chat.md b/3_cli/0_local-models/chat.md
index 68402db..bb71d53 100644
--- a/3_cli/0_local-models/chat.md
+++ b/3_cli/0_local-models/chat.md
@@ -1,6 +1,6 @@
---
-title: "`lms chat`"
-sidebar_title: "`lms chat`"
+title: "lms chat"
+sidebar_title: "lms chat"
description: Start a chat session with a local model from the command line.
index: 1
---
diff --git a/3_cli/0_local-models/get.md b/3_cli/0_local-models/get.md
index d6e7164..f303168 100644
--- a/3_cli/0_local-models/get.md
+++ b/3_cli/0_local-models/get.md
@@ -1,6 +1,6 @@
---
-title: "`lms get`"
-sidebar_title: "`lms get`"
+title: "lms get"
+sidebar_title: "lms get"
description: Search and download models from the command line.
index: 2
---
diff --git a/3_cli/0_local-models/import.md b/3_cli/0_local-models/import.md
index 4d2d84d..ef43f95 100644
--- a/3_cli/0_local-models/import.md
+++ b/3_cli/0_local-models/import.md
@@ -1,6 +1,6 @@
---
-title: "`lms import`"
-sidebar_title: "`lms import`"
+title: "lms import"
+sidebar_title: "lms import"
description: Import a local model file into your LM Studio models directory.
index: 6
---
diff --git a/3_cli/0_local-models/load.md b/3_cli/0_local-models/load.md
index 83eaf92..1d4a6c0 100644
--- a/3_cli/0_local-models/load.md
+++ b/3_cli/0_local-models/load.md
@@ -1,6 +1,6 @@
---
-title: "`lms load`"
-sidebar_title: "`lms load`"
+title: "lms load"
+sidebar_title: "lms load"
description: Load or unload models, set context length, GPU offload, TTL, or estimate memory usage without loading.
index: 3
---
diff --git a/3_cli/0_local-models/ls.md b/3_cli/0_local-models/ls.md
index 6c7998e..0b4d396 100644
--- a/3_cli/0_local-models/ls.md
+++ b/3_cli/0_local-models/ls.md
@@ -1,6 +1,6 @@
---
-title: "`lms ls`"
-sidebar_title: "`lms ls`"
+title: "lms ls"
+sidebar_title: "lms ls"
description: List all downloaded models in your LM Studio installation.
index: 4
---
diff --git a/3_cli/0_local-models/meta.json b/3_cli/0_local-models/meta.json
new file mode 100644
index 0000000..1a9abfa
--- /dev/null
+++ b/3_cli/0_local-models/meta.json
@@ -0,0 +1,11 @@
+{
+ "title": "Local Models",
+ "pages": [
+ "chat",
+ "get",
+ "import",
+ "load",
+ "ls",
+ "ps"
+ ]
+}
diff --git a/3_cli/0_local-models/ps.md b/3_cli/0_local-models/ps.md
index 3498e2f..d789dcd 100644
--- a/3_cli/0_local-models/ps.md
+++ b/3_cli/0_local-models/ps.md
@@ -1,6 +1,6 @@
---
-title: "`lms ps`"
-sidebar_title: "`lms ps`"
+title: "lms ps"
+sidebar_title: "lms ps"
description: Show information about currently loaded models from the command line.
index: 5
---
diff --git a/3_cli/1_serve/log-stream.md b/3_cli/1_serve/log-stream.md
index eb125fa..7233e75 100644
--- a/3_cli/1_serve/log-stream.md
+++ b/3_cli/1_serve/log-stream.md
@@ -1,6 +1,6 @@
---
-title: "`lms log stream`"
-sidebar_title: "`lms log stream`"
+title: "lms log stream"
+sidebar_title: "lms log stream"
description: Stream logs from LM Studio. Useful for debugging prompts sent to the model.
index: 4
---
diff --git a/3_cli/1_serve/meta.json b/3_cli/1_serve/meta.json
new file mode 100644
index 0000000..b48f9f9
--- /dev/null
+++ b/3_cli/1_serve/meta.json
@@ -0,0 +1,9 @@
+{
+ "title": "Serve",
+ "pages": [
+ "log-stream",
+ "server-start",
+ "server-status",
+ "server-stop"
+ ]
+}
diff --git a/3_cli/1_serve/server-start.md b/3_cli/1_serve/server-start.md
index 2242f89..19250ea 100644
--- a/3_cli/1_serve/server-start.md
+++ b/3_cli/1_serve/server-start.md
@@ -1,6 +1,6 @@
---
-title: "`lms server start`"
-sidebar_title: "`lms server start`"
+title: "lms server start"
+sidebar_title: "lms server start"
description: Start the LM Studio local server with customizable port and logging options.
index: 1
---
diff --git a/3_cli/1_serve/server-status.md b/3_cli/1_serve/server-status.md
index bb194ed..416ff4f 100644
--- a/3_cli/1_serve/server-status.md
+++ b/3_cli/1_serve/server-status.md
@@ -1,6 +1,6 @@
---
-title: "`lms server status`"
-sidebar_title: "`lms server status`"
+title: "lms server status"
+sidebar_title: "lms server status"
description: Check the status of your running LM Studio server instance.
index: 2
---
diff --git a/3_cli/1_serve/server-stop.md b/3_cli/1_serve/server-stop.md
index 0eac1c0..ec15b74 100644
--- a/3_cli/1_serve/server-stop.md
+++ b/3_cli/1_serve/server-stop.md
@@ -1,6 +1,6 @@
---
-title: "`lms server stop`"
-sidebar_title: "`lms server stop`"
+title: "lms server stop"
+sidebar_title: "lms server stop"
description: Stop the running LM Studio server instance.
index: 3
---
diff --git a/3_cli/2_daemon/daemon-down.md b/3_cli/2_daemon/daemon-down.mdx
similarity index 77%
rename from 3_cli/2_daemon/daemon-down.md
rename to 3_cli/2_daemon/daemon-down.mdx
index 6725958..c606193 100644
--- a/3_cli/2_daemon/daemon-down.md
+++ b/3_cli/2_daemon/daemon-down.mdx
@@ -1,6 +1,6 @@
---
-title: "`lms daemon down`"
-sidebar_title: "`lms daemon down`"
+title: "lms daemon down"
+sidebar_title: "lms daemon down"
description: Stop llmster from the CLI.
index: 2
---
@@ -11,9 +11,9 @@ The `lms daemon down` command stops the running llmster.
lms daemon down
```
-```lms_info
+
`lms daemon down` only works if llmster is running. It will not stop LM Studio if it is running as a GUI app.
-```
+
### Learn more
diff --git a/3_cli/2_daemon/daemon-status.md b/3_cli/2_daemon/daemon-status.md
index d725b6f..bb10108 100644
--- a/3_cli/2_daemon/daemon-status.md
+++ b/3_cli/2_daemon/daemon-status.md
@@ -1,6 +1,6 @@
---
-title: "`lms daemon status`"
-sidebar_title: "`lms daemon status`"
+title: "lms daemon status"
+sidebar_title: "lms daemon status"
description: Check whether llmster is running.
index: 3
---
diff --git a/3_cli/2_daemon/daemon-up.md b/3_cli/2_daemon/daemon-up.md
index db37b03..7d34e5e 100644
--- a/3_cli/2_daemon/daemon-up.md
+++ b/3_cli/2_daemon/daemon-up.md
@@ -1,6 +1,6 @@
---
-title: "`lms daemon up`"
-sidebar_title: "`lms daemon up`"
+title: "lms daemon up"
+sidebar_title: "lms daemon up"
description: Start llmster from the CLI.
index: 1
---
diff --git a/3_cli/2_daemon/daemon-update.md b/3_cli/2_daemon/daemon-update.md
index 5fbc3ef..355d9ef 100644
--- a/3_cli/2_daemon/daemon-update.md
+++ b/3_cli/2_daemon/daemon-update.md
@@ -1,6 +1,6 @@
---
-title: "`lms daemon update`"
-sidebar_title: "`lms daemon update`"
+title: "lms daemon update"
+sidebar_title: "lms daemon update"
description: Update llmster to the latest version.
index: 4
---
diff --git a/3_cli/2_daemon/meta.json b/3_cli/2_daemon/meta.json
new file mode 100644
index 0000000..c079bed
--- /dev/null
+++ b/3_cli/2_daemon/meta.json
@@ -0,0 +1,9 @@
+{
+ "title": "Daemon",
+ "pages": [
+ "daemon-down",
+ "daemon-status",
+ "daemon-up",
+ "daemon-update"
+ ]
+}
diff --git a/3_cli/3_link/link-disable.md b/3_cli/3_link/link-disable.md
index 8fbce66..ea10a0c 100644
--- a/3_cli/3_link/link-disable.md
+++ b/3_cli/3_link/link-disable.md
@@ -1,6 +1,6 @@
---
-title: "`lms link disable`"
-sidebar_title: "`lms link disable`"
+title: "lms link disable"
+sidebar_title: "lms link disable"
description: Disable LM Link on this device from the CLI.
index: 2
---
diff --git a/3_cli/3_link/link-enable.md b/3_cli/3_link/link-enable.mdx
similarity index 88%
rename from 3_cli/3_link/link-enable.md
rename to 3_cli/3_link/link-enable.mdx
index 413bcae..fbfccee 100644
--- a/3_cli/3_link/link-enable.md
+++ b/3_cli/3_link/link-enable.mdx
@@ -1,15 +1,15 @@
---
-title: "`lms link enable`"
-sidebar_title: "`lms link enable`"
+title: "lms link enable"
+sidebar_title: "lms link enable"
description: Enable LM Link on this device from the CLI.
index: 1
---
The `lms link enable` command enables LM Link on this device, allowing it to connect with other devices on the same link.
-```lms_info
+
LM Link requires an LM Studio account. Run `lms login` first if you haven't already.
-```
+
## Enable LM Link
diff --git a/3_cli/3_link/link-set-device-name.md b/3_cli/3_link/link-set-device-name.md
index b5ec3cf..a6c80e8 100644
--- a/3_cli/3_link/link-set-device-name.md
+++ b/3_cli/3_link/link-set-device-name.md
@@ -1,6 +1,6 @@
---
-title: "`lms link set-device-name`"
-sidebar_title: "`lms link set-device-name`"
+title: "lms link set-device-name"
+sidebar_title: "lms link set-device-name"
description: Rename this device on LM Link from the CLI.
index: 4
---
diff --git a/3_cli/3_link/link-set-preferred-device.md b/3_cli/3_link/link-set-preferred-device.md
index c494a80..593cbfe 100644
--- a/3_cli/3_link/link-set-preferred-device.md
+++ b/3_cli/3_link/link-set-preferred-device.md
@@ -1,6 +1,6 @@
---
-title: "`lms link set-preferred-device`"
-sidebar_title: "`lms link set-preferred-device`"
+title: "lms link set-preferred-device"
+sidebar_title: "lms link set-preferred-device"
description: Set the preferred device for model resolution on LM Link.
index: 5
---
diff --git a/3_cli/3_link/link-status.md b/3_cli/3_link/link-status.md
index e570e12..d4f5c1b 100644
--- a/3_cli/3_link/link-status.md
+++ b/3_cli/3_link/link-status.md
@@ -1,6 +1,6 @@
---
-title: "`lms link status`"
-sidebar_title: "`lms link status`"
+title: "lms link status"
+sidebar_title: "lms link status"
description: Check LM Link connection status and see connected peers.
index: 3
---
diff --git a/3_cli/3_link/meta.json b/3_cli/3_link/meta.json
new file mode 100644
index 0000000..78b28a3
--- /dev/null
+++ b/3_cli/3_link/meta.json
@@ -0,0 +1,10 @@
+{
+ "title": "Link",
+ "pages": [
+ "link-disable",
+ "link-enable",
+ "link-set-device-name",
+ "link-set-preferred-device",
+ "link-status"
+ ]
+}
diff --git a/3_cli/4_runtime/meta.json b/3_cli/4_runtime/meta.json
new file mode 100644
index 0000000..df9dcfe
--- /dev/null
+++ b/3_cli/4_runtime/meta.json
@@ -0,0 +1,6 @@
+{
+ "title": "Runtime",
+ "pages": [
+ "runtime"
+ ]
+}
diff --git a/3_cli/4_runtime/runtime.md b/3_cli/4_runtime/runtime.md
index 32eba83..e35baed 100644
--- a/3_cli/4_runtime/runtime.md
+++ b/3_cli/4_runtime/runtime.md
@@ -1,6 +1,6 @@
---
-title: "`lms runtime`"
-sidebar_title: "`lms runtime`"
+title: "lms runtime"
+sidebar_title: "lms runtime"
description: Manage LM Studio inference runtimes from the CLI.
index: 1
---
diff --git a/3_cli/5_develop-and-publish/clone.md b/3_cli/5_develop-and-publish/clone.md
index 5493bdd..7444307 100644
--- a/3_cli/5_develop-and-publish/clone.md
+++ b/3_cli/5_develop-and-publish/clone.md
@@ -1,6 +1,6 @@
---
-title: "`lms clone`"
-sidebar_title: "`lms clone`"
+title: "lms clone"
+sidebar_title: "lms clone"
description: Clone an artifact from LM Studio Hub to a local folder (beta).
index: 1
---
diff --git a/3_cli/5_develop-and-publish/dev.md b/3_cli/5_develop-and-publish/dev.md
index 62a6650..b08e6d5 100644
--- a/3_cli/5_develop-and-publish/dev.md
+++ b/3_cli/5_develop-and-publish/dev.md
@@ -1,6 +1,6 @@
---
-title: "`lms dev` (Beta)"
-sidebar_title: "`lms dev`"
+title: "lms dev (Beta)"
+sidebar_title: "lms dev"
description: Start a plugin dev server or install a local plugin (beta).
index: 3
---
diff --git a/3_cli/5_develop-and-publish/login.md b/3_cli/5_develop-and-publish/login.md
index 3dbb3e4..b649770 100644
--- a/3_cli/5_develop-and-publish/login.md
+++ b/3_cli/5_develop-and-publish/login.md
@@ -1,6 +1,6 @@
---
-title: "`lms login`"
-sidebar_title: "`lms login`"
+title: "lms login"
+sidebar_title: "lms login"
description: Authenticate with LM Studio Hub (beta).
index: 4
---
diff --git a/3_cli/5_develop-and-publish/meta.json b/3_cli/5_develop-and-publish/meta.json
new file mode 100644
index 0000000..fdada2a
--- /dev/null
+++ b/3_cli/5_develop-and-publish/meta.json
@@ -0,0 +1,9 @@
+{
+ "title": "Develop & Publish",
+ "pages": [
+ "clone",
+ "dev",
+ "login",
+ "push"
+ ]
+}
diff --git a/3_cli/5_develop-and-publish/push.md b/3_cli/5_develop-and-publish/push.md
index bf5e5f2..77ab14b 100644
--- a/3_cli/5_develop-and-publish/push.md
+++ b/3_cli/5_develop-and-publish/push.md
@@ -1,6 +1,6 @@
---
-title: "`lms push` (Beta)"
-sidebar_title: "`lms push`"
+title: "lms push (Beta)"
+sidebar_title: "lms push"
description: Upload the current folder's artifact to LM Studio Hub (beta).
index: 2
---
diff --git a/3_cli/_lms-load.md b/3_cli/_lms-load.md
index cc2bb2e..f2fa392 100644
--- a/3_cli/_lms-load.md
+++ b/3_cli/_lms-load.md
@@ -1,5 +1,5 @@
---
-title: "`lms load`"
+title: "lms load"
description: Use the lms CLI to load or unload models
---
diff --git a/3_cli/contributing.md b/3_cli/contributing.md
index 60c19a9..209f793 100644
--- a/3_cli/contributing.md
+++ b/3_cli/contributing.md
@@ -1,7 +1,7 @@
---
title: "Contributing"
sidebar_title: "Contributing"
-description: "Learn where to file issues and how to contribute to the `lms` CLI."
+description: "Learn where to file issues and how to contribute to the lms CLI."
index: 2
---
diff --git a/3_cli/index.md b/3_cli/index.mdx
similarity index 95%
rename from 3_cli/index.md
rename to 3_cli/index.mdx
index 48e4efa..a6b69ee 100644
--- a/3_cli/index.md
+++ b/3_cli/index.mdx
@@ -1,7 +1,7 @@
---
-title: "`lms` — LM Studio's CLI"
+title: "lms — LM Studio's CLI"
sidebar_title: "Introduction"
-description: Get starting with the `lms` command line utility.
+description: Get starting with the lms command line utility.
index: 1
---
@@ -34,13 +34,13 @@ lms --help
### Verify the installation
-```lms_info
+
👉 You need to run LM Studio _at least once_ before you can use `lms`.
-```
+
Open a terminal window and run `lms`.
-```lms_terminal
+```bash title="Terminal"
$ lms
lms is LM Studio's CLI utility for your models, server, and inference runtime. (v0.0.47)
diff --git a/3_cli/meta.json b/3_cli/meta.json
new file mode 100644
index 0000000..631b405
--- /dev/null
+++ b/3_cli/meta.json
@@ -0,0 +1,21 @@
+{
+ "title": "CLI",
+ "pages": [
+ "---Introduction---",
+ "index",
+ "contributing",
+ "_lms-load",
+ "---Local Models---",
+ "...0_local-models",
+ "---Serve---",
+ "...1_serve",
+ "---Daemon---",
+ "...2_daemon",
+ "---Link---",
+ "...3_link",
+ "---Runtime---",
+ "...4_runtime",
+ "---Develop & Publish---",
+ "...5_develop-and-publish"
+ ]
+}
diff --git a/4_integrations/1_mcp-remote/meta.json b/4_integrations/1_mcp-remote/meta.json
new file mode 100644
index 0000000..8df5ad6
--- /dev/null
+++ b/4_integrations/1_mcp-remote/meta.json
@@ -0,0 +1,6 @@
+{
+ "title": "MCP Integrations",
+ "pages": [
+ "popular"
+ ]
+}
diff --git a/4_integrations/1_mcp-remote/popular.md b/4_integrations/1_mcp-remote/popular.md
index b59d26c..aeea752 100644
--- a/4_integrations/1_mcp-remote/popular.md
+++ b/4_integrations/1_mcp-remote/popular.md
@@ -11,12 +11,16 @@ Create issues, search projects, update statuses, and more in Linear, directly fr
@@ -40,12 +44,16 @@ Search pages, create documents, and read from your Notion workspace.
@@ -69,12 +77,16 @@ Work with Jira issues and Confluence pages from within LM Studio.
@@ -98,12 +110,16 @@ Query issues, inspect stack traces, and analyze errors from your Sentry projects
diff --git a/4_integrations/claude-code.md b/4_integrations/claude-code.md
deleted file mode 100644
index 31dd739..0000000
--- a/4_integrations/claude-code.md
+++ /dev/null
@@ -1,60 +0,0 @@
----
-title: Claude Code
-description: Use Claude Code with LM Studio
-index: 2
----
-
-Claude Code can talk to LM Studio via the Anthropic-compatible `POST /v1/messages` endpoint.
-See: [Anthropic-compatible Messages endpoint](/docs/developer/anthropic-compat/messages).
-
-
-
-```lms_protip
-Have a powerful LLM rig? Use [LM Link](/docs/integrations/lmlink) to run Claude Code from your laptop while the model runs on your rig.
-```
-
-### 1) Start LM Studio's local server
-
-Make sure LM Studio is running as a server (default port `1234`).
-
-You can start it from the app, or from the terminal with `lms`:
-
-```bash
-lms server start --port 1234
-```
-
-### 2) Configure Claude Code
-
-Set these environment variables so the `claude` CLI points to your local LM Studio:
-
-```bash
-export ANTHROPIC_BASE_URL=http://localhost:1234
-export ANTHROPIC_AUTH_TOKEN=lmstudio
-```
-
-Notes:
-
-- If Require Authentication is enabled, set `ANTHROPIC_AUTH_TOKEN` to your LM Studio API token. To learn more, see: [Authentication](/docs/developer/core/authentication).
-
-### 3) Run Claude Code against a local model
-
-```bash
-claude --model openai/gpt-oss-20b
-```
-
-```lms_protip
-Use a model (and server/model settings) with more than ~25k context length. Tools like Claude Code can consume a lot of context.
-```
-
-### 4) If Require Authentication is enabled, use your LM Studio API token
-
-If you turned on "Require Authentication" in LM Studio, create an API token and set:
-
-```bash
-export LM_API_TOKEN=
-export ANTHROPIC_AUTH_TOKEN=$LM_API_TOKEN
-```
-
-When Require Authentication is enabled, LM Studio accepts both `x-api-key` and `Authorization: Bearer `.
-
-If you're running into trouble, hop onto our [Discord](https://discord.gg/lmstudio)
diff --git a/4_integrations/claude-code.mdx b/4_integrations/claude-code.mdx
new file mode 100644
index 0000000..88d9c9c
--- /dev/null
+++ b/4_integrations/claude-code.mdx
@@ -0,0 +1,74 @@
+---
+title: Claude Code
+description: Use Claude Code with LM Studio
+index: 2
+---
+
+import { Step, Steps } from "fumadocs-ui/components/steps";
+
+Claude Code can talk to LM Studio via the Anthropic-compatible `POST /v1/messages` endpoint.
+See: [Anthropic-compatible Messages endpoint](/docs/developer/anthropic-compat/messages).
+
+
+
+
+Have a powerful LLM rig? Use [LM Link](/docs/integrations/lmlink) to run Claude Code from your laptop while the model runs on your rig.
+
+
+## Setup
+
+
+
+ Start LM Studio's local server
+
+ Make sure LM Studio is running as a server (default port `1234`).
+
+ You can start it from the app, or from the terminal with `lms`:
+
+ ```bash
+ lms server start --port 1234
+ ```
+
+
+
+ Configure Claude Code
+
+ Set these environment variables so the `claude` CLI points to your local LM Studio:
+
+ ```bash
+ export ANTHROPIC_BASE_URL=http://localhost:1234
+ export ANTHROPIC_AUTH_TOKEN=lmstudio
+ ```
+
+ Notes:
+
+ - If Require Authentication is enabled, set `ANTHROPIC_AUTH_TOKEN` to your LM Studio API token. To learn more, see: [Authentication](/docs/developer/core/authentication).
+
+
+
+ Run Claude Code against a local model
+
+ ```bash
+ claude --model openai/gpt-oss-20b
+ ```
+
+
+ Use a model (and server/model settings) with more than ~25k context length. Tools like Claude Code can consume a lot of context.
+
+
+
+
+ If Require Authentication is enabled, use your LM Studio API token
+
+ If you turned on "Require Authentication" in LM Studio, create an API token and set:
+
+ ```bash
+ export LM_API_TOKEN=
+ export ANTHROPIC_AUTH_TOKEN=$LM_API_TOKEN
+ ```
+
+ When Require Authentication is enabled, LM Studio accepts both `x-api-key` and `Authorization: Bearer `.
+
+
+
+If you're running into trouble, hop onto our [Discord](https://discord.gg/lmstudio)
diff --git a/4_integrations/codex.md b/4_integrations/codex.md
deleted file mode 100644
index 93bb295..0000000
--- a/4_integrations/codex.md
+++ /dev/null
@@ -1,48 +0,0 @@
----
-title: Codex
-description: Use Codex with LM Studio
-index: 3
----
-
-Codex can talk to LM Studio via the OpenAI-compatible `POST /v1/responses` endpoint.
-See: [OpenAI-compatible Responses endpoint](/docs/developer/openai-compat/responses).
-
-
-
-```lms_protip
-Have a powerful LLM rig? Use [LM Link](/docs/integrations/lmlink) to run Codex from your laptop while the model runs on your rig.
-```
-
-### 1) Start LM Studio's local server
-
-Make sure LM Studio is running as a server (default port `1234`).
-
-You can start it from the app, or from the terminal with `lms`:
-
-```bash
-lms server start --port 1234
-```
-
-### 2) Run Codex against a local model
-
-Run Codex as you normally would, but with the `--oss` flag to point it to LM Studio.
-
-Example:
-
-```bash
-codex --oss
-```
-
-By default, Codex will download and use [openai/gpt-oss-20b](https://lmstudio.ai/models/openai/gpt-oss-20b).
-
-```lms_protip
-Use a model (and server/model settings) with more than ~25k context length. Tools like Codex can consume a lot of context.
-```
-
-You can also use any other model you have available in LM Studio. For example:
-
-```bash
-codex --oss -m ibm/granite-4-micro
-```
-
-If you're running into trouble, hop onto our [Discord](https://discord.gg/lmstudio)
diff --git a/4_integrations/codex.mdx b/4_integrations/codex.mdx
new file mode 100644
index 0000000..deda7d0
--- /dev/null
+++ b/4_integrations/codex.mdx
@@ -0,0 +1,58 @@
+---
+title: Codex
+description: Use Codex with LM Studio
+index: 3
+---
+
+import { Step, Steps } from "fumadocs-ui/components/steps";
+
+Codex can talk to LM Studio via the OpenAI-compatible `POST /v1/responses` endpoint.
+See: [OpenAI-compatible Responses endpoint](/docs/developer/openai-compat/responses).
+
+
+
+
+Have a powerful LLM rig? Use [LM Link](/docs/integrations/lmlink) to run Codex from your laptop while the model runs on your rig.
+
+
+## Setup
+
+
+
+ Start LM Studio's local server
+
+ Make sure LM Studio is running as a server (default port `1234`).
+
+ You can start it from the app, or from the terminal with `lms`:
+
+ ```bash
+ lms server start --port 1234
+ ```
+
+
+
+ Run Codex against a local model
+
+ Run Codex as you normally would, but with the `--oss` flag to point it to LM Studio.
+
+ Example:
+
+ ```bash
+ codex --oss
+ ```
+
+ By default, Codex will download and use [openai/gpt-oss-20b](https://lmstudio.ai/models/openai/gpt-oss-20b).
+
+
+ Use a model (and server/model settings) with more than ~25k context length. Tools like Codex can consume a lot of context.
+
+
+ You can also use any other model you have available in LM Studio. For example:
+
+ ```bash
+ codex --oss -m ibm/granite-4-micro
+ ```
+
+
+
+If you're running into trouble, hop onto our [Discord](https://discord.gg/lmstudio)
diff --git a/4_integrations/meta.json b/4_integrations/meta.json
new file mode 100644
index 0000000..6af7f07
--- /dev/null
+++ b/4_integrations/meta.json
@@ -0,0 +1,13 @@
+{
+ "title": "Integrations",
+ "pages": [
+ "---Introduction---",
+ "index",
+ "lmlink",
+ "claude-code",
+ "codex",
+ "openclaw",
+ "---MCP Integrations---",
+ "...1_mcp-remote"
+ ]
+}
diff --git a/4_integrations/openclaw.md b/4_integrations/openclaw.md
deleted file mode 100644
index 0e86183..0000000
--- a/4_integrations/openclaw.md
+++ /dev/null
@@ -1,61 +0,0 @@
----
-title: OpenClaw
-description: Use OpenClaw with LM Studio
-index: 3
----
-
-OpenClaw now supports LM Studio as a native model provider.
-See: [OpenClaw Docs](https://docs.openclaw.ai/providers/lmstudio).
-
-
-
-```lms_protip
-Have a powerful LLM rig? Use [LM Link](/docs/integrations/lmlink) to run OpenClaw from your laptop while the model runs on your rig.
-```
-
-### 1) Start LM Studio's local server
-
-Make sure LM Studio is running as a server (default port `1234`).
-
-You can start it from the app, or from the terminal with `lms`:
-
-```bash
-lms server start --port 1234
-```
-
-### 2) Run Openclaw with LM Studio as model provider
-
-Install OpenClaw as normal or run the OpenClaw onboard command as follows *(recommended)*
-
-```bash
-openclaw onboard
-```
-
-and complete the interactive setup with LM Studio as your model provider
-
-You can do the onboarding in non-interactive way by using the following command:
-
-```bash
-openclaw onboard \
- --non-interactive \
- --accept-risk \
- --auth-choice lmstudio \
- --custom-base-url http://localhost:1234/v1 \
- --lmstudio-api-key "$LM_API_TOKEN" \
- --custom-model-id qwen/qwen3.5-9b
-```
-
-```lms_protip
-Use a model (and server/model settings) with more than ~50k context length. Tools like OpenClaw can consume a lot of context.
-```
-
-### 3) Set up LM Studio as default memory search provider
-
-To use LM Studio as the embedding model provider for memory search, run the following command and restart openclaw gateway
-
-```bash
-openclaw config set agents.defaults.memorySearch.provider lmstudio
-openclaw gateway restart
-```
-
-If you're running into trouble, hop onto our [Discord](https://discord.gg/lmstudio)
diff --git a/4_integrations/openclaw.mdx b/4_integrations/openclaw.mdx
new file mode 100644
index 0000000..ec0b5de
--- /dev/null
+++ b/4_integrations/openclaw.mdx
@@ -0,0 +1,73 @@
+---
+title: OpenClaw
+description: Use OpenClaw with LM Studio
+index: 3
+---
+
+import { Step, Steps } from "fumadocs-ui/components/steps";
+
+OpenClaw now supports LM Studio as a native model provider.
+See: [OpenClaw Docs](https://docs.openclaw.ai/providers/lmstudio).
+
+
+
+
+Have a powerful LLM rig? Use [LM Link](/docs/integrations/lmlink) to run OpenClaw from your laptop while the model runs on your rig.
+
+
+## Setup
+
+
+
+ Start LM Studio's local server
+
+ Make sure LM Studio is running as a server (default port `1234`).
+
+ You can start it from the app, or from the terminal with `lms`:
+
+ ```bash
+ lms server start --port 1234
+ ```
+
+
+
+ Run OpenClaw with LM Studio as model provider
+
+ Install OpenClaw as normal or run the OpenClaw onboard command as follows *(recommended)*
+
+ ```bash
+ openclaw onboard
+ ```
+
+ and complete the interactive setup with LM Studio as your model provider
+
+ You can do the onboarding in non-interactive way by using the following command:
+
+ ```bash
+ openclaw onboard \
+ --non-interactive \
+ --accept-risk \
+ --auth-choice lmstudio \
+ --custom-base-url http://localhost:1234/v1 \
+ --lmstudio-api-key "$LM_API_TOKEN" \
+ --custom-model-id qwen/qwen3.5-9b
+ ```
+
+
+ Use a model (and server/model settings) with more than ~50k context length. Tools like OpenClaw can consume a lot of context.
+
+
+
+
+ Set up LM Studio as default memory search provider
+
+ To use LM Studio as the embedding model provider for memory search, run the following command and restart openclaw gateway
+
+ ```bash
+ openclaw config set agents.defaults.memorySearch.provider lmstudio
+ openclaw gateway restart
+ ```
+
+
+
+If you're running into trouble, hop onto our [Discord](https://discord.gg/lmstudio)
diff --git a/5_lmlink/1_basics/meta.json b/5_lmlink/1_basics/meta.json
new file mode 100644
index 0000000..c5c7dd6
--- /dev/null
+++ b/5_lmlink/1_basics/meta.json
@@ -0,0 +1,8 @@
+{
+ "title": "Getting Started",
+ "pages": [
+ "add-device",
+ "faq",
+ "preferred-device"
+ ]
+}
diff --git a/5_lmlink/meta.json b/5_lmlink/meta.json
new file mode 100644
index 0000000..fb44d63
--- /dev/null
+++ b/5_lmlink/meta.json
@@ -0,0 +1,9 @@
+{
+ "title": "LM Link",
+ "pages": [
+ "---Introduction---",
+ "index",
+ "---Getting Started---",
+ "...1_basics"
+ ]
+}
diff --git a/README.md b/README.md
index 044c253..0b1977d 100644
--- a/README.md
+++ b/README.md
@@ -46,38 +46,28 @@ Configurations that look good:
2. no title + 2+ variants
````
-```lms_code_snippet
- variants:
- TypeScript:
- language: typescript
- code: |
- // Multi-line TypeScript code
- function hello() {
- console.log("hey")
- return "world"
- }
-
- Python:
- language: python
- code: |
- # Multi-line Python code
- def hello():
- print("hey")
- return "world"
+```typescript tab="TypeScript"
+// Multi-line TypeScript code
+function hello() {
+ console.log("hey")
+ return "world"
+}
+```
+
+```python tab="Python"
+# Multi-line Python code
+def hello():
+ print("hey")
+ return "world"
```
````
````
-```lms_code_snippet
- title: "generator.py"
- variants:
- Python:
- language: python
- code: |
- # Multi-line Python code
- def hello():
- print("hey")
- return "world"
+```python title="generator.py"
+# Multi-line Python code
+def hello():
+ print("hey")
+ return "world"
```
````
diff --git a/_template_dont_edit.md b/_template_dont_edit.mdx
similarity index 84%
rename from _template_dont_edit.md
rename to _template_dont_edit.mdx
index acdb566..98b8427 100644
--- a/_template_dont_edit.md
+++ b/_template_dont_edit.mdx
@@ -12,36 +12,26 @@ Configurations that look good:
1. title + 1 variant
2. no title + 2+ variants
-```lms_code_snippet
- variants:
- TypeScript:
- language: typescript
- code: |
- // Multi-line TypeScript code
- function hello() {
- console.log("hey")
- return "world"
- }
-
- Python:
- language: python
- code: |
- # Multi-line Python code
- def hello():
- print("hey")
- return "world"
+```typescript tab="TypeScript"
+// Multi-line TypeScript code
+function hello() {
+ console.log("hey")
+ return "world"
+}
+```
+
+```python tab="Python"
+# Multi-line Python code
+def hello():
+ print("hey")
+ return "world"
```
-```lms_code_snippet
- title: "generator.py"
- variants:
- Python:
- language: python
- code: |
- # Multi-line Python code
- def hello():
- print("hey")
- return "world"
+```python title="generator.py"
+# Multi-line Python code
+def hello():
+ print("hey")
+ return "world"
```
@@ -82,17 +72,17 @@ async function main() {
main();
```
-```lms_notice
+
You can jump to Settings from anywhere in the app by pressing `cmd` + `,` on macOS or `ctrl` + `,` on Windows/Linux.
-```
+
-```lms_protip
+
You can jump to Settings from anywhere in the app by pressing `cmd` + `,` on macOS or `ctrl` + `,` on Windows/Linux.
-```
+
-```lms_warning
+
You can jump to Settings from anywhere in the app by pressing `cmd` + `,` on macOS or `ctrl` + `,` on Windows/Linux.
-```
+
### Params
diff --git a/meta.json b/meta.json
new file mode 100644
index 0000000..c23b45b
--- /dev/null
+++ b/meta.json
@@ -0,0 +1,12 @@
+{
+ "title": "Docs",
+ "pages": [
+ "0_app",
+ "1_developer",
+ "1_python",
+ "2_typescript",
+ "3_cli",
+ "4_integrations",
+ "5_lmlink"
+ ]
+}