🧠 [CONCEPT] vertically integrated toolkit

# Vertically integrated toolkit

Currently `apx` depends on several external technologies in regards to routing and http server. 

Specifically, those are:
1. [`uvicorn`](https://uvicorn.dev/) as ASGI-compatible web server  
2. [`FastAPI`](https://fastapi.tiangolo.com/) as a web framework
3. [`sqlmodel`](https://sqlmodel.tiangolo.com/) as a database model systen
4. [`TanStack Router`](https://tanstack.com/router/latest) for frontend side routing

A combination of these toolkits provides solid base for building modern applications in a combined stack of Python + TS. 
`apx` does a large effort in integrating these systems together, providing end-to-end developer experience locally, and packaging them when serving a project on Databricks Apps.

However, there are multiple quirks related to these technologies. 
In this text I'm trying to identify the most critical pieces of the stack and think of potential close-gap solutions from long-term perspective.

## Wider context 

From some moment in time, approx. 2024, the advancement of the coding agents has gotten to a level where business users became capable of almost one-shooting a complex web application. Well advanced stacks provide capability to not just emulate the frontend behaviour, but also combine together backend and frontend logic. 

However, the generated code comes with new risks and challenges, specifically:
1. **Deeply hidden issues** - desplite the advancements in the coding agents, and extensive testing via UI, there are still a lot of "unknown unknows" when working with LLM-generated code. Specifically, coding agent can deliver "what it was asked to do", but at a cost of poor quality, high level of duplication, lower extensibility and poor flow control. This creates an illusion of "well-written" app, however under the hood it might contain poor abstractions, hardcoded logic and problematic behavior patterns.
2. **fragile under heavy load** usually the generated applications are tested and verified by a single user directly in the UI. This might lead both to lack of load testing, as well as single choke points inside the application. Examples of such could be shared dictionaries or in-memory cache objects which are not intended to horizontally scale. Another example is when application is running in a multi-worker setup (e.g. `uvicorn`), where each worker runs in a separate thread. In such scenarios usage of `Queue` or other standard pythonic primitives are impossible, yet agent may not have knowledge of such details purely because of context lack. Since Python is not enforcing such scenarios and has no validation mechanism for "context" of variables, it won't be able to catch such issues. 
3. **opportunistic usage of technologies and libraries** Even with actively provided context, coding agents (due to being learned on OSS code), by nature are trying to handover complex pieces of work into 3rd party libraries, sometimes unchecked and unverified of their capabilities. 

Despite of these problems, agentic code generation with proper combination of context, rules and tools and careful review can and should be used to generate apps and agents. 

Another important piece to keep in mind is that applications are now more and more becoming advanced interfaces which connect complex entities with agents. Standard CRUD forms (the root of most of the applications) are indeed to stay with us for a longer period of time (potentially decades), however more and more modern applications have nothing but a chat interface, combined with some advanced structure on the screen which is editable by the user, preferably in a shared way in real time. 

## The ecosystem

As of 2026, Pythonic development of web-driven applications has been developing into 2 separate directions.

### Pure Pythonic tools
Most popular libraries are Dash, Streamlit, Reflex, NiceGUI. 
All these tools follow one unified conceptual approach: serve a small client which then opens a websocket connection and loads rest of the content from the server. 

Client propagates the clicks directly to the server, and code in python runtime handles these clicks. In this case there is a good benefit of server-driven nature of events, but huge problem with scalability. Any application running this approach with 500+ users needs to keep _at least_ 1 websocket connection per each browser tab. 

Such applications of course provide certain level of convenience to the Pythonic developers, but anticipated to have issues with scalability, connection drops and issues when developing frontend-heavy logic (e.g. 3d maps, advanced visualizations).


### Backend-for-frontend pattern

A more common pattern that appeared recently with development of server frameworks in JS runtime such as NextJS / SolidJS / React Router. Recent advancements in React made it possible to pre-generate certain pieces of the frontend code in the server, and then stream it via `renderToStream` directly into the browser via SSR capabilities. 

In such scenario there are 3 entities usually:
```
<Frontend code in React> <---> <Backend code in NextJS / express> <---> <Pythonic service, e.g. FastAPI>
``` 

This approach provides separation between frontend and backend logic, yet has subtle problems, namely:
- it becomes unclear where should certain business logic live. For example, in a standard CRUD-like system - should it go directly from JS backend to database? How do we notify pythonic service from JS service? 
- Auth becomes a second problem, since now BFF instance  needs to either "pass" the authenticated user from frontend into backend. 
- Codebase becomes bigger due to necessity of supporting rest-like (or gql-like) contract between JS and Python
- Pythonic capabilities for certain tasks are much more robust (e.g. agentic systems) than the JS ones

## The state 

Final piece of the application (would it be agentic application or standard user-oriented one) is the state. The state itself is not a problem, because modern OLTP systems support efficient mechanisms to store any kind of data (would it be blobs or structured data). However managing the state from the codebase becomes a problem of it's own, especially including things like: 

- ORMs (python has `sqlmodel` + `sqlalchemy` and several other tools)
- Subscriptions / Notifications ( electricSQL is a good example) 
- Versioning and migrations (alembic etc).

However the problem is that Pythonic mechanisms for state control are missing basic features such as migrations, lesser dependencies on driver ecosystem, async operations (you can combine it from a set of tools, e.g. `sqlmodel` can do ORM, but doesn't have versioning and migration pattern, async is only possible via `asyncpg`, notifications are not natively supported by `sqlmodel` etc). 

## Vertically integrated system 

If we think about these problems in a nutshell, it becomes clearly visible that the ecosystem is scattered and doesn't provide a vertically integrated capability to build applications from one end to another. 

In a perfect world, users will be looking for the following:

1. ASGI server with workers based on tokio runtime
2. web framework which works in close accord with the ASGI server without requiring custom integrations 
3. built-in OTEL telemetry and tracing 
4. built-in integration between python world and TS files / pages 
5. Efficient ORM with versioning capabilities 
6. ORM extension for streaming cases, e.g. notifications / electricSQL pattern 
7. Simple and concise, type-safe routing that is driven by the server-side code, and clean handover into Typescript world when it comes to frontend code
8. Predefined set of frontend components which is extensible, easy to use and develop

### Implementation detail - single-server process

Due to historical reasons (absence of async operations in Python until 2015, GIL), Python became the only language with a strange web server distribution model where actual web server runs several forked subprocesses (workers).
In comparison:
- NodeJS (Next etc) - Single process, single event loop.
- Go - single process, many goroutines
- Rust (Axum / Actix) - single process, parallel workers with shared state via tokio 

The only language which resembles this approach is Ruby (RoR+uvicorn / puma), and again - primarily because of GIL. 


## The server 

Server is a beating heart of the vertically integrated system. It's main responsibilities are:
- route handling 
- http request / response processing
- websocket handling
- SSE handing

From the implementation perspective, given that `apx` uses rust at it's core, there is no real reason not to continue using `axum`, which provides all necessary capabilities. 

## The runtime

Approach with multiple workes has demonstrated high level of efficiency and control over the server behaviour. It works well with the python GIL pattern, and increases overall server stability. However, such runtime needs additional primitives for efficient cross-worker caching and queueing. 


## The primitives

As mentioned above, the primitives for such a server would be required to provide efficient and secure cross-process management. Apx plans to expose following primitives:

- Cache (in-memory, sqllite based, persists pythonic objects as bytes)
- Queue (in-memory, ? based, persists objects as bytes)

In-memory design should be primary, but might be extended later on.

## Native OpenTelemetry design
OpenTelemetry defines a multi-signal model (traces, metrics, logs) and emphasises correlation via shared resource context and identifiers. 

OTLP defines how telemetry is encoded and transported, and is stable for traces/metrics/logs. 

A runtime-first approach means:
- The Rust server always creates a trace/span for each request/session
- Propagators extract/inject context across inbound HTTP headers and outbound calls (later)
- Logs and metrics are emitted with consistent attributes (route template, status code, worker id, protocol)
- Python user code can attach spans/events, but the base telemetry exists even if user code does nothing
- This is a key differentiator versus “library-only” observability where the app must opt in everywhere.

The pythonic interface should provide access to the telemetry capabilities, e.g.:

```
from apx.telemetry import logs
from apx.telemetry import trace
```

## Proposed architecture model and request lifecycle

The core structure is a Rust supervisor + Rust workers + Python user module model, with compilation artifacts produced at build time.

## Build-time compilation stage
At apx build, metadata is complied (not bytecode)
- Import user modules and locate the apx “app” object (FastAPI-inspired decorators, apx-specific primitives).

Extract:
- Route table (method, path template, handler id)
- Request body model for each operation (Pydantic class)
- Response contract for each operation (Pydantic class or apx standard response type)
- Dependency graph (a DAG with typed inputs/outputs and scopes)
- Protocol endpoints: WebSocket routes, SSE endpoints, and their message contracts

Emit:
- A manifest consumed by Rust at runtime (e.g., JSON or a binary format)
- Generated OpenAPI 3.1 for HTTP operations (plus extensions for WS/SSE typing)
- Generated “registry stubs” to make runtime calls fast and stable

Why this is not critical despite being on hot path: Pydantic’s core validation logic is implemented in Rust (pydantic-core), so the Python-layer type validation already leverages Rust performance characteristics; shifting graph construction and schema generation to build-time reduces runtime reflection costs. 

### Runtime stage: Rust supervisor and worker processes

#### Supervisor
- Loads the compiled manifest
- Manages worker processes (start, restart on failure, health checks)
- Owns cross-process primitives (shared cache/queue), depending on design

#### Workers
- Run the Rust HTTP/WebSocket/SSE server (axum)
- Load the user Python module once per worker
- Execute handler + DI for matched routes

Rust frameworks like axum are explicitly designed to leverage the tower ecosystem for middleware such as timeouts and tracing; tower provides primitives for timeouts and backpressure (e.g., concurrency limits, buffering), which is directly relevant to making a multi-process, high-throughput runtime predictable. 

### Request-to-response lifecycle for HTTP
A standards-aligned lifecycle looks like this:
- Parse request according to HTTP semantics and the selected transport version (HTTP/1.1 initially; future HTTP/2/3 optional). 
- Apply Rust-level admission control and backpressure:
  - global concurrency limit
  - per-route concurrency limit
  - request timeouts (tower timeout middleware aborts requests that exceed the configured duration) 
- Route match:
  - path template parsing, extraction of path params
  - DI resolution (static DAG):
  - Resolve Rust-native dependencies (headers, cookies, query params, remote addr, correlation ids)
  - Resolve Python dependencies (async-only, executed in a controlled way)
  - Support “yield-style” teardown for FastAPI-like semantics (teardown ordering is a major correctness concern; FastAPI supports this pattern explicitly). 
- Request body validation:
  - BaseModel.model_validate_json() for JSON body models (fast path; defined in Pydantic docs) 
- Handler execution:
  - Must be async def (enforced at build time)
- Response enforcement:
  - If handler returns a Pydantic model (or compatible object), validate/serialise via Pydantic (Pydantic supports model_dump_json etc.) 
  - If handler returns an apx standard response, bypass Pydantic but still enforce headers/status/body constraints
- Error handling:
  - All runtime errors and validation errors are transformed into RFC 9457 problem details responses (application/problem+json) with consistent fields (type, title, optional status, detail, instance, plus extensions). 

This pipeline mirrors the developer convenience of FastAPI’s “dependency injection runs before your endpoint and injects values”, while being owned by apx as an executable plan rather than runtime reflection. 

## Why multi-process?

- independent GILs per worker (true parallelism for Python bytecode across CPU cores, avoiding the single-GIL bottleneck)
- crash containment: a worker can die without killing the entire service; supervisor restarts it
- memory isolation: memory leaks in one worker don’t poison the whole service; recycling is feasible

# The bottom line

If apx ever goes into the direction of vertical system, it gets largely differentiated: it won't be a framework anymore -  instead it becomes a runtime + compiler (better to say, builder) toolchain that happens to author user logic in Python. 

The main idea is to take pragmatic view on Python’s runtime realities (GIL, extension safety), but instead of patching them -  adopt Rust-controlled multi-process workers for isolation and parallelism as a systematic approach to increase stability.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🧠 [CONCEPT] vertically integrated toolkit #116

Vertically integrated toolkit

Wider context

The ecosystem

Pure Pythonic tools

Backend-for-frontend pattern

The state

Vertically integrated system

Implementation detail - single-server process

The server

The runtime

The primitives

Native OpenTelemetry design

Proposed architecture model and request lifecycle

Build-time compilation stage

Runtime stage: Rust supervisor and worker processes

Supervisor

Workers

Request-to-response lifecycle for HTTP

Why multi-process?

The bottom line

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

🧠 [CONCEPT] vertically integrated toolkit #116

Description

Vertically integrated toolkit

Wider context

The ecosystem

Pure Pythonic tools

Backend-for-frontend pattern

The state

Vertically integrated system

Implementation detail - single-server process

The server

The runtime

The primitives

Native OpenTelemetry design

Proposed architecture model and request lifecycle

Build-time compilation stage

Runtime stage: Rust supervisor and worker processes

Supervisor

Workers

Request-to-response lifecycle for HTTP

Why multi-process?

The bottom line

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions