v0.5.0
New features
- Processing time is affected by server load
- Change TTFT parameter to be based on number of request tokens
- KV cache affects prefill time
- Support failure injection
- Implement kv-cache usage and waiting loras Prometheus metrics
- Randomize response length based when max-tokens is defined in the request
- Support DP (data parallel)
- Support /tokenize endpoint
What's Changed
- Fix server interrupt by @npolshakova in #161
- Show final config in simulaor default logger at Info lvel by @pancak3 in #154
- Cast bounds type in tests to func def: latency, interToken, and timeToFirst (to int) by @pancak3 in #163
- Remvoe unnecessary deferal of server close by @pancak3 in #162
- Fix: Rand generator is not set in a test suite which result in accessing nil pointer during runtime if run the only test suite by @pancak3 in #166
- Use channels for metrics updates, added metrics tests by @irar2 in #171
- Remove rerun on comment action by @irar2 in #174
- Add failure injection mode to simulator by @smarunich in #131
- Add waiting loras list to loraInfo metrics by @mayabar in #175
- feat: generate response length based on a histogram when max_tokens is defined in the request by @mayabar in #169
- extend response length buckets calculation to have not necessary equally sized buckets by @mayabar in #176
- Use dynamic ports in zmq tests by @pancak3 in #170
- Change time-to-first-token parameter to be based on number of request tokens #137 by @pancak3 in #165
- Bugfix: was accessing number of tokens from nil var; getting it from req instead by @pancak3 in #177
- feat: add helm charts for Kubernetes deployment by @Blackoutta in #182
- chore: Make the image smaller by @shmuelk in #183
- Take cached prompt tokens into account in prefill time calculation by @irar2 in #184
- Add ignore eos in request by @pancak3 in #187
- Support DP by @irar2 in #188
- Change RandomNorm from float types to int by @pancak3 in #190
- KV cache usage metric by @irar2 in #192
- Adjust request "processing time" to current load by @pancak3 in #189
- Updates for the new release of kv-cache-manager by @irar2 in #194
- DP bug fix: wait after starting rank 0 sim by @irar2 in #193
- Support /tokenize endpoint by @irar2 in #198
- add Service to expose vLLM deployment and update doc by @googs1025 in #201
- Split simulator.go into several files by @irar2 in #199
New Contributors
- @smarunich made their first contribution in #131
- @Blackoutta made their first contribution in #182
- @googs1025 made their first contribution in #201
Full Changelog: v0.4.0...v0.5.0