Skip to content

Commit 7345fe5

Browse files
authored
simplify what's new and add publication_list (#1070)
1 parent e18f9b6 commit 7345fe5

File tree

3 files changed

+39
-24
lines changed

3 files changed

+39
-24
lines changed

README.md

Lines changed: 16 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -30,30 +30,26 @@ See our [paper](https://arxiv.org/pdf/2309.05516) for more details. For usage in
3030

3131

3232
## 🆕 What's New
33-
[2025/11] AutoRound has now landed in **LLM-Compressor**! You can apply AutoRound algorithm using `AutoRoundModifier`. Check out the [example](https://github.com/vllm-project/llm-compressor/tree/main/examples/autoround/README.md) to get started!
3433

35-
[2025/11] AutoRound now offers preliminary support for an enhanced GGUF quantization algorithm via `--enable_alg_ext`. For detailed accuracy benchmarks, please refer to the [documentation](./docs/gguf_alg_ext_acc.md).
34+
* [2025/11] AutoRound has landed in **LLM-Compressor**: [*Usage*](https://github.com/vllm-project/llm-compressor/tree/main/examples/autoround/README.md).
3635

37-
[2025/10] AutoRound has been integrated into **SGLang**. You can now run models in the AutoRound format directly using the SGLang versions newer than v0.5.4.
36+
* [2025/11] An **enhanced GGUF** quantization algorithm is available via `--enable_alg_ext`: [*Accuracy*](./docs/gguf_alg_ext_acc.md).
3837

39-
[2025/10] We enhanced the RTN mode (--iters 0) to significantly reduce quantization cost compared to the default tuning mode. Check out [this doc](./docs/opt_rtn.md) for some accuracy results. If you don’t have sufficient resources, you can use this mode for 4-bit quantization.
38+
* [2025/10] AutoRound has been integrated into **SGLang**: [*Usage*](), [*LMSYS Blog*](https://lmsys.org/blog/2025-11-13-AutoRound/), [*X post*](https://x.com/lmsysorg/status/1991977019220148650?s=20), [*Linkedin*](https://www.linkedin.com/feed/update/urn:li:activity:7397742859354857472).
4039

41-
[2025/10] We proposed a fast algorithm to generate **mixed bits/datatypes** schemes in minutes. Please
42-
refer to the documentation for accuracy [results](./docs/auto_scheme_acc.md) and [this guide](https://github.com/intel/auto-round/blob/main/docs/step_by_step.md#autoscheme) for usage instructions.
40+
* [2025/10] A **mix precision** algorithm is available to generate schemes in minutes: [*Usage*](https://github.com/intel/auto-round/blob/main/docs/step_by_step.md#autoscheme), [*Accuracy*](./docs/auto_scheme_acc.md).
4341

44-
[2025/09] AutoRound now includes experimental support for the **mxfp4 and nvfp4 dtypes**. For accuracy results, see the [documentation](./docs/mxnv_acc.md)
45-
. We currently recommend exporting to the LLM-Compressor format.
42+
* [2025/09] **MXFP4** and **NVFP4** dtypes is available: [*Accuracy*](./docs/mxnv_acc.md).
4643

47-
[2025/08] AutoRound now provides experimental support for **an improved INT2 algorithm** via `--enable_alg_ext`. See this [documentation](./docs/alg_202508.md)
48-
for some accuracy results.
44+
* [2025/08] An **improved INT2** algorithm is available via `--enable_alg_ext`: [*Accuracy*](./docs/alg_202508.md)
4945

50-
[2025/07] AutoRound now offers experimental support for **GGUF** format, and recommends using optimized RTN mode (--iters 0) for
51-
all bits other than 3 bits.
46+
* [2025/07] **GGUF** format is supported: [*Usage*](./docs/step_by_step.md#gguf-format).
5247

53-
[2025/05] AutoRound has been integrated into **Transformers** and **vLLM**.
48+
* [2025/05] AutoRound has been integrated into **vLLM**: [*Usage*](https://docs.vllm.ai/en/latest/features/quantization/auto_round/), [*Blog*](https://medium.com/@NeuralCompressor/accelerating-vllm-and-sglang-deployment-using-autoround-45fdc0b2683e).
5449

55-
[2025/03] The INT2-mixed **DeepSeek-R1** model (~200GB) retains 97.9% accuracy. Check
56-
out [OPEA/DeepSeek-R1-int2-mixed-sym-inc](https://huggingface.co/OPEA/DeepSeek-R1-int2-mixed-sym-inc).
50+
* [2025/05] AutoRound has been integrated into **Transformers**: [*Blog*](https://huggingface.co/blog/autoround).
51+
52+
* [2025/03] The INT2-mixed **DeepSeek-R1** model (~200GB) retains 97.9% accuracy: [*Model*]((https://huggingface.co/OPEA/DeepSeek-R1-int2-mixed-sym-inc)).
5753

5854

5955
## ✨ Key Features
@@ -319,14 +315,14 @@ for prompt, output in zip(prompts, outputs):
319315

320316
### Transformers (CPU/Intel GPU/Gaudi/CUDA)
321317

322-
323318
AutoRound supports 10+ backends and automatically selects the best available backend based on the installed libraries and prompts the user to
324319
install additional libraries when a better backend is found.
325320

326321
**Please avoid manually moving the quantized model to a different device** (e.g., model.to('cpu')) during inference, as
327322
this may cause unexpected exceptions.
328323

329324
The support for Gaudi device is limited.
325+
330326
```python
331327
from transformers import AutoModelForCausalLM, AutoTokenizer
332328

@@ -337,15 +333,12 @@ text = "There is a girl who likes adventure,"
337333
inputs = tokenizer(text, return_tensors="pt").to(model.device)
338334
print(tokenizer.decode(model.generate(**inputs, max_new_tokens=50)[0]))
339335
```
336+
340337
## Acknowledgement
341338
Special thanks to open-source low precision libraries such as AutoGPTQ, AutoAWQ, GPTQModel, Triton, Marlin, and ExLLaMAV2 for providing low-precision CUDA kernels, which are leveraged in AutoRound.
342339

340+
> **Note**:
341+
> For all publications/events, please view [Publication List](./docs/publication_list.md).
342+
343343
## 🌟 Support Us
344344
If you find AutoRound helpful, please ⭐ star the repo and share it with your community!
345-
346-
347-
348-
349-
350-
351-

docs/publication_list.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
Full Publications/Events
2+
==========
3+
4+
## 2025 (3)
5+
6+
* Blog in LMSYS: [AutoRound Meets SGLang: Enabling Quantized Model Inference with AutoRound](https://lmsys.org/blog/2025-11-13-AutoRound/) (Nov 2025)
7+
8+
* Blog in Medium: [Accelerating vLLM and SGLang Deployment using AutoRound](https://medium.com/@NeuralCompressor/accelerating-vllm-and-sglang-deployment-using-autoround-45fdc0b2683e) (Oct 2025)
9+
10+
* Blog in HuggingFace: [What is AutoRound?](https://huggingface.co/blog/autoround) (April 2025)
11+
12+
## 2024 (1)
13+
14+
* EMNLP: [Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLM](https://aclanthology.org/2024.findings-emnlp.662/) (Oct 2024)
15+
16+
# 2023 (2)
17+
18+
* arXiv: [TEQ: Trainable Equivalent Transformation for Quantization of LLMs](https://arxiv.org/abs/2310.10944) (Oct 2023)
19+
20+
* Blog in Medium: [Effective Post-Training Quantization for Large Language Models](https://medium.com/intel-analytics-software/effective-post-training-quantization-for-large-language-models-with-enhanced-smoothquant-approach-93e9d104fb98) (Apr 2023)

docs/step_by_step.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -408,7 +408,9 @@ ar.quantize_and_save(output_dir, format="auto_round")
408408
409409
### GGUF format
410410
Experimental feature. This format is well-suited for CPU devices and is widely adopted by the community.
411-
This format is well-suited for CPU devices and is widely adopted by the community.
411+
412+
The optimized RTN mode is suggested (--iters 0) for all bits other than 3 bits.
413+
412414
```python
413415
from auto_round import AutoRound
414416

0 commit comments

Comments
 (0)