Linguistic Reasoning

A framework for evaluating language models on different tasks.

Requirements

Python 3.x
API keys for the model providers you intend to use

Installation

Clone the repository:

git clone https://github.com/Sara-Rajaee/lingo_reason
cd your-repo

Install dependencies:
```
pip install -r requirements.txt
```

Set your API keys as environment variables:

export TOGETHER_API_KEY=your_key_here
export GEMINI_API_KEY=your_key_here
export OPENAI_API_KEY=your_key_here

You only need to set the keys for the model providers you plan to use.

For GPT-OSS models via a local vLLM proxy:

export GPT_OSS_API_BASE=http://localhost:19743/v1/   # proxy URL (default)
export GPT_OSS_API_KEY=no-key                        # optional, proxy usually has no auth

(optional) register polymath repo as a module (needed for polymath evaluation)

git submodule add https://github.com/QwenLM/PolyMath.git third_party/PolyMath
git submodule update --init --recursive

Serving GPT-OSS Models

GPT-OSS models are served locally via vLLM. A launch script is provided at scripts/serve.sh.

Prerequisites: Install vLLM (pip install vllm) and have GPU(s) available.

# Serve gpt-oss-20b (auto-detects available GPUs for tensor parallelism)
bash scripts/serve.sh openai/gpt-oss-20b

# Serve gpt-oss-120b on 4 GPUs, custom port
TP=4 PORT=8000 bash scripts/serve.sh openai/gpt-oss-120b

# Serve any HuggingFace model
bash scripts/serve.sh meta-llama/Llama-3.3-70B-Instruct

The server exposes an OpenAI-compatible API at http://localhost:19743/v1/ (by default). Once the server is running, use run.py in a separate terminal to evaluate against it.

Environment variables for scripts/serve.sh:

Variable	Default	Description
`PORT`	`19743`	API server port
`TP`	auto-detect	Tensor-parallel GPU count
`DTYPE`	`auto`	Model dtype (`auto`, `float16`, `bfloat16`)
`MAX_MODEL_LEN`	vLLM default	Max sequence length
`GPU_UTIL`	`0.9`	GPU memory utilization (0.0–1.0)
`EXTRA_ARGS`		Additional `vllm serve` arguments

Usage

Run an evaluation by specifying a model and a task:

python run.py --model gemini-2.5-flash --task polymath

# GPT-OSS models (requires a running vLLM proxy)
python run.py --model gpt-oss-20b --task mmmlu
python run.py --model gpt-oss-120b --task polymath

To see all available models and tasks:

python run.py --list

Configuration

Task configurations (e.g. languages, number of evaluation samples) can be modified in the config files located in the config/ directory.

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
config		config
data		data
scripts		scripts
src		src
third_party		third_party
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Linguistic Reasoning

Requirements

Installation

Serving GPT-OSS Models

Usage

Configuration

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Linguistic Reasoning

Requirements

Installation

Serving GPT-OSS Models

Usage

Configuration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages