Add new load-test command#281
Conversation
This new command launches queries in parallel against the specified endpoint, and shows a statistics about the response times, success rates, and so on. There are options `--request-rate` (how many queries per second to send), `--duration` (how long to run the test), and `--log-frequency` (with what frequency to print the statistics). Queries can be read from a TSV or YAML file (same format as for the `benchmark-queries` command), or from a directory that contains one JSON file per query (format used for our wikidata-query-log dataset). The option `--query-selection` determines how queries are selected (`random`, which is the default, or `cyclic`).
There was a problem hiding this comment.
Pull request overview
Adds a new load-test CLI command to generate concurrent SPARQL traffic against a configured endpoint and periodically report latency/error statistics. This extends the existing qlever command suite with a purpose-built load generator that can reuse existing query-file formats.
Changes:
- Introduces
LoadTestCommandwith options for request rate, duration, log frequency, endpoint selection, and query selection strategy. - Adds a unified
QuerySourceabstraction to load queries from TSV, YAML, or a directory of JSON query logs. - Implements periodic table-style status logging plus a final HTTP status code histogram.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| subparser.add_argument( | ||
| "--query-selection", | ||
| type=str, | ||
| choices=["random", "cycle"], | ||
| default="random", | ||
| help="How to pick the next query (default: random)", | ||
| ) |
There was a problem hiding this comment.
--query-selection is documented in the PR description as supporting cyclic, but the CLI currently exposes cycle (and the QuerySource.get_query docstring also says "cycle"). To avoid user confusion / broken scripts, align the option name with the PR description (e.g., accept cyclic as an alias, or rename the choice and update help/docstrings accordingly).
| try: | ||
| start_time = time.time() | ||
| next_log_time = start_time + log_frequency_secs | ||
| interval = 1.0 / args.request_rate | ||
| next_launch_time = start_time + interval |
There was a problem hiding this comment.
interval = 1.0 / args.request_rate will raise ZeroDivisionError for --request-rate 0 and behaves oddly for negative values. Also --log-frequency 0s will make next_log_time never advance, causing a tight loop that logs continuously. Consider validating request_rate > 0 and log_frequency_secs > 0 (and possibly duration_secs > 0 if you want to disallow no-op runs) and return a clear error message when invalid.
| try: | ||
| result = subprocess.run( | ||
| curl_cmd, shell=True, capture_output=True, | ||
| text=True, timeout=max(duration_secs * 2, 300), | ||
| ) | ||
| elapsed = time.time() - start | ||
| http_code = result.stdout.strip() | ||
| with lock: | ||
| num_done += 1 | ||
| completed_times.append(elapsed) | ||
| status_codes[http_code] = \ | ||
| status_codes.get(http_code, 0) + 1 | ||
| if http_code != "200": | ||
| num_errors += 1 | ||
| except Exception: | ||
| elapsed = time.time() - start | ||
| with lock: | ||
| num_done += 1 | ||
| num_errors += 1 | ||
| completed_times.append(elapsed) | ||
| status_codes["timeout"] = \ | ||
| status_codes.get("timeout", 0) + 1 |
There was a problem hiding this comment.
send_query catches all exceptions and records them as timeout, which will misclassify failures like curl not being installed, DNS errors, or other subprocess issues. Catch subprocess.TimeoutExpired explicitly for timeouts, and handle other exceptions separately (e.g., record an exception/failed_to_run bucket and log at debug level).
| thread = threading.Thread( | ||
| target=send_query, args=(query,), daemon=True, | ||
| ) | ||
| thread.start() | ||
| with lock: | ||
| num_launched += 1 | ||
| next_launch_time += interval |
There was a problem hiding this comment.
Threads are started as daemon=True and the code never waits for in-flight requests to finish. This makes the final Done/Running counts and timing stats potentially incorrect, and the process may exit while curl subprocesses are still running. Track created threads (or use a bounded executor) and join them (optionally with a grace period) before printing the final status/histogram.
| thread.start() | ||
| with lock: | ||
| num_launched += 1 |
There was a problem hiding this comment.
There is a race where num_done can be incremented by the worker thread before num_launched is incremented (because thread.start() happens before num_launched += 1). That can temporarily make Running = Launched - Done negative and distort intermediate stats. Increment num_launched under the lock before starting the thread (or otherwise ensure launched is recorded before the worker can finish).
| thread.start() | |
| with lock: | |
| num_launched += 1 | |
| with lock: | |
| num_launched += 1 | |
| thread.start() |
| # Launch queries that are due. | ||
| while next_launch_time <= now: | ||
| query = source.get_query(args.query_selection) | ||
| thread = threading.Thread( | ||
| target=send_query, args=(query,), daemon=True, | ||
| ) | ||
| thread.start() | ||
| with lock: | ||
| num_launched += 1 | ||
| next_launch_time += interval |
There was a problem hiding this comment.
Launching one new threading.Thread per request with no upper bound can exhaust OS resources at higher --request-rate or when the endpoint slows down (threads accumulate). A more robust approach is to use a bounded worker pool (e.g., ThreadPoolExecutor with a max size) plus a queue, or an async HTTP client, and apply backpressure when the system is saturated.
| def parse_duration(s: str) -> float: | ||
| """ | ||
| Parse a duration string like "10s", "34min", "2h", "1.5h", "90" (seconds | ||
| by default) and return the duration in seconds. | ||
| """ | ||
| s = s.strip() | ||
| match = re.fullmatch(r"(\d+(?:\.\d+)?)\s*(s|sec|min|m|h|hr)?", s) | ||
| if not match: | ||
| raise ValueError(f"Cannot parse duration: \"{s}\" (examples: " | ||
| f"\"10s\", \"34min\", \"2h\", \"90\")") | ||
| value = float(match.group(1)) | ||
| unit = match.group(2) or "s" | ||
| if unit in ("s", "sec"): | ||
| return value | ||
| elif unit in ("min", "m"): | ||
| return value * 60 | ||
| elif unit in ("h", "hr"): | ||
| return value * 3600 | ||
| else: | ||
| raise ValueError(f"Unknown duration unit: \"{unit}\"") |
There was a problem hiding this comment.
This file introduces non-trivial parsing/selection logic (parse_duration, format_duration, and QuerySource parsing/selection) plus a new command entrypoint. The repository has command-level pytest coverage (e.g., test/qlever/commands/*), but there are currently no tests for this new load-test command and its helpers. Please add unit tests for duration parsing/formatting edge cases and for QuerySource behavior across TSV/YAML/dir modes (including random vs cycle).
`--queries-dir` replaced by `--queries-jsonl` -> reads a single JSONL file instead of a directory of JSON files; all three formats (TSV, YAML, JSONL) now fully read queries into memory upfront `--duration` replaced by `--num-queries` (default 1000) -> specify how many queries to send, not how long to run `--query-selection` added -> `random` (default) or `cycle` `--show-queries` added -> print full queries in purple and exit `--show-error-messages` added -> show query and error message for failed queries Refactor code and various minor improvements
This new command launches queries in parallel against the specified endpoint, and shows a statistics about the response times, success rates, and so on. There are options
--request-rate(how many queries per second to send),--duration(how long to run the test), and--log-frequency(with what frequency to print the statistics).Queries can be read from a TSV or YAML file (same format as for the
benchmark-queriescommand), or from a directory that contains one JSON file per query (format used for our wikidata-query-log dataset). The option--query-selectiondetermines how queries are selected (random, which is the default, orcyclic).