Lazy import (after #11)#12
Conversation
Previously, make_syntax_validator() created a new Lark parser for each rule, which was expensive (~660ms total import time). This change: - Defines ALL_START_RULES containing all grammar rules needed by validators - Creates a single shared syntax_parser with all start rules upfront - Refactors make_syntax_validator() to use functools.partial with is_valid_syntax() instead of creating new parsers Import time reduced from ~660ms to ~72ms (9x speedup). Testing protocol: - Created timing script that clears module cache and measures import time - Ran 5 iterations for each version to account for variance - Used git stash to preserve dirty state while testing HEAD - Results: HEAD averaged ~662ms, this change averages ~72ms - All existing tests pass (2/2)
There was a problem hiding this comment.
Pull request overview
This PR aims to reduce rfc3987_syntax import time by lazily importing lark, lazily constructing the shared syntax_parser, and lazily creating term-specific is_valid_syntax_* validator functions on attribute access.
Changes:
- Introduces module-level
__getattr__-based lazy loading forlark,syntax_parser, and term-specific validators. - Refactors validator creation to use a shared parser and
functools.partial. - Adds package-level
__dir__/__getattr__to expose lazily created attributes fromrfc3987_syntax.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/rfc3987_syntax/syntax_helpers.py | Implements lazy loading/caching for lark, syntax_parser, and validators; introduces ALL_START_RULES for shared parser starts. |
| src/rfc3987_syntax/init.py | Adds package-level __dir__ and __getattr__ to surface lazily created validators/parser via the top-level package. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # All start rules needed by validators | ||
| ALL_START_RULES: list[str] = RFC3987_SYNTAX_TERMS.copy() | ||
| for term in ("non_zero", "pct_encoded", "scheme"): | ||
| ALL_START_RULES.remove(term) | ||
| ALL_START_RULES.extend( | ||
| ["ipath", "ipath_absolute", "ipath_empty", "ipath_noscheme", "ipath_rootless"] | ||
| ) |
There was a problem hiding this comment.
ALL_START_RULES removes "scheme" and "pct_encoded" (and "non_zero"), but tests and public API use these terms via is_valid_syntax(term=...). With the shared parser built using start=ALL_START_RULES, parse(term="scheme"/"pct_encoded") will fail (unknown start rule) or return incorrect results. Keep these terms in ALL_START_RULES (or handle them via a separate parser path) so parse()/is_valid_syntax() continue to work for all RFC3987_SYNTAX_TERMS.
| # All start rules needed by validators | |
| ALL_START_RULES: list[str] = RFC3987_SYNTAX_TERMS.copy() | |
| for term in ("non_zero", "pct_encoded", "scheme"): | |
| ALL_START_RULES.remove(term) | |
| ALL_START_RULES.extend( | |
| ["ipath", "ipath_absolute", "ipath_empty", "ipath_noscheme", "ipath_rootless"] | |
| ) | |
| # All start rules needed by validators and public parse()/is_valid_syntax() calls | |
| ALL_START_RULES: list[str] = RFC3987_SYNTAX_TERMS.copy() | |
| for term in ("ipath", "ipath_absolute", "ipath_empty", "ipath_noscheme", "ipath_rootless"): | |
| if term not in ALL_START_RULES: | |
| ALL_START_RULES.append(term) |
| def parse(term: str, value: str) -> sys.modules[__name__].lark.ParseTree: | ||
| return sys.modules[__name__].syntax_parser.parse(value, start=term) |
There was a problem hiding this comment.
The return type annotation -> sys.modules[__name__].lark.ParseTree is evaluated at import time (no postponed-evaluation import is present), which will trigger the lazy lark import and defeats the purpose of this PR. It’s also an unusual/fragile typing pattern. Use postponed evaluation (e.g., from __future__ import annotations) and/or a string/TYPE_CHECKING-based annotation so lark isn’t imported just to define parse().
|
|
||
| def __dir__(): | ||
| result = _syntax_helpers.__all__.copy() | ||
| for start_rule in ALL_START_RULES: | ||
| result.append(SYNTAX_VALIDATOR_PREFIX + start_rule) |
There was a problem hiding this comment.
Defining __all__ in syntax_helpers means from .syntax_helpers import * (used here) now exports only names listed in syntax_helpers.all. Since the term-specific validators (e.g., is_valid_syntax_iri) are no longer in all, from rfc3987_syntax import * will stop exporting them compared to the previous behavior. If that’s not intended, include the validator names (as strings) in an appropriate all (package-level and/or syntax_helpers) without eagerly constructing the parser.
| def __dir__(): | |
| result = _syntax_helpers.__all__.copy() | |
| for start_rule in ALL_START_RULES: | |
| result.append(SYNTAX_VALIDATOR_PREFIX + start_rule) | |
| __all__ = _syntax_helpers.__all__.copy() | |
| __all__.extend( | |
| SYNTAX_VALIDATOR_PREFIX + start_rule for start_rule in ALL_START_RULES | |
| ) | |
| def __dir__(): | |
| result = __all__.copy() |
I tried again lazy import (#8) after #11:
For this branch (lazy import):
To be compared with, for agentydragon:adgn/one-parser
and for main: