Skip to content

gh-138912: Add fast path for match class patterns without sub-patterns#144820

Open
cdce8p wants to merge 1 commit intopython:mainfrom
cdce8p:match-class-opcode-isinstance
Open

gh-138912: Add fast path for match class patterns without sub-patterns#144820
cdce8p wants to merge 1 commit intopython:mainfrom
cdce8p:match-class-opcode-isinstance

Conversation

@cdce8p
Copy link
Contributor

@cdce8p cdce8p commented Feb 14, 2026

Extracted from #139080. This builds on the work merged with 75d4839.

Add a fast path for match class patterns without any sub-patterns. This avoids creating a new (empty) tuple, pushing it to the stack and subsequently unpacking the tuple result from the MATCH_CLASS opcode again.

__
Micro benchmarks with all specializations enabled.

Micro benchmark (bare class pattern)
from timeit import timeit
setup = """
class A:
    def __init__(self, x, y):
        self.x = x
        self.y = y
a = A(1, 2)

def f1(a):
    if isinstance(a, A):
        return True
    return False

def f2(a):
    match a:
        case A():
            return True
    return False
"""
print(f"If\t{timeit("f1(a)", setup, number=10**7)}")
print(f"Match\t{timeit("f2(a)", setup, number=10**7)}")
# Before
If      0.32616550009697676
Match   0.33599245804362

# After
If      0.3079932499676943
Match   0.2534141249489039 -24.58%

If specialization disabled, the performance improvement is even better with -40.5%.


📚 Documentation preview 📚: https://cpython-previews--144820.org.readthedocs.build/

@cdce8p cdce8p force-pushed the match-class-opcode-isinstance branch from a515a54 to 7f8b4b2 Compare February 22, 2026 11:17
@markshannon
Copy link
Member

Opcodes are a very limited resource, so we need to justify it carefully when adding new ones.

Is pattern matching widely used enough yet to justify this? I don't know.
Is the performance increase enough to justify adding a new instruction?


An approach to optimizing pattern matching that I would be more enthusiastic about is optimizing the decision tree.
This will still need new instructions, but they would be smaller and more easily optimized.

For example, case A(a, b) does both an instance check, and unpacks into a and b, so

match m:
    case A(a, b):
        X
    case A():
       Y

can be transformed into the decision tree, like this:

if m isa A:
    if tmp := unpack_attributes(m, 2):
        a, b = tmp
        X
    else:
        Y

which can be converted into code that has simpler operations and is more amenable to further specialization/jitting.

@eendebakpt
Copy link
Contributor

Inspired by Mark's comments I looked at the generated bytecode:

def g(m):
    match m:
        case A():
            return 100
    return None
import dis
print(dis.dis(g))

Output:

  2           RESUME                   0

  3           LOAD_FAST                0 (m)

  4           LOAD_GLOBAL              0 (A)
              LOAD_CONST               1 (())
              MATCH_CLASS              0
              COPY                     1
              POP_JUMP_IF_NONE         3 (to L1)
              UNPACK_SEQUENCE          0

  5           RETURN_CONST             2 (100)

  4   L1:     POP_TOP

  6           RETURN_CONST             0 (None)
None

This can be simplified to (diff main...eendebakpt:pattern_match_class_v0)

  8           RESUME                   0

  9           LOAD_FAST_BORROW         0 (m)

 10           LOAD_GLOBAL              0 (A)
              LOAD_CONST               0 (())
              MATCH_CLASS              0
              POP_JUMP_IF_FALSE        3 (to L1)
              NOT_TAKEN

 11           LOAD_SMALL_INT         100
              RETURN_VALUE

 12   L1:     LOAD_CONST               1 (None)
              RETURN_VALUE
None

A quick benchmarks shows this is a bit faster.

I tried changing the bytecode generation to use an isinstance check instead of MATCH_CLASS, but that turns out to be slower (even when adding isinstance to the LOAD_COMMON_CONSTANT). Maybe this can be made faster (also for the multi argument case), but that would require some more work.

@cdce8p
Copy link
Contributor Author

cdce8p commented Feb 27, 2026

Is pattern matching widely used enough yet to justify this? I don't know. Is the performance increase enough to justify adding a new instruction?

Is it widely used, probably not. However, I'd guess that's mostly because support for Python 3.9 was only dropped at the end of last year. So most projects didn't have the option to start adopting it before then. Furthermore it's unlikely that developers go through existing code bases just to look for good opportunities to replace existing if-statements with match. I did it for pylint, just because I was curious, e.g. pylint-dev/pylint#10529. Sure this refactor is mostly unnecessary but while doing so, I noticed that especially the class pattern can be quite useful. E.g. I replaced this

# old
if isinstance(var, A):
    ...
elif isinstance(var, B):
    ...
elif isinstance(var, C):
    ...

# new
match var:
    case A():
        ...
    case B():
        ...
    case C():
        ...

We ended up keeping it and I've seen a similar PR proposed for pytest. What I did realize though was that I inadvertently ended up introducing a noticeable performance regression, roughly 7-10% for the whole program, on Python 3.13. That's what lead me down the path of exploring how it can be speed up, first with #138915 and now here.

Approach / Implementation

Since class patterns are so useful, I'd like to make them as fast as possible. Ideally so that there isn't a real difference between equivalent if and match statements. To archive that I'd like to eliminate the intermediary tuple completely, at least if only keyword patterns are used. This will change the evaluation from BFS to DFS but I think that is an acceptable tradeoff. To do that however, I'll need two new opcode. First the MATCH_CLASS_ISINSTANCE added here and then another one to get the optional attributes MATCH_CLASS_GET_OPT_ATTR. You can view the full change in #139080.

An approach to optimizing pattern matching that I would be more enthusiastic about is optimizing the decision tree. This will still need new instructions, but they would be smaller and more easily optimized.

I'd consider something like that shortly as I was checking if there are ways to cache the results of the isinstance check in class patterns. Didn't investigate it further though.

--
@eendebakpt Thanks for the suggestion! If the decision is against adding new opcodes, this would be a fallback option. Do note however that an empty tuple is still created (the singleton lookup) placed onto the stack, popped for MATCH_CLASS, created and pushed to the stack again before it needs to be pooped in codegen_pattern_class before returning the result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting review performance Performance or resource usage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants