Replies: 10 comments 14 replies
-
|
We (or I need) to do something like this for the expression-elements-properties branch. |
Beta Was this translation helpful? Give feedback.
-
|
https://github.com/Mathics3/mathics-core/blob/benchmarking/bench-results.rst has recent results comparing V 4.0.0 performance on the combinatorica 0.9 test versus the current master. I will be drilling down here to see what is up in more details. Will also do some comparisons for #83 (comment) If you want to verify the two branches to use are https://github.com/Mathics3/mathics-core/tree/benchmarking for current and https://github.com/Mathics3/mathics-core/tree/4.0.0-benchmark Run: in each branch. |
Beta Was this translation helpful? Give feedback.
-
|
For vs 4.0.0-benchmarking While for 4.1.0: 4.0.0: |
Beta Was this translation helpful? Give feedback.
-
|
I don't have time right now to investigate this further, but I do have a plan for approaching things that I think is neat. First of all, we could use (Elswhere I have mentioned that the "1000" could be turned into "50" or even "10" and things would be just as apparent but take less time and easier to debug.) But here's something else that I may work on because it is neat. In What one can do is remove any data where the number of calls didn't change between say 10 and 50. Or the number of calls doesn't change between version 4.0.0 and 4.1.0. And clearly remove any line where that has a 0 With this, I think we would have a very precise indication of what's up. |
Beta Was this translation helpful? Give feedback.
-
|
Here is a proposal for benchmarking code, to compare the performance in the evaluation of several (simple) expressions, against Results: machine: x86_64 system: Linux 5.13.0-41-generic test1<< Do[{a,a,a,a,a,a,a,a,a,a,a,a,a,a,a},{10}] >> 2.2.0: 0.581977ms +/-6.1% test2<< Do[Table[a, {15}],{10}] >> 2.2.0: 2.380450ms +/-4.0% test3<< Do[Table[i, {i, 15}],{10}] >> 2.2.0: 37.125100ms +/-0.8% |
Beta Was this translation helpful? Give feedback.
-
|
@rocky, here I added it. machine: x86_64 system: Linux 5.13.0-41-generic test1<< Do[{a,a,a,a,a,a,a,a,a,a,a,a,a,a,a},{10}] >> 2.2.0: 0.588356ms +/-5.8% test2<< Do[Table[a, {15}],{10}] >> 2.2.0: 2.362820ms +/-4.6% test3<< Do[Table[i, {i, 15}],{10}] >> 2.2.0: 37.315500ms +/-0.8% |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for benchmarking. I have a number of thoughts and comments on this. That will have to wait for when I have time. How is 4.10.dev0 (which I assume is 4.1.0.dev0) different than origin/master? |
Beta Was this translation helpful? Give feedback.
-
|
I have been looking the code and following traces in execution of This is an area where we are slow because we are calling The fact that The
So here I am thinking of adding a function
There are times when we create an So computing properties of elements in creating an Expression slows things down when an evaluation will never occur. Lastly, there is a pattern that comes up: Here I am thinking about adding a method In sum:
Now a little bit about the parameters for the functions.
|
Beta Was this translation helpful? Give feedback.
-
|
With change along the lines above, I am seeing I've been a bit sloppy or pessimistic what I've done so far - it was hard enough to get this far. I do think the code is a little bit clearer and cleaner. Also interesting would be an extended sequence like the combinatorica runs. But that too will have to wait for some other time. |
Beta Was this translation helpful? Give feedback.
-
|
In working on this, there are a couple of conclusions or observations.
|
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Continuing with the problem of performance in Mathics, I want to discuss here one of the critical factors that determine it: the evaluation routine. To decouple it from other aspects, like pattern matching, rule applying, or parameter replacement, let's review how much it takes to evaluate a trivial expression, i.e. expressions that does not involve previously defined symbols. According to MathicsBenchmark, to evaluate something like
F[x]whenFandxare undefined symbols, takes around 60 us. We can also check it inside the interpreter using the function
Timing:We can see that each call takes, in Mathics, around 60+/-10 us, while in WMA .20+/-.02 us.
In Mathics, since the time does not seems to depend on the number of atomic parameters, we can expect that most of the time is consumed in the
mathics.core.expression.evaluatemethod. Inside this method, most of the work is done by themathics.core.expression.evaluate_nextmethod, which is applied iteratively until a fixed point is reached.Looking inside
mathics.core.expression.evaluate_nextby splitting in into different small chunks, and timing those chunks I found:(Processing
Global`F[Global`x])(Bellow is the modified code)
The first two lines provides a baseline to discount the time required by the timer, that after the first call are just on two function calls (~120ns x 2).
Discarding those lines, and sorting this by the time consumed, we obtain then
Processing
Global`F[Global`x]So, the most costly step is to check attributes. In our case, attributes are empty, so this is a kind of lower bound. Probably, we can improve this time using a bitmask instead of a list of strings to store attributes. Building the new expression can also be improved by optimizing
from_python(as in PR #51) or avoiding to call it at all (PR #49).Following the list, we have the step of processing rules. In this case, there are not rules, so all the time is devoted into finding a definition for the head and the leaves, and again, checking attributes. In the following section I am going to discuss some ideas about how to speedup the access to the
Definitionobject.Very close, in the next place, we find the "checking range" chuck. Again, in part of this subroutine, we need to check attributes, but also appears another routine used many times along the code:
Expression.has_form.has_forminvolves a) a function call, b) string comparisons instead of symbol (is) comparisons. Some idea to speed up this calls are implemented in PR #65. Also, this chunk involves calls toevaluateover the leaves, which in this case we can expect takes the half of the time (according to the time required to evaluate the head of the expression, position 6. in the ranking).The remaining items in the list takes each one less than 3us, involving a) private function definitions, b) local module loading, c) coping leaves. If they seems to be less critical, we should notice that each of them takes 10 times more time that the whole evaluation in WMA.
Finally, notice that by using CYTHON, times are roughly reduced to a half, except in the case of loading attributes, when the time is incremented. In any case, what we need here is to reduce all these times in a factor of 10 at the very least.
Regarding the access to definitions
As I mentioned before, on each step on the evaluation process, the method
Expression.evaluatehas to access theDefinitionobject associated to certainlookup_namein order to recover different sets of rules and attributes, that defines the way in which the rules are applied. Any improvement in the algorithm that brings theDefinitionof a givenSymbolshould have a measurable positive improvement in the evaluation time.In the current implementation, the
Definitionobjects are stored in the collection classDefinitions. This class stores all the definitions in the session. This allows for example, along the same Python session, to have several independent Mathics sessions. I think this is used in Mathics-Django to serve different instances [Check]. In other implementations (like WMA or jupyter's wolfram-kernel) this behaviour is implemented using different "kernels" that run as different process. With the current implementation, aSymbolcan have manyDefinitions, according the session, and then, during the evaluation, we need to "lookup" the corresponding definitions.Definitions.get_definitionreturns the definition of a givenSymbol, in terms of its (not necesarily fulli qualified) name. Definitions stores up to 4 instaces ofDefinitions objects for each symbol in different dictionaries:Definitions.definitions_cache,Definitions.builtin,Definitions.user, andDefinitions.pymathics.Definitions.get_definitionfirst try to find the (non necesarily fqn) in theDefinitions.definitions_cachedictionary. Most of the time, theDefinitionis in thedefinitions_cacheand then is just returned. If the definition is not in thedefinitions_cache, thenget_definitionlooks into the other three dictionaries by its fqn. If the a definition appears in just one of the three dictionaries, then that definition is stored in the cache and afterward, is returned.If there is no definition, a new empty definition is created and stored as
userdefinition and into the cache. On the other hand, if there are more than one definition, a merge of the available definitions is built and stored into the cache.builtindefinitions are built and stored whenDefinitionsobject is created, whilepymathicsdefinitions are built and stored when apymathicsmodule is loaded. This allows tounloadmodules without erase the vanilla definition provided bymathics-core, and thatClear[S]restores the vanilla definition provided bybuiltinand thepymathicsdefinitions already loaded.When the input from the front-end is parsed, usually the expressions consists of non fqns. During this parsing step,
Definitions.get_definitionhelps to determine if certain name exists in the current context, or in the context path, to produce expressions made ofSymbols, that always have fqn's.Afterward, during the evaluation, we can assume that all the calls to
get_definitionare made with fqn's. In this process,Expression.evaluateandSymbol.evaluateask for the lookup name of the symbols in the expression callingget_lookup_name(). ForSymbolsget_lookup_name()returns the (fqn) of theSymbol. ForExpressions,get_lookup_name()(in master) returnsself._head.get_lookup_name(), in a recursive way. In the branch faster_get_lookup_name (PR #16) an iterative algorithm is proposed, that seems to reduce the time in around a 20% (132ns vs 167ns in my laptop).Another idea to improve the access time would be to avoid function calls to the
get_definitionmethod by checking first if there is already a definition in thedefinitions_cachePR #59 explores this direction.Still faster could be also to avoid dealing with
Definitionscollections and store definitions directly insideSymbolobjects. These are directions that we can explore in the future.Appendix:
Modified code for timing
evaluate_nextTrivial expression evaluations from jupyter (using timeit)
1
126 ns ± 3.75 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
Global`a329 ns ± 33 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Global`F[]2 µs ± 130 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Global`F[Global`a]2.1 µs ± 9.26 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Global`F[System`Pi]2.2 µs ± 159 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Global`F[Global`a, Global`b]2.29 µs ± 15.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Global`F[Global`a, Global`b, Global`c, Global`d, Global`e, Global`f, Global`g, Global`h, Global`i, Global`j, Global`k]4.09 µs ± 168 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Beta Was this translation helpful? Give feedback.
All reactions