-
Notifications
You must be signed in to change notification settings - Fork 9
PoC: Generics and better type inference #58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Sorry for the late reply. I think the motivation for generic is clear enough, e.g. being able to go from It would also be worthwhile to think how this would work with the envisioned Some short comments, haven't yet looked at the code:
The error messages is something to look at, but there wasn't much work on them anyway, so that could be also done in some later PR. |
|
Type aliases definitely need more thought, the code in this PR isn't prepared for them. I was thinking about having a cache of types (so that we don't create multiple Regarding the comments:
|
|
Cache of types sounds reasonable. Agreed also on the spaces.
I'm worried about the cognitive complexity of having all I still struggle to find a practical difference between For |
| use std::cmp::Ordering; | ||
| use std::collections::HashSet; | ||
| use std::collections::{HashMap, HashSet}; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to add a reference to the paper in the top-level comment so that it shows in the (future) documentation what the type checking is based on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the whole text where you explained how it works in the PR description under "How type checking/inference work" should be stored somewhere. Either as a top-level module comment here, or have a Markdown doc somewhere in the repo and refer to it. I'm just thinking from the perspective of a new contributor opening this file and trying to figure out how it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. A Markdown doc would probably be easier to read.
|
I also just noticed the "allof" (intersection) type, I quite can't wrap my head around it. Can you give an example how it differs from "oneof" (union) type, or what it is used for in the type checker? Is |
Yeah, that's a fair point. We could get away with only having
Consider a custom command that takes any input, converts it to a string, and truncates it: Except the implementation here is incorrect. I'm assuming
If you're saying we should get rid of
I like Tangent about naming: TypeScript also has an In Nushell, if heavy breaking changes are on the table, I'd love to name the top type |
An value of type Some examples with simple types. It's useless here, because it simplifies to something the user could've written themselves:
Some examples with records. Here, it's essentially a deep merge operator:
This record type merging can be useful if you're working with type parameters. You can define a signature like this (warning: stupidly contrived example incoming): Within the typechecker, type parameters are what justify the existence of the In addition to references to type parameters,
Does that help? I really hope I didn't make you read that wall of text for nothing. I'll come up with an actual piece of Nushell pseudocode that demonstrates that allof-constraint thing I mentioned (later, when I'm not functioning on 4 hours of sleep). |
|
Thanks! Very helpful. I'll need to re-read the Just to clarify the typechecker would infer the type of the So either we keep I think But yeah, I think most of this can be left for further discussion. I'd like to check the code a bit just to make sure I understand is, but otherwise the PR looks good. One more thing: Could you run some benchmarks before/after this PR? |
Yup, it should be fairly easy to hide from the user in error messages.
No, sorry, I don't think I explained correctly. This StackOverflow question talking about TypeScript's When you declare a parameter or variable as I don't think we need to bother implementing syntax like C++'s
Yeah, the name
Good idea, I'll figure them out sometime this week. Will definitely be slower :( |
|
I think renaming |
|
Just to clarify, right now, if you leave the type annotation off a variable declaration, its type is inferred rather than defaulting to For matching on x and then coercing it to a lower type, here's what I can think of (not mutually exclusive options):
def foo(x: Any) =
x match {
case i: Int => i + 2
case s: String => s + "bar"
case List(a: Int, b: String) => println("this is a list containing an int and then a string)
} |
|
Unfortunately, I can't compare the benchmarks with the I did run them separately and put the results in this gist. |
|
benchmarks are looking good, that's a lot of power in the type system for not much performance penalty |
|
i also played around with the benchmarks a bit to make them a bit easier to interpret at a glance, this is the sum of the min/avg/max for each test category: before:
after:
code$results
| lines
| parse "{name} {_} [ {min} ... {avg} ... {max} ]{_}"
| str replace -a " " "" min avg max
| into duration min avg max
| where name !~ nu_old
| group-by { get name | split row '_' | get 0 } --to-table
| insert min {|e| $e.items.min | math sum }
| insert avg {|e| $e.items.avg | math sum }
| insert max {|e| $e.items.max | math sum }
| rename name
| reject items |
OK, perfect. I don't think we should be inferring parameter types because it would be hard to track and debug for the user. It would be impossible to tell just from the command signature what types it expects. This would also prevent (imagined possible) features like linking from working.
We have
Seems good enough to me. We have Much of the discussion was about the top/bottom parts being exposed to the user. I think we can just leave that for a future PR, but to summarize:
|
|
Benchmarks: There is clearly a performance hit, but doesn't seem too dramatic (eg. about 10% on the combined1000 benchmark). I think it's to be expected as we add new features, but it's good to check to prevent accidental excessive regressions. Yeah, I've noticed the tango panic. We might want to swap to something else, maybe criterion, or even default |
|
Looked at the code, it seems good. The |
Easier for case-sensitive searching.
@kubouch we brought up this idea with the team at the last meeting and it seems like people were generally leaning away from it. I think the general consensus was that it would be okay to make a top type usable for power users; folks at the meeting are satisfied with how I do like the idea still but I understand the concerns. If you think we should go this route then I think it would mean a lot to the team coming from you, but maybe we should reconsider if you're not set on it. If we are still interested in possibly going this route I think it might also be good to get a solid understanding of what this would look like in practice, like under what circumstances you would actually need to specify Stefan also pointed out that we would want to use the old parser tests for the new parser and this kind of change would make it a lot more difficult, but you did mention that there are already breaking changes in the new parser so I suppose that's an issue regardless |
|
Forgot about this PR. I threw in a document |
|
I'll merge this since it seems like documentation was the last outstanding issue and Kubouch was in favor of merging this:
|
This is a proof-of-concept implementing generics and slightly better type inference. It's based on the paper Complete and Easy Bidirectional Typechecking for Higher-Rank Polymorphism, although I had to make some modifications because their type system is pretty different from ours. I'm not sure yet if the new type inference algorithm will never infer the wrong type, and there's missing pieces here and there. But we should be able to make this general approach to type checking/inference work.
This is what it looks like:
I didn't do subtype/supertype bounds on type parameters, but it shouldn't be terribly hard to implement with the current design.
Other things included in this PR:
anyandoneofThings worsened by this PR:
++requires both operands to be lists, to match how Nushell now workslet x: list<list<int>> = [["foo"]], you'll get 3 error messages instead of just one (one saying"foo"should be an int, one for["foo"], and one for[["foo"]]).Things left to do:
Apologies for the big PR. I should probably have done some of these things before implementing generics. That said, it's just a proof-of-concept, so it doesn't need to be reviewed thoroughly.
How type checking/inference work
The
typecheck_exprmethod takes the expected type of the expression it is currently processing and provides an inferred type for every expression it visits. This way, we both check against the expected type and infer a type for the expression at the same time.For generics to work, the algorithm requires creating and solving type variables. These type variables have a lower bound and an upper bound. As we move through the program, these bounds are tightened further. At the end of the program, the lower bound of each type variable is chosen as its value.
Every time we come across a call to a custom command with type parameters, we instantiate new type variables corresponding to those type parameters. For example, for the expression
f 123above, we instantiate a new type variable'0with lower boundbottomand upper boundtop(essentially unbounded). Because of the signature off, we know that123: '0andf 123: list<'0>.So we first call
typecheck_exprto check/infer the type of123, providing'0as its expected type. Since it's just an integer literal, we infer the type to beint. Then, to ensure that this type matches the expected type ('0), we call theconstrain_subtypemethod to ensure thatint <: '0. The existing lower bound for'0wasbottom, so we set the new lower bound tooneof<bottom, int> = int.Then we set the type of
f 123tolist<'0>. After all expressions have been processed, we replace all occurrences of'0withint. So this becomeslist<int>.Recursive bounds are not allowed. The bounds for a type variable (say,
'5) can only refer to type variables created before it (here,'0through'4).'5to'6, then we instead set the upper bound of'5to whatever the lower bound of'6is at that point. This behavior can be improved upon somewhat in the future (in this particular example, we could instead update'6's lower bound to include'5), but I think it's good enough for now.