Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions contributing/typechecking.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Typechecking

The current typechecker is based on the paper [Complete and Easy Bidirectional Typechecking for Higher-Rank Polymorphism](https://arxiv.org/abs/1306.6032), although we've had to make some modifications because their type system is pretty different from ours. Mainly, we've added gradual typing and subtyping while ignoring the higher-rank polymorphism stuff.

The following are types used by the typechecker:

- The top type: the supertype of all types (similar to Java's `Object`, which is the supertype of all reference types)
- The bottom type: the subtype of all types (same as TypeScript's `never`). You can never construct an instance of the bottom type.
- This is useful for indicating that a command will never return (due to throwing an error or looping infinitely).
- The `oneof` type: for creating union types. `oneof<int, string>` can be used as the type for a value that's either an integer or a string.
- The `allof` type: for creating intersection types. `allof<A, B, C, ...>` represents a type that is simultaneously the subtype of `A`, `B`, and `C`.
- This isn't particularly useful in Nushell, so users won't be able to construct `allof` types themselves.
- However, the typechecker does use this internally.

## How type checking/inference work

The `typecheck_expr` method takes the expected type of the expression it is currently processing and provides an inferred type for every expression it visits. This way, we both check against the expected type and infer a type for the expression at the same time. Below is an example program using generics.

```nu
def f<T> [ x: T ] : nothing -> list<T> {
let z = $x
[$z]
}
f 123 # Inferred to be of type list<int>
```

For generics to work, the algorithm requires creating and solving type variables. These type variables have a lower bound and an upper bound. As we move through the program, these bounds are tightened further. At the end of the program, the lower bound of each type variable is chosen as its value.

Every time we come across a call to a custom command with type parameters, we instantiate new type variables corresponding to those type parameters. For example, for the expression `f 123` above, we instantiate a new type variable `'0` with lower bound `bottom` and upper bound `top` (essentially unbounded). Because of the signature of `f`, we know that `123: '0` and `f 123: list<'0>`.

So we first call `typecheck_expr` to check/infer the type of `123`, providing `'0` as its expected type. Since it's just an integer literal, we infer the type to be `int`. Then, to ensure that this type matches the expected type (`'0`), we call the `constrain_subtype` method to ensure that `int <: '0`. The existing lower bound for `'0` was `bottom`, so we set the new lower bound to `oneof<bottom, int> = int`.

Then we set the type of `f 123` to `list<'0>`. After all expressions have been processed, we replace all occurrences of `'0` with `int`. So this becomes `list<int>`.

Recursive bounds are not allowed. The bounds for a type variable (say, `'5`) can only refer to type variables created before it (here, `'0` through `'4`).
- If, during typechecking, we find ourselves setting the upper bound of `'5` to `'6`, then we instead set the upper bound of `'5` to whatever the lower bound of `'6` is at that point. This behavior can be improved upon somewhat in the future (in this particular example, we could instead update `'6`'s lower bound to include `'5`), but I think it's good enough for now.
21 changes: 16 additions & 5 deletions src/compiler.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
use crate::errors::SourceError;
use crate::parser::{AstNode, Block, NodeId, Pipeline};
use crate::protocol::Command;
use crate::resolver::{DeclId, Frame, NameBindings, ScopeId, VarId, Variable};
use crate::resolver::{
DeclId, Frame, NameBindings, ScopeId, TypeDecl, TypeDeclId, VarId, Variable,
};
use crate::typechecker::{TypeId, Types};
use std::collections::HashMap;

Expand Down Expand Up @@ -58,8 +60,14 @@ pub struct Compiler {
pub variables: Vec<Variable>,
/// Mapping of variable's name node -> Variable
pub var_resolution: HashMap<NodeId, VarId>,
/// Declarations (commands, aliases, externs), indexed by VarId
/// Type declarations, indexed by TypeDeclId
pub type_decls: Vec<TypeDecl>,
/// Mapping of type decl's name node -> TypeDecl
pub type_resolution: HashMap<NodeId, TypeDeclId>,
/// Declarations (commands, aliases, externs), indexed by DeclId
pub decls: Vec<Box<dyn Command>>,
/// Declaration NodeIds, indexed by DeclId
pub decl_nodes: Vec<NodeId>,
/// Mapping of decl's name node -> Command
pub decl_resolution: HashMap<NodeId, DeclId>,

Expand All @@ -71,7 +79,6 @@ pub struct Compiler {

// Use/def
// pub call_resolution: HashMap<NodeId, CallTarget>,
// pub type_resolution: HashMap<NodeId, TypeId>,
pub errors: Vec<SourceError>,
}

Expand All @@ -96,16 +103,17 @@ impl Compiler {
scope_stack: vec![],
variables: vec![],
var_resolution: HashMap::new(),
type_decls: vec![],
type_resolution: HashMap::new(),
decls: vec![],
decl_nodes: vec![],
decl_resolution: HashMap::new(),

// variables: vec![],
// functions: vec![],
// types: vec![],

// call_resolution: HashMap::new(),
// var_resolution: HashMap::new(),
// type_resolution: HashMap::new(),
errors: vec![],
}
}
Expand Down Expand Up @@ -157,7 +165,10 @@ impl Compiler {
self.scope_stack.extend(name_bindings.scope_stack);
self.variables.extend(name_bindings.variables);
self.var_resolution.extend(name_bindings.var_resolution);
self.type_decls.extend(name_bindings.type_decls);
self.type_resolution.extend(name_bindings.type_resolution);
self.decls.extend(name_bindings.decls);
self.decl_nodes.extend(name_bindings.decl_nodes);
self.decl_resolution.extend(name_bindings.decl_resolution);
self.errors.extend(name_bindings.errors);
}
Expand Down
34 changes: 34 additions & 0 deletions src/parser.rs
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,7 @@ pub enum AstNode {
// Definitions
Def {
name: NodeId,
type_params: Option<NodeId>,
params: NodeId,
in_out_types: Option<NodeId>,
block: NodeId,
Expand Down Expand Up @@ -1018,6 +1019,32 @@ impl Parser {
self.create_node(AstNode::Params(param_list), span_start, span_end)
}

pub fn type_params(&mut self) -> NodeId {
let _span = span!();
let span_start = self.position();
self.less_than();

let mut param_list = vec![];

while self.has_tokens() {
if self.is_greater_than() {
break;
}

if self.is_comma() {
self.tokens.advance();
continue;
}

param_list.push(self.name());
}

let span_end = self.position() + 1;
self.greater_than();

self.create_node(AstNode::Params(param_list), span_start, span_end)
}

pub fn type_args(&mut self) -> NodeId {
let _span = span!();
let span_start = self.position();
Expand Down Expand Up @@ -1159,6 +1186,12 @@ impl Parser {
_ => return self.error("expected def name"),
};

let type_params = if self.is_less_than() {
Some(self.type_params())
} else {
None
};

let params = self.signature_params(ParamsContext::Squares);
let in_out_types = if self.is_colon() {
Some(self.in_out_types())
Expand All @@ -1172,6 +1205,7 @@ impl Parser {
self.create_node(
AstNode::Def {
name,
type_params,
params,
in_out_types,
block,
Expand Down
Loading
Loading