Thesis
Download a PDF version of this document. Download a Zip archive of the HTML and PDF versions of this document.
1 Introduction
\def\ifmathjax#1{#1}\def\iflatex#1{}
1.1 Warm-up
[What does a compiler do]Todo
[This thesis aims to build a framework which helps write compilers.]Todo
[Our focus is on compilers, not virtual machines or other run-time systems. We are also not concerned with parsers — there are lots of existing approaches and libraries which help writing parsers, although parsing in general is not yet a solved problem on all accounts.]Todo
[IR = the macroscopic structure of the program (i.e. the meta-model (explain what it is)) + the code of functions and/or methods (statements and expressions, basic blocks of statements, or bytecode instructions)]Todo
1.2 The challenges of writing compilers
Brain surgery? — It’s not exactly rocket science, is it?
That Mitchell & Webb Look, Series 3 — BBC Two
Compilers are an essential part of today’s software systems. Compilers translate high-level languages with complex semantics into lower-level languages. A compiler will parse the program, transform it in various ways, perform some more or less advanced static checks, and optimise the input program before producing an output in the desired target language. A compiler must be correct, reusable and fast. It must be correct because programmers are concerned with logical errors in their own code, and should not fear that the compiler introduces erroneous behaviour on its own. It must be also well-architectured: extensible, because the language is likely to evolve over time, modular in the hope that some components can be improved independently of the rest of the compiler (e.g. replacing or improving an optimisation phase, or changing the compiler’s front-end, to support another input language), and more generally reusable, so that parts can be repurposed to build other compilers, or other tools (analysers, IDEs, code instrumentation and so on). Finally, a fast compiler is desirable because the programmer’s edit-build-test cycle should be as frequent as possible[ (Sandberg 1988)]Todo.
Given their broad role, the complexity of the transformations involved, and the stringent requirements, writing compilers is a difficult task. Multiple pitfalls await the compiler engineer, which we will discuss in more detail below. This thesis aims to improve the compiler-writing toolkit currently available, in order to help compiler developers produce compilers which are closer to correctness, and easier to maintain.
The overall structure of a compiler will usually include a lexer and parser, which turn the program’s source into an in-memory representation. This initial representation will often be translated into an intermediate representation (IR) better suited to the subsequent steps. At some early point, the program will be analysed for syntactical or semantic inconsistencies (ranging from missing parentheses to duplicate definitions of the same variable), and may also perform a more thorough static analysis. Finally, code in the target language or for the target architecture is generated. The translation can additionally include optimisation phases in several spots: during code generation, using locally-recognisable patterns, or for example earlier, using the results of the program-wide analysis performed separately.
We identify three pitfalls which await the compiler-writer:
It is easy to reuse excessively a single intermediate representation type, instead of properly distinguishing the specifics of the input and output type of each pass;
There is a high risk associated with the definition of large, monolithic passes, which are hard to test, debug, and extend;
The fundamental structure of the program being compiled is often a graph, but compilers often work on an Abstract Syntax Tree, which requires explicit handling of the backward arcs; This is a source of bugs which could be avoided by using a better abstraction.
The first two issues are prone to manifestations of some form or another of the “god object” anti-pattern11 The “god object” anti-pattern describes object-oriented classes which do too much or know too much. The size of these classes tends to grow out of control, and there is usually a tight coupling between the methods of the object, which in turn means that performing small changes may require understanding the interactions between random parts of a very large file, in order to avoid breaking existing functionality.. The last issue is merely caused by the choice of an abstraction which does not accurately represent the domain. We will discuss each of these ailments in more detail in the following sections, and detail the undesirable symptoms associated with them.
1.2.1 Large monolithic passes
Large, monolithic passes, which perform many transformations simultaneously have the advantage of possibly being faster than several smaller passes chained one after another. Furthermore, as one begins writing a compiler, it is tempting to incrementally extend an initial pass to perform more work, rather than starting all over again with a new intermediate representation, and a new scaffolding to support its traversal.
However, the drawback is that large compiler passes are harder to test (as there are many more combinations of paths through the compiler’s code to test), harder to debug (as many unrelated concerns interact to some extent with each other), and harder to extend (for example, adding a new special form to the language will necessitate changes to several transformations, but if these are mingled in a single pass, the changes may be scattered through it, and interact with a significant amount of surrounding code). This higher maintenance cost also comes with another drawback: formal verification of the compiler will clearly be more difficult when large, tangled chunks of code which handle different semantic aspects are involved.
[Talk a bit about compcert here (one of the few/ the only formally verified compilers).]Todo
1.2.2 Overusing a single intermediate representation
The static analysis, optimisation and code generation phases could in principle work on the same intermediate representation. However, several issues arise from this situation.
In principle, new information gained by the static analysis may be added to the existing representation via mutation, or the optimiser could directly alter the IR. This means that the IR will initially contain holes (e.g. represented by null values), which will get filled in gradually. Manipulating these parts is then risky, as it is easy to accidentally attempt to retrieve a value before it was actually computed. Using the same IR throughout the compiler also makes it difficult for later passes to assume that some constructions have been eliminated by previous simplification passes, and correctness relies on a fixed order of execution of the passes; parts of the code which access data introduced or modified by other passes are more brittle and may be disrupted when code is refactored (for example, when moving the computation of some information to a later pass).
This situation becomes worse during the maintenance phase of the compiler’s lifecycle: when considering the data manipulated by a small portion of code (in order to fix or improve said code), it is unclear which parts are supposed to be filled in at that point, as well as which parts have been eliminated by prior simplification passes.
Furthermore, a mutable IR hinders parallel execution of compiler passes. Indeed, some compiler passes will perform global transformations or analyses, and such code may be intrinsically difficult to parallelise. Many other passes however are mere local transformations, and can readily be executed on distinct parts of the abstract syntax tree, as long as there is no need to synchronise concurrent accesses or modifications.
Using immutable intermediate representations (and performing shallow copies when updating data) can help with this second issue. However, there is more to gain if, instead of having many instances of the same type, each intermediate representation is given a distinct, precise type. The presence or absence of computed information can be known from the input and output type of a pass, instead of relying on the order of execution of the passes in order to know what the input data structure may contain.
1.2.3 Graphs
Nontrivial programs are inherently graphs: they may contain mutually recursive functions (which directly refer to each other, and therefore will form a cycle in a representation of the program), circular and (possibly mutually) recursive datatypes may syntactically contain (possibly indirect) references to themselves, and the control flow graph of a function or method can, as its name implies, contain instructions which perform conditional or unconditional backwards branches.
However, nearly every compiler textbook will mention the use of Abstract Syntax Trees (ASTs) to represent the program. This means that a structure, which intrinsically has the shape of a graph, is encoded as a tree.
Edges in the graph which may embody backward references can be made explicit in various ways:
By using a form of unique identifier like a name bearing some semantic value (e.g. the fully qualified name of the type or function that is referred to), an index into an array of nodes (e.g. the offset of an instruction in a function’s bytecode may be used to refer to it in the control flow graph), an automatically-generated unique identifier.
Manipulation of these identifiers introduces a potential for some sorts of bugs: name clashes can occur if the chosen qualification is not sufficient to always distinguish nodes. Thus compiler passes which duplicate nodes (for example specialising functions) or merge them must be careful to correctly update identifiers.
Anecdotally, we can mention a bug in the "Mono.Cecil" library (which allows easy manipulation of .NET bytecode). When “resolving” a reference to a primitive type, it can happen in some cases that "Mono.Cecil" returns a Type metadata object which references a type with the correct name, but [exported]Todo from the wrong DLL library.
Alternatively, backward references may be encoded as a form of path from the referring node. De Bruijn indices can be used in such an encoding, for example.
Once again, manipulating these references is risky, and De Bruijn indices are particularly brittle, for example when adding a wrapper around a node (i.e. adding an intermediate node on the path from the root), the De Bruijn indices used in some of the descendents of that node (but not all) must be updated. It is understandably easy to incorrectly implement updates of these indices, and a single off-by-one error can throw the graph’s representation into an inconsistent state.
The program’s representation could also contain actual pointers (thereby really representing the program as an “Abstract Syntax Graph”), using mutation to patch nodes after they are initially created.
In order to prevent undesired mutation of the graph after it is created, it is possible to “freeze” the objects contained within[references]Todo. This intuitively gives guarantees similar to those obtained from a purely immutable representation. However, the use of mutation could obstruct formal verification efforts, as some invariants will need to take into account the two phases of an object’s lifetime (during the creation of the containing graph, and after freezing it). More generally speaking, it is necessary to ensure that no mutation of objects happens during the graph construction, with the exception of the mutations required to patch cycles.
The compiler could also manipulate lazy data structures, where the actual value of a node in the graph is computed on the fly when that node is accessed.
Lazy programs are however harder to debug (Morris 1982; Nilsson and Fritzson 1993; Wadler 1998), as the computation of the various parts of the data manipulated does not occur in an intuitive order. Among other things, accidental infinite recursion could be triggered by totally unrelated code which merely reads a value.
Finally, Higher-Order Abstract Syntax, or HOAS for short, is a technique which encodes variable bindings as anonymous functions in the host language (whose parameters reify bindings at the level of the host language). Variable references are then nothing more than actual uses of the variable at the host language’s level. Substitution, a common operation in compilers and interpreters, then becomes a simple matter of calling the anonymous function with the desired substitute. HOAS has the additional advantage that it enforces well-scopedness, as it is impossible to refer to a variable outside of its scope in the host language.
Parametric HOAS, dubbed PHOAS, also allows encoding the type of the variables in the representation. [Can extra information other than the type be stored?]Todo
There are a few drawbacks with HOAS and PHOAS:
The “target” of a backward reference must be above all uses in the tree (i.e. a node may be the target of either backward references, forward references, but not a mix of both). This might not always be the case. For example, pre/post-conditions could, in an early pass in the compiler, be located outside of the normal scope of a function’s signature, but still refer to the function’s parameters. If the pre/post-condition language allows breaking encapsulation, these could even refer to some temporary variables declared inside the function.
[PHOAS naturally lends itself to the implementation of substitutions, and therefore is well-suited to the writing of interpreters. However, the representation cannot be readily traversed and accessed as would be done with normal structures, and therefore the model could be counterintuitive.]Todo
[It seems difficult to encode an arbitrary number of variables bound in a single construct (e.g. to represent bound type names across the whole program, or an arbitrary number of mutually-recursive functions declared via let … and … in …, with any number of and clauses in the compiled language.]Todo
Although some of these seem like viable solutions (e.g. explicitly freezing objects), they still involve low-level mechanisms to create the graph. When functionally replacing a node with a new version of itself, all references to it must be updated to point to the new version. Furthermore, code traversing the graph to gather information or perform updates needs to deal with cycles, in order to avoid running into an infinite loop (or infinite recursion). Finally, backward edges are not represented in the same way as forward edges, introducing an arbitrary distinction when fetching data from the neighbours of a node. This last aspect reduces the extensibility of code which manipulates graphs where access to fields is not done uniformly: supposing new features of the language to be compiled require turning a “seamless” edge into one which must be explicitly resolved in some way (e.g. because this node, in the updated IR, may now be part of cycles), this change of interface will likely break old code which relied on seamless access to that field.
We think that the compiler engineer could benefit from abstracting over these implementation details, and think of compiler passes in terms of graph transformations. Programmers using functional languages often write list transformations using functions like map, foldl, filter, zip and so on, instead of explicitly writing recursive functions.
The graph can be manipulated by updating some or all nodes of a given type, using an old-node to new-node transformation function. This transformation function could produce more than one node, by referencing the extra nodes from the replacement one. It should furthermore be possible to locally navigate through the graph, from its root and from the node currently being transformed. This interface would allow to seamlessly handle cycles — transformations which apply over a whole collection of nodes need not be concerned with cycles — and still allow local navigation, without distinguishing backward and forward edges.
1.2.4 Expressing the data dependencies of a pass via row types
It is easy enough to test a compiler by feeding it sample programs and checking that the compiled output behaves as expected. However, a specific set of conditions inside a given pass, in order to achieve reasonably complete code coverage, may prove to be a harder task: previous passes may modify the input program in unexpected ways, and obtaining a certain data configuration at some point in the compiler requires the developer to mentally execute the compiler’s code backwards from that point, in order to determine the initial conditions which will produce the desired configuration many steps later. This means that extensively testing corner cases which may occur in principle, but are the result of unlikely combinations of features in the input program, is a cumbersome task.
If the compiler consists of many small passes, whose inputs and outputs are serializable, then it becomes possible to thoroughly test a single pass in isolation, by supplying an artificial, crafted input, and checking some properties of its output.
However, a compiler written following the Nanopass philosophy will feature many small passes which read and update only a small part of their input. Specifying actual values for the irrelevant parts of the data not only makes the test cases more verbose than they need to be, but also hides out of plain sight which variations of the input matter (and may thus allow the detection of new errors), and which parts of the input are mere placeholders whose actual value will not influence the outcome of the pass, aside from being copied over without changes.
It is desirable to express, in a statically verifiable way, which parts of the input are relevant, and which parts are copied verbatim (modulo updated sub-elements). Furthermore, it would be useful to be able to only specify the relevant parts of tests, and omit the rest (instead of supplying dummy values).
Row polymorphism allows us to satisfy both expectations. The irrelevant fields of a record and the irrelevant cases of a variant type can be abstracted away under a single row type variable. “Row” operations on records allow reading and updating relevant fields, while keeping the part abstracted by the row type variable intact. When invoking a compiler pass, the row type variables may be instantiated to the full set of extra fields present in the real IR, when the pass is called as part of the actual compilation; it is also possible, when the pass is called during testing, to instantiate them to an empty set of fields (or to use a single field containing a unique identifier, used to track “object identity”).
1.2.5 Verification
[Needs a transition from the previous section, or this should be moved elsewhere.]Todo
The implementation presented in this thesis cannot be immediately extended to support end-to-end formal verification of the compiler being written. However, it contributes to pave the way for writing formally verified compilers: firstly, the smaller passes are easier to verify. Secondly, the use of intermediate representations which closely match the input and output data can be used, given a formal semantic of each IR, to assert that a transformation pass is systematically preserving the semantics of the input. Thirdly, the use of a typed language instead of the currently “untyped” Nanopass framework means that a lot of properties can be ensured by relying on the type system. Typed Racket’s type checker is not formally verified itself and would have to be trusted (alternatively, the adventurous researcher could attempt to derive well-typedness proofs automatically by hijacking the type checker to generate traces of the steps involved, or manually, only using the type checker as a helper tool to detect and weed out issues during development. Fourthly, the explicit specification of the dependencies of passes on their input via row types is a form of frame specification, and can significantly ease the verification effort, as the engineer can rely on the fact that irrelevant parts were not modified in the output. These are statically enforced by our extension of Typed Racket’s type system, which we formalise in chapter [??]Todo. This relies on trusting Typed Racket itself once again, and on the correctness of the implementation of our translation from the extended type system to Typed Racket’s core type system. Fifthly, we provide means to express graph transformations as such instead of working with an encoding of graphs as abstract syntax trees (or directed acyclic graphs), with explicit backward references. We are hopeful that eliminating this mismatch will be beneficial to the formal verification of the transformation passes.
These advantages would be directly available to engineers attempting a formal proof of their compiler, while trusting the correctness of Typed Racket, as well as that of our framework. The implementation of our framework is not hardened, and Typed Racket itself suffers from a small number of known sources of unsoundness22 See https://github.com/racket/typed-racket/issues., however. In order to do an end-to-end verification of a compiler, it would be necessary to port our approach to a language better suited to formal verification. Alternatively, Racket could in principle be extended to help formalisation efforts. Two approaches come to mind: the first consists in proving that the macros correctly implement the abstraction which they attempt to model; the second would be to have the macros inject annotations and hints indicating properties that must be proven, in the same way that type annotations are currently inserted. These hints could then be used by the prover to generate proof obligations, which could then be solved manually or automatically.
Mixing macros and language features which help obtaining static guarantees is a trending topic in the Racket ecosystem and in other communities.
Typed Racket is still in active development, and several other projects were presented recently.
Hackett33 https://github.com/lexi-lambda/hackett, mainly developed by King, is a recent effort to bring a Haskell 98-like type system to Racket.
Hackett is built on the techniques developed by Chang, Knauth and Greenman in (Chang et al. 2017), which lead to the Turnstile library44 https://bitbucket.org/stchang/macrotypes.git. Turnstile is a helper library which facilitates the creation of typed languages in Racket. Macros are amended with typing rules, which are used to thread type information through uses of macros, definition forms and other special forms. Type checking is therefore performed during macro-expansion, and does not rely on an external type checker which would work on the expanded code. As a result, new languages built with Racket and Turnstile are not limited to a pre-existing type system, but can rather devise their own from the ground up. This approach brings a lot of flexibility, the drawback being that more trust is put in the language designer.
The work presented in this thesis aims to follow a different path than that followed by Turnstile: we chose to implement our extended type system as an abstraction over the existing type system of Typed Racket. This means that we do not rely so much on the correctness of our typing rules: instead of verifying ourselves the well-typedness of compilers written using our framework, we inject type annotations in the expanded code, which are then verified by Typed Racket. Therefore, we are confident that type errors will be caught by Typed Racket, safe in the knowledge that the code obtained after macro-expansion is type-safe55 We actually use on a few occasions unsafe-cast. Such a blunt instrument is however only used in cases where Typed Racket already has the required type information, but the type checker fails to deduce the equivalence between two formulations of the same type, or does not acknowledge that the term being checked has the expected type. These issues are being worked on by the developers of Typed Racket, however, and we hope to be able to remove these uses of unsafe-cast in later versions.. This increased serenity comes at the cost of flexibility, as Typed Racket’s type system was not able to express the type constructs that we wanted to add. We therefore had to resort to a few hacks to translate our types into constructs that could be expressed using Typed Racket.
The approach of building more complex type constructs atop a small trusted kernel has been pursued by Cur66 https://github.com/wilbowma/cur, developed by Bowman. Cur is a dependently typed language which permits theorem proving and verified programming. It is based on a small kernel (the Curnel), which does not contain language features which can be expressed by macro transformations. Most notably, the prover’s tactics are defined using metaprogramming tools, and are not part of the core language.
Another tool worth mentioning is Rosette77 https://github.com/emina/rosette[reference]Todo. Rosette, mainly developed by Torlak, is a solver-aided language: it tightly integrates an SMT solver with some Racket constructs, so that powerful queries can be performed and answered, for example “what input values to the function f will generate outputs which satisfy the predicate p?”. It can also generate simple functions which satisfy given conditions. These features allow it to be used both as a helper tool during development, for engineers coming from various domains, and as a verifier, as the solver can be used to assert that a function will never give an erroneous output, given a set of constraints on its input.
The idea of expressing the type of macro transformations at some level (e.g. by indicating the type of the resulting expression in terms of the type of the input terms, as is allowed by Turnstile) is not new: in 2001, Ganz, Sabry and Taha already presented in 2001 MacroML (Ganz et al. 2001), an experimental language which allows type checking programs before macro-expansion. However, it seems that there is interest, both in the Racket community and elsewhere, for languages with powerful metaprogramming facilities, coupled with an expressive type system. We are hopeful that this will lead to innovations concerning the formal verification of programs making heavy use of complex macros.
1.3 Summary
Once upon a time…
Unknown
1.3.1 Extensible type system
We implemented a different type system on top of Typed Racket, using macros. Macros have been used not only to extend a language’s syntax (control structures, contract annotations, and so on), but also to reduce the amount of “boilerplate” code and obtain clearer syntax for common or occasional tasks. Macros have further been used to extend the language with new paradigms, like adding an object system (CLOS (Bobrow et al. 1988)[is it really implemented using macros?]Todo, Racket classes (Flatt et al. 2006)) or supporting logic programming (Racket Datalog and Racklog, Rosette (Torlak and Bodik 2013, 2014)). In the past, Typed Racket (Tobin-Hochstadt and Felleisen 2008) has proved that a type system can be successfully fitted onto an existing “untyped” language, using macros. We implemented the type-expander library atop Typed Racket, without altering Typed Racket itself. Our type-expander library allows macros to be used in contexts where a type is expected.
This shows that an existing type system can be made extensible using macros, without altering the core implementation of the type system. We further use these type expanders to build new kinds of types which were not initially supported by Typed Racket: non-nominal algebraic types, with row polymorphism. Typed Racket has successfully been extended in the past (for example by adding type primitives for Racket’s class system, which incidentally also support row polymorphism), but these extensions required modifications to the trusted core of Typed Racket. Aside from a small hack (needed to obtain non-nominal algebraic types which remain equivalent across multiple files), our extension integrates seamlessly with other built-in types, and code using these types can use idiomatic Typed Racket features.
Typed Racket was not initially designed with this extension in mind, nor, that we know of, was it designed with the goal of being extensible. We therefore argue that a better choice of primitive types supported by the core implementation could enable many extensions without the need to resort to hacks the like of which was needed in our case. In other words, a better design of the core types with extensibility in mind would have made our job easier.
In particular, Types in Typed Clojure (Bonnaire-Sergeant et al. 2016) support fine-grained typing of heterogeneous hash tables, this would likely allow us to build much more easily the “strong duck typing” primitives on which our algebraic data types are based, and without the need to resort to hacks.
In languages making heavy uses of macros, it would be beneficial to design type systems with a well-chosen set of primitive types, on top of which more complex types can be built using macros.
Building the type system via macros atop a small kernel is an approach that has been pursued by Cur, a dependently-typed language developed with Racket, in which the tactics language is entirely built using macros, and does not depend on Cur’s trusted type-checking core.
1.3.2 Compiler-writing framework
Supports writing a compiler using many small passes (in the spirit of Nanopass)
Allows writing the compiler in a strongly-typed language
Uses immutable data structures for the Intermediate Representations (ASTs)
Supports backwards branches in the AST, making it rather an Abstract Syntax Graph (this is challenging due to the use of immutable data structures).
Provides easy manipulation of the Intermediate Representations: local navigation from node to node, global higher-order operations over many nodes, easy construction, easy serialization, with the guarantee that at no point an incomplete representation can be manipulated. These operations should handle seamlessly backwards arcs.
Enforces structural invariants (either at compile-time or at run-time), and ensures via the type system that unchecked values cannot be used where a value respecting the invariant is expected.
Has extensive support for testing the compiler, by allowing the generation of random ASTs [(note that the user guides the random generation, it’s not fully automatic like quickcheck)]Todo, making it possible to read and write ASTs from files and compare them, and allows compiler passes to consume ASTs containing only the relevant fields (using row polymorphism).
1.3.3 Overview
The rest of this document is structured as follows:
Chapter 2 presents related work. It discusses approaches to extensible type systems in Section 2.1. Section 2.2 considers how structures, variants and polymorphism may exhibit different properties in different languages, and makes the case for bounded row polymorphism as a well-suited tool for building compilers. The Nanopass Compiler Framework is presented in Section 2.3, and techniques used to encode and manipulate graphs are studied in Section 2.4.
In Chapter 3, we give some example uses of the compiler framework described in this thesis. We indicate the pitfalls, bugs and boilerplate avoided thanks to its use. [Move this above the related work?]Todo
Chapter 4 gives an overview of the features of Typed Racket’s type system. It then formalises the features relevant to our work, by giving the subtyping rules, as well as the builder and accessor primitives for values of these types.
Section 5.1 explains how we implemented support for type-level macros as an extension to Typed Racket, without modifying the core implementation. We detail the reduction rules which allow translating a type making use of expanders into a plain type.
Section 5.2 discusses the flavour of algebraic datatypes which we chose to implement atop Typed Racket, as well as its extension with row types. We explain in detail our goals and constraints, and present two implementations — a first attempt, which unfortunately required verbose annotations for some “row operations”, and a second solution, which greatly reduces the number of annotations required, by using a different implementation strategy. We give formal semantics for these extensions, give the reduction rules which translate them back to Typed Racket’s type system, and show that this reduction preserves the original semantics. We then finally define a layer of syntactic sugar which allows convenient use of these new type primitives.
Section 6.1 builds upon these algebraic datatypes and row types to define a typed version of the Nanopass Compiler Framework which operates on abstract syntax trees. We explain the notions of tree nodes, mapping functions and cata-morphisms, show how these interact with row typing, and give an overview of the implementation. We then extend this with detached mappings: this feature allows the user to map a function over all nodes of a given type in a graph, regardless of the structure of the graph outisde of the relevant parts which are manipulated by the mapping function. This allows test data to not only omit irrelevant branches in the abstract syntax tree by omitting the field pointing to these branches, but also irrelevant parts located above the interesting nodes. In other words, this feature allows cutting and removing the top of the abstract syntax tree, and glue together the resulting forest. We also formally describe the result of applying a set of detached or regular mapping functions to an input tree.
Section 6.2 extends this typed version of Nanopass to handle directed acyclic graphs. We start by considering concerns such as equality of nodes (for which the previous chapter assumed a predicate existed, without actually implementing it), and hash consing. This allows us to prepare the ground for the extension presented in the next chapter, namely graphs.
We can then introduce support for cycles in Section 6.3: instead of describing abstract syntax tree transformations, it then becomes possible to describe graphs transformations. To this end, we introduce new kinds of nodes: placeholders, which are used during the construction of the graph, with-indices nodes, which encode references to neighbouring nodes as indices into arrays containing all nodes of a given type, for the current graph, and with-promises nodes, which hide away this implementation detail by lazily resolving all references, using a uniform API. This allows all fields of a given node to be accessed in the same way, while still allowing logical cycles built atop a purely immutable representation.
We give an overview of how our implementation handles cycles, using worklists which gather and return placeholders when the mapping functions perform recursive calls, and subsequently turn the results into into with-indices nodes, then into with-promises ones. We then update our notion of equality and hash consing to account for cycles, and update the formal semantics too.
Extra safety can be obtained by introducing some structural invariants which constrain the shape of the graph. For example, it is possible to ensure the well-scopedness of variable references. Another desirable property would be the absence of cycles, either to better model the IR, knowing that cyclic references are not allowed at some point by the target language, or to detect early on changes in the IR which may break code assuming the absence of cycles. A third option would be to ensure that “parent” pointers are correct, and that the node containing them is indeed referenced by the parent (i.e., ensure the presence of well-formed cycles). Section 6.4 presents an extension of our graph manipulation framework, which allows the specification of structural invariants. These can in some cases be checked statically, in other cases it may be necessary to follow a macro-enforced discipline, and as a last resort, a dynamic check may be performed.
We further explain how we use phantom types to reflect at the type level which invariants were checked on an instance of a graph. The types used to represent that an instance satisfies an invariant are chosen so that instances with stronger invariants are subtypes of instances with weaker invariants.
Finally, in Section 6.5 we succinctly present some extensions which could be added to the framework presented in the previous chapters. We discuss how it would be possible to garbage-collect unused parts of the graph when only a reference to an internal node is kept, and the root is logically unreachable. Another useful feature would be the ability to define general-purpose graph algorithms (depth-first traversal, topological sort, graph colouring, and so on), operating on a subset of the graph’s fields. This would allow to perform these common operations while considering only a subgraph of the one being manipulated. Finally, we mention the possibility to implement an α-equivalence comparison operator.
In Chapter 7, we present more examples and revisit the initial examples illustrating our goals in the light of the previous chapters.
We ported the most complete compiler written with Nanopass (a Scheme compiler) to our framework; we summarise our experience and compare our approach with Nanopass, by indicating the changes required, in particular how many additional type annotations were necessary.
Finally, we conclude and list future work directions.
2 State of the art
\def\ifmathjax#1{#1}\def\iflatex#1{}
2.1 Extending the type system via macros
Our work explores one interesting use of macros: their use to extend a programming language’s type system.
Chang, Knauth and Greenman (Chang et al. 2017) took the decision to depart from Typed Racket, and implemented a new approach, which allows type systems to be implemented as macros. Typing information about identifiers is threaded across the program at compile-time, and macros can decide whether a term is well-typed or not.
Another related project is Cur88 https://github.com/wilbowma/cur, a dependent type system implemented using Racket macros.
Bracha suggests that pluggable type systems should be used (Bracha 2004). Such a system, JavaCOP is presented by Andreae, Noble, Markstrum and Shane (Andreae et al. 2006) as a tool which “enforces user-defined typing constraints written in a declarative and expressive rule language”.
In contrast, Typed Racket (Tobin-Hochstadt and Felleisen 2008) was implemented using macros on top of “untyped” Racket, but was not designed as an extensible system: new rules in the type system must be added to the core implementation, a system which is complex to approach.
Following work by Asumu Takikawa99 https://github.com/racket/racket/pull/604, we extended Typed Racket with support for macros in the type declarations and annotations. We call this sort of macro “type expanders”, following the commonly-used naming convention (e.g. “match expanders” are macros which operate within patterns in pattern-matching forms). These type expanders allow users to easily extend the type system with new kinds of types, as long as these can be translated back to the types offered natively by Typed Racket.
While the Type Systems as Macros by Chang, Knauth and Greenman (Chang et al. 2017) allows greater flexibility by not relying on a fixed set of core types, it also places on the programmer the burden of ensuring that the type checking macros are sound. In contrast, our type expanders rely on Typed Racket’s type checker, which will still catch type errors in the fully-expanded types. In other words, writing type expanders is safe, because they do not require any specific trust, and translate down to plain Typed Racket types, where any type error would be caught at that level.
An interesting aspect of our work is that the support for type expanders was implemented without any change to the core of Typed Racket. Instead, the support for type expanders itself is available as a library, which overrides special forms like define, lambda or cast, enhancing them by pre-processing type expanders, and translating back to the “official” forms from Typed Racket. It is worth noting that Typed Racket itself is implemented in a similar way: special forms like define and lambda support plain type annotations, and translate back to the “official” forms from so-called “untyped” Racket. In both cases, this approach goes with the Racket spirit of implementing languages as libraries (Tobin-Hochstadt et al. 2011)
2.2 Algebraic datatypes for compilers
I used polymorphic variants to solve the assert false problems in my lexical analyser, along with some subtyping. You have to write the typecasts explicitly, but that aside it is very powerful (a constructor can “belong” to several types etc.).
Personal communication from a friend
The phc-adt library implements algebraic datatypes (variants and structures) which are adapted to compiler-writing.
“datatype” uses nominal typing, while we use structural typing (i.e. two identical type declarations in distinct modules yield the same type in our case). This avoids the need to centralize the type definition of ASTs.
“datatype” uses closed variants, where a constructor can only belong to one variant. We simply define variants as a union of constructors, where a constructor can belong to several variants. This allows later passes in the compiler to add or remove cases of variants, without the need to duplicate all the constructors under slightly different names.
“datatype” does not support row polymorphism, or similarly the update and extension of its product types with values for existing and new fields, regardless of optional fields. We implement the latter.
[Cite the variations on variants paper (for Haskell)]Todo
2.2.1 The case for bounded row polymorphism
[Explain the “expression problem”.]Todo [Talk about the various ways in which it can be “solved”, and which tradeoffs we aim for. Mention extensible-functions, another solution to the expression problem in Racket, but with rather different tradeoffs.]Todo
We strive to implement compilers using many passes. A pass should be able to accept a real-world AST, and produce accordingly a real-world transformed AST as its output. It should also be possible to use lightweight “mock” ASTs, containing only the values relevant to the passes under test (possibly only the pass being called, or multiple passes tested from end to end). The pass should then return a corresponding simplified output AST, omitting the fields which are irrelevant to this pass (and were not part of the input). Since the results of the pass on a real input and on a test input are equivalent modulo the irrelevant fields, this allows testing the pass in isolation with simple data, without having to fill in each irrelevant field with null (and hope that they are indeed irrelevant, without a comforting guarantee that this is the case), as is commonly done when creating “mocks” to test object-oriented programs.
This can be identified as a problem similar to the “expression problem”. In our context, we do not strive to build a compiler which can be extended in an “open world”, by adding new node types to the AST, or new operations handling these nodes. Rather, the desired outcome is to allow passes to support multiple known subsets of the same AST type, from the start.
We succinctly list below some of the different sorts of polymorphism, and argue that row polymorphism is well-suited for our purpose. More specifically, bounded row polymorphism gives the flexibility needed when defining passes which keep some fields intact (without reading their value), but the boundedness ensures that changes in the input type of a pass do not go unnoticed, so that the pass can be amended to handle the new information, or its type can be updated to ignore this new information.
Subtyping (also called inclusion polymorphism, subtype polymorphism, or nominal subtyping ?): Subclasses and interfaces in C# and Java, sub-structs and union types in Typed Racket, polymorphic variants in CAML (Minsky et al. 2013b), chap. 6, sec. Polymorphic Variants
Multiple inheritance. NIT, CLOS, C++, C# interfaces, Java interfaces. As an extension in “untyped” Racket with Alexis King’s safe multimethods1010 https://lexi-lambda.github.io/blog/2016/02/18/simple-safe-multimethods-in-racket/.
This in principle could help in our case: AST nodes would have .withField(value) methods returning a copy of the node with the field’s value updated, or a node of a different type with that new field, if it is not present in the initial node type. This would however require the declaration of many such methods in advance, so that they can be used when needed (or with a recording mechanism like the one we use, so that between compilations the methods called are remembered and generated on the fly by a macro). Furthermore, Typed Racket lacks support for multiple inheritance on structs. It supports multiple inheritance for classes [?]Todo, but classes currently lack the ability to declare immutable fields, which in turn causes problems with occurrence typing (see the note in the “row polymorphism” point below).
Parametric polymorphism: Generics in C# and Java, polymorphic types in CAML and Typed Racket
F-bounded polymorphism: Java, C#, C++, Eiffel. Possible to mimic to some extent in Typed Racket with (unbounded) parametric polymorphism and intersection types. [Discuss how it would work/not work in our case.]Todo
- Operator overloading (also called overloading polymorphism?) and multiple dispatch:
Operator overloading in C#
Nothing in Java aside from the built-in cases for arithmetic and string concatenation, but those are not extensible
C++
typeclasses in Haskell? [I’m not proficient enough in Haskell to be sure or give a detailed description, I have to ask around to double-check.]Todo
LISP (CLOS): has multiple dispatch
nothing built-in in Typed Racket.
CAML?
Coercion polymorphism (automatic casts to a given type). This includes Scala’s implicits, C# implicit coercion operators (user-extensible, but at most one coercion operator is applied automatically, so if there is a coercion operator A \rightarrow{} B, and a coercion operator B \rightarrow{} C, it is still impossible to supply an A where a C is expected without manually coercing the value), and C++’s implicit conversions, where single-argument constructors are treated as implicit conversion operators, unless annotated with the explicit keyword. Similarly to C#, C++ allows only one implicit conversion, not two or more in a chain.
Struct type properties in untyped Racket can somewhat be used to that effect, although they are closer to Java’s interfaces than to coercion polymorphism. Struct type properties are unsound in Typed Racket and are not represented within the type system, so their use is subject to caution anyway.
Coercion (downcasts). Present in most typed languages. This would not help in our case, as the different AST types are incomparable (especially since Typed Racket lacks multiple inheritance)
Higher-kinded polymorphism: Type which is parameterized by a \mathit{Type} \rightarrow{} \mathit{Type} function. Scala, Haskell. Maybe CAML?
The type expander library which we developed for Typed Racket supports \Lambda{}, used to describe anonymous type-level macros. They enable some limited form of \mathit{Type} \rightarrow{} \mathit{Type} functions, but are actually applied at macro-expansion time, before typechecking is performed, which diminishes their use in some cases. For example, they cannot cooperate with type inference. Also, any recursive use of type-level macros must terminate, unless the type “function” manually falls back to using \mathit{Rec} to create a regular recursive type. This means that a declaration like F(X) := X × F(F(X)) is not possible using anonymous type-level macros only.
As an example of this use of the type expander library, our cycle-handling code uses internally a “type traversal” macro. In the type of a node, it performs a substitution on some subparts of the type. It is more or less a limited form of application of a whole family of type functions a{}_i \rightarrow{} b{}_i, which have the same inputs a{}_i \ldots{}, part of the initial type, but different outputs b{}_i \ldots{} which are substituted in place of the a{}_i \ldots{} in the resulting type. The “type traversal” macro expands the initial type into a standard polymorphic type, which accepts the desired outputs b{}_i \ldots{} as type arguments.
Lenses. Can be in a way compared to explicit coercions, where the coercion is reversible and the accessible parts can be altered.
Structural polymorphism (also sometimes called static duck-typing): Scala, TypeScript. It is also possible in Typed Racket, using the algebraic datatypes library which we implemented. Possible to mimic in Java and C# with interfaces “selecting” the desired fields, but the interface has to be explicitly implemented by the class (i.e. at the definition site, not at the use-site).
Palmer et al. present TinyBang (Palmer et al. 2014), a typed language in which flexible manipulation of objects is possible, including adding and removing fields, as well as changing the type of a field. They implement in this way a sound, decidable form of static duck typing, with functional updates which can add new fields and replace the value of existing fields. Their approach is based on two main aspects:Type-indexed records supporting asymmetric concatenation: by concatenating two records r{}_1 \& r{}_2, a new record is obtained containing all the fields from r{}_1 (associated to their value in r{}_1), as well as the fields from r{}_2 which do not appear in r{}_1 (associated to their value in r{}_2). Primitive types are eliminated by allowing the use of type names as keys in the records: integers then become simply records with a int key, for example.
Dependently-typed first-class cases: pattern-matching functions are expressed as pattern \mathbin{\texttt{->}} expression, and can be concatenated with the \& operator, to obtain functions matching against different cases, possibly with a different result type for each case. The leftmost cases can shadow existing cases (i.e. the case which is used is the leftmost case for which the pattern matches successfully).
TinyBang uses an approach which is very different from the one we followed in our Algebraic Data Types library, but contains the adequate primitives to build algebraic data types which would fulfill our requirements (aside from the ability to bound the set of extra “row” fields). We note that our flexible structs, which are used within the body of node-to-node functions in passes, do support an extension operation, which is similar to TinyBang’s \&, with the left-hand side containing a constant and fixed set of fields.
- Row polymorphism: Apparently, quoting a post on Quora1111 https://www.quora.com/Object-Oriented-Programming-What-is-a-concise-definition-of-polymorphism\label{quora-url-footnote}:
Mostly only concatenative and functional languages (like Elm and PureScript) support this.
Classes in Typed Racket can have a row type argument (but classes in Typed Racket cannot have immutable fields (yet), and therefore occurrence typing does not work on class fields. Occurrence typing is an important idiom in Typed Racket, used to achieve safe but concise pattern-matching, which is a feature frequently used when writing compilers).
Our Algebraic Data Types library implements a bounded form of row polymorphism, and a separate implementation (used within the body of node-to-node functions in passes) allows unbounded row polymorphism.
[Virtual types]Todo
So-called “untyped” or “uni-typed” languages: naturally support most of the above, but without static checks.
Operator overloading can be present in “untyped” languages, but is really an incarnation of single or multiple dispatch, based on the run-time, dynamic type (as there is no static type based on which the operation could be chosen). However it is not possible in “untyped” languages and languages compiled with type erasure to dispatch on “types” with a non-empty intersection: it is impossible to distinguish the values, and they are not annotated statically with a type.
As mentioned above, Typed Racket does not have operator overloading, and since the inferred types cannot be accessed reflectively at compile-time, it is not really possible to construct it as a compile-time feature via macros. Typed Racket also uses type erasure, so the same limitation as for untyped languages applies when implementing some form of single or multiple dispatch at run-time —
namely the intersection of the types must be empty. [Here, mention (and explain in more detail later) our compile-time “empty-intersection check” feature (does not work with polymorphic variables).]Todo
[Overview of the existing “solutions” to the expression problems, make a summary table of their tradeoffs (verbosity, weaknesses, strengths).]Todo
[Compare the various sorts of subtyping and polymorphism in that light (possibly in the same table), even those which do not directly pose as a solution to the expression problem.]Todo
“Nominal types”: our tagged structures and node types are not nominal types.
The “trivial” Racket library tracks static information about the types in simple cases. The “turnstile” Racket language [is a follow-up]Todo work, and allows to define new typed Racket languages. It tracks the types of values, as they are assigned to variables or passed as arguments to functions or macros. These libraries could be used to implement operator overloads which are based on the static type of the arguments. It could also be used to implement unbounded row polymorphism in a way that does not cause a combinatorial explosion of the size of the expanded code.[Have a look at the implementation of row polymorphism in Typed Racket classes, cite their work if there is something already published about it.]Todo
From the literate program (tagged-structure-low-level):
Row polymorphism, also known as "static duck typing" is a type system feature which allows a single type variable to be used as a place holder for several omitted fields, along with their types. The phc-adt library supports a limited form of row polymorphism: for most operations, a set of tuples of omitted field names must be specified, thereby indicating a bound on the row type variable.
This is both a limitation of our implementation (to reduce the combinatorial explosion of possible input and output types), as well as a desirable feature. Indeed, this library is intended to be used to write compilers, and a compiler pass should have precise knowledge of the intermediate representation it manipulates. It is possible that a compiler pass may operate on several similar intermediate representations (for example a full-blown representation for actual compilation and a minimal representation for testing purposes), which makes row polymorphism desirable. It is however risky to allow as an input to a compiler pass any data structure containing at least the minimum set of required fields: changes in the intermediate representation may add new fields which should, semantically, be handled by the compiler pass. A catch-all row type variable would simply ignore the extra fields, without raising an error. Thanks to the bound which specifies the possible tuples of omitted field names, changes to the the input type will raise a type error, bringing the programmer’s attention to the issue. If the new type is legit, and does not warrant a modification of the pass, the fix is easy to implement: simply adding a new tuple of possibly omitted fields to the bound (or replacing an existing tuple) will allow the pass to work with the new type. If, on the other hand, the pass needs to be modified, the type system will have successfully caught a potential issue.
2.3 Writing compilers using many small passes (a.k.a following the Nanopass Compiler Framework philosophy)
2.4 Cycles in intermediate representations of programs
[There already were a few references in my proposal for JFLA.]Todo [Look for articles about graph rewriting systems.]Todo
The following sections present the many ways in which cycles within the AST, CFG and other intermediate representations can be represented.
2.4.1 Mutable data structures
Hard to debug
When e.g. using lazy-loading, it is easy to mistakenly load a class or method after the Intermediate Representation was frozen. Furthermore, unless a .freeze() method actually enforces this conceptual change from a mutable to an immutable representation, it can be unclear at which point the IR (or parts of it) is guaranteed to be complete and its state frozen. This is another factor making maintenance of such code difficult.
We are using ML to build a compiler that does low-level optimization. To support optimizations in classic imperative style, we built a control-flow graph using mutable pointers and other mutable state in the nodes. This decision proved unfortunate: the mutable flow graph was big and complex, and it led to many bugs. We have replaced it by a smaller, simpler, applicative flow graph based on Huet’s (1997) zipper. The new flow graph is a success; this paper presents its design and shows how it leads to a gratifyingly simple implementation of the dataflow framework developed by Lerner, Grove, and Chambers (2002).
2.4.2 Unique identifiers used as a replacement for pointers
Mono uses that (Evain and others 2008; Köplinger et al. 2014), it is very easy to use an identifier which is supposed to reference a missing object, or an object from another version of the AST. It is also very easy to get things wrong when duplicating nodes (e.g. while specializing methods based on their caller), or when merging or removing nodes.
2.4.3 Explicit use of other common graph representations
Adjacency lists, De Bruijn indices.
Error prone when updating the graph (moving nodes around, adding, duplicating or removing nodes).
Needs manual
2.4.4 Using lazy programming languages
- Lazy programming is harder to debug.
Quote (Nilsson and Fritzson 1993):Traditional debugging techniques are, however, not suited for lazy functional languages since computations generally do not take place in the order one might expect.
Quote (Nilsson and Fritzson 1993):Within the field of lazy functional programming, the lack of suitable debugging tools has been apparent for quite some time. We feel that traditional debugging techniques (e.g. breakpoints, tracing, variable watching etc.) are not particularly well suited for the class of lazy languages since computations in a program generally do not take place in the order one might expect from reading the source code.
To be usable, a language system must be accompanied by a debugger and a profiler. Just as with interlanguage working, designing such tools is straightforward for strict languages, but trickier for lazy languages.
Constructing debuggers and profilers for lazy languages is recognized as difficult. Fortunately, there have been great strides in profiler research, and most implementations of Haskell are now accompanied by usable time and space profiling tools. But the slow rate of progress on debuggers for lazy languages makes us researchers look, well, lazy.
How does one debug a program with a surprising evaluation order? Our attempts to debug programs submitted to the lazy implementation have been quite entertaining. The only thing in our experience to resemble it was debugging a multi-programming system, but in this case virtually every parameter to a procedure represents a new process. It was difficult to predict when something was going to happen; the best strategy seems to be to print out well-defined intermediate results, clearly labelled.
So-called “infinite” data structures constructed lazily have problems with equality and serialization. The latter is especially important for serializing and de-serializing Intermediate Representations for the purpose of testing, and is also very important for code generation: the backend effectively needs to turn the infinite data structure into a finite one. The Revised$^6$ Report on Scheme requires the "equal?" predicate to correctly handle cyclic data structures, but efficient algorithms implementing this requirement are nontrivial (Adams and Dybvig 2008). Although any representation of cyclic data structures will have at some point to deal with equality and serialization, it is best if these concerns are abstracted away as much as possible.
2.4.5 True graph representations using immutable data structures
The huet zipper (Huet 1997). Implementation in untyped Racket, but not Typed Racket1212 See zippers, and https://github.com/david-christiansen/racket-zippers
3 Goals, constraints and examples
\def\ifmathjax#1{#1}\def\iflatex#1{}
4 Typed Racket
\def\ifmathjax#1{#1}\def\iflatex#1{}
We start this section with some history: Lisp, the language with lots of parentheses, shortly following Fortran as one of the first high-level programming languages, was initially designed between 1956 and 1958, and subsequently implemented (McCarthy 1981). Dialects of Lisp generally support a variety of programming paradigms, including (but not limited to) functional programming and object-oriented programming (e.g. via CLOS, the Common Lisp Object System). One of the the most proeminent aspects of Lisp is homoiconicity, the fact that programs and data structures look the same. This enables programs to easily manipulate other programs, and led to the extensive use of macros. Uses of macros usually look like function applications, but, instead of invoking a target function at run-time, a macro will perform some computation at compile-time, and expand to some new code, which is injected as a replacement of the macro’s use.
The two main dialects of Lisp are Common Lisp and Scheme. Scheme follows a minimalist philosophy, where a small core is standardised (Abelson et al. 1998; Shinn et al. 2013; Sperber et al. 2009) and subsequently extended via macros and additional function definitions.
Racket, formerly named PLT Scheme, started as a Scheme implementation. Racket evolved, and the Racket Manifesto (Felleisen et al. 2015) presents it as a “programming-language programming language”, a language which helps with the creation of small linguistic extensions as well as entirely new languages. The Racket ecosystem features many languages covering many paradigms:
The racket/base language is a full-featured programming language which mostly encourages functional programming.
racket/class implements an object-oriented system, implemented atop racket/base using macros, and can be used along with the rest of the racket/base language.
racklog is a logic programming language in the style of prolog. The Racket ecosystem also includes an implementation of datalog.
Scribble can be seen as an alternative to LaTeX, and is used to create the Racket documentation. It also supports literate programming, by embedding chunks of code in the document which are then aggregated together. This thesis is in fact written using Scribble.
slideshow is a DSL (domain-specific language) for the creation of presentations, and can be thought as an alternative to Beamer and SliTeX.
r5rs and r6rs are implementations of the corresponding scheme standards.
Redex is a DSL which allows the specification of reduction semantics for programming languages. It features tools to explore and test the defined semantics.
Typed Racket (Tobin-Hochstadt 2010; Tobin-Hochstadt and Felleisen 2008) is a typed variant of the main racket language. It is implemented as a macro which takes over the whole body of the program. That macro fully expands all other macros in the program, and then typechecks the expanded program.
Turnstile allows the creation of new typed languages. It takes a different approach when compared to Typed Racket, and threads the type information through assignments and special forms, in order to be able to typecheck the program during expansion, instead of doing so afterwards.
In the remainder of this section, we will present the features of Typed Racket’s type system, and then present formal semantics for a subset of those, namely the part which is relevant to our work. The Typed Racket Guide and The Typed Racket Reference provide good documentation for programmers who desire to use Typed Racket; we will therefore keep our overview succinct and gloss over most details.
4.1 Overview of Typed Racket’s type system
4.1.1 Simple primitive types
Typed Racket has types matching Racket’s baggage of primitive values: Number, Boolean, Char, String, Void1313 The Void type contains only a single value, #<void>, and is equivalent to the void type in C. It is the equivalent of unit of CAML and Haskell, and is often used as the return type of functions which perform side-effects. It should not be confused with Nothing, the bottom type which is not inhabited by any value, and is similar to the type of Haskell’s undefined. Nothing can be used for example as the type of functions which never return — in that way it is similar to C’s __attribute__ ((__noreturn__)). and so on.
> (ann #true Boolean) - : Boolean
#t
> 243 - : Integer [more precisely: Positive-Byte]
243
> "Hello world" - : String
"Hello world"
> #\c - : Char
#\c
; The void function produces the void value ; Void values on their own are not printed, ; so we place it in a list to make it visible. > (list (void)) - : (Listof Void) [more precisely: (List Void)]
'(#<void>)
For numbers, Typed Racket offers a “numeric tower” of partially-overlapping types: Positive-Integer is a subtype of Integer, which is itself a subtype of Number. Zero, the type containing only the number 0, is a both a subtype of Nonnegative-Integer (numbers ≥ 0) and of Nonpositive-Integer (numbers ≤ 0).
Typed Racket also includes a singleton type for each primitive value of these types: we already mentioned Zero, which is an alias of the 0 type. Every number, character, string and boolean value can be used as a type, which is only inhabited by the same number, character, string or boolean value. For example, 243 belongs to the singleton type 243, which is a subtype of Positive-Integer.
> 0 - : Integer [more precisely: Zero]
0
> (ann 243 243) - : Integer [more precisely: 243]
243
> #t - : Boolean [more precisely: True]
#t
4.1.2 Pairs and lists
Pairs are the central data structure of most Lisp dialects. They are used to build linked lists of pairs, terminated by '(), the null element. The null element has the type Null, while the pairs which build the list have the type (Pairof A B), where A and B are replaced by the actual types for the first and second elements of the pair. For example, the pair built using (cons 729 #true), which contains 729 as its first element, and #true as its second element, has the type (Pairof Number Boolean), or using the most precise singleton types, (Pairof 729 #true).
> (cons 729 #true) - : (Pairof Positive-Index True)
'(729 . #t)
> '(729 . #true) - : (Pairof Positive-Index True)
'(729 . #t)
Heterogeneous linked lists of fixed length can be given a precise type by nesting the same number of pairs at the type level. For example, the list built with (list 81 #true 'hello) has the type (List Number Boolean Symbol), which is a shorthand for the type (Pairof Number (Pairof Boolean (Pairof Symbol Null))). Lists in Typed Racket can thus be seen as the equivalent of a chain of nested 2-tuples in languages like CAML or Haskell. The analog in object-oriented languages with support for generics would be a class Pair<A, B>, where the generic type argument B could be instantiated by another instance of Pair, and so on.
> (cons 81 (cons #true (cons 'hello null))) - : (Listof (U 'hello Positive-Byte True))
'(81 #t hello)
> (ann (list 81 #true 'hello) (Pairof Number (Pairof Boolean (Pairof Symbol Null)))) - : (Listof (U Boolean Complex Symbol)) [more precisely: (List Number Boolean Symbol)]
'(81 #t hello)
The type of variable-length homogeneous linked lists can be described using the Listof type operator. The type (Listof Integer) is equivalent to (Rec R (U (Pairof Integer R) Null)). The Rec type operator describes recursive types, and U describes unions. Both of these features are described below, for now we will simply say that the previously given type is a recursive type R, which can be a (Pairof Integer R) or Null (to terminate the linked list).
> (ann (range 0 5) (Listof Number)) - : (Listof Number)
'(0 1 2 3 4)
4.1.3 Symbols
Another of Racket’s primitive datatypes is symbols. Symbols are interned strings: two occurrences of a symbol produce values which are pointer-equal if the symbols are equal (i.e. they represent the same string)1414 This is true with the exception of symbols created with gensym and the like. gensym produces a fresh symbol which is not interned, and therefore different from all existing symbols, and different from all symbols created in the future..
Typed Racket includes the Symbol type, to which all symbols belong. Additionally, there is a singleton type for each symbol: the type 'foo is only inhabited by the symbol 'foo.
> 'foo - : Symbol [more precisely: 'foo]
'foo
Singleton types containing symbols can be seen as similar to constructors without arguments in CAML and Haskell, and as globally unique enum values in object-oriented languages. The main difference resides in the scope of the declaration: two constructor declarations with identical names in two separate files will usually give distinct types and values. Similarly, when using the “type-safe enum” design pattern, two otherwise identical declarations of an enum will yield objects of different types. In contrast, two uses of an interned symbols in Racket and Typed Racket will produce identical values and types. A way of seeing this is that symbols are similar to constructors (in the functional programming sense) or enums which are implicitly declared globally.
> (module m1 typed/racket (define sym1 'foo) (provide sym1))
> (module m2 typed/racket (define sym2 'foo) (provide sym2)) > (require 'm1 'm2) ; The tow independent uses of 'foo are identical: > (eq? sym1 sym2) - : Boolean
#t
4.1.4 Unions
These singleton types may not seem very useful on their own. They can however be combined together with union types, which are built using the U type operator.
The union type (U 0 1 2) is inhabited by the values 0, 1 and 2, and by no other value. The Boolean type is actually defined as (U #true #false), i.e. the union of the singleton types containing the #true and #false values, respectively. The Nothing type, which is not inhabited by any value, is defined as the empty union (U). The type Any is the top type, i.e. it is a super-type of all other types, and can be seen as a large union including all other types, including those which will be declared later or in other units of code.
Unions of symbols are similar to variants which contain zero-argument constructors, in CAML or Haskell.
> (define v : (U 'foo 'bar) 'foo) > v - : Symbol [more precisely: (U 'bar 'foo)]
'foo
> (set! v 'bar) > v - : Symbol [more precisely: (U 'bar 'foo)]
'bar
; This throws an error at compile-time: > (set! v 'oops) eval:5:0: Type Checker: type mismatch;
mutation only allowed with compatible types
expected: (U 'bar 'foo)
given: 'oops
in: oops
A union such as (U 'ca (List 'cb Number) (List 'cc String Symbol)) can be seen as roughly the equivalent of a variant with three constructors, ca, cb and cc, where the first has no arguments, the second has one argument (a Number), and the third has two arguments (a String and a Symbol).
The main difference is that a symbol can be used as parts of several unions, e.g. (U 'a 'b) and (U 'b 'c), while constructors can often only be part of the variant used to declare them. Unions of symbols are in this sense closer to CAML’s so-called polymorphic variants (Minsky et al. 2013a) than to regular variants.
> (define-type my-variant (U 'ca (List 'cb Number) (List 'cc String Symbol))) > (define v₁ : my-variant 'ca) > (define v₂ : my-variant (list 'cb 2187)) > (define v3 : my-variant (list 'cc "Hello" 'world))
Finally, it is possible to mix different sorts of types within the same union: the type (U 0 #true 'other) is inhabited by the number 0, the boolean #true, and the symbol 'other. Translating such an union to a language like CAML could be done by explicitly tagging each case of the union with a distinct constructor.
Implementation-wise, all values in the so-called “untyped” version of Racket are tagged: a few bits within the value’s representation are reserved and used to encode the value’s type. When considering the target of a pointer in memory, Racket is therefore able to determine if the pointed-to value is a number, boolean, string, symbol and so on. Typed Racket preserves these run-time tags. They can then be used to detect the concrete type of a value when its static type is a union. This detection is done simply by using Racket’s predicates: number?, string?, symbol? etc.
4.1.5 Intersections
Intersections are the converse of unions: instead of allowing a mixture of values of different types, an intersection type, described using the ∩ type operator, only allows values which belong to all types.
The intersection type (∩ Nonnegative-Integer Nonpositive-Integer) is the singleton type 0. The intersection of (U 'a 'b 'c) and (U 'b 'c 'd) will be (U 'b 'c), as 'b and 'c belong to both unions.
; :type shows the given type, or a simplified version of it > (:type (∩ (U 'a 'b 'c) (U 'b 'c 'd))) (U 'b 'c)
Typed Racket is able to reduce some intersections such as those given above at compile-time. However, in some cases, it is forced to keep the intersection type as-is. For example, structs (describled below can, using special properties, impersonate functions. This mechanism is similar to PHP’s __invoke, the ability to overload operator() in C++. Typed Racket does not handle these properties (yet), and therefore cannot determine whether a given struct type also impersonates a function or not. This means that the intersection (∩ s (→ Number String)), where s is a struct type, cannot be reduced to Nothing, because Typed Racket cannot determine whether the struct s can act as a function or not.
Another situation where Typed Racket cannot reduce the intersection is when intersecting two function types (presented below).
(∩ (→ Number String) (→ Number Symbol)) (∩ (→ Number String) (→ Boolean String))
The first intersection seems like could be simplified to (→ Number String) (→ Number Symbol), and the second one could be simplified to (→ (U Number Boolean) String), however the equivalence between these types has not been implemented (yet) in Typed Racket, so we do not rely on them. Note that this issue is not a soundness issue: it only prevents passing values types to which they belong in principle, but it cannot be exploited to assign a value to a variable with an incompatible type.
Finally, when some types are intersected with a polymorphic type variable, the intersection cannot be computed until the polymorphic type is instantiated.
When Typed Racket is able to perform a simplification, occurrences of Nothing (the bottom type) propagate outwards in some cases, pairs and struct types which contain Nothing as one of their elements being collapsed to Nothing. This propagation of Nothing starts from occurrences of Nothing in the parts of the resulting type which are traversed by the intersection operator. It collapses the containing pairs and struct types to Nothing, moving outwards until the ∩ operator itself is reached. In principle, the propagation could go on past that point, but this is not implemented yet in Typed Racket1515 See Issue #552 on Typed Racket’s GitHub repository for more details on what prevents implementing a more aggressive propagation of Nothing..
The type (∩ 'a 'b) therefore gets simplified to Nothing, and the type (∩ (Pairof 'a 'x) (Pairof 'b 'x)) also simplifies to Nothing (Typed Racket initially pushes the intersection down the pairs, so that the type first becomes (Pairof (∩ 'a 'b) (∩ 'x 'x)), which is simplified to (Pairof Nothing 'x), and the occurrence of Nothing propagates outwards). However, if the user directly specifies the type (Pairof (∩ 'a 'b) Integer), it is simplified to (Pairof Nothing Integer), but the Nothing does not propagate outwards beyond the initial use of ∩.
> (:type (∩ 'a 'b)) Nothing
> (:type (∩ (Pairof 'a 'x) (Pairof 'b 'x))) Nothing
> (:type (Pairof (∩ 'a 'b) Integer))
(Pairof Nothing Integer)
[can expand further: Integer]
A simple workaround exists: the outer type, which could be collapsed to Nothing, can be intersected again with a type of the same shape. The outer intersection will traverse both types (the desired one and the “shape”), and propagate the leftover Nothing further out.
> (:type (Pairof (∩ 'a 'b) Integer))
(Pairof Nothing Integer)
[can expand further: Integer]
> (:type (∩ (Pairof (∩ 'a 'b) Integer) (Pairof Any Any))) Nothing
These intersections are not very interesting on their own, as in most cases it is possible to express the resulting simplified type without using the intersection operator. They become more useful when mixed with polymorphic types: intersecting a polymorphic type variable with another type can be used to restrict the actual values that may be used. The type (∩ A T), where A is a polymorphic type variable and T is a type defined elsewhere, is equivalent to the use of bounded type parameters in Java or C#. In C#, for example, the type (∩ A T) would be written using an where A : T clause.
4.1.6 Structs
Racket also supports structs, which are mappings from fields to values. A struct is further distinguished by its struct type: instances of two struct types with the same name and fields, declared in separate files, can be differentiated using the predicates associated with these structs. Structs in Racket can be seen as the analog of classes containing only fields (but no methods) in C# or Java. Such classes are sometimes called “Plain Old Data (POD) Objects”. Structs belong to a single-inheritance hierarchy: instances of the descendents of a struct type are recognised by their ancestor’s predicate. When a struct inherits from another, it includes its parent’s fields, and can add extra fields of its own.
Each struct declaration within a Typed Racket program additionally declares corresponding type.
> (struct parent ([field₁ : (Pairof String Symbol)]) #:transparent)
> (struct s parent ([field₂ : Integer] [field₃ : Symbol]) #:transparent) > (s (cons "x" 'y) 123 'z) - : s
(s '("x" . y) 123 'z)
In Typed Racket, structs can have polymorphic type arguments, which can be used inside the types of the struct’s fields.
> (struct (A B) poly-s ([field₁ : (Pairof A B)] [field₂ : Integer] [field₃ : B]) #:transparent) > (poly-s (cons "x" 'y) 123 'z) - : (poly-s String (U 'y 'z))
(poly-s '("x" . y) 123 'z)
Racket further supports struct type properties, which can be seen as a limited form of method definitions for a struct, thereby making them closer to real objects. The same struct type property can be implemented by many structs, and the declaration of a struct type property is therefore roughly equivalent to the declaration of an interface with a single method.
Struct type properties are often considered a low-level mechanism in Racket. Among other things, a struct type property can only be used to define a single property at a time. When multiple “methods” have to be defined at once (for example, when defining the prop:equal+hash property, which requires the definition of an equality comparison function, and two hashing functions), these can be grouped together in a list of functions, which is then used as the property’s value. “Generic interfaces” are a higher-level feature, which among other things allow the definition of multiple “methods” as part of a single generic interface, and offers a friendlier API for specifying the “generic interface” itself (i.e. what Object Oriented languages call an interfece), as and for specifying the implementation of said interface.
Typed Racket unfortunately offers no support for struct type properties and generic interfaces for now. It is impossible to assert that a struct implements a given property at the type level, and it is also for example not possible to describe the type of a function accepting any struct implementing a given property or generic interface. Finally, no type checks are performed on the body of functions bound to such properties, and to check verifies that a function implementation with the right signature is supplied to a given property. Since struct type properties and generics cannot be used in a type-safe way for now, we refrain from using these features, and only use them to implement some very common properties1616 We built a thin macro wrapper which allows typechecking the implementation and signature of the functions bound to these two properties.: prop:custom-write which is the equivalent of Java’s void toString(), and prop:equal+hash which is equivalent to Java’s boolean equals(Object o) and int hashCode().
4.1.7 Functions
Typed Racket provides rich function types, to support some of the flexible use patterns allowed by Racket.
The simple function type below indicates that the function expects two arguments (an integer and a string), and returns a boolean:
We note that unlike Haskell and CAML functions, Racket functions are not implicitly curried. To express the corresponding curried function type, one would write:
A function may additionally accept optional positional arguments, and keyword (i.e. named) arguments, both mandatory and optional:
; Mandatory string, optional integer and boolean arguments: (->* (String) (Integer Boolean) Boolean) ; Mandatory keyword arguments: (→ #:size Integer #:str String Boolean) ; Mandatory #:str, optional #:size and #:opt: (->* (#:str String) (#:size Integer #:opt Boolean) Boolean)
Furthermore, functions in Racket accept a catch-all “rest” argument, which allows for the definition of variadic functions. Typed racket also allows expressing this at the type level, as long as the arguments covered by the “rest” clause all have the same type:
; The function accepts one integer and any number of strings: (-> Integer String * Boolean) ; Same thing with an optional symbol inbetween: (->* (Integer) (Symbol) #:rest String Boolean)
One of Typed Racket’s main goals is to be able to typecheck idiomatic Racket programs. Such programs may include functions whose return type depends on the values of the input arguments. Similarly, case-lambda can be used to create lambda functions which dispatch to multiple behaviours based on the number of arguments passed to the function.
Typed Racket provides the case→ type operator, which can be used to describe the type of these functions:
; Allows 1 or 3 arguments, with the same return type. (case→ (→ Integer Boolean) (→ Integer String Symbol Boolean)) ; A similar type based on optional arguments allows 1, 2 or 3 ; arguments in contrast: (->* (Integer) (String Symbol) Boolean) ; The output type can depend on the input type: (case→ (→ Integer Boolean) (→ String Symbol)) ; Both features (arity and dependent output type) can be mixed (case→ (→ Integer Boolean) (→ Integer String (Listof Boolean)))
Another important feature, which can be found in the type system of most functional programming languages, and most object-oriented languages, is parametric polymorphism. Typed Racket allows the definition of polymorphic structs, as detailed above, as well as polymorphic functions. For example, the function cons can be considered as a polymorphic function with two polymorphic type arguments A and B, which takes an argument of type A, an argument of type B, and returns a pair of A and B.
Typed Racket supports polymorphic functions with multiple polymorphic type variables, as the one shown above. Furthermore, it allows one of the polymorphic variables to be followed by ellipses, indicating a variable-arity polymorphic type (Tobin-Hochstadt 2010). The dotted polymorphic type variable can be instantiated with a tuple of types, and will be replaced with that tuple where it appears. For example, the type
can be instantiated with Number String Boolean, which would yield the type for a function accepting a list of four elements: two numbers, a string and a boolean.
Dotted polymorphic type variables can only appear in some places. A dotted type variable can be used as the tail of a List type, so (List Number B ...) (a String followed by any number of Bs) is a valid type, but (List B ... String) (any number of Bs followed by a String) is not. A dotted type variable can also be used to describe the type of a variadic function, as long as it appears after all other arguments, so (→ String B ... Void) (a function taking a String, any number of Bs, and returning Void) is a valid type, but (→ String B ... Void) (a function taking any number of Bs, and a String, and returning Void) is not. Finally, a dotted type variable can be used to represent the last element of the tuple of returned values, for functions which return multiple values (which are described below).
When the context makes it unclear whether an ellipsis \ldots{} indicates a dotted type variable, or is used to indicate a metasyntactic repetition at the level of mathematical formulas, we will write the first using τ{}_1 \ldots{} τ{}_n, explicitly indicating the first and last elements, or using \overline{τ} [and we will write the second using \textit{tvar}\ \textit{ooo}]Todo
Functions in Racket can return one or several values. When the number of values returned is different from one, the result tuple can be destructured using special functions such as (call-with-values f g), which passes each value returned by f as a distinct argument to g. The special form (let-values ([(v₁ … vₙ) e]) body) binds each value returned by e to the corresponding vᵢ (the expression e must produce exactly n values). The type of a function returning multiple values can be expressed using the following notation:
(→ In₁ … Inₙ (Values Out₁ … Outₘ))
Finally, predicates (functions whose results can be interpreted as booleans) can be used to gain information about the type of their argument, depending on the result. The type of a predicate can include positive and negative filters, indicated with #:+ and #:-, respectively. The type of the string? predicate is:
In this notation, the positive filter #:+ String indicates that when the predicate returns #true, the argument is known to be a String. Conversely, when the predicate exits with #false, the negative filter #:- (! String) indicates that the input could not (!) possibly have been a string. The information gained this way allows regular conditionals based on arbitrary predicates to work like pattern-matching:
> (define (f [x : (U String Number Symbol)]) (if (string? x) ; x is known to be a String here: (ann x String) ; x is known to be a Number or a Symbol here: (ann x (U Number Symbol))))
The propositions do not necessarily need to refer to the value as a whole, and can instead give information about a sub-part of the value. Right now, the user interface for specifying paths can only target the left and right members of cons pairs, recursively. Internally, Typed Racket supports richer paths, and the type inference can produce filters which give information about individual structure fields, or about the result of forced promises, for example.
4.1.8 Occurrence typing
Typed Racket is built atop Racket, which does not natively support pattern matching. Instead, pattern matching forms are implemented as macros, and expand to nested uses of if.
As a result, Typed Racket needs to typecheck code with the following structure:
(λ ([v : (U Number String)]) (if (string? v) (string-append v ".") (+ v 1)))
In this short example, the type of v is a union type including Number and String. After applying the string? predicate, the type of v is narrowed down to String in the then branch, and it is narrowed down to Number in the else branch. The type information gained thanks to the predicate comes from the filter part of the predicate’s type (as explained in Functions).
Occurrence typing only works on immutable variables and values. Indeed, if the variable is modified with set!, or if the subpart of the value stored within which is targeted by the predicate is mutable, it is possible for that value to change between the moment the predicate is executed, and the moment when the value is actually used. This places a strong incentive to mostly use immutable variables and values in Typed Racket programs, so that pattern-matching and other forms work well.
In principle, it is always possible to copy the contents of a mutated variable to a temporary one (or copy a mutable subpart of the value to a new temporary variable), and use the predicate on that temporary copy. The code in the then and else branches should also use the temporary copy, to benefit from the typing information gained via the predicate. In our experience, however, it seems that most macros which perform tasks similar to pattern-matching do not provide an easy means to achieve this copy. It therefore remains more practical to avoid mutation altogether when possible.
4.1.9 Recursive types
Typed Racket allows recursive types, both via (possibly mutually-recursive) named declarations, and via the Rec type operator.
In the following examples, the types Foo and Bar are mutually recursive. The type Foo matches lists with an even number of alternating Integer and String elements, starting with an Integer,
(define-type Foo (Pairof Integer Bar)) (define-type Bar (Pairof String (U Foo Null)))
This same type could alternatively be defined using the Rec operator. The notation (Rec R T) builds the type T, where occurrences of R are interpreted as recursive occurrences of T itself.
(Rec R (Pairof Integer (Pairof String (U R Null))))
4.1.10 Classes
The racket/class module provides an object-oriented system for Racket. It supports the definition of a hierarchy of classes with single inheritance, interfaces with multiple inheritance, mixins and traits (methods and fields which may be injected at compile-time into other classes), method renaming, and other features.
The typed/racket/class module makes most of the features of racket/class available to Typed Racket. In particular, it defines the following type operators:
Class is used to describe the features a class, including the signature of its constructor, as well as the public fields and methods exposed by the class. We will note that a type expressed with Class does not mention the name of the class. Any concrete implementation which exposes all (and only) the required methods, fields and constructor will inhabit the type. In other words, the types built with Class are structural, and not nominal.
Object is used to describe the methods and fields which an already-built object bears.
The (Instance (Class …)) type is a shorthand for the Object type of instances of the given class type. It can be useful to describe the type of an instance of a class without repeating all the fields and methods (which could have been declared elsewhere).
In types described using Class and Instance, it is possible to omit fields which are not relevant. These fields get grouped under a single row polymorphic type variable. A row polymorphic function can, for example, accept a class with some existing fields, and produce a new class extending the existing one:
> (: add-my-field (∀ (r #:row) (-> (Class (field [existing Number]) #:row-var r) (Class (field [existing Number] [my-field String]) #:row-var r)))) > (define (add-my-field parent%) (class parent% (super-new) (field [my-field : String "Hello"]))) The small snippet of code above defined a function add-my-field which accepts a parent% class exporting at least the existing field (and possibly other fields and methods). It then creates an returns a subclass of the given parent% class, extended with the my-field field.
We consider the following class, with the required existing field, and a supplementary other field:
> (define a-class% (class object% (super-new) (field [existing : Integer 0] [other : Boolean #true]))) When passed to the add-my-field function, the row type variable is implicitly instantiated with the field [other Boolean]. The result of that function call is therefore a class with the three fields existing, my-field and other.
> (add-my-field a-class%) - : (Class (field (existing Number) (my-field String) (other Boolean)))
#<class:add-my-field>
These mechanisms can be used to perform reflective operations on classes like adding new fields and methods to dynamically-created subclasses, in a type-safe fashion.
The Row operator can be used along with row-inst to explicitly instantiate a row type variable to a specific set of fields and methods. The following call to add-my-field is equivalent to the preceding one, but does not rely on the automatic inference of the row type variable.
> ({row-inst add-my-field (Row (field [other Boolean]))} a-class%) - : (Class (field (existing Number) (my-field String) (other Boolean)))
#<class:add-my-field>
We will not further describe this object system here, as our work does not rely on this aspect of Typed Racket’s type system. We invite the curious reader to refer to the documentation for racket/class and typed/racket/class for more details.
We will simply note one feature which is so far missing from Typed Racket’s object system: immutability. It is not possible yet to indicate in the type of a class that a field is immutable, or that a method is pure (in the sense of always returning the same value given the same input arguments). The absence of immutability means that occurrence typing does not work on fields. After testing the value of a field against a predicate, it is not possible to narrow the type of that field, because it could be mutated by a third party between the check and future uses of the field.
4.1.11 Local type inference
Typed Racket’s type system is rich and contains many features. Among other things, it mixes polymorphism and subtyping, which notoriously make typechecking difficult.
The authors of Typed Racket claim that global type inference often produces indecipherable error messages, with a small change having repercussions on the type of terms located in other distant parts of the program.
The authors of Typed Racket suggest that type annotations are often beneficial as documentation. Since the type annotations are checked at compile-time, they additionally will stay synchronised with the code that they describe, and will not become out of date.
Instead of relying on global type inference, Typed Racket uses local type inference to determine the type of local variables and expressions. Typed Racket’s type inference can also determine the correct instantiation for most calls to polymorphic functions. It however requires type annotations in some places. For example, it is usually necessary to indicate the type of the parameters when defining a function.
4.2 Formal semantics for part of Typed Racket’s type system
\def\ifmathjax#1{#1}\def\iflatex#1{}
The following definitions and rules are copied and adjusted from (Tobin-Hochstadt 2010), with the author’s permission. Some of the notations were changed to use those of (Kent et al. 2016).
We include below the grammar, semantics and typing rules related to the minimal core of the Typed Racket language1717 The core language is defined in (Tobin-Hochstadt 2010), pp. 61–70., dubbed λ_{\mathit{TS}}, including extensions which add pairs1818 The extensions needed to handle pairs are described in (Tobin-Hochstadt 2010), pp. 71–75., functions of multiple arguments, variadic functions and variadic polymorphic functions1919 The extensions needed to handle functions of multiple arguments, variadic functions, and variadic functions where the type of the “rest” arguments are not uniform are described in (Tobin-Hochstadt 2010), pp. 91–77., intersection types, recursive types, symbols and promises. These features have been informally described in Overview of Typed Racket’s type system.
We purposefully omit extensions which allow advanced logic reasoning when propagating information gained by complex combinations of conditionals2020 The extensions which allow advanced logic reasoning are described in (Tobin-Hochstadt 2010), pp. 75–78., refinement types2121 The extensions which introduce refinement types are described in (Tobin-Hochstadt 2010), pp. 85–89., dependent refinement types2222 Dependent refinement types are presented in (Kent et al. 2016). (which allow using theories from external solvers to reason about values and their type, e.g. using bitvector theory to ensure that a sequence of operations does not produce a result exceeding a certain machine integer size), structs and classes. These extensions are not relevant to our work2323 We informally describe a translation of our system of records into structs in section [[??]]Todo, but settle for an alternative implementation in section [[??]]Todo which does not rely on structs., and their inclusion in the following semantics would needlessly complicate things.
4.2.1 Notations
We note a sequence of elements with \overrightarrow{y}. When there is more than one sequence involved in a rule or equation, we may use the notation \overrightarrow{y}\overset{\scriptscriptstyle\,\ifmathjax{\raise1mu{n}}\iflatex{n}}{\vphantom{y}} to indicate that there are n elements in the sequence. Two sequences can be forced to have the same number of elements in that way. We represent a set of elements (an “unordered” sequence) with the notation \def\overrightbracedarrow#1{\overset{\scriptscriptstyle{\raise1mu{\{}}}{\vphantom{#1}}\overrightarrow{#1}\overset{\scriptscriptstyle{\raise1mu{\}}}}{\vphantom{#1}}}\overrightbracedarrow{y}. The use of ellipses in α \mathbf{\ldots{}} does not indicate the repetition of α. Instead, it indicates that α is a variadic polymorphic type variable: a placeholder for zero or more types which will be substituted for occurrences of α when the polymorphic type is instantiated. These ellipses appear as such in the Typed Racket source code, and are the reason we use the notation \overrightarrow{y} to indicate repetition, instead of the ellipses commonly used for that purpose. FInally, an empty sequence of repeated elements is sometimes noted ϵ
The judgements are written following the usual convention, where Γ is the environment which associates variables to their type. The Δ environment contains the type variables within scope, and is mostly used to determine the validity of types.
The environments can be extended as follows:
The typing information \mathrm{R} associated with an expression contains the type of the expression, as well as aliasing information and other propositions which are known to be conditionally true depending on the value of the expression at run-time. These pieces of information are described in more detail below. Since the typing information \mathrm{R} is often inlined in the typing judgement, a typing judgement will generally have the following form:
In this notation, the + and - signs in \phi{}^+ and \phi{}^- are purely syntactical, and serve to distinguish the positive and negative filters, which are instances of the nonterminal \phi.
The various nonterminals used throughout the language are written in italics and are defined using the notation:
Additionally, a symbol assigned to a nonterminal may be used as a placeholder in rules and definitions, implicitly indicating that the element used to fill in that placeholder should be an instance of the corresponding nonterminal. When multiple such placeholders are present in a rule, they will be subscripted to distinguish between the different occurrences. The subscripts i, j, k and l are often used for repeated sequences.
In later chapters, we extend already-defined non-terminals using the notation:
Typing rules and other rules are described following the usual natural deduction notation.
Metafunctions (i.e. functions which operate on types as syntactical elements, or on other terms of the language) are written in a roman font. The meta-values \mathrm{true} and \mathrm{false} indicate logical truth and falsehood respectively.
Language operators are written in bold face:
Values are written using a bold italic font:
Type names start with a capital letter, and are written using a bold font:
We indicate the syntactical substitution of y with z in w using the notation w[y↦z]. When given several elements to replace, the substitution operator performs a parallel substitution (that is, w[x↦y\ y↦z] will not replace the occurrences of y introduced by the first substitution).
[Succinctly describe the other conventions used in the thesis, if any were omitted above.]Todo
[Define the meta substitution and equality operators precisely.]Todo
4.2.2 Names and bindings
In the following sections, we assume that all type variable names which occur in binding positions are unique. This assumption could be made irrelevant by explicitly renaming in the rules below all type variables to fresh unique ones. Performing this substitution would be a way of encoding a notion of scope and the possibility for one identifier to hide another. However, the Racket language features macros which routinely produce new binding forms. The macro system in Racket is a hygienic one, and relies on a powerful model of the notion of scope (Flatt 2016). Extending the rules below with a simplistic model of Racket’s notion of scope would not do justice to the actual system, and would needlessly complicate the rules. Furthermore, Typed Racket only typechecks fully-expanded programs. In these programs, the binding of each identifier has normally been determined2424 Typed Racket actually still determines the binding for type variables by itself, but we consider this is an implementation detail..
4.2.3 Expressions
The following expressions are available in the subset of Typed Racket which we consider. These expressions include references to variables, creation of basic values (numbers, booleans, lists of pairs ending with \textbf{$\mathord{{\bfit \text{null}}}$}, symbols, promises), a variety of lambda functions with different handling of rest arguments (fixed number of arguments, polymorphic functions with a uniform list of rest arguments and variadic polymorphic functions, as well as polymorphic abstractions), a small sample of primitive functions which are part of Racket’s library and a few operations manipulating these values (function application and polymorphic instantiation, forcing promises, symbol comparison and so on).
Symbol literals are noted as s \in{} \mathcal{S} and the universe of symbols (which includes symbol literals and fresh symbols created via (\textbf{gensym})) is noted as s{}’ \in{} \mathcal{S}{}’.
4.2.4 Primitive operations (library functions)
Racket offers a large selection of library functions, which we consider as primitive operations. A few of these are listed below, and their type is given later after, once the type system has been introduced. \textit{number?}, \textit{pair?} and \textit{null?} are predicates for the corresponding type. \textit{car} and \textit{cdr} are accessors for the first and second elements of a pair, which can be created using \mathit{cons}. The \textit{identity} function returns its argument unmodified, and \textit{add1} returns its numeric argument plus 1. These last two functions are simply listed as examples.
4.2.5 Values
These expressions and primitive functions may produce or manipulate the following values:
The \textbf{$\mathord{{\bfit \text{list}}}$} value notation is defined as a shorthand for a \textbf{$\mathord{{\bfit \text{null}}}$}-terminated linked list of pairs.
4.2.6 Evaluation contexts
The operational semantics given below rely on the following evaluation contexts:
4.2.7 Typing judgement
The Γ ; Δ ⊢ e : \mathrm{R} typing judgement indicates that the expression e has type τ. The Γ typing environment maps variables to their type (and to extra information), while the Δ environment stores the polymorphic type variables, variadic polymorphic type variables and recursive type variables which are in scope.
Additionally, the typing judgement indicates a set of propositions \phi{}^- which are known to be true when the run-time value of e is \textbf{$\mathord{{\bfit \text{false}}}$}, and a set of propositions \phi{}^+ which are known to be true when the run-time value of e is \textbf{$\mathord{{\bfit \text{true}}}$}2525 Any other value is treated in the same way as \textbf{$\mathord{{\bfit \text{true}}}$}, as values other than \textbf{$\mathord{{\bfit \text{false}}}$} are traditionally considered as true in language of the Lisp family.. The propositions will indicate that the value of a separate variable belongs (or does not belong) to a given type. For example, the \phi{}^- proposition \mathbf{Number}_y indicates that when e evaluates to \textbf{$\mathord{{\bfit \text{false}}}$}, the variable y necessarily holds an integer.
Finally, the typing judgement can indicate with o that the expression e is an alias for a sub-element of another variable in the environment. For example, if the object o is \mathrm{car} :: \mathrm{cdr}(y), it indicates that the expression e produces the same value that (car (cdr y)) would, i.e. that it returns the second element of a (possibly improper) list stored in y.
Readers familiar with abstract interpretation can compare the \phi propositions to the Cartesian product of the abstract domains of pairs of variables. A static analyser can track possible pairs of values contained in pairs of distinct variables, and will represent this information using an abstract domain which combinations of values may be possible, and which may not. Occurrence typing similarly exploits the fact that the type of other variables may depend on the value of τ.
4.2.8 Types
Typed Racket handles the types listed below. Aside from the top type (⊤) which is the supertype of all other types, this list includes singleton types for numbers, booleans, symbols and the \textbf{$\mathord{{\bfit \text{null}}}$} value. The types \mathbf{Number} and \mathbf{Symbol} are the infinite unions of all number and symbol singletons, respectively. Also present are function types (with fixed arguments, homogeneous rest arguments and the variadic polymorphic functions which accept heterogeneous rest arguments, as well as polymorphic abstractions), unions of other types, intersections of other types, the type of pairs and promises. The value assigned to a variadic polymorphic function’s rest argument will have a type of the form (\textbf{List}\ τ \mathbf{\ldots{}}_{α}). Finally, Typed Racket allows recursive types to be described with the \textbf{Rec} combinator.
Additionally, the \mathbf{Boolean} type is defined as the union of the \textbf{True} and \textbf{False} singleton types, and the \textbf{List} type operator is a shorthand for describing the type of \textbf{$\mathord{{\bfit \text{null}}}$}-terminated heterogeneous linked lists of pairs, with a fixed length. The \textbf{Listof} type operator is a shorthand for describing the type of \textbf{$\mathord{{\bfit \text{null}}}$}-terminated homogeneous linked lists of pairs, with an unspecified length.
4.2.9 Filters (value-dependent propositions)
The filters associated with an expression are a set of positive (resp. negative) propositions which are valid when the expression is true (resp. false).
These propositions indicate that a specific subelement of a location has a given type.
The location can be a variable, or the special \bullet{} token, which denotes a function’s first parameter, when the propositions are associated with that function’s result. This allows us to express relations between the output of a function and its input, without referring to the actual name of the parameter, which is irrelevant. In other words, \bullet{} occurs in an α-normal form of a function’s type.
Objects, which represent aliasing information, can either indicate that the expression being considered is an alias for a sub-element of a variable, or that no aliasing information is known.
4.2.10 Objects (aliasing information)
Sub-elements are described via a chain of path elements which are used to access the sub-element starting from the variable.
4.2.11 Paths
The path concatenation operator :: is associative. . The ϵ is omitted from paths with one or more elements, so we write car::cdr instead of car::cdr::ϵ.
4.2.12 Path elements
Path elements can be \mathrm{car} and \mathrm{cdr}, to indicate access to a pair’s first or second element, and \mathrm{force}, to indicate that the proposition or object targets the result obtained after forcing a promise. We will note here that this obviously is only sound if forcing a promise always returns the same result (otherwise the properties and object which held on a former value may hold on the new result). Racket features promises which do not cache their result. These could return a different result each time they are forced by relying on external state. However, forcing a promise is generally assumed to be an idempotent operation, and not respecting this implicit contract in production code would be bad practice. Typed Racket disallows non-cached promises altogether. We introduced a small module delay-pure which allows the safe creation of non-cached promises. delay-pure restricts the language to a small subset of functions and operators which are known to not perform any mutation, and prevents access to mutable variables. This ensures that the promises created that way always produce the same value, without the need to actually cache their result.
4.2.13 Subtyping
The subtyping judgement is ⊢ τ \mathrel{≤:} σ. It indicates that τ is a subtype of σ (or that τ and σ are the same type).
The {\mathrel{≤:}} relation is reflexive and transitive. When two or more types are all subtypes of each other, they form an equivalence class. They are considered different notations for the same type, and we note ⊢ τ \mathrel{=:} σ, whereas ⊢ τ \mathrel{\neq:} σ indicates that τ and σ are not mutually subtypes of each other (but one can be a strict subtype of the other).
The ⊥ type is a shorthand for the empty union (\cup). It is a subtype of every other type, and is not inhabited by any value. {\rm S\text{-}B{\small O}{\small T}\text{-}S{\small U}{\small B}} can be derived from {\rm S\text{-}B{\small O}{\small T}} and {\rm S\text{-}U{\small N}{\small I}{\small O}{\small N}S{\small U}{\small B}}, by constructing an empty union.
The singleton types (\textbf{Num}\ n) and (\textbf{Symbol}\ s) which are only inhabited by their literal counterpart are subtypes of the more general \mathbf{Number} or \mathbf{Symbol} types, respectively.
The following subtyping rules are concerned with function types and polymorphic types:
[{\rm S\text{-}P{\small O}{\small L}{\small Y}D\text{-}{\small Α}\text{-}E{\small Q}{\small U}{\small I}{\small V}} should use the substitution for a polydot (subst-dots?), not the usual subst.]Todo
[check the {\rm S\text{-}DF{\small U}{\small N}\text{-}F{\small U}{\small N}{\small *}} rule.]Todo
The following rules are concerned with recursive types built with the Rec combinator. The {\rm S\text{-}R{\small E}{\small C}W{\small R}{\small A}{\small P}} rule allows considering \mathbf{Number} a subtype of (\textbf{Rec}\ r\ \mathbf{Number}) for example (i.e. applying the recursive type combinator to a type which does not refer to r is a no-op), but it also allows deriving ⊢ (\textbf{Rec}\ r\ (\textbf{$\cup$}\ \ifmathjax{\hspace{-0.2ex}\unicode{x276C}\hspace{-0.2ex}}\iflatex{❬}τ{\mathbf{,}} r\ifmathjax{\hspace{-0.2ex}\unicode{x276D}\hspace{-0.2ex}}\iflatex{❭}\ \textbf{Null})) \mathrel{≤:} (\textbf{$\cup$}\ \ifmathjax{\hspace{-0.2ex}\unicode{x276C}\hspace{-0.2ex}}\iflatex{❬}τ{\mathbf{,}} ⊤\ifmathjax{\hspace{-0.2ex}\unicode{x276D}\hspace{-0.2ex}}\iflatex{❭}\ \textbf{Null}). The {\rm {\small }S\text{-}R{\small E}{\small C}E{\small L}{\small I}{\small M}} rule has the opposite effect, and is mainly useful to “upcast” members of an union containing r. It allows the deriving ⊢ \textbf{Null} \mathrel{≤:} (\textbf{Rec}\ r\ (\textbf{$\cup$}\ \ifmathjax{\hspace{-0.2ex}\unicode{x276C}\hspace{-0.2ex}}\iflatex{❬}τ{\mathbf{,}} r\ifmathjax{\hspace{-0.2ex}\unicode{x276D}\hspace{-0.2ex}}\iflatex{❭}\ \textbf{Null})). The rules {\rm {\small }S\text{-}R{\small E}{\small C}S{\small T}{\small E}{\small P}} and {\rm S\text{-}R{\small E}{\small C}U{\small N}S{\small T}{\small E}{\small P}} allow unraveling a single step of the recursion, or assimilating an such an unraveled step as part of the recursive type.
[TODO: renamings]Todo
The rules below describe how union and intersection types compare.
Finally, promises are handled by comparing the type that they produce when forced, and pairs are compared pointwise. Dotted lists types, which usually represent the type of the value assigned to a variadic polymorphic function’s “rest” argument
4.2.14 Operational semantics
The semantics for the simplest expressions and for primitive functions are expressed using δ-rules.
The {\rm E\text{-}C{\small O}{\small N}{\small T}{\small E}{\small X}{\small T}} rule indicates that when the expression has the shape E[L], the subpart L can be evaluated and replaced by its result. The syntax E[L] indicates the replacement of the only occurrence of [\cdot{}] within an evaluation context E. The evaluation context can then match an expression with the same shape, thereby separating the L part from its context.
The next rules handle β-reduction for the various flavours of functions.
Instantiation of polymorphic abstractions is a no-op at run-time, because Typed Racket performs type erasure (no typing information subsists at run-time, aside from the implicit tags used to distinguish the various primitive data types: pairs, numbers, symbols, \textbf{$\mathord{{\bfit \text{null}}}$}, \textbf{$\mathord{{\bfit \text{true}}}$}, \textbf{$\mathord{{\bfit \text{false}}}$}, functions and promises).
For simplicity, we assume that promises only contain pure expressions, and therefore that the expression always produces the same value (modulo object identity, i.e. pointer equality issues). In practice, Typed Racket allows expressions relying on external state, and caches the value obtained after forcing the promise for the first time. The subset of the language which we present here does not contain any mutable value, and does not allow mutation of variables either, so the expression wrapped by promises is, by definition, pure. We note here that we implemented a small library for Typed Racket which allows the creation of promises encapsulating a pure expression, and whose result is not cached.
Once the evaluation context rule has been applied to evaluate the condition of an \textbf{if} expression, the evaluation continues with the then branch or with the else branch, depending on the condition’s value. Following Lisp tradition, all values other than \textbf{$\mathord{{\bfit \text{false}}}$} are interpreted as a true condition.
The \textbf{gensym} expression produces a fresh symbol s{}’ \in{} \mathcal{S}*, which is guaranteed to be different from all symbol literals s \in{} \mathcal{S}, and different from all previous and future symbols returned by \textbf{gensym}. The \textbf{eq?} operator can then be used to compare symbol literals and symbols produced by \textbf{gensym}.
The semantics of \textbf{map} are standard. We note here that \textbf{map} can also be used as a first-class function in Typed Racket, and the same can be achieved with the simplified semantics using the η-expansion \mathbf{\Lambda{}}(α\ β).{\bfit λ}(x_f:(α\ {\mathbf{\rightarrow{}}}\ {\mathbf{\ifmathjax{\unicode{x2772}}\iflatex{❲}}}β \;{\mathbf{;}}\; ϵ {\mathbf{{/}}} ϵ \;{\mathbf{;}}\; \emptyset{}{\mathbf{\ifmathjax{\unicode{x2773}}\iflatex{❳}}})x_l:(\textbf{Listof}\ α)).(\textbf{map}\ x_f\ x_l) of the \textbf{map} operator.
4.2.15 Type validity rules
Polymorphic type variables valid types if they are bound, that is if they are present in the Δ environment. Additionally variadic (i.e. dotted) polymorphic type variables may be present in the environment. When this is the case, they can be used as part of a (\textbf{List}\ τ \mathbf{\ldots{}}_{α}) type.
The following rules indicate that function types are valid if their use of polymorphic type variables is well-scoped.
The following rule indicates that types built using the recursive type combinator \textbf{Rec} are valid if their use of the recursive type variable r is well-scoped.
The next rules are trivial, and state that the base types are valid, or simply examine validity pointwise for unions, intersections, pairs, promises and filters.
4.2.16 Typing rules
The rules below relate the simple expressions to their type.
Below are the rules for the various flavours of lambda functions and polymorphic abstractions.
[Should the φ⁺ φ⁻ o be preserved in {\rm T\text{-}TA{\small B}{\small S}} and {\rm T\text{-}DTA{\small B}{\small S}{\small ?}}]Todo
The \vphantom{\phi}{|}_{{x}↦{z}} operation restricts the information contained within a \phi or o so that the result only contains information about the variable x, and renames it to z. When applied to a filter \phi, it corresponds to the \operatorname{abo} and \operatorname{ apo} operators from (Tobin-Hochstadt 2010), pp. 65,75.
The ⊥ cases of the \operatorname{apo} operator from (Tobin-Hochstadt 2010), pp. 65,75 are covered by the corresponding cases in the \operatorname{restrict} and \operatorname{remove} operators defined below, and therefore should not need to be included in our \vphantom{\phi}{|}_{{x}↦{z}} operator. The subst
The map function can be called like any other function, but also has a specific rule allowing a more precise result type when mapping a polymorphic function over a (\textbf{List}\ τ \mathbf{\ldots{}}_{α}). ormap, andmap and apply similarly have special rules for variadic polymorphic lists. We include the rule for map below as an example. The other rules are present in (Tobin-Hochstadt 2010), pp. 96.
Below are the typing rules for the various flavours of function application and instantiation of polymorphic abstractions.
[For the inst rules, are the φ⁺ φ⁻ o preserved?]Todo
The rule for \textbf{if} uses the information gained in the condition to narrow the type of variables in the then and else branches. This is the core rule of the occurrence typing aspect of Typed Racket.
The Γ + \phi operator narrows the type of variables in the environment using the information contained within \phi.
The \operatorname{update} operator propagates the information contained within a ψ down to the affected part of the type.
The \operatorname{update} operator can then apply \operatorname{restrict} to perform the intersection of the type indicated by the \textbf{if}’s condition and the currently-known type for the subpart of the considered variable.
[How do \operatorname{restrict} and \operatorname{remove} behave on intersections?]Todo
The \operatorname{update} operator can also apply the \operatorname{remove} operator when the condition determined that the subpart of the considered variable is not of a given type.
[Δ is not available here.]Todo [The non-nested use of σ is not quite correct syntactically speaking]Todo
The \operatorname{restrict} operator and the \operatorname{simplify} operator described later both rely on \operatorname{no-overlap} to determine whether two types have no intersection, aside from the ⊥ type.
The \operatorname{combinefilter} operator is used to combine the \phi information from the two branches of the \textbf{if} expression, based on the \phi information of the condition. This allows Typed Racket to correctly interpret and, or as well as other boolean operations, which are implemented as macros translating down to nested if conditionals. Typed Racket will therefore be able to determine that in the then branch of (if (and (string? x) (number? y)) 'then 'else), the type of x is String, and the type of y is Number. We only detail a few cases of combinefilter here, a more complete description of the operator can be found in (Tobin-Hochstadt 2010), pp. 69,75–84.
4.2.17 Simplification of intersections
In some cases, intersections are simplified, and the eventual resulting ⊥ types propagate outwards through pairs (and structs, which we do not model here). The \operatorname{simplify} and \operatorname{propagate_⊥} operators show how these simplification and propagation steps are performed. The simplification step mostly consists in distributing intersections over unions and pairs, and collapsing pairs and unions which contain ⊥, for the traversed parts of the type.
\operatorname{simplify}(τ) is applied pointwise in other cases:
[Apply the intersections on substituted poly types after an inst (or rely on the sutyping rule for intersections to recognise that ⊥ is a subtype of the resulting type?)]Todo.
4.2.18 δ-rules
Finally, the type and semantics of primitive functions are expressed using the δ-rules given below.
4.2.19 Soundness
Since Typed Racket is an existing language which we use for our implementation, and not a new language, we do not provide here a full proof of correctness.
We invite instead the interested reader to refer to the proof sketches given in (Tobin-Hochstadt 2010), pp. 68–84. These proof sketches only cover a subset of the language presented here, namely a language with variables, function applictation, functions of a single argument, pairs, booleans, numbers and \textbf{if} conditionals with support for occurrence typing. Since occurrence typing is the main focus of the proof, the other extensions aggregated here should not significantly threaten its validity.
5 Extensible type system and algebraic datatypes
\def\ifmathjax#1{#1}\def\iflatex#1{}
5.1 Extensible type systems via type-level macros
\def\ifmathjax#1{#1}\def\iflatex#1{}
5.2 Extension of Typed Racket with algebraic datatypes and row polymorphism
\def\ifmathjax#1{#1}\def\iflatex#1{}
We extend the formalisation from (Tobin-Hochstadt 2010), pp. 62, 72 and 92.
5.2.1 Notations
We use the same notations as in Formal semantics for part of Typed Racket’s type system. Additionally, we use ρ_{c} to denote a row type variable abstracting over a set of constructors, and we use \varrho{}_{f} to denote a row type variable abstracting over a set of fields. The occurrences of c and f in this context are purely syntactical, and only serve the purpose of distinguishing between the two notations — the one for constructors, and the one for fields.
We define the universe of constructor names 𝒞 as being equivalent to the set of strings of unicode characters, and the universe of field names ℱ likewise (the distinction resides only in their intended use). Constructor and field names are compile-time constants, i.e. they are written literally in the program source.
5.2.2 Types (with ρ)
\def\ifmathjax#1{#1}\def\iflatex#1{}
We introduce two new sorts of types: constructors and records. Constructors are similar to a Typed Racket pair whose left element is a symbol, used as a tag. A constructor’s tag does not need to be declared beforehand, and can be used on-the-fly. This makes these constructors very similar to the constructors used in CAML’s polymorphic variants (Minsky et al. 2013a), chapter 6. Records are similar to the structs available in Typed Racket, but the accessor for a field is not prefixed by the name of the struct introducing that field. This means that the same accessor can be used to access the field in all records which contain that field, whereas a different accessor would be needed for each struct type in Typed Racket. We also introduce row-polymorphic abstractions, which allow a single row type variable to range over several constructors or fields.
We further define variants as a subset of the unions allowed by Typed Racket (including unions of the constructors defined above). Variants are equivalent to the union of their cases, but guarantee that pattern matching can always be performed (for example, it is not possible in Typed Racket to distinguish the values of two function types present in the same union, and it is therefore impossible to write a pattern matching expression which handles the two cases differently). Additionally, constraints on the absence of some constructors in the row type can be specified on variants.
Similarly, records can contain a set of fields. It is also possible to use a row type variable ranging over constructors. The syntax -a indicates a field which could be present in the row and but will not be present in the record type, and the syntax +a:τ indicates a field which will always be present, as well as its type.
5.2.3 Expressions (with ρ)
\def\ifmathjax#1{#1}\def\iflatex#1{}
We extend the syntax of expressions in typed racket as defined by (Tobin-Hochstadt 2010), pp. 62, 72 and 92 and presented in Formal semantics for part of Typed Racket’s type system by adding expressions related to constructors. The first syntax builds an instance of the constructor with label \kappa and the value of e. The expression (\kappa\textbf{?} e) determines whether e is an instance of the constructor with label \kappa. The expression (\mathbf{getval}_\kappa e) extracts the value stored in an instance of a constructor.
We also introduce expressions related to records. The first builds an instance of a record with the given fields. We note that the order in which the fields appear affects the order in which the sub-expressions will be evaluated. However, in the resulting value and in its type, the order of the fields is irrelevant. The second expression tests whether e is an instance of a record with the given fields present. The third expression is similar, but allows any number of extra fields to be present, while checking that the -a{}_j fields are absent. The fourth expression accesses the value of the a field stored in the given instance. The fifth expression updates an existing record instance by adding (or replacing) the field a, while the sixth removes the a field.
Finally, we define the row-polymorphic abstractions (\textbf{$\Lambda{}$${}_{\textbf{c}}$}\ (\overrightarrow{ρ_{c}})\ e) and (\textbf{$\Lambda{}$${}_{\textbf{f}}$}\ (\overrightarrow{\varrho{}_{f}})\ e) which bind row type variables hiding constructors and fields respectively. The corresponding instantiation operators are (\textbf{@${}_{\textbf{c}}$}\ e\ \overrightarrow{\varsigma_{c}}) and (\textbf{@${}_{\textbf{f}}$}\ e\ \overrightarrow{\varsigma_{f}}).
5.2.4 Values (with ρ)
\def\ifmathjax#1{#1}\def\iflatex#1{}
5.2.5 Evaluation contexts (with ρ)
\def\ifmathjax#1{#1}\def\iflatex#1{}
5.2.6 Type validity rules (with ρ)
\def\ifmathjax#1{#1}\def\iflatex#1{}
where
5.2.7 Subtyping (with ρ)
\def\ifmathjax#1{#1}\def\iflatex#1{}
Variants which do not involve a row type are nothing more than a syntactic restriction on a union of types. The variant only allows constructors to be present in the union, as specified by the type validity rule {\rm TE\text{-}V{\small A}{\small R}{\small I}{\small A}{\small N}{\small T}}.
From {\rm S\text{-}V{\small A}{\small R}{\small I}{\small A}{\small N}{\small T}\text{-}U{\small N}{\small I}{\small O}{\small N}}, we can derive that the empty variant is equivalent to the bottom type.
[propagate our types up/down unions like the primitive ones (I think Typed Racket does not do this everywhere it “should”).]Todo
5.2.8 Path elements (with ρ)
\def\ifmathjax#1{#1}\def\iflatex#1{}
[Does this need any change when adding row typing?]Todo
We extend the metafunctions for paths given in (Tobin-Hochstadt 2010), pp. 65 and 75. The \operatorname{update} metafunction is used when using filters to restrict the type of a (subpart of a) local variable in the then and else branches of a conditional.
[∷ is not defined in (Tobin-Hochstadt 2010), we have to define it.]Todo
[How should I note cleanly these removals / replacements which refer to an a and its τ or v inside the set of a{}_i?]Todo
[Also write down the simple "update" cases for row polymorphism]Todo
where
5.2.9 Typing rules (with ρ)
\def\ifmathjax#1{#1}\def\iflatex#1{}
[Should the filter be something else than ϵ|ϵ or is the filter inferred via other rules when the “function” does not do anything special?]Todo
\operatorname{applyfilter} is defined in (Tobin-Hochstadt 2010), p. 75.
[Copy the definition of applyfilter.]Todo
TODO: removing fields on the ρ should not matter if the fields are present in the main part of the record (as they are implicitly not in the ρ, because they are in the main part).
5.2.10 Operational Semantics (with ρ)
\def\ifmathjax#1{#1}\def\iflatex#1{}
Instantiation of the new sorts of polymorphic abstractions is a no-op at run-time, similarly to those of Typed Racket.
[Does this need any change when adding row typing (they don’t add any rules in (Tobin-Hochstadt 2010))?]Todo
We extend the δ relation to accept in its first position not only constant functions (members of c), but also members of families of operators indexed by a constructor label or a field label, like \kappa\textbf{?}, \mathbf{getval}_\kappa and (\textbf{record?}\ \overrightarrow{a{}_i})
[Is it really necessary to use a δ-rule for E-Ctor-GetVal ?]Todo
[This ∖ does not make sense because we remove the label a’ from a set of label+value tuples. We must define a separate mathematical operator for removal of a label+value tuple from a set based on the label.]Todo
[what to do with the = sign? The a = v sign is syntactical, but could easily be understood as a meta comparison, instead of indicating the association between the field and the value.]Todo
5.2.11 Shorthands (with ρ)
\def\ifmathjax#1{#1}\def\iflatex#1{}
The polymorphic builder function for the \kappa constructor which intuitively corresponds to (\textbf{ctor}\ \kappa) can be written as the η-expansion of the (\textbf{ctor}\ \kappa\ e) operator:
The same applies to the predicate form of constructors:
The same applies to the accessor for a constructor’s encapsulated value:
As a convenience, we will write (\textbf{ctor}\ \kappa), \kappa\textbf{?} and \mathbf{getval}_\kappa as a shorthand for the above lambda functions.
As per the typing rules given in Typing rules (with ρ), these functions have the following types:
[Write their types here too.]Todo
The polymorphic builder function for a record which intuitively corresponds to (\textbf{record}\ \overrightarrow{a}) can be written as the η-expansion of the (\textbf{record}\ \overrightarrow{a = e}) operator:
The same applies to the predicate forms of record types:
The same applies to the accessor for a field of a record:
[Write their types here too.]Todo
As a convenience, we will write (\textbf{record}\ \overrightarrow{a}), (\textbf{record?}\ \def\overrightbracedarrow#1{\overset{\scriptscriptstyle{\raise1mu{\{}}}{\vphantom{#1}}\overrightarrow{#1}\overset{\scriptscriptstyle{\raise1mu{\}}}}{\vphantom{#1}}}\overrightbracedarrow{a{}_i}), (\textbf{record?}\ \def\overrightbracedarrow#1{\overset{\scriptscriptstyle{\raise1mu{\{}}}{\vphantom{#1}}\overrightarrow{#1}\overset{\scriptscriptstyle{\raise1mu{\}}}}{\vphantom{#1}}}\overrightbracedarrow{a{}_i}\ \def\overrightbracedarrow#1{\overset{\scriptscriptstyle{\raise1mu{\{}}}{\vphantom{#1}}\overrightarrow{#1}\overset{\scriptscriptstyle{\raise1mu{\}}}}{\vphantom{#1}}}\overrightbracedarrow{-a{}_j}) and .a as shorthands for the above lambda functions.
6 Typed nanopass
6.1 Typed nanopass on trees
6.2 Typed nanopass on DAGs
6.3 Typed nanopass on graphs
6.4 Structural invariants
6.5 Further possible extensions
7 Examples and results
8 Conclusion and future work directions
9 Bibliography
Harold Abelson, R Kent Dybvig, Christopher T Haynes, Guillermo Juan Rozas, NI Adams, Daniel P Friedman, E Kohlbecker, GL Steele, David H Bartley, Robert Halstead, and others. Revised 5 report on the algorithmic language Scheme. Higher-order and symbolic computation 11(1), pp. 7–105, 1998. |
Michael D Adams and R Kent Dybvig. Efficient nondestructive equality checking for trees and graphs. In Proc. ACM Sigplan Notices, 2008. |
Chris Andreae, James Noble, Shane Markstrum, and Todd Millstein. A framework for implementing pluggable type systems. In Proc. ACM SIGPLAN Notices, 2006. |
Daniel G. Bobrow, Linda G. DeMichiel, Richard P. Gabriel, Sonya E. Keene, Gregor Kiczales, and David A. Moon. Common lisp object system specification. ACM Sigplan Notices 23(SI), pp. 1–142, 1988. http://dl.acm.org/citation.cfm?id=885632 |
Ambrose Bonnaire-Sergeant, Rowan Davies, and Sam Tobin-Hochstadt. Practical optional types for Clojure. In Proc. European Symposiumon ProgrammingLanguagesand Systems, 2016. http://link.springer.com/chapter/10.1007/978-3-662-49498-1_4 |
Gilad Bracha. Pluggable Type Systems. In Proc. OOPSLA workshop on revival of dynamic languages, 2004. |
Luca Cardelli and Peter Wegner. On understanding types, data abstraction, and polymorphism. ACM Computing Surveys (CSUR) 17(4), pp. 471–523, 1985. http://lucacardelli.name/papers/onunderstanding.a4.pdf |
Stephen Chang, Alex Knauth, and Ben Greenman. Type systems as macros. In Proc. Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, 2017. |
Roland Ducournau. Programmation par Objets: les concepts fondamentaux. Université de Montpellier, 2014. |
Matthias Felleisen, Robert Bruce Findler, Matthew Flatt, Shriram Krishnamurthi, Eli Barzilay, Jay McCarthy, and Sam Tobin-Hochstadt. The racket manifesto. In Proc. LIPIcs-Leibniz International Proceedings in Informatics, 2015. |
Matthew Flatt. Binding as sets of scopes. ACM SIGPLAN Notices 51(1), pp. 705–717, 2016. |
Matthew Flatt, Robert Bruce Findler, and Matthias Felleisen. Scheme with classes, mixins, and traits. In Proc. Asian Symposiumon ProgrammingLanguagesand Systems, 2006. http://link.springer.com/chapter/10.1007/11924661_17 |
Steven E Ganz, Amr Sabry, and Walid Taha. Macros as multi-stage computations: type-safe, generative, binding macros in MacroML. ACM SIGPLAN Notices 36(10), pp. 74–85, 2001. |
Roger Hindley. The principal type-scheme of an object in combinatory logic. Transactions of the american mathematical society 146, pp. 29–60, 1969. |
Gérard Huet. The Zipper. Journal of Functional Programming 7(5), pp. 549–554, 1997. |
Andrew M Kent, David Kempe, and Sam Tobin-Hochstadt. Occurrence typing modulo theories. In Proc. ACM SIGPLAN Notices, 2016. |
Alexander Köplinger, Jean-Baptiste Evain, and others. Mono.Cecil documentation. 2014. |
Robin Milner. A theory of type polymorphism in programming. Journal of computer and system sciences 17(3), pp. 348–375, 1978. |
Yaron Minsky, Anil Madhavapeddy, and Jason Hickey. Real World OCaml: Functional programming for the masses. 2013a. |
Yaron Minsky, Anil Madhavapeddy, and Jason Hickey. Real World OCaml: functional programming for the masses. 2013b. |
James H Morris. Real programming in functional languages. Functional Programming and Its Applications: An Advanced Course, pp. 129–176, 1982. |
Henrik Nilsson and Peter Fritzson. Lazy algorithmic debugging: Ideas for practical implementation. In Proc. International Workshop on Automated and Algorithmic Debugging, 1993. |
Jeffrey Overbey. Immutable Source-Mapped Abstract Syntax Tree: A Design Pattern for Refactoring Engine APIs. 2013. |
Zachary Palmer, Pottayil Harisanker Menon, Alexander Rozenshteyn, and Scott Smith. Types for FlexibleObjects. In Proc. Programming Languagesand Systems, 2014. http://link.springer.com/chapter/10.1007/978-3-319-12736-1_6 |
Norman Ramsey and João Dias. An ApplicativeControl-FlowGraphBasedon Huet’s Zipper. Electronic Notes in Theoretical Computer Science 148(2), pp. 105–126, 2006. http://www.sciencedirect.com/science/article/pii/S1571066106001289 |
D. W. Sandberg. Smalltalk and Exploratory Programming. SIGPLAN Notices 23(10), pp. 85–92, 1988. |
Alex Shinn, JOHN Cowan, Arthur A Gleckler, and others. Revised 7 report on the algorithmic language scheme. , 2013. |
Michael Sperber, R Kent Dybvig, Matthew Flatt, Anton Van Straaten, Robby Findler, and Jacob Matthews. Revised 6 report on the algorithmic language Scheme. Journal of Functional Programming 19(S1), pp. 1–301, 2009. |
Sam Tobin-Hochstadt. Typed scheme: Fromscripts to programs. 2010. |
Sam Tobin-Hochstadt and Matthias Felleisen. The design and implementation of typed scheme. ACM SIGPLAN Notices 43(1), pp. 395–406, 2008. http://dl.acm.org/citation.cfm?id=1328486 |
Sam Tobin-Hochstadt, Vincent St-Amour, Ryan Culpepper, Matthew Flatt, and Matthias Felleisen. Languages as libraries. In Proc. ACMSIGPLANNotices, 2011. http://dl.acm.org/citation.cfm?id=1993514 |
Emina Torlak and Rastislav Bodik. Growing solver-aided languages with Rosette. In Proc. Proceedings of the 2013 ACMinternational symposium on Newideas, new paradigms, and reflections on programming & software, 2013. http://dl.acm.org/citation.cfm?id=2509586 |
Emina Torlak and Rastislav Bodik. A Lightweight Symbolic Virtual Machine for Solver-aided Host Languages. In Proc. Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2014. |
Philip Wadler. Functional Programming. SIGPLAN Notices 33(8), pp. 23–27, 1998. |
A Ph.C Graph library: Implementation
This library is implemented using literate programming. The implementation details are presented in the following sections. The user documentation is in the Ph.C Graph library document.
A.1 Parametric replacement of parts of data structures
A.1.1 Introduction
This utility allows functionally updating parts of data structures. The define-fold macro takes the type of the whole data structure and a list of type names associated with their predicate. It locates all literal occurrences of those type names within the data structure, and identifies those locations as the parts to replace. The type of the whole data structure is expressed as a syntactic tree. Within that syntactic tree, only the parts which are syntactically equal to one of the types to replace are considered.
As an example, suppose the whole type is (List Foo Number (Listof String)), and Foo is defined as:
(define-type Foo (Listof String))
If Foo is given as a type to replace, and its replacement type is (Listof Symbol), then the type of the result would be:
The second occurrence of (Listof String), although semantically equivalent to the type to replace, Foo, will not be altered, as it is not expressed syntactically using the Foo identifier.
(type-name type-to-replaceᵢ ...)
is the same type as
whole-type
In other words, type-name is defined as whole-type, except that each syntactic occurrence of a type-to-replaceᵢ is replaced with the corresponding type argument Tᵢ.
It also defines function-name as a function, with the type
(∀ (Aᵢ ... Bᵢ ... Acc) (→ (?@ (→ Any Boolean : Aᵢ) (→ Aᵢ Acc (Values Bᵢ Acc))) ... (→ (type-name Aᵢ ...) Acc (Values (type-name Bᵢ ...) Acc))))
We use the ?@ notation from syntax/parse/experimental/template to indicate that the function accepts a predicate, followed by an update function, followed by another predicate, and so on. For example, the function type when there are three type-to-replaceᵢ would be:
(∀ (A₁ A₂ A₃ B₁ B₂ B₃ Acc) (→ (→ Any Boolean : A₁) (→ A₁ Acc (Values B₁ Acc)) (→ Any Boolean : A₂) (→ A₂ Acc (Values B₂ Acc)) (→ Any Boolean : A₃) (→ A₃ Acc (Values B₃ Acc)) (→ (type-name A₁ A₂ A₃) Acc (Values (type-name B₁ B₂ B₃) Acc))))
The function-name replaces all values in the whole data structure which are present in locations corresponding to a type-to-replaceᵢ in the whole-type. It expects those values to have the type Aᵢ, i.e. its input type is not restricted to whole-type, any polymorphic instance of type-name is valid. Each value is passed as an argument to the corresponding update function with type (→ Aᵢ Acc (Values Bᵢ Acc)), and the result of type Bᵢ is used as a replacement.
An accumulator value, with the type Acc, is threaded through all calls to all update functions, so that the update functions can communicate state in a functional way.
A.1.2 Implementation
A.1.2.1 Caching the results of define-fold
(define-for-syntax get-f-cache (make-parameter #f)) (define-for-syntax get-τ-cache (make-parameter #f)) (define-for-syntax get-f-defs (make-parameter #f)) (define-for-syntax get-τ-defs (make-parameter #f)) (define-for-syntax (with-folds thunk) ;; TODO: should probably use bound-id instead. (parameterize ([get-f-cache (make-mutable-free-id-tree-table)] [get-τ-cache (make-mutable-free-id-tree-table)] [get-f-defs (box '())] [get-τ-defs (box '())]) (define/with-syntax thunk-result (thunk)) (with-syntax ([([f-id f-body f-type] …) (unbox (get-f-defs))] [([τ-id . τ-body] …) (unbox (get-τ-defs))]) #`(begin (define-type τ-id τ-body) … (: f-id f-type) … (define f-id f-body) … thunk-result))))
(define-template-metafunction (!replace-in-type stx) (syntax-case stx () [(_ whole-type [type-to-replaceᵢ Tᵢ] …) #`(#,(syntax-local-template-metafunction-introduce (fold-τ #'(whole-type type-to-replaceᵢ …))) Tᵢ …)]))
(define-template-metafunction (!∀-replace-in-type stx) (syntax-case stx () [(_ whole-type type-to-replaceᵢ …) (syntax-local-template-metafunction-introduce (fold-τ #'(whole-type type-to-replaceᵢ …)))]))
(define fold-τ (syntax-parser [(whole-type:type type-to-replaceᵢ:type …) #:with rec-args #'([type-to-replaceᵢ Tᵢ] …) (cached [τ- (get-τ-cache) (get-τ-defs) #'(whole-type type-to-replaceᵢ …)] (define replacements (make-immutable-free-id-tree-table (list [cons #'type-to-replaceᵢ #'Tᵢ] …))) #`(∀ (Tᵢ …) #,(syntax-parse #'whole-type #:literals (Null Pairof Listof List Vectorof Vector U tagged) «type-cases»)))]))
(begin-for-syntax (define-syntax-rule (cached [base cache defs key] . body) (begin (unless (and cache defs) (error "fold-τ and fold-f must be called within with-folds")) (if (dict-has-key? cache key) (dict-ref cache key) (let ([base #`#,(gensym 'base)]) (dict-set! cache key base) (let ([result (let () . body)]) (set-box! defs `([,base . ,result] . ,(unbox defs))) base))))))
(define-template-metafunction (!replace-in-instance stx) (syntax-case stx () [(_ whole-type [type-to-replaceᵢ predicateᵢ updateᵢ] …) #`(#,(syntax-local-template-metafunction-introduce (fold-f #'(whole-type type-to-replaceᵢ …))) {?@ predicateᵢ updateᵢ} …)]))
(define-template-metafunction (!λ-replace-in-instance stx) (syntax-case stx () [(_ whole-type type-to-replaceᵢ …) (syntax-local-introduce (fold-f #'(whole-type type-to-replaceᵢ …)))]))
(define fold-f (syntax-parser [(whole-type:type type-to-replaceᵢ:type …) #:with rec-args #'([type-to-replaceᵢ predicateᵢ updateᵢ] …) (define replacements (make-immutable-free-id-tree-table (list [cons #'type-to-replaceᵢ #'updateᵢ] …))) (define/with-syntax args #'({?@ predicateᵢ updateᵢ} …)) (cached [f- (get-f-cache) (get-f-defs) #'(whole-type type-to-replaceᵢ …)] #`[«fold-f-proc» «fold-f-type»])]))
(λ ({?@ predicateᵢ updateᵢ} …) (λ (v acc) #,(syntax-parse #'whole-type #:literals (Null Pairof Listof List Vectorof Vector U tagged) «f-cases»)))
(∀ (Aᵢ … Bᵢ … Acc) (→ (?@ (→ Any Boolean : Aᵢ) (→ Aᵢ Acc (Values Bᵢ Acc))) … (→ (!replace-in-type whole-type [type-to-replaceᵢ Aᵢ] …) Acc (Values (!replace-in-type whole-type [type-to-replaceᵢ Bᵢ] …) Acc))))
[t #:when (dict-has-key? replacements #'t) #:with update (dict-ref replacements #'t) #'(update v acc)]
[t #:when (dict-has-key? replacements #'t) #:with T (dict-ref replacements #'t) #'T]
[(Pairof X Y) #'(let*-values ([(result-x acc-x) ((!replace-in-instance X . rec-args) (car v) acc)] [(result-y acc-y) ((!replace-in-instance Y . rec-args) (cdr v) acc-x)]) (values (cons result-x result-y) acc-y))]
[(Listof X) #'(foldl-map (!replace-in-instance X . rec-args) acc v)]
[(Vectorof X) #'(vector->immutable-vector (list->vector (foldl-map (!replace-in-instance X . rec-args) acc (vector->list v))))]
[(List X Y …) #'(let*-values ([(result-x acc-x) «f-list-car»] [(result-y* acc-y*) «f-list-cdr»]) (values (cons result-x result-y*) acc-y*))]
where the replacement is applied to the car, and to the cdr as a whole (i.e. by recursion on the whole remaining list of types):
((!replace-in-instance X . rec-args) (car v) acc)
[(U Xⱼ …) #'(dispatch-union v ([type-to-replaceᵢ Aᵢ predicateᵢ] …) [Xⱼ ((!replace-in-instance Xⱼ . rec-args) v acc)] …)]
[(tagged name [fieldⱼ (~optional :colon) Xⱼ] …) #'(tagged name [fieldⱼ : (!replace-in-type Xⱼ . rec-args)] …)]
[(tagged name [fieldⱼ (~optional :colon) Xⱼ] …) #'(let*-values ([(resultⱼ acc) ((!replace-in-instance Xⱼ . rec-args) (uniform-get v fieldⱼ) acc)] …) (values (tagged name #:instance [fieldⱼ resultⱼ] …) acc))]
[else-T #'else-T]
[else-T #'(values v acc)]
where foldl-map is defined as:
(: foldl-map (∀ (A B Acc) (→ (→ A Acc (Values B Acc)) Acc (Listof A) (Values (Listof B) Acc)))) (define (foldl-map f acc l) (if (null? l) (values l acc) (let*-values ([(v a) (f (car l) acc)] [(ll aa) (foldl-map f a (cdr l))]) (values (cons v ll) aa))))
A.1.3 Putting it all together
(require racket/require phc-toolkit type-expander phc-adt "dispatch-union.rkt" (for-syntax (subtract-in racket/base subtemplate/override) subtemplate/override phc-toolkit/untyped type-expander/expander "free-identifier-tree-equal.rkt" racket/dict) (for-meta 2 racket/base) (for-meta 2 phc-toolkit/untyped) (for-meta 2 syntax/parse)) (provide (for-syntax with-folds !replace-in-type !∀-replace-in-type !replace-in-instance !λ-replace-in-instance)) «foldl-map» «with-folds» «cached» (begin-for-syntax «api» «fold-τ» «fold-f»)
A.2 Flexible functional modification and extension of records
A.2.1 Goals
Our goal here is to have strongly typed records, with row polymorphism (a Rest row type variable can range over multiple possibly-present fields), and structural type equivalence (two record types are identical if they have the same fields and the type of the fields is the same in both record types).
A.2.2 Overview
We represent a flexible record as a tree, where the leaves are field values. Every field which occurs anywhere in the program is assigned a constant index. This index determines which leaf is used to store that field’s values. In order to avoid storing in-memory a huge tree for every record, the actual fields are captured by a closure, and the tree is lazily generated (node by node) upon access.
The type for a flexible record can support row polymorphism: the type of fields which may optionally be present are represented by a polymorphic type variable. Note that this means that not only one type variable is used, but severalIn principle, each potentially-present field would need a distinct type variable, but whole potentially-present branches may be collapsed to a single type variable if no access is made to the variables within..
A.2.3 Generating the tree type with polymorphic holes
We define in this section facilities to automatically generate this list of polymorphic type variables. In order to avoid having a huge number of type variables, a branch containing only optional fields can be collapsed into a single type variable. An exception to this rule is when a field needs to be added or modified by the user code: in this case the polymorphic type variable for that field must be preserved, and the branch may not entirely be collapsed.
We take as an example the case where that there are 8 fields (f1, f2, a, b, f5, f6, f7 and u) used in the whole program. Records are therefore represented as a tree of depth 4 (including the root node) with 8 leaves. A record with the fields a and b will be represented as a tree where the branch containing f1 and f2 is collapsed. Furthermore, the right branch of the root, containing f5, f6, f7 and u will also be collapsed.
Figure 1: The tree representing the type for a record with fields a and b is ((C¹. (a . b)) . C²), where Cⁱ indicates that a single polymorphic type is used to represent the type of the whole collapsed branch.
Figure 3: Updating the node u, which appears somewhere in the collapsed branch C² is not possible, because the type abstracted away by C² may be different before and after the update.
Figure 3: We therefore break apart the collapsed branch C². The tree representing the type for a record with fields a and b, and where the field u may be updated or added, is ((C¹. (a . b)). (C²′. (C³ . u))). Note that u may refer to a possibly absent field (in which case the polymorphic type variable will be instantiated with None.
A.2.4 From a selection of field indieces to a tree shape
phase 1 procedure
(idx→tree none-wrapper leaf-wrapper node-wrapper vec) → r
none-wrapper :
(->d [depth exact-nonnegative-integer?] any/c)
leaf-wrapper :
(->d [i-in-vec exact-nonnegative-integer?] [leaf-idx exact-nonnegative-integer?] any/c)
node-wrapper :
(->d [left any/c] [right any/c] any/c) vec : (vectorof exact-nonnegative-integer?)
(define/contract (idx→tree #:none-wrapper none-wrapper #:leaf-wrapper leaf-wrapper #:node-wrapper node-wrapper #:vec vec) «idx→tree/ctc» (define (r #:depth depth #:vec-bounds vec-start vec-after-end #:leaf-bounds first-leaf after-last-leaf) (cond «idx→tree cases»)) r)
;; contract for idx→tree, as specified above
;; contract for idx→tree, as specified above (-> #:none-wrapper (->d ([depth exact-nonnegative-integer?]) [result any/c]) #:leaf-wrapper (->d ([i-in-vec exact-nonnegative-integer?] [leaf-idx exact-nonnegative-integer?]) any) #:node-wrapper (->d ([left any/c] [right any/c]) any) #:vec (vectorof exact-nonnegative-integer?) (-> #:depth exact-nonnegative-integer? #:vec-bounds exact-nonnegative-integer? exact-nonnegative-integer? #:leaf-bounds exact-nonnegative-integer? exact-nonnegative-integer? any/c))
The idx→tree is a two-step curried function, and the first step returns a function with the following signature:
procedure
(r #:depth depth #:vec-bounds vec-start vec-after-end #:leaf-bounds first-leaf after-last-leaf) → any/c depth : exact-nonnegative-integer? vec-start : exact-nonnegative-integer? vec-after-end : exact-nonnegative-integer? first-leaf : exact-nonnegative-integer? after-last-leaf : exact-nonnegative-integer?
The idx→tree function works by performing repeated dichotomy on the vector of present fields vec.
(cond [(= vec-start vec-after-end) (none-wrapper depth)] [(= (+ first-leaf 1) after-last-leaf) (leaf-wrapper vec-start (vector-ref vec vec-start))] [else (let* ([leftmost-right-leaf (/ (+ first-leaf after-last-leaf) 2)] [vec-left-branch-start vec-start] [vec-left-branch-after-end (find-≥ leftmost-right-leaf vec vec-start vec-after-end)] [vec-right-branch-start vec-left-branch-after-end] [vec-right-branch-after-end vec-after-end]) (node-wrapper (r #:depth (sub1 depth) #:vec-bounds vec-left-branch-start vec-left-branch-after-end #:leaf-bounds first-leaf leftmost-right-leaf) (r #:depth (sub1 depth) #:vec-bounds vec-right-branch-start vec-right-branch-after-end #:leaf-bounds leftmost-right-leaf after-last-leaf)))])
The find-≥ function performs the actual dichotomy, and finds the first index within the given bounds for which (vector-ref vec result) is greater than or equal to val (if there is none, then the upper exclusive bound is returned).
(define/contract (find-≥ val vec start end) (->i ([val exact-nonnegative-integer?] [vec vector?] ;; (sorted-vectorof exact-nonnegative-integer? <) [start (vec) (integer-in 0 (sub1 (vector-length vec)))] [end (vec start) (integer-in (add1 start) (vector-length vec))]) #:pre (val vec start) (or (= start 0) (< (vector-ref vec (sub1 start)) val)) #:pre (val vec end) (or (= end (vector-length vec)) (> (vector-ref vec end) val)) [result (start end) (integer-in start end)]) (if (= (- end start) 1) ;; there is only one element (if (>= (vector-ref vec start) val) start end) (let () (define mid (ceiling (/ (+ start end) 2))) (if (>= (vector-ref vec mid) val) (find-≥ val vec start mid) (find-≥ val vec mid end)))))
A.2.5 Empty branches (branches only containing missing fields)
For efficiency and to reduce memory overhead, we pre-define values for branches of depth d ∈ (range 0 n) which only contain missing fields.
;; TODO: clean this up (use subtemplate). (define-syntax defempty (syntax-parser [(_ n:nat) #:with (empty1 emptyA …) (map (λ (depth) (string->symbol (format "empty~a" (expt 2 depth)))) (range (add1 (syntax-e #'n)))) #:with (emptyB … _) #'(empty1 emptyA …) #:with (empty1/τ emptyA/τ …) (stx-map λ.(format-id % "~a/τ" %) #'(empty1 emptyA …)) #:with (emptyB/τ … _) #'(empty1/τ emptyA/τ …) (syntax-local-introduce #'(begin (define-type empty1/τ 'none) (define-type emptyA/τ (Pairof emptyB/τ emptyB/τ)) … (define empty1 : empty1/τ 'none) (define emptyA : emptyA/τ (cons emptyB emptyB)) … (provide empty1 emptyA … empty1/τ emptyA/τ …)))])) (defempty 10)
A.2.6 Creating a record builder given a set of field indices
(define-syntax record-builder (syntax-parser ;; depth ≥ 0. 0 ⇒ a single node, 1 ⇒ 2 nodes, 2 ⇒ 4 nodes and so on. [(_ depth:nat idx:nat …) #:when (not (check-duplicates (syntax->datum #'(idx …)))) (define vec (list->vector (sort (syntax->datum #'(idx …)) <))) (define arg-vec (vector-map (λ (idx) (format-id (syntax-local-introduce #'here) "arg~a" idx)) vec)) (define/with-syntax #(arg …) arg-vec) (define/with-syntax tree ((idx→tree «record-builder/wrappers» #:vec vec) #:depth (syntax-e #'depth) #:vec-bounds 0 (vector-length vec) #:leaf-bounds 0 (expt 2 (syntax-e #'depth)))) (define/with-syntax (arg/τ …) (stx-map (λ (arg) (format-id arg "~a/τ" arg)) #'(arg …))) #'(λ #:∀ (arg/τ …) ([arg : arg/τ] …) tree)]))
(idx→tree #:none-wrapper λdepth.(format-id #'here "empty~a" (expt 2 depth)) #:leaf-wrapper λi.idx.(vector-ref arg-vec i) #:node-wrapper λl.r.(list #'cons l r) #:vec vec)
A.2.7 Row typing
Row type variable identifiers are bound as transformers to instances of the ρ-wrapper struct, so that they are easily recognisable by special forms such as record.
(begin-for-syntax (struct ρ-wrapper (id)))
(begin-for-syntax (struct ρ-wrapper (id) #:property prop:procedure (λ (self stx) (raise-syntax-error (syntax-e (ρ-wrapper-id self)) "invalid use of row type variable" stx))))
The row type variable actually expands to several polymorphic type variables. In order to know which fields are relevant, we remember which fields are used along a given row type variable, while compiling the current file. The set of field names used with a given row types variable is stored as an entry of the ρ-table.
(define-for-syntax ρ-table (make-free-id-table))
A record type which includes a row type variable is expanded to the type of a binary tree. Fields which are used along that row type variable matter, and must be explicitly indicated as optional leaves (so that operations which add or remove fields later in the body of a row-polymorphic function may refer to that leaf). Branches which only contain irrelevant leaves can be collapsed to a single polymorphic type. The core implementation of record types (in which the depth of the tree is explicitly given, and ) follows. Since the code is very similar to the defintion of idx→tree, the unchanged parts are dimmed below.
(define-type-expander record-type (syntax-parser [(_ depth:nat idx:nat …) #:when (not (check-duplicates (syntax->datum #'(idx …)))) (define vec (list->vector (sort (syntax->datum #'(idx …)) <))) (define arg-vec (vector-map (λ (idx) (format-id (syntax-local-introduce #'here) "arg~a" idx)) vec)) (define/with-syntax #(arg …) arg-vec) (define/with-syntax tree ((idx→tree «record-type/wrappers» #:vec vec) #:depth (syntax-e #'depth) #:vec-bounds 0 (vector-length vec) #:leaf-bounds 0 (expt 2 (syntax-e #'depth)))) (define sidekicks-len 0) (define sidekicks '()) #`(∀ (arg … #,@sidekicks) tree)]))
The results of record-type and record-builder differ in two aspects: the result of record-type is a polymorphic type, not a function, and the empty branches are handled differently. When a branch only containing none elements is encountered, it is replaced with a single polymorphic type.
(idx→tree #:none-wrapper λdepth.(begin (define sidekick (format-id #'here "row~a" sidekicks-len)) (set! sidekicks-len (add1 sidekicks-len)) (set! sidekicks (cons sidekick sidekicks)) sidekick) #:leaf-wrapper λi.idx.(vector-ref arg-vec i) #:node-wrapper λl.r.(list #'Pairof l r) #:vec vec)
Since the list of fields may be amended long after the type was initially expanded, we delay the expansion of the type. This is done by initially expanding to a placeholder type name, and later patching the expanded code to replace that placeholder with the actual expanded record type. current-patch-table maps each placeholder to an anonymous function which will produce the syntax for the desired type when called.
(define-for-syntax current-patch-table (make-parameter #f))
Type-level forms like ∀ρ, ρ-inst and record add placeholders to current-patch-table.
The form with-ρ is used to declare row type variables and collect patches in current-patch-table. The explicit declaration of row type variables is necessary because type variables which are logically “bound” with ∀ are not actually bound as identifiers during macro expansion (instead, Typed Racket performs its own resolution while typechecking, i.e. after the program is expanded). If a row type variable is introduced in the type declaration for a function, we need to be able to detect the binding when processing type-level forms within the body of the function.
(define-syntax with-ρ (syntax-parser [(_ (ρ …) . body) #'(splicing-letrec-syntax ([ρ (ρ-wrapper #'ρ)] …) (with-ρ-chain (ρ …) . body))]))
Once the bindings have been introduced via splicing-letrec-syntax, the expansion continues within the context of these identifiers, via the with-ρ-chain macro.
(define-syntax with-ρ-chain (syntax-parser [(_ (ρ …) . body) (parameterize ([current-patch-table (make-free-id-table)]) (define expanded-form (local-expand #'(begin . body) 'top-level '())) (patch expanded-form (current-patch-table)))]))
A.2.7.1 Type of a record, with a multiple fields
The ∀ρ type-level form translates to a ∀ form. The ∀ form is expanded, so that uses of the row type variables within can be detected. The ∀ρ then expands to a placeholder delayed-type, which will be patched by the surrounding with-ρ.
(define-type-expander ∀ρ (syntax-parser [(_ (A:id … #:ρ ρ:id …) τ) (for ([ρ (in-syntax #'(ρ …))]) (free-id-table-set! ρ-table ρ «make-lifo-set»)) (define expanded (expand-type #'(∀ (A …) (Let ([ρ 'NOT-IMPL-YET] …) τ)))) (define/syntax-parse ({~literal tr:∀} (A′ …) . τ′) expanded) (free-id-table-set! (current-patch-table) #'delayed-type (λ (self) (delayed-∀ρ #'{(A′ …) (ρ …) τ′}))) #'delayed-type]))
When the delayed-type is replaced, the type variables associated with each row type variable are injected as extra arguments to the previously-expanded polymorphic type.
(define-for-syntax/case-args (delayed-∀ρ {(A′ …) (ρ …) τ′}) (define/syntax-parse ((ρ′ …) …) (for/list ([ρ (in-syntax #'(ρ …))]) (for/list ([ρ′ (sort («lifo-set→list» (free-id-table-ref ρ-table ρ)) symbol<?)]) ;; TODO: the format-id shouldn't be here, it should be when ;; the id is added to the list. Also #'here is probably the ;; wrong srcloc (format-id #'here "~a.~a" ρ ρ′)))) #'(∀ (A′ … ρ′ … …) . τ′))
«ρ-wrapper» «current-patch-table» «with-ρ-chain» «with-ρ» «ρ-table» «∀ρ» «delayed-∀ρ» (define-type-expander record (syntax-parser [(_ f:id … . {~or (#:ρ ρ:id) ρ:id}) (define set (free-id-table-ref! ρ-table #'ρ (λ () «make-lifo-set»))) (for ([f (in-syntax #'(f …))]) («lifo-set-add!» set (syntax->datum f))) #'(List 'f … 'ρ)])) (define-syntax ρ-inst (syntax-parser [(_ f A:id … #:ρ ρ:id …) #'TODO]))
;; ordered free-identifier-set, last added = first in the order. (cons (box '()) (mutable-set))
(λ (s v) (set-member? (cdr s) v))
(λ (s v) (unless («lifo-member?» s v) (set-box! (car s) (cons v (unbox (car s))))) (set-add! (cdr s) v))
(define-for-syntax (patch e tbl) (define (patch* e) (cond ;[(syntax-case e () ; [(x y) ; (free-id-table-ref tbl x (λ () #f)) ; (values #t ((free-id-table-ref tbl x) e))] ; [_ #f])] [(and (identifier? e) (free-id-table-ref tbl e (λ () #f))) => (λ (f) (values #t (f e)))] [(syntax? e) (let-values ([(a b) (patch* (syntax-e e))]) (if a (values a (datum->syntax e b e e)) (values a e)))] [(pair? e) (let-values ([(a1 b1) (patch* (car e))] [(a2 b2) (patch* (cdr e))]) (if (or a1 a2) (values #t (cons b1 b2)) (values #f e)))] ;; TODO: hash, prefab etc. [else (values #f e)])) (let-values ([(a b) (patch* e)]) b))
(define-for-syntax identifier<? (∘ symbol<? syntax-e)) (define-for-syntax (sort-ids ids) (sort (syntax->list ids) identifier<?)) (define-type-expander ρ/fields (syntax-parser [(_ (important-field …) (all-field …)) #:with (sorted-important-field …) (sort-ids #'(important-field …)) #:with (sorted-all-field …) (sort-ids #'(all-field …)) #'~]))
A.2.8 Type of a record, with a single hole
In order to functionally update records, the updating functions will take a tree where the type of a single leaf needs to be known. This of course means that branches that spawn off on the path from the root have to be given a polymorphic type, so that the result can have the same type for these branches as the original value to update.
(define-for-syntax (tree-type-with-replacement n last τ*) (define-values (next mod) (quotient/remainder n 2)) (cond [(null? τ*) last] [(= mod 0) (tree-type-with-replacement next #`(Pairof #,last #,(car τ*)) (cdr τ*))] [else (tree-type-with-replacement next #`(Pairof #,(car τ*) #,last) (cdr τ*))]))
A.2.9 Functionally updating a tree-record
A.2.9.1 Adding and modifying fields
Since we only deal with functional updates of immutable records, modifying a field does little more than discarding the old value, and injecting the new value instead into the new, updated record.
Adding a new field is done using the same exact operation: missing fields are denoted by a special value, 'NONE, while present fields are represented as instances of the polymorphic struct (Some T). Adding a new field is therefore as simple as discarding the old 'NONE marker, and replacing it with the new value, wrapped with Some. A field update would instead discard the old instance of Some, and replace it with a new one.
(if (= i 1) #'(delay/pure/stateless replacement) (let* ([bits (to-bits i)] [next (from-bits (cons #t (cddr bits)))] [mod (cadr bits)]) (define/with-syntax next-id (vector-ref low-names (sub1 next))) (if mod #`(replace-right (inst next-id #,@τ*-limited+T-next) tree-thunk replacement) #`(replace-left (inst next-id #,@τ*-limited+T-next) tree-thunk replacement))))
(define-pure/stateless (: replace-right (∀ (A B C R) (→ (→ (Promise B) R (Promise C)) (Promise (Pairof A B)) R (Promise (Pairof A C))))) (define #:∀ (A B C R) (replace-right [next-id : (→ (Promise B) R (Promise C))] [tree-thunk : (Promise (Pairof A B))] [replacement : R]) (delay/pure/stateless (let ([tree (force tree-thunk)]) (let ([left-subtree (car tree)] [right-subtree (cdr tree)]) (cons left-subtree (force (next-id (delay/pure/stateless right-subtree) replacement)))))))) (define-pure/stateless (: replace-left (∀ (A B C R) (→ (→ (Promise A) R (Promise C)) (Promise (Pairof A B)) R (Promise (Pairof C B))))) (define #:∀ (A B C R) (replace-left [next-id : (→ (Promise A) R (Promise C))] [tree-thunk : (Promise (Pairof A B))] [replacement : R]) (delay/pure/stateless (let ([tree (force tree-thunk)]) (let ([left-subtree (car tree)] [right-subtree (cdr tree)]) (cons (force (next-id (delay/pure/stateless left-subtree) replacement)) right-subtree)))))) (define-for-syntax (define-replace-in-tree low-names names rm-names τ* i depth) (define/with-syntax name (vector-ref names (sub1 i))) (define/with-syntax rm-name (vector-ref rm-names (sub1 i))) (define/with-syntax low-name (vector-ref low-names (sub1 i))) (define/with-syntax tree-type-with-replacement-name (gensym 'tree-type-with-replacement)) (define/with-syntax tree-replacement-type-name (gensym 'tree-replacement-type)) (define τ*-limited (take τ* depth)) (define τ*-limited+T-next (if (= depth 0) (list #'T) (append (take τ* (sub1 depth)) (list #'T)))) #`(begin (provide name rm-name) (define-type (tree-type-with-replacement-name #,@τ*-limited T) (Promise #,(tree-type-with-replacement i #'T τ*-limited))) (define-pure/stateless (: low-name (∀ (#,@τ*-limited T) (→ (tree-type-with-replacement-name #,@τ*-limited Any) T (tree-type-with-replacement-name #,@τ*-limited T)))) (define #:∀ (#,@τ*-limited T) (low-name [tree-thunk : (tree-type-with-replacement-name #,@τ*-limited Any)] [replacement : T]) : (Promise #,(tree-type-with-replacement i #'T τ*-limited)) #,«make-replace-in-tree-body»)) (: name (∀ (#,@τ*-limited T) (→ (tree-type-with-replacement-name #,@τ*-limited Any) T (tree-type-with-replacement-name #,@τ*-limited (Some T))))) (define (name tree-thunk replacement) (low-name tree-thunk (Some replacement))) (: rm-name (∀ (#,@τ*-limited) (→ (tree-type-with-replacement-name #,@τ*-limited (Some Any)) (tree-type-with-replacement-name #,@τ*-limited 'NONE)))) (define (rm-name tree-thunk) (low-name tree-thunk 'NONE))))
A.2.10 Auxiliary values
The following sections reuse a few values which are derived from the list of fields:
(define all-fields #'(field …)) (define depth-above (ceiling-log2 (length (syntax->list #'(field …))))) (define offset (expt 2 depth-above)) (define i*-above (range 1 (expt 2 depth-above))) (define names (list->vector (append (map (λ (i) (format-id #'here "-with-~a" i)) i*-above) (stx-map (λ (f) (format-id f "with-~a" f)) #'(field …))))) (define rm-names (list->vector (append (map (λ (i) (format-id #'here "-without-~a" i)) i*-above) (stx-map (λ (f) (format-id f "without-~a" f)) #'(field …))))) (define low-names (list->vector (append (map (λ (i) (format-id #'here "-u-with-~a" i)) i*-above) (stx-map (λ (f) (format-id f "u-with-~a" f)) #'(field …)))))
A.2.11 Type of a tree-record
(define-for-syntax (τ-tree-with-fields struct-fields fields) (define/with-syntax (struct-field …) struct-fields) (define/with-syntax (field …) fields) «utils» ;; Like in convert-from-struct (define lookup (make-free-id-table (for/list ([n (in-syntax all-fields)] [i (in-naturals)]) (cons n (+ i offset))))) (define fields+indices (sort (stx-map λ.(cons % (free-id-table-ref lookup %)) #'(struct-field …)) < #:key cdr)) (define up (* offset 2)) ;; Like in convert-fields, but with Pairof (define (f i) (if (and (pair? fields+indices) (= i (cdar fields+indices))) (begin0 `(Some ,(caar fields+indices)) (set! fields+indices (cdr fields+indices))) (if (>= (* i 2) up) ;; DEPTH ''NONE (begin `(Pairof ,(f (* i 2)) ,(f (add1 (* i 2)))))))) (f 1))
A.2.12 Conversion to and from record-trees
(define-for-syntax (define-struct↔tree offset all-fields τ* struct-name fields) (define/with-syntax (field …) fields) (define/with-syntax fields→tree-name (format-id struct-name "~a→tree" struct-name)) (define/with-syntax tree→fields-name (format-id struct-name "tree→~a" struct-name)) (define lookup (make-free-id-table (for/list ([n (in-syntax all-fields)] [i (in-naturals)]) (cons n (+ i offset))))) (define fields+indices (sort (stx-map λ.(cons % (free-id-table-ref lookup %)) fields) < #:key cdr)) #`(begin (: fields→tree-name (∀ (field …) (→ field … (Promise #,(τ-tree-with-fields #'(field …) all-fields))))) (define (fields→tree-name field …) (delay/pure/stateless #,(convert-fields (* offset 2) fields+indices))) (: tree→fields-name (∀ (field …) (→ (Promise #,(τ-tree-with-fields #'(field …) all-fields)) (Values field …)))) (define (tree→fields-name tree-thunk) (define tree (force tree-thunk)) #,(convert-back-fields (* offset 2) fields+indices))))
A.2.12.1 Creating a new tree-record
(define-for-syntax (convert-fields up fields+indices) (define (f i) (if (and (pair? fields+indices) (= i (cdar fields+indices))) (begin0 `(Some ,(caar fields+indices)) (set! fields+indices (cdr fields+indices))) (if (>= (* i 2) up) ;; DEPTH ''NONE `(cons ,(f (* i 2)) ,(f (add1 (* i 2))))))) (f 1))
A.2.12.2 Extracting all the fields from a tree-record
We traverse the tree in preorder, and accumulate definitions naming the interesting subparts of the trees (those where there are fields).
(define-for-syntax (convert-back-fields up fields+indices) (define result '()) (define definitions '()) (define (f i t) (if (and (pair? fields+indices) (= i (cdar fields+indices))) (begin0 (begin (set! result (cons #`(Some-v #,t) result)) #t) (set! fields+indices (cdr fields+indices))) (if (>= (* i 2) up) ;; DEPTH #f (let* ([left-t (string->symbol (format "subtree-~a" (* i 2)))] [right-t (string->symbol (format "subtree-~a" (add1 (* i 2))))] [left (f (* i 2) left-t)] [right (f (add1 (* i 2)) right-t)]) (cond [(and left right) (set! definitions (cons #`(define #,left-t (car #,t)) definitions)) (set! definitions (cons #`(define #,right-t (cdr #,t)) definitions)) #t] [left (set! definitions (cons #`(define #,left-t (car #,t)) definitions)) #t] [right (set! definitions (cons #`(define #,right-t (cdr #,t)) definitions)) #t] [else #f]))))) (f 1 #'tree) #`(begin #,@definitions (values . #,(reverse result))))
A.2.13 Defining the converters and accessors for each known record type
(define-for-syntax (define-trees stx) (syntax-case stx () [(bt-fields-id (field …) [struct struct-field …] …) (let () «utils» (define ∀-types (map λ.(format-id #'here "τ~a" %) (range (add1 depth-above)))) (define total-nb-functions (vector-length names)) «define-trees-result»)]))
(define-for-syntax (bt-fields-type fields) (λ (stx) (syntax-case stx () [(_ . fs) #`(∀ fs (Promise #,(τ-tree-with-fields #'fs fields)))])))
#`(begin (define-type-expander bt-fields-id (bt-fields-type #'#,(syntax-local-introduce #'(field …)))) #,@(map λ.(define-replace-in-tree low-names names rm-names ∀-types % (floor-log2 %)) (range 1 (add1 total-nb-functions))) #,@(map λ.(define-struct↔tree offset all-fields ∀-types %1 %2) (syntax->list #'(struct …)) (syntax->list #'([struct-field …] …))))
A.2.13.1 Putting it all together
(struct (T) Some ([v : T]) #:transparent) (define-type (Maybe T) (U (Some T) 'NONE))
(require delay-pure "flexible-with-utils.hl.rkt" phc-toolkit/syntax-parse (for-syntax (rename-in racket/base [... …]) syntax/parse syntax/stx racket/syntax racket/list syntax/id-table racket/set racket/sequence racket/vector racket/contract type-expander/expander phc-toolkit/untyped/syntax-parse) (for-meta 2 racket/base) (prefix-in tr: (only-in typed/racket ∀)) racket/splicing) (provide (for-syntax define-trees) ;; For tests: (struct-out Some) ;;DEBUG: (for-syntax τ-tree-with-fields) record-builder ∀ρ with-ρ record) «maybe» «tree-type-with-replacement» «define-replace-in-tree» ;<define-remove-in-tree> «convert-fields» «convert-back-fields» «τ-tree-with-fields» «define-struct↔tree» «define-trees» «bt-fields-type» (begin-for-syntax «dichotomy» «idx→tree») «empty-branches» «record-builder» «patch» «record-type»
A.2.14 Utility math functions for binary tree manipulation
| (require (lib "phc-graph/flexible-with-utils.hl.rkt")) | |
| package: phc-graph | |
(require (for-syntax racket/base)) (provide (for-syntax to-bits from-bits floor-log2 ceiling-log2)) «to-bits» «from-bits» «floor-log2» «ceiling-log2» (module* test racket/base (require (for-template (submod ".."))) (require rackunit) «test-to-bits» «test-from-bits»)
procedure
n : exact-nonnegative-integer?
;; 1 => 1 ;; 2 3 => 10 11 ;; 4 5 6 7 => 100 101 110 111 ;; 89 ab cd ef => 1000 1001 1010 1011 1100 1101 1110 1111 ;; 1 => ε ;; 2 3 => 0 1 ;; 4 5 6 7 => 00 01 10 11 ;; 89 ab cd ef => 000 001 010 011 100 101 110 111 ;; 0 => 0 ;; 1 2 => 1 10 ;; 3 4 5 6 => 11 100 101 110 ;; 78 9a bc de => 111 1000 1001 1010 1011 1100 1101 1110 (define-for-syntax (to-bits n) (reverse (let loop ([n n]) (if (= n 0) null (let-values ([(q r) (quotient/remainder n 2)]) (cons (if (= r 1) #t #f) (loop q)))))))
(check-equal? (to-bits 0) '()) (check-equal? (to-bits 1) '(#t)) (check-equal? (to-bits 2) '(#t #f)) (check-equal? (to-bits 3) '(#t #t)) (check-equal? (to-bits 4) '(#t #f #f)) (check-equal? (to-bits 5) '(#f . #t . #t)) (check-equal? (to-bits 6) '(#t #t #f)) (check-equal? (to-bits 7) '(#t #t #t)) (check-equal? (to-bits 8) '(#t #f #f #f)) (check-equal? (to-bits 12) '(#t #t #f #f)) (check-equal? (to-bits 1024) '(#t #f #f #f #f #f #f #f #f #f #f))
procedure
n : (listof boolean?)
(check-equal? (from-bits '()) 0) (check-equal? (


