Architecture Overview

The following will give a short overview of the architecture used in the parser, AST and Interpreter.

The library is divided into the public API (everything accessible through the public headers in include/MiniLua/) and the private API (everything that is not accessible by the user of the library). To properly separate the public and private API we use the PImpl technique in the public API to hide the implementation details. See below.

The library is overall divided into three parts:

  1. Parser
  2. Abstract Syntax Tree (AST)
  3. Interpreter
%3 cluster Created by User SourceCode Source Code Parser Parser SourceCode->Parser Environment Pre-Setup Environment Interpreter Interpreter Environment->Interpreter ReturnValue Return Value AST AST Parser->AST AST->Interpreter Interpreter->ReturnValue SourceChanges SourceChanges Interpreter->SourceChanges SourceChanges->SourceCode Apply to Code

Parser

This library uses Tree-Sitter as the parser. Tree-Sitter is only a parser generator library and we use our own grammar definition for Lua which is a fork of Azganoth/tree-sitter-lua.

The tree-sitter parser takes the source code provided by the user and creates a tree of nodes. We then convert this tree into our own abstract syntax tree.

Abstract Syntax Tree

The abstract syntax tree (AST) makes it a bit more convenient for the interpreter to navigate through the tree. The AST also takes care of desugaring some of the syntax. Desugaring converts more complicated syntax to simpler syntax. E.g. for loops get desugared into while loops and method definitions and calls get desugared into normal function definitions and calls.

Interpreter

The interpreter walks through the AST, generates Values for the literals and executes the code. It also takes care of handling and combining the SourceChanges.

For (almost) every AST node there is one visit_* method in the internal Interpreter. The methods take the AST node and the current environment (in form of the internal Env). We need this environment to correctly scope local variables. We create a new environment whenever there is a new block/scope and the local variable declarations are then only added to this environment. When we leave the block/scope we switch back to the outer environment.

Allocator

The interpreter keeps track of all Values and the Environments that are created. We do this to prevent memory leaks.

MemoryAllocator is the class that handles all allocation and keeps track of the values. Actually the allocator only keeps track of the tables because they are the only type that can create reference cycles, which would lead to a memory leak. Cycles can happen in the following ways:

  • Direct or indirect cycles in tables. I.e.

    -- Lua Code
    t1 = {}
    t2 = {table1 = t1}
    t1["table2"] = t2
  • Functions capture their environment but are also stored in the environment. Which also creates a cycle.

Implementation

The following sections describe some implementation details/techniques.

PImpl Technique

Also see: https://en.cppreference.com/w/cpp/language/pimpl

With the PImpl technique we hide all behaviour using a (private) nested forward declaration of a struct or class. This has two important uses:

  1. hiding the implementation details from the user
  2. breaking up cycles of incomplete types (this works because the cyclic reference are now in the cpp instead of the header)

For this to work classes using the PImpl technique can't use the private nested forward declaration. This means we have to declare but not define the methods in the header. This includes special member functions like copy-constructor and destructor. Then these methods have to be implemented in the cpp file and the only difference is basically instead of using this-> they have to use this->impl-> or just impl->.

A class with the PImpl technique usually looks like this:

// in the header
class Something {
  struct Impl;
  owning_ptr<Impl> impl; // any pointer type is ok here

public:
  // normal constructor declarations
  Something(const Something&);
  Something(Something&&) noexcept;
  Something& operator=(const Something& other);
  Something& operator=(Something&& other);
  friend void swap(Something& self, Something& other);
};

// in the cpp
struct Something::Impl {
  // fields you would have put in class Something
};
Something::Something(const Something&) = default;
Something(Something&&) noexcept = default;
Something& operator=(const Something& other) = default;
Something& operator=(Something&& other) = default;
void swap(Something& self, Something& other) {
  std::swap(self.impl, other.impl);
}

If the nested struct Impl is not default constructible you have to provide the implementation of the copy-/move-constructors and -operators manually and can't use = default.

owning_ptr<T> was chosen for the pointer type because it behaves like a normal T (but lives on the heap). It's move, copy and lifetime semantics are identical to the one of T.

It is also possible to choose another pointer type like std::unique_ptr or std::shared_ptr. You should probably avoid using raw pointer T* because lifetime management is hard with raw pointers.