I've embarked on a voyage to build a basic interpreter - this post will dictate some of the things I've learned. Firstly, what is an interpreter? An interpreter processes and executes source code. It's similar to a compiler; what is the difference? A compiler processes the entire source code into machine code then you can execute this machine code. Note that compilers also optimise your code. Difference platforms have different architectures and therefore styles of machine code - this means some code compiled on one computer might not work on another. A interpreter also translates the source code into machine code line by line; it can execute code on any platform. This means code will show an error as soon as it's encountered. It's often slower than compiled languages.
We'll go over some key concepts. Each line of code is made of tokens they are objects with a type and value. For example 3
would be 3, and INTEGER. Breaking down lines of code into tokens is called lexical analysis this is carried out by a lexer. Recognizing a phrase in a steam of tokens is called parsing this is done by a parser. A parser will understand x = 5
means you want to store 5 in a variable called x.
All languages need a grammar. A grammar is like a language to describe a language! A context free grammar is a set of recursive productions to generate strings. These grammars are concise, and serve as a starting point to make a programming language. Grammars are made of many rules or productions. Each production is made of a head and body they look like this num: INTEGER
- head is num
and body is INTEGER
. A terminal is a string that appears in strings generated by grammar. Non-terminals are placeholders for terminals. In the example above num
is non-terminal, and INTEGER
is terminal. The start symbol of a grammar is the non-terminal symbol of the first rule.