Scanning Code

I built a scanner as part of building a Lox interpreter. A scanner transforms source code into tokens, which are meaningful units of information for other parts of an interpreter (or compiler).

We can model scanner as a function, constrained by the rules/design of the programming language:

scanner(sourceCode: string): Token[]

scanner iterates over sourceCode to identify substrings (lexemes), and associate them with some meaningful category (token).

The following code helps illustrate how Lox code is accepted as input, to produce a list of tokens as output:

enum TokenType { VAR, IDENTIFIER, EQUAL, // ... }
class Token { tokenType: TokenType, lexeme: string, // ... }

const sourceCode = `var foo = "bar";`
const tokens = scanner(sourceCode)

console.log(tokens)
// console ouput
[
  Token { type: VAR, lexeme: 'var', // ... },
  Token { type: IDENTIFIER, lexeme: 'foo', // ...  },
  Token { type: EQUAL, lexeme: '=', // ... },
  Token { type: STRING, lexeme: '"bar"', // ... },
  Token { type: SEMICOLON, lexeme: ';', // ... },
  Token { type: EOF, lexeme: '', // ...  }
]

I’ve omitted portions of that code to focus on the core of what scanner does, which is to transform arbitrary text into meaningful units (more on this later)

On using TypeScript instead of Java

I haven’t used Java in a while, and I use TypeScript almost every day. Plus using a different language helped make sure I’m not fooling myself by writing the same code as the book; thinking about how to translate between the two languages was a huge boost towards understanding the material