Calculator

Tips for navigating the slides:
  • Press O or Escape for overview mode.
  • Press the copy icon on the upper right of code blocks to copy the code

Class outline:

  • Programming languages
  • Parsing a language
  • The Calculator language
  • Evaluating a language
  • Interactive interpreters

Programming languages

Levels of languages

High-level programming language
(Python, C++, JavaScript)

Assembly language
(Hardware-specific)

Machine language
(Hardware-specific)

Machine language

The language of the machine is all 1s and 0s, often specifying the action and the memory address to act on:


                    10111001 00000000 11001010 10011010 00111011 ; Fib(ecx) loop counter
                    10001001 11001101
                    10111011 00000000 11111100 11111111 11111111
                    00100001 11011100 ; alignment necessary for convenient pointer-reset
                    00000001 11011100
                    10001101 00111100 00011100
                    11111001 ; starting conditions: both buffers zeroed, but carry-in = 1
                    10110011 01110010
                    01011000
                    00010011 00000100 00010111 ; read from a potentially-offset location (but still store to the front)
                    10001101 10110000 00000000 00110110 01100101 11000100
                    00111001 11000101 ; sets CF when (base-1) < eax.  i.e. when eax>=base
                    00001111 01000010 11000110 ; eax %= base, keeping it in the [0..base) range
                    10101011 ; Skylake: 3 uops.  Like add + non-micro-fused store.  32,909Mcycles for 100M iters (with lea/cmp, not sub/cmc)
                    11111110 11001011 ; preserves CF.  The resulting partial-flag merge on ADC would be slow on pre-SnB CPUs
                    01110101 11101100
                    00001111 10010010 11000010
                    10001001 00010111 ; store the carry-out into an extra limb beyond limbcount
                    11000001 11100010 00000010
                    10001001 11100000 ; test/setnz could work, but only saves a byte if we can somehow avoid the  or dl,al
                    00100100 00000100
                    10000111 11111100 ; only works if limbcount is even, otherwise we'd need to subtract limbcount first.
                    00100001 11011100 ; -1024 ; revert to start of buffer, regardless of offset
                    00100001 11011111
                    00000001 11010100 ; read offset in src
                    00001000 11000010 ; also set r8d if we had a source offset last time, to handle the 2nd buffer
                    11100010 11010010 ; Maybe 0.01% slower than dec/jnz overall
                    10001101 01101001 00001010
                    10110001 01110010
                    01011000 ; lodsd
                    10110011 00001001
                    10011001 ; edx=0 because eax can't have the high bit set
                    11110111 11110101 ; edx=remainder = low digit = 0..9.  eax/=10
                    10000000 11000010 00110000
                    01001111 ; store digits in MSD-first printing order, working backwards from the end of the string
                    10001000 00010111
                    11111110 11001011
                    01110101 11110011
                    11100010 11101110
                    00101001 11011010 ; edx = 1024 + 0..9 (leading digit).  +0 in the Fib(10**9) case
                    10110000 00000100 ; SYS_write
                    10001101 01011000 11111101 ; fd=1
                    10001101 01001111 00000001 ; Hard-code for Fib(10**9), which has one leading zero in the highest limb.
                    11001101 10000000 ; write(1, buf+1, 1024)
                    10001001 11011000 ; SYS_exit=1
                    11001101 10000000 ; exit(ebx=1)
                    

Code is executed directly by the hardware.

Credit Peter Cordes

Assembly language

Assembly language was introduced for (slightly) easier programming. It maps directly to machine code.

Machine code Assembly code

                                    10111001 00000000 11001010 10011010 00111011
                                    10001001 11001101
                                    10111011 00000000 11111100 11111111 11111111
                                    00100001 11011100
                                    00000001 11011100
                                    10001101 00111100 00011100
                                    11111001


                                    10110011 01110010

                                    01011000
                                    00010011 00000100 00010111
                                    

                                    mov    ecx, 1000000000
                                    mov    ebp, ecx
                                    mov    ebx, -1024
                                    and    esp, ebx
                                    add    esp, ebx
                                    lea    edi, [esp + ebx*1]
                                    stc
                                    .fibonacci:
                                    limbcount equ 114
                                    mov    bl, limbcount
                                    .digits_add:
                                    pop    eax
                                    adc    eax, [edi + edx*1]
                                    

Higher-level languages

Higher level languages:

  • provide means of abstraction such as naming, function definition, and objects
  • abstract away system details to be independent of hardware and operating system

                    if x > y:
                        z = 2
                    else:
                        z = 3
                    

Statements & expressions are either interpreted by another program or compiled (translated) into a lower-level language.

Compiled vs. interpreted

When a program is compiled, the source code is translated into machine code, and that code can be distributed and run repeatedly.

Source code → Compiler → Machine code → Output

When a program is interpreted, an interpreter runs the source code directly (without compiling it first).

Source code → Interpreter → Output

In its most popular implementation (CPython), Python programs are interpreted but have a compile step:

Source code → Compiler → Bytecode → Virtual Machine → Output

Phases of an interpreter/compiler

In order to either interpret or compile source code, a program must be written that understands that source code.

Typical phases of understanding:

Source code → Lexing → Parsing → Abstract Syntax Tree

Lexing & Parsing

Reading Scheme Lists

A Scheme list is written as elements in parentheses:


                    (<element_0> <element_1> ... <element_n>)
                    

Each <element> can be a combination or primitive.


                    (+ (* 3 (+ (* 2 4) (+ 3 5))) (+ (- 10 7) 6)) 
                    

The task of parsing a language involves turning a string representation of an expression into a structured object representing the expression.

Parsing

A parser takes text and returns an expression object.

Text Lexical
Analysis
Tokens Syntactic
Analysis
Expression
'(+ 1' '(', '+', 1 Pair('+', Pair(1, ...))

printed as

(+ 1 (- 23) (* 4 5.6))
' (- 23)' '(', '-', 23, ')'
' (* 4 5.6))' '(', '*', 4, 5.6, ')', ')'

Lexical analysis

' (* 4 5.6))''(', '*', 4, 5.6, ')', ')'
  • Iterative process
  • Checks for malformed tokens
  • Determines types of tokens
  • Processes one line at a time

Syntactic analysis

'(', '+', 1, ... → Pair('+', Pair(1, ...))
  • Tree-recursive process
  • Balances parentheses
  • Returns tree structure
  • Processes multiple lines

In scheme_reader.py, each call to scheme_read consumes the input tokens for exactly one expression.

  • Base case: symbols and numbers
  • Recursive case: read subexpressions and combine them

Pair class

The Pair class represents Scheme pairs and lists. A list is a pair whose second element is either pair or a list.


                    class Pair:


                    

                    s = Pair(1, Pair(2, Pair(3, nil)))
                    print(s)  # (1 2 3)
                    len(s)    # 3
                    

Improper lists:


                    print(Pair(1, 2))  # (1 . 2)
                    print(Pair(1, Pair(2, 3))) # (1 2 . 3)
                    len(Pair(1, Pair(2, 3))) Error!
                    

The Calculator Language

What's in a language?

A programming language has:

  • Syntax: The legal statements and expressions in the language
  • Semantics: The execution/evaluation rule for those statements and expressions

To create a new programming language, you either need a:

  • Specification: A document describe the precise syntax and semantics of the language
  • Canonical Implementation: An interpreter or compiler for the language

Calculator language syntax

The Calculator language has primitive expressions and call expressions. (That's it!)

A primitive expression is a number: 2 -4 5.6

A call expression is a combination that begins with an operator (+, -, *, /) followed by 0 or more expressions:
(+ 1 2 3) (/ 3 (+ 4 5))

Expression Expression tree Representation as pairs

                                    (* 3
                                        (+ 4 5)
                                        (* 6 7 8))
                                    
Diagram of expression tree for calculator expression Diagram of Pairs for Calculator expression

Calculator language semantics

The value of a calculator expression is defined recursively.

  • Primitive: A number evaluates to itself.
  • Call: A call expression evaluates to its argument values combined by an operator.
    • +: Sum of the arguments
    • *: Product of the arguments
    • -: If one argument, negate it. If more than one, subtract the rest from the first.
    • /: If one argument, invert it. If more than one, divide the rest from the first.
Expression Expression Tree

                                    (+ 5
                                        (* 2 3)
                                        (* 2 5 5))
                                    
Diagram of evaluated expression tree for Calculator expression

Evaluation

The eval function

The eval function computes the value of an expression.

It is a generic function that behaves according to the type of the expression (primitive or call).

Implementation Language semantics

                                    def calc_eval(exp):
                                        if isinstance(exp, (int, float)):
                                            return exp
                                        elif isinstance(exp, Pair):
                                            if isinstance(exp.first, Pair):
                                                return Pair(calc_eval(exp.first), exp.rest.map(calc_eval))
                                            else:
                                                arguments = exp.rest.map(calc_eval)
                                                return calc_apply(exp.first, arguments)
                                        else:
                                            raise TypeError
                                    

A number evaluates...
to itself

A call expression evaluates...
to its argument values combined by an operator

Applying built-in operators

The apply function applies some operation to a (Scheme) list of argument values

In calculator, all operations are named by built-in operators: +, -, *, /

Implementation Language semantics

                                        def calc_apply(operator, args):
                                            if operator == '+':
                                                return reduce(add, args, 0)
                                            elif operator == '-':
                                                ...
                                            elif operator == '*':
                                                ...
                                            elif operator == '/':
                                                ...
                                            else:
                                            raise TypeError
                                        

+
Sum of the arguments

-
...

*
...

/
...

Interactive interpreters

REPL: Read-Eval-Print Loop

The user interface for many programming languages is an interactive interpreter

  • Print a prompt
  • Read text input from the user
  • Parse the text input into an expression
  • Evaluate the expression
  • If any errors occur, report those errors, otherwise
  • Print the value of the expression and repeat

Raising exceptions

Exceptions can be raised during lexical analysis, syntactic analysis, eval, and apply.

Example exceptions

  • Lexical analysis: The token 2.3.4 raises ValueError("invalid numeral")
  • Syntactic analysis: An extra ) raises SyntaxError("unexpected token")
  • Eval: An empty combination raises TypeError("() is not a number or call expression")
  • Apply: No arguments to - raises TypeError("- requires at least 1 argument")

Handling exceptions

An interactive interpreter prints information about each error.

A well-designed interactive interpreter should not halt completely on an error, so that the user has an opportunity to try again in the current environment.