High-level programming language
(Python, C++, JavaScript)
⬇
Assembly language
(Hardware-specific)
⬇
Machine language
(Hardware-specific)
The language of the machine is all 1s and 0s, often specifying the action and the memory address to act on:
10111001 00000000 11001010 10011010 00111011 ; Fib(ecx) loop counter
10001001 11001101
10111011 00000000 11111100 11111111 11111111
00100001 11011100 ; alignment necessary for convenient pointer-reset
00000001 11011100
10001101 00111100 00011100
11111001 ; starting conditions: both buffers zeroed, but carry-in = 1
10110011 01110010
01011000
00010011 00000100 00010111 ; read from a potentially-offset location (but still store to the front)
10001101 10110000 00000000 00110110 01100101 11000100
00111001 11000101 ; sets CF when (base-1) < eax. i.e. when eax>=base
00001111 01000010 11000110 ; eax %= base, keeping it in the [0..base) range
10101011 ; Skylake: 3 uops. Like add + non-micro-fused store. 32,909Mcycles for 100M iters (with lea/cmp, not sub/cmc)
11111110 11001011 ; preserves CF. The resulting partial-flag merge on ADC would be slow on pre-SnB CPUs
01110101 11101100
00001111 10010010 11000010
10001001 00010111 ; store the carry-out into an extra limb beyond limbcount
11000001 11100010 00000010
10001001 11100000 ; test/setnz could work, but only saves a byte if we can somehow avoid the or dl,al
00100100 00000100
10000111 11111100 ; only works if limbcount is even, otherwise we'd need to subtract limbcount first.
00100001 11011100 ; -1024 ; revert to start of buffer, regardless of offset
00100001 11011111
00000001 11010100 ; read offset in src
00001000 11000010 ; also set r8d if we had a source offset last time, to handle the 2nd buffer
11100010 11010010 ; Maybe 0.01% slower than dec/jnz overall
10001101 01101001 00001010
10110001 01110010
01011000 ; lodsd
10110011 00001001
10011001 ; edx=0 because eax can't have the high bit set
11110111 11110101 ; edx=remainder = low digit = 0..9. eax/=10
10000000 11000010 00110000
01001111 ; store digits in MSD-first printing order, working backwards from the end of the string
10001000 00010111
11111110 11001011
01110101 11110011
11100010 11101110
00101001 11011010 ; edx = 1024 + 0..9 (leading digit). +0 in the Fib(10**9) case
10110000 00000100 ; SYS_write
10001101 01011000 11111101 ; fd=1
10001101 01001111 00000001 ; Hard-code for Fib(10**9), which has one leading zero in the highest limb.
11001101 10000000 ; write(1, buf+1, 1024)
10001001 11011000 ; SYS_exit=1
11001101 10000000 ; exit(ebx=1)
Code is executed directly by the hardware.
Assembly language was introduced for (slightly) easier programming. It maps directly to machine code.
Machine code | Assembly code |
---|---|
|
|
Higher level languages:
if x > y:
z = 2
else:
z = 3
Statements & expressions are either interpreted by another program or compiled (translated) into a lower-level language.
When a program is compiled, the source code is translated into machine code, and that code can be distributed and run repeatedly.
Source code → Compiler → Machine code → Output
When a program is interpreted, an interpreter runs the source code directly (without compiling it first).
Source code → Interpreter → Output
In its most popular implementation (CPython), Python programs are interpreted but have a compile step:
Source code → Compiler → Bytecode → Virtual Machine → Output
In order to either interpret or compile source code, a program must be written that understands that source code.
Typical phases of understanding:
Source code → Lexing → Parsing → Abstract Syntax Tree
A Scheme list is written as elements in parentheses:
(<element_0> <element_1> ... <element_n>)
Each <element> can be a combination or primitive.
(+ (* 3 (+ (* 2 4) (+ 3 5))) (+ (- 10 7) 6))
The task of parsing a language involves turning a string representation of an expression into a structured object representing the expression.
A parser takes text and returns an expression object.
Text | Lexical Analysis | Tokens | Syntactic Analysis | Expression |
---|---|---|---|---|
'(+ 1'
| → | '(' , '+' , 1
| → | Pair('+', Pair(1, ...))
printed as (+ 1 (- 23) (* 4 5.6)) |
' (- 23)'
| → | '(' , '-' , 23 , ')'
| ||
' (* 4 5.6))'
| → | '(' , '*' , 4 , 5.6 , ')' , ')'
|
' (* 4 5.6))'
→ '('
, '*'
, 4
, 5.6
, ')', ')'
'('
, '+'
, 1
, ... → Pair('+', Pair(1, ...))
In scheme_reader.py, each call to scheme_read
consumes the input tokens for exactly one expression.
The Pair class represents Scheme pairs and lists. A list is a pair whose second element is either pair or a list.
class Pair:
s = Pair(1, Pair(2, Pair(3, nil)))
print(s) # (1 2 3)
len(s) # 3
Improper lists:
print(Pair(1, 2)) # (1 . 2)
print(Pair(1, Pair(2, 3))) # (1 2 . 3)
len(Pair(1, Pair(2, 3))) Error!
A programming language has:
To create a new programming language, you either need a:
The Calculator language has primitive expressions and call expressions. (That's it!)
A primitive expression is a number: 2
-4
5.6
A call expression is a combination that begins with an operator (+
, -
, *
, /
) followed by 0
or more expressions:
(+ 1 2 3)
(/ 3 (+ 4 5))
Expression | Expression tree | Representation as pairs |
---|---|---|
|
The value of a calculator expression is defined recursively.
+
: Sum of the arguments
*
: Product of the arguments
-
: If one argument, negate it. If more than one, subtract the rest from the first.
/
: If one argument, invert it. If more than one, divide the rest from the first.
Expression | Expression Tree |
---|---|
|
The eval function computes the value of an expression.
It is a generic function that behaves according to the type of the expression (primitive or call).
Implementation | Language semantics |
---|---|
|
A number evaluates... A call expression evaluates... |
The apply function applies some operation to a (Scheme) list of argument values
In calculator, all operations are named by built-in operators: +, -, *, /
Implementation | Language semantics |
---|---|
|
|
The user interface for many programming languages is an interactive interpreter
Exceptions can be raised during lexical analysis, syntactic analysis, eval, and apply.
Example exceptions
ValueError("invalid numeral")
SyntaxError("unexpected token")
TypeError("() is not a number or call expression")
-
raises TypeError("- requires at least 1 argument")
An interactive interpreter prints information about each error.
A well-designed interactive interpreter should not halt completely on an error, so that the user has an opportunity to try again in the current environment.