Lab 16 - Parsing

Due by 11:59pm on 2024-03-14.

Starter Files

Download lab16.zip. Inside the archive, you will find starter files for the questions in this lab.

Topics

Calculator Language

In Project 3 you will build an interactive interpreter for the Calculator language.

The interpreter will allow the user to enter valid calculator expressions and evaluate them. For example:

calc >> (+ 3 4)
7
calc >> (+ 3 (+ 2 2))
7

To do that, we must first review the calculator language.

The Calculator language has primitive expressions, such as 2, -4, 5.6, and call expressions.

A call expression is a combination that begins with an operator (+, -, *, /) followed by 0 or more expressions. For example:

(+ 1 2)
(+ 1 2 3)
(/ 3 (+ 4 5))

Notice that the operator comes first and the operands follow. (So, instead of having (3 + 4), we have (+ 3 4).) Additionally, the operands can be another call expression.

The rules of evaluation are as follows:

Evaluate the operator/function (in other words, what function is it?)
Evaluate the operands
- If an operand is another call expression, then evaluate that call expression. For example, if the expression is (/ 3 (+ 4 5)), evaluate (+ 4 5) to get 9, then do (/ 3 9) to get 0.33
Apply the function to the operands and achieve the result

Using these rules of evaluation on the previous examples gives us,

(+ 1 2) = 3
(+ 1 2 3) = 6
(/ 3 (+ 4 5)) = 0.33

Notice that the expression (+ 1 2 3) evaluates to $(1+2)+3 = 6$. Likewise, (/ 20 5 4) evaluates to $(20 / 5) / 4 = 1$.

Practice

Evaluate the following calculator expressions (it may be useful to go to the expression tree segment if you are struggling).

(+ 3 2)
(* (- 8 4) 4)
(/ (+ 15 5) (* 2 2))
(+ 1 2 3 4 5)
(+ (- 10 2 2) (* (+ 3 2) 4))

Answers:

5
- Add 3 to 2
16
- Multiply (- 8 4) or 4 by 4
5
- Divide (+ 15 5) or 20 by (* 2 2) or 4
15
- (((1 + 2) + 3) + 4) + 5 = 15
26
- Add (- 10 2 2) or 6 to (* (+ 3 2) 4) or 20

Expression Tree

Expression Trees allow us to break down complicated calculator expressions. If we have a call expression, we represent that as a box with edges or lines leading to the operator/function and the operands. If an operand is another call expression, then we represent that as a box with its own edges.

For example, if the expression is (+ 2 2), we diagram that like so:

Additionally, if the expression is (* 3 (+ 4 5) (* 6 7 8)):

Representing Expression Trees

We can represent expression trees using slightly different version of a linked list. Instead of using the Link class, we will be using the Pair class to construct a pair list of the operators and operands. Like Link, every Pair has a .first and .rest attribute; however, all Pair lists end with nil. For example, Pair(2, nil) or Pair(1, Pair(2, nil)). Representing expressions as pair lists will make it easier to evaluate the expressions later down the road.

If necessary, refer to pair.py for the implementation for Pair and nil

To demonstrate how to represent expressions as pair lists, look at the following examples that convert a calculator expression to a tokens list and then to a pair list:

(+ 1 1)  ->  ['(', '+', '1', '1', ')']  ->  Pair('+', Pair(1, Pair(1, nil)))
#                                                      pair list of operands

(+ 1.5 1.5)  ->  ['(', '+', '1.5', '1.5', ')']  ->  Pair('+', Pair(1.5, Pair(1.5, nil)))

Notice that at the start of an expression, we immediately start with a Pair followed by the operator and a Pair list of operands where each .first value is an operand. Additionally, each operand is converted to a float or int accordingly. In the context of converting the python list to a pair list, the open parenthesis ( signifies the start of a pair list and the closing parenthesis ) signifies the end of the pair list or nil.

In the case that there is an operand is another expression, we represent that with another pair list. For example:

(* (+ 1 1) 4) --> Pair('*', Pair(Pair('+', Pair(1, Pair(1, nil))), Pair(4, nil)))
                                 ^-----------------------------^
                                             (+ 1 1)

For Q2, here's the algorithm for converting a python list of tokens to a Pair list. It uses a recursive function, called parse_tokens() that has two parameters: a list containg the tokens and an index value.

If the token at the provided index is an open parenthesis '('
1. The next token should be an operator. Store the operator in a variable.
2. If the index is not zero, then the current tokens is a start of a sub-expression, so...
  1. Call parse_tokens() by passing in the list of tokens tokens with the index incremented by 2 (this puts the token after the operator as the next one to be processed). The recursive call will return two values: a new pair list and a new updated index. Save the new pair list in a variable and update index to equal the updated index returned by the function.
  2. Reassign the variable you stored the operator in step 1.1 to a Pair object with the operator as .first and the new pair list (returned in step 1.2.1) as .rest
3. If the index of the current token is 0
  1. increment the index by 2
4. After those conditions are checked call parse_tokens() with the token list tokens and index. Save the new pair list in a variable and update index to new index returned by the function. This function call will either return the rest of the syntax tree if the input index was 0 or nil if it wasn't.
5. Return a Pair object with the variable holding the operator from 1.1 (which was possibly updated in 1.2.2) as the .first and the pair list returned in step 1.4 as the .rest along with the index returned in 1.4 (in that order).
If the token is a close parenthesis ')'
1. return a nil object and the index incremented by 1.
All other tokens should be operands for expression as integers or floating point numbers. Do this entire part in a try/except block that raises a TypeError if converting the current token to float or integer fails.
1. If the current token has a decimal point ., convert it to a float, and store it in a variable.
2. If the current token does not have a decimal point, convert it to an int, and store it in a variable.
3. Call parse_tokens() with the token list tokens and the index incremented by 1 to process the next token. Save the new pair list and update index to the new index returned.
4. Return a Pair object with the variable created in 3.1 or 3.2 as .first and the pair object returned in step 3.3 as .rest along with the index returned in step 3.3.

Required Question

Q1: Tokenize

To evaluate user input, our program needs to format that input in such a way that make it easier to evaluate. Build a tokenize() function that takes a string expression like the ones above and returns a list where each item in the list is a parenthesis, one of the four operators, or a number literal.

def tokenize(expression):
    """ Takes a string and returns a list where each item
    in the list is a parenthesis, one of the four operators (/, *, -, +),
    or a number literal.
    >>> tokenize("(+ 3 2)")
    ['(', '+', '3', '2', ')']
    >>> tokenize("(- 9 3 3)")
    ['(', '-', '9', '3', '3', ')']
    >>> tokenize("(+ 10 100)")
    ['(', '+', '10', '100', ')']
    >>> tokenize("(+ 5.5 10.5)")
    ['(', '+', '5.5', '10.5', ')']
    >>> expr = "(* (- 8 4) 4)"
    >>> tokenize(expr)
    ['(', '*', '(', '-', '8', '4', ')', '4', ')']
    >>> expr = "(* (- 6 8) (/ 18 3) (+ 10 1 2))"
    >>> tokenize(expr)
    ['(', '*', '(', '-', '6', '8', ')', '(', '/', '18', '3', ')', '(', '+', '10', '1', '2', ')', ')']
    """
    # Write your code here

Hint: The .split() or .replace() methods may be helpful here.

Q2: Parse Tokens

You will be writing this function in Homework 6, so if you have time get started on it in the lab!

Start implementing the parse_tokens() function that takes a list of tokens and an index and converts the list of tokens to a Pair list. Refer to the steps above for instructions how to implement it.

def parse_tokens(tokens, index):
    """ Takes a list of tokens and an index and converts the tokens to a Pair list

    >>> parse_tokens(['(', '+', '1', '1', ')'], 0)
    (Pair('+', Pair(1, Pair(1, nil))), 5)
    >>> parse_tokens(['(', '*', '(', '-', '8', '4', ')', '4', ')'], 0)
    (Pair('*', Pair(Pair('-', Pair(8, Pair(4, nil))), Pair(4, nil))), 9)
    """
    # Write your code here

Submit

Submit the lab16.py file on Canvas to Gradescope in the window on the assignment page.