Homework 6 - Parsing

Due by 11:59pm on 2023-11-08.

Objectives

  • Build an expression tree from a parsed calculator input expression

Starter Files

Download hw06.zip. Inside the archive, you will find the pair.py library and test files for this homework.

Introduction

In this homework, you'll build off of your work from Lab 16 and write a parse_tokens() function that takes as input the tokenized list of symbols that the tokenize() function you wrote generates and parses that into an expression tree that represents the mathematical expression the user input. Project 3 will then use these functions to build a full interpreter for the Calculator language.

Representing Expression Trees

This section is a review from lab 16.

We can represent expression trees using slightly different version of a linked list. Instead of using Link, we will be using Pair to construct a pair list of the operators and operands. Like Link, every Pair has a first and rest attribute; however, all Pair lists end with nil. For example, Pair(2, nil) or Pair(1, Pair(2, nil)). Representing expressions as pair lists will make it even easier to evaluate the expressions later down the road.

If necessary, refer to pair.py for the implementation for Pair and nil

To demonstrate how to represent expressions as pair lists, look at the following examples:

(+ 1 1) --> ['(', '+', '1', '1', ')'] --> Pair('+', Pair(1, Pair(1, nil)))
(+ 1.5 1.5) --> ['(', '+', '1.5', '1.5', ')'] --> Pair('+', Pair(1.5, Pair(1.5, nil)))

Notice that at the start of an expression, we immediately start with a Pair followed by the operator and a Pair list of operands where each .first value is an operand. Additionally, each operand is converted to a float or int accordingly. In the context of converting the python list to a pair list, the open parenthesis ( signifies the start of a pair list and the closing parenthesis ) signifies the end of the pair list or nil.

In the case that there is an operand is another expression, we represent that with another pair list. For example:

(* (- 8 4) 4) --> Pair('*', Pair(Pair('-', Pair(8, Pair(4, nil))), Pair(4, nil)))
^
(- 8 4)

Here's the algorithm for converting a python list of tokens to a Pair list. It uses a recursive function that has two parameters, the token list and an index value:

  1. If the token at the provided index is an open parenthesis '('?
    1. The next token should be an operator. Store the operator in a variable.
    2. If the index is not zero, then the current tokens is a start of a sub-expression, so...
      1. Call parse_tokens() by passing in the list of tokens with the index incremented by 2 (this puts the token after the operator as the next one to be processed). The recursive call will return two values: a new pair list and a new index respectively. Capture the new pair list returned by the function in a new variable and update index to the new index returned by the function.
      2. Set the variable you stored the operator in step 1.1 to a Pair object with the operator as .first and the new pair list (returned in step 1.2.1) as .rest
    3. If the index of the current token is 0, increment the index by 2.
    4. Call parse_tokens() with the token list and index. Update the index to new index returned by the function and capture the new pair list returned in a new variable. This function call will either return the rest of the syntax tree if the input index was 0 or nil if it wasn't.
    5. Return a Pair object with the variable from 1.1 (possibly updated in 1.2.2) as the .first and the pair list returned in step 1.4 as the .rest along with the index returned in 1.4 (in that order).
  2. If the token is a close parenthesis ')', return a nil object and the index incremented by 1.
  3. Everything else should be operands and should be integers or floating point numbers. Do this entire part in a try/except block that raises a TypeError if converting the current token to float or integer fails.
    1. If the current token has a decimal point, convert it to a float, and store it in a variable.
    2. If the current token does not have a decimal point, convert it to an int, and store it in a variable.
    3. Call parse_tokens() with the token list and the index incremented by 1 to process the next token. Capture the new pair list and update the index to the new index returned.
    4. Return a Pair object with the variable created in 3.1 or 3.2 as .first and the pair object returned in step 3.3 as .rest along with the index returned in step 3.3.

Part 1 - Getting started

Task 1 - Set up

Start by creating a new calculator.py file to hold your code in. Copy your tokenize() function from lab 16 into this file. You'll also want to import the Pair class and nil object from the pair.py file that was included in the zip file.

from pair import Pair, nil

Next, create a parse_tokens() function that takes a list of tokens and an index as input parameters:

def parse_tokens(tokens, index):
""" Takes a list of tokens and an index and converts the tokens to a Pair list

>>> parse_tokens(['(', '+', '1', '1', ')'], 0)
(Pair('+', Pair(1, Pair(1, nil))), 5)
>>> parse_tokens(['(', '*', '(', '-', '8', '4', ')', '4', ')'], 0)
(Pair('*', Pair(Pair('-', Pair(8, Pair(4, nil))), Pair(4, nil))), 9)
"""

# Write your code here

The calculator.py file will be the basis of Project 3 as well.

Task 2 - Write some tests

While not necessary, it will help you in your debugging if you have some test cases generated that you could use to verify that your code is creating the correct output. We've provided some with the zip file, but you should add a few of your own. You could write these as either doctests or pytest functions. Think about some example input expressions and what the generated tree should look like for those expressions. Write the tests to verify that you get what you expect. Now you're ready to start coding

Task 3 - Implement parse_tokens()

With the list of tokens from our tokenize() function, we need to convert that into a form we can evaluate. You are going to build a syntax tree using the Pair class (which we've provided you in the the pair.py file). Remember that each Pair has a data value called first and a link to the rest of the structures called rest. first can either be an operator, a value, or another Pair object that contains a sub-expression. rest is either a Pair object containing the next item in the expression or a nil object (also provided in pair.py) that signifies that this is the end of the expression. Our task is to convert our token list into that syntax tree.

The full algorithm for this is above in the Representing Expression Trees section. If at any point, something is not correct, for example a string where there should be numbers, an invalid operator, or extra or nor parentheses, you should have your function throw an exception reporting the error.

Since our syntax tree is a recursive structure, you'll be following a recursive process to build it as described above and you will need to encapsulate parsing logic into a recursive function (the parse_tokens() function). This function will return either a Pair or nil object and the index of the next token to process. The resulting Pair object will be the head of a linked list of Pair objects representing the syntax tree of the expression.

Before you start coding, review the algorithm to be sure you understand what is happening. You might try running the algorithm by hand on some simple expressions to make sure you know how it works. Once you understand it, add the code to your function to actually implement the algorithm. If you started the parse_tokens() function in Lab 16, you've already started on this. Copy your code over and continue from where you left off.

Turn in your work

You'll submit your calculator.py file on Canvas via Gradescope where it will be checked via the auto grader. Make sure that you haven't "hard coded" anything specific to the test data we gave you. We do not guarantee that all scenarios are tested by the test data that we have provided you.

© 2023 Brigham Young University, All Rights Reserved