Lab 12 - Testing

Due by 11:59pm on 2024-02-27.

Starter Files

Download lab12.zip. Inside the archive, you will find starter files for the questions in this lab.

Topics

Pytest

If you ever want to use your code for something of importance like medical software, you need to make sure it works properly. To ensure it works, we use testing. There's several libraries and modules you can use to test your code in an efficient way. For this class, we are going to use pytest.

Installing Pytest

To install pytest, run one of the following:

pip install pytest

python3 -m pip install pytest

To make sure you installed it correctly, run one of the following:

pytest -h

python3 -m pytest -h

You can uninstall a python library by typing into the terminal pip uninstall <library name> or python3 -m pip uninstall <library name>

BYU Pytest Utils

The autograder uses pytest with an extra library containing extra testing utility tools. To run the autograder's tests locally, install the BYU pytest utils library if you have not already:

pip install byu_pytest_utils

python3 -m pip install byu_pytest_utils

Using Pytest

Let's say we are trying to test our functions square() and find_factors() in example.py:

def square(x):
    return x * x

def find_factors(n):
    factors = []
    for i in range(1, n):
        if n % i == 0:
            factors.append(i)

    return factors

We can create and run our tests using pytest by writing test functions in example.py that check if the output matches what we expect. To do this, we use the assert statement. It takes a condition to assert is True, and if it is False it will raise an AssertionError. Optionally, you can provide an error message if the assertion is False:

assert <condition>, "<some error message>"

For example, using the assert statements to create the test_square() and test_find_factors() functions is given below:

def test_square():
    assert square(4) == 16
    assert square(0) == 0
    assert square(1/2) == 0.25, "The square of 1/2 is not 0.25"

def test_find_factors():
    assert find_factors(15) == [1,3,5,15]
    assert find_factors(20) == [1,2,4,5,10,20]

Notice that each of the test functions start with test_* where the * represents any amount of most characters. In order for pytest to realize that these function are used to verify our code, the function name must start with test_*. At this point, we can type one of the following commands into the terminal:

pytest example.py

python3 -m pytest example.py

and it will run all the functions in example.py that start with test_* without us ever needing to call them in our code.

Here's what the output looks like:

We can see that find_factors() does not work.

If we want to display more about the failed test case, we run pytest with the -v option.

pytest example.py -v

python3 -m pytest example.py -v

Additionally, if we only wanted to run one test function from a test file, we can do follow the format of pytest <test_file.py>::<test_function>

pytest example.py::test_square

python3 -m pytest example.py::test_square

Note: If you're having trouble running pytest, try each of the terminal commands. It's possible only one of the commands work for your operating system

Organization with Test Files

When there is a large amount of code in one file, it is worth moving our tests into a different file for better organization. We will move all our tests into test_example.py and import the functions from example.py:

from example import * # from example.py import everything

def test_square():
    assert square(4) == 16
    assert square(0) == 0
    assert square(1/2) == 0.25

def test_find_factors():
    assert find_factors(15) == [1,3,5,15]
    assert find_factors(20) == [1,2,4,5,10,20]

An additional benefit of doing this is that running pytest without specifying a file will cause pytest to automatically run all files in the format test_*.py or *_test.py in the current directory and subdirectories.

Running the command

pytest

python3 -m pytest

will make pytest automatically run test_example.py in this case.

Writing Tests

A very common problem in testing is figuring out how many tests you need and what inputs to test with to verify that some code works properly. On one hand, a programmer can write hundreds of tests for some code to verify it works properly at the tradeoff of the tests taking a lot of resources to execute. On the other, not having enough tests will not take a lot of resources to execute, but may fail to verify that the code works properly. One wants to find a good middle ground, and to do so requires critical thinking about what the code is intended to do. In black-box testing, a programmer only thinks about the code's inputs and the outputs and does not consider the underlying implementation.

For example, the test_square() function follow black-box testing. It provides the square() function with the argument 4 and it expects 16. It does not care if the function does addition, multiplication, or anything else to get the right output -- just as long as the function provides the right output.

Figuring out what types of inputs to test a function is also a hard problem that requires critical thinking about the code you are testing. Generally, there are three criteria of inputs you want to consider - invalid cases, valid cases, and border cases. We will use the following specifications for a factorial function as an example:

The factorial is the product of all positive integers less than or equal to n. Computing the factorial of a negative number is not valid.

Valid Cases:

These are the scenarios where the code is given a good input and executes properly. Here, this is the case where n is positive and returns a number. There should be at least two tests ensuring that given n it provides the correct output. For example, 3! should return 6, and 5! should return 120.

Invalid Cases:

These are the scenarios where the code is given a bad input and fails. Here, this is the case where n is a negative and it raises an error. There should be a test ensuring that an exception is thrown.

Border Cases:

These are cases where behavior changes based on some condition or boundary in the code. For example, there is a boundary at zero. If n is less than zero, then an exception is thrown. If n is greater than or equal to zero, then an answer is computed. There should be three tests ensuring for proper behavior when n is equal to -1, 0, and 1.

Note: In some cases you may be working with a function that performs some mathematical computation like our square() function. Whenever testing a function like this, there is not really a valid or invalid case. Here, you should test positive numbers, negative numbers, zero, and any other potential concerns.

Features of Pytest

Approx

When dealing with floating point numbers (i.e. decimals), computers have a hard time storing particular numbers within memory. For example,

>>> 0.1 + 0.2 == 0.3
False

To compensate for this limitation, pytest has a approx function.

>>> import pytest
>>> 0.1 + 0.2 == pytest.approx(0.3)
True

By default, the tolerance on the approximation is 1e-6. Provide a second argument to change the tolerance.

>>> import pytest
>>> 1.5 + 0.4 == pytest.approx(2)
False
>>> 1.5 + 0.4 == pytest.approx(2, 0.1)
True
>>> 1.5 + 0.6 == pytest.approx(2, 0.1)
True

Raises

Sometimes we design our code to raise errors. To test that our code does that, we can use pytest's raises function.

import pytest

def square_root(x):
    if x < 0:
	    raise ValueError("Negative numbers not allowed")
    return sqrt(x)

def test_square_root_raises_exception():
    with pytest.raises(ValueError):
        square_root(-4)

Required Questions

Write your code in lab12.py and your tests in test_lab12.py

Q1: Product and Summation

Write the tests for the product and summation functions first before writing any code for the function.

Write tests for a function called product that takes in a integer parameter n. product returns the result of 1 · 2 · 3 · ... · n; however, if n is less than one or not an integer, raise a ValueError.

Additionally, write tests for a similar function called summation that takes in a integer parameter n. summation returns the result of 1 + 2 + ... + n; however, if n is less than zero or not an integer, raise a ValueError.

To check if a number is an integer, use the isinstance() function. For example,
>>> value_in_question = 5
>>> isinstance(value_in_question, float)
False
>>> isinstance(value_in_question, int)
True

When writing the tests, make sure to consider all cases. For example, product should do the following:

If n is less than the one or not an integer, raise a ValueError
If n is greater than or equal to one, compute 1 · 2 · ... · n

Write tests that check if your code follows these rules by thinking of what inputs would cause each case.

Make sure to use the raises function that comes with pytest.

After writing the tests, for both functions, implement both functions. When you are done, run one of the following pairs in your terminal:

pytest test_lab12.py::test_summation
pytest test_lab12.py::test_product

python3 -m pytest test_lab12.py::test_summation
python3 -m pytest test_lab12.py::test_product

If you get an error, it is either due to poorly written tests or a poorly written function. If you are confident that your tests are correct, find the bug in the respective function.

Q2: Refactoring Product and Summation

You may have noticed that product and summation are very similar to each other in that they both raise a ValueError if n is less than some number or if n is not an integer. Additionally, both functions take the total of a function (add or multiply) applied on some range of values. Because of this, we can refactor our code so the functions have the same behavior but with a cleaner design.

To refactor our code, create three new functions:

product_short(n) - same behavior as product, but with a cleaner design
summation_short(n) - same behavior as summation, but with a cleaner design
accumulate(merger, initial, n)

accumulate with contain the logic of applying some function merger to intial and to each value in the range from one to n. It will then return the total after merger has been applied to each value. (merger will either be the add or mul functions.) Additionally, if n is less than the initial or not an integer, raise a ValueError. For example,

>>> from operator import add, mul
>>> accumulate(add, 0, 3)  # 0 + 1 + 2 + 3
6
>>> accumulate(add, 2, 3)  # 2 + 1 + 2 + 3
8
>>> accumulate(mul, 2, 4)  # 2 * 1 * 2 * 3 * 4
48
>>> accumulate(mul, 5, 0)  # Raises a ValueError

Write tests for accumulate and then implement accumulate. (Feel free to use the examples given above in addition to the tests you write yourself.)

pytest test_lab12.py::test_accumulate

python3 -m pytest test_lab12.py::test_accumulate

Hint: Using the second example given above, add(2,1) gives 3, then add(3, 2) gives 5, then add(5, 3) gives 10

After implementing accumulate, use the same tests from test_product and test_summation for test_product_short and test_summation_short to ensure that the new versions of each of the functions work the exact same. After that,implement product_short and summation_short by calling accumulate with the right arguments. product_short and summation_short should contain one line each in their function bodies.

pytest test_lab12.py::test_summation_short
pytest test_lab12.py::test_product_short

python3 -m pytest test_lab12.py::test_summation_short
python3 -m pytest test_lab12.py::test_product_short

Q3: Statistics

Your younger sibling (or cousin) was covering statistics in math class today and learned about the mean, median, mode, and standard deviation of a dataset. After working on two problems where they had to calculate each statistic by hand, they had had enough. They chose to write a program with functions that would do their homework for them; however, it does not work 😞. Your sibling has already spent more time trying to debug their program than it would have taken to complete their homework, and they are too tired to keep debugging. Now, they need your help to figure out what is wrong.

Write tests for each function they wrote -- square, sqrt, mean, etc. If the functions fail the tests, try to find the error in their code and fix it.

When fixing errors, do not delete an entire line or rewrite a function. The errors are small and should require you to add, delete, or replace a few things.

Some of their functions may work while others do not. Some functions may rely on other broken functions. To find what the expected outputs should be, rather than calculating them by hand, it is worth searching for a calculator on the web that will do it for you. Down below is a quick review of the mean, median, mode, and standard deviation of a dataset that your sibling (or cousin) used as reference.

Mean

To calculate the mean, find the sum of the dataset and divide it by the size/length of the dataset. For example, if the dataset was [1, 1, 1, 3, 4]. The sum would be 10 and the size would be 5, so the mean would be 10/5 or 2.

Median

The median is the middle value of a sorted dataset. For example, if the dataset was [1, 2, 3, 4, 5] , the median would be 3. If there is no middle value in the dataset because there is an even amount of elements, the median would be the mean/average of the two values closest to the middle. For example, if the dataset was [1, 2, 3, 4, 5, 6], the two values closest to the middle are 3 and 4. Taking the mean/average of those numbers gives 3.5 which would be the median.

Mode

The mode is the most common element in a dataset. For example, if the dataset was [1,2,1,1], the mode of the dataset would be 1 because it appears the most times. If two elements appear the same amount of times, the mode will be (for this lab) the element that appeared the most times first. For example, if the dataset was [1,1,2,2], the mode would be 1.

Standard Deviation

The standard deviation represents the amount of variation of all the values in a dataset. To calculate it, we use the following formula:

$$\sigma = \sqrt{ \frac{\sum (x_i - \mu)^2 }{n} }$$

where

$\sigma$ = standard deviation

$x_i$ = individual data value

$\mu$ = mean

$n$ = dataset's size

We can read this formula as:

For each data value in the dataset
1. Find the data value minus the mean. Square that result, and add it to a sum.
Divide the sum by the size of the dataset.
Take the square root of the result from step 2

Hint: Whenever you are working with floating point numbers, it is good practice to use the approx() function. Additionally, remember that the optional second parameter tolerance will be helpful.

Submit

If you attend the lab, you don't have to submit anything.

If you don't attend the lab, you will have to submit working code. Submit the lab12.py and test_lab12.py files on Canvas to Gradescope in the window on the assignment page.

Grading on Gradescope

If you submit your lab to Gradescope, you will be graded on two things:

Submitting working functions
- This will require you to write tests to identify the bugs in both the functions you write and the starter functions you're given
- This will be graded with regular tests
Submitting passing tests
- You should just submit the tests you wrote as you looked for bugs in the functions
- This will be graded by running your tests to make sure they pass

Normally, the starter files come with the tests that the autograder will run. But in this case, doing so would defeat the purpose of having you write tests in the first place! So, unlike other assignments, you won't be given any tests in the starter files.

Note: Gradescope has two naming conventions. As an example, test_invert will test the actual invert function you submit, and test_test_invert will test the test_invert test you submit.

Optional Questions

Q4: Invert and Change

Write the tests for the invert and change functions first before writing any code for the function

Write the tests for a function invert that takes in a number x and limit as parameters. invert calculates 1/x, and if the quotient is less than the limit, the function returns 1/x; otherwise the function returns limit. However, if x is zero, the function raises a ZeroDivisionError.

Write the tests second function change that takes in numbers x, y and limit as parameters and returns abs(y - x) / x if it is less than the limit; otherwise the function returns the limit. If x is zero, raise a ZeroDivisionError.

Tests for Invert and Change

When writing the tests, make sure to consider all cases. For example, invert should do the following:

If 1/x is less than the limit return 1/x
If 1/x is greater than the limit return limit
If x is zero, raise a ZeroDivisionError

Write tests that check if your code follows these rules by thinking of what inputs would cause each case.

Now implement invert and change.

Check your work and run pytest in the terminal:

pytest

Q5: Refactor

Notice that invert and change have very similar logic in that you are dividing some numerator by x and if the result is greater than the limit then the function returns the limit. Because of this, we can refactor our code so it has the same behavior but with a cleaner design.

To do this we are going to add three new functions:

invert_short - same behavior as invert but designed differently
change_short - same behavior as change but designed differently
limited

limited will have three parameters numerator, denominator and limit. It will contain the logic of dividing a numerator by the denominator, and if the result is greater than the limit then the function returns the limit, and it returns the result otherwise. However, if the denominator is zero, it raises a ZeroDivisionError.

Now have invert_short and change_short call limited appropriately to maintain the same behavior as invert and change.

Note: invert_short and change_short should have only one line in its body

Tests for Refactor

Implement two more test functions test_invert_short and test_change_short that ensures that those two functions behave the same as invert and change.

Check your work and run pytest in the terminal:

pytest

Additional Info

Code Coverage

The effectiveness of tests can be assessed with code coverage, which measures the number of lines of code that have been executed. Ideally, your tests should cover most if not all of your program.

Statement Coverage

The percentage of lines executed in a program is measured with statement coverage. This is the most general type of code coverage, meaning it is the least specific. It only tells us the quantity of code that was reached, and not much else.

Branch Coverage

A more accurate depiction of how extensive your tests are can be measured with branch coverage. We can represent a program with multiple outcomes into a tree with several branches. Here's a simple example:

def greater_than_five(num):
    if num > 5:
        print("This number is greater than five.")
    elif num == 5:
        print("This number is five.")
    else:
        print("This number is less than five.")

Within this function, there are three possible branches we can take, which depend upon the parameter num. Therefore, in order to get 100% branch coverage, we must write at least three tests for greater_than_five(), changing num so that each of the three branches is executed.

Multiple Condition Decision Coverage (MC/DC)

Often, branches of code are dependent on multiple conditions. We can represent all the possible outcomes of this function with a table.

def bool_operators(a,b):
    if a and b:
        print("Both are true!")
    elif a or b:
        print("One is true.")
    else:
        print("Neither are true. :(")

a	b	result
True	True	Both are true!
True	False	One is true.
False	True	One is true.
False	False	Neither are true. :(

In order to get 100% condition coverage, we would need test cases for all four of these combinations of conditions, even though two of them produce the same output.