Lab 22 - Plotting With MatPlotLib

Due by 11:59pm on 2023-04-11.

Starter Files

Download lab22.zip. Inside the archive, you will find starter files for the questions in this lab.

Topics

Being able to visualize big datasets can help us recognize patterns. Fortunately, the MatPlotLib library helps us to create all sorts of graphs that can be used to do visualize data. We are going to be using the dataset from HW1 and brown dwarfs to demonstrate creating plots, scatter plots, and saving them as PNG files.

Installing MatPlotLib (#installing-matplotlib)

Try one of the following to install matplotlib:

pip install matplotlib
python3 -m pip install matplotlib

Remember, you can always uninstall a library by doing pip uninstall <library_name> or python3 -m pip uninstall <library_name>

To test if you did it right, paste the following code and run it:

import matplotlib.pyplot as plt

x_points = [1,5]
y_points = [1,5]

plt.plot(x_points, y_points)
plt.show()

A graph similar to the following should show up.

plot example

MatPlotLib Review

There are various graphs and plots that you can graph using MatPlotLib

Plot Type Example Code
plot
x_points = [1, 5]
y_points = [1, 5]

plt.plot(x_points, y_points)
plt.show() # display the plot
scatter
x_points = [1, 2, 3, 4, 5]
y_points = [1, 3, 2, 4, 5]

plt.scatter(x_points, y_points)
plt.show() # display the plot
bar
categories = ['A', 'B', 'C', 'D']
y_points = [5, 1, 3, 1]

plt.bar(categories, y_points)
plt.show() # display the plot
histogram
frequencies = [
1,1,1,1,1,1, # 6 ones
2,2,2, # 3 twos
3, # 1 three
4,4, # 2 fours
5 # 1 five
]

plt.hist(frequencies)
plt.show() # display the plot
pie
counts = [4, 1, 2, 3]

plt.pie(counts)
plt.show() # display the plot

Saving a Graph to a File and Clearing

If you want to save a graph to some output fie, then you can use .savefig() method. For example,

import matplotlib.pyplot as plt

x_points = [1,5]
y_points = [1,5]

plt.plot(x_points, y_points)
plt.savefig("output_file.png") # <----------

If you are trying to save to some amount files several different graphs and plots, make sure to clear it after you are done creating the figure and before you start creating the next one. Not doing so, like the code below, will generate a image with mixed plots.

Code:

import matplotlib.pyplot as plt

x_points = [1,5]
y_points = [1,5]

plt.plot(x_points, y_points)
plt.savefig("output_file1.png")

plt.scatter(x_points, y_points)
plt.savefig("output_file2.png")

counts = [4, 1, 2, 3]

plt.pie(counts)
plt.savefig("output_file3.png")

Output of 'output_file3.png':

You can prevent this by using plt.clf to clear the figure.

import matplotlib.pyplot as plt

x_points = [1,5]
y_points = [1,5]

plt.plot(x_points, y_points)
plt.savefig("output_file1.png")
plt.clf() # <--------------------

plt.scatter(x_points, y_points)
plt.savefig("output_file2.png")
plt.clf() # <--------------------

counts = [4, 1, 2, 3]

plt.pie(counts)
plt.savefig("output_file3.png")
plt.clf() # <--------------------

Creating a Graph with Multiple Lines

If you want to create a graph with multiple lines, plot the y points of each line while using the same x points list. For example,

import matplotlib.pyplot as plt

x_points = [0, 1, 2, 3, 4, 5]
line1_y_points = [6, 7, 8, 9, 10, 10]
line2_y_points = [1, 2, 3, 4, 5, 6]
# more line_y_points

plt.plot(x_points, line1_y_points)
plt.plot(x_points, line2_y_points)
# more line plots
plt.show()

Colors

To add colors, you can provide an additional argument when ploting. For example, if we wanted to make the line red, we can use 'r' when plotting,

x_points = [0, 1, 2, 3, 4, 5]
y_points = [6, 7, 8, 9, 10, 10]

plt.plot(x_points, y_points, 'r')
# x_list y_list color
Color Syntax
Red 'r'
Green 'g'
Blue 'b'
Cyan 'c'
Magenta 'm'
Yellow 'y'
Black 'k'
White 'w'

Labels

Note: TAs should skip this section if there is not enough time.

To add labels, we can use

  • plt.title(<string>)
  • plt.xLabel(<string>)
  • plt.yLabel(<string>)

Each of these functions take in a string to use.

For example,

import matplotlib.pyplot as plt

x_points = [0, 1, 2, 3, 4, 5]
y_points = [6, 7, 8, 9, 10, 10]

plt.plot(x_points, y_points)

plt.title("Your Coolness")
plt.xlabel("Months")
plt.ylabel("Coolness Level")

plt.show()

Legends

Note: TAs should skip this section if there is not enough time.

If we want to give a legend detailing what each line describes, when we plot it we have to provide a label argument set to a string. Once that is done, we can use plt.legend() to create a legend. For example,

import matplotlib.pyplot as plt

x_points = [0, 1, 2, 3, 4, 5]
y1_points = [6, 7, 8, 9, 10, 10]
y2_points = [4, 5, 6, 5, 7, 10]

plt.plot(x_points, y1_points, label="Isaih") # <------------
plt.plot(x_points, y2_points, label="Jake") # <------------

plt.title("TA Coolness Levels")
plt.xlabel("Months")
plt.ylabel("Coolness Level")

plt.legend() # <----------------

plt.show()

Additional Info About Histograms

Note: TAs should skip this section if there is not enough time.

For project 4, you will have to specify the number of bins wanted in your histogram graph. Recall that a histogram details the frequency of some value, like numbers.

Number of Bins

By default, when you create a histogram, the graph will have 10 bins. If you want to change the number of bins, provide a second argument when ploting the histogram. For example, if we provided a 3 as the second argument, all the data would now go into 3 different bins.

import matplotlib.pyplot as plt

frequencies = [1,1,1,1,1,1, 2,2,2, 3, 4,4, 5]
plt.hist(frequencies, 3)
plt.show()

Compare this histogram to the histogram plotted without the 3 argument.

Bin Ranges

Additionally, you can specify the range of each bin by passing in a list of of numbers. For example, if bins is [1, 2, 3, 4, 5, 6], the first bin would be between 1 and 2 (excluding 2); the second bin would be between 2 and 3 (excluding 3); The third bin would be between 3 and 4 (excluding 4); etc.

import matplotlib.pyplot as plt

frequencies = [1,1,1,1,1,1, 2,2,2, 3, 4,4, 5]
plt.hist(frequencies, [1, 2, 3, 4, 5, 6])
plt.show()

Demo this code and play with changing the bin ranges

Return Values of hist()

Additionally, the hist() function returns three things. For this class, we will only use the first two. The first item is a list of the number of items within each bin. The second item is a list of the bin values.

For example,

import matplotlib.pyplot as plt
frequencies = [
1,1,1,1,1,1, # 6 ones
2,2,2, # 3 twos
3, # 1 three
4,4, # 2 fours
5 # 1 five
]
bin_counts, bin_nums, item = plt.hist(frequencies, [1, 2, 3, 4, 5, 6])
print(bin_counts)
print(bin_nums)

Would result in:

[6. 3. 1. 2. 1.]
[1. 2. 3. 4. 5. 6.]

(Note that each number in the lists are floats)

Required Questions

Plot Party In Provo!

Q1: GPA and SAT Histograms

Open and read admission-algorithms-dataset.csv and store each of the students' GPA and SAT into their own respective lists. Using those two lists, generate two histograms with one histogram displaying data of all the students' GPA and the other displaying all the students' SAT data. The GPA histogram should be saved in a file called gpa.png, and the SAT histogram should be saved in a file called sat_score.png.

Recall that the file is organized like the following

Student,SAT,GPA,Interest,High School Quality,Sem1,Sem2,Sem3,Sem4

Write your code under the plot_histogram function.

Recall the .split(<delimiter>) method of a string and remember to convert the strings that are representing numbers into floats.

Q2: Correlation between GPA and SAT

Using the same GPA and SAT lists from the previous problem, create a scatter plot between those two lists with the GPA as the x-axis and SAT as the y-axis. Save the graph to a file called correlation.png.

Are there outliers? What is off about those students' stats?

Q3: Spectra

Open and read spectrum1.txt and spectrum2.txt. These files will contain two columns representing the wavelength and flux respectively (which detail the intensity of light of an astronomical object). On a single graph, plot both datasets as a line plot with different colors. Data from spectrum1.txt will be blue, and data from spectrum2.txt will be green. Wavelength should be on the x-axis, and Flux should be on the y-axis. Save the final graph as spectra.png.

Note: The data in the files are separated by four spaces. To separate a line into its separate number components, use line.split() to get a list with the two numbers represented as strings

For those interested, these are models of brown dwarfs, which are objects bigger than planets but not quite big enough to start hydrogen fusion in their cores and become stars. Both have surface temperatures of 1700 Kelvin (about 2600 degrees Fahrenheit). Spectrum 1 has no silicate clouds and spectrum 2 has fairly dense silicate clouds. The spectra look similar but there are differences in the wavelength where the flux is emitted because while they emit the same amount of energy, the clouds block light in some regions so the energy has to come out somewhere else.

Submit

Submit the lab22.py file on Canvas to Gradescope in the window on the assignment page.


© 2023 Brigham Young University, All Rights Reserved