Code
import warnings
"ignore")
warnings.filterwarnings(
from datascience import *
import numpy as np
import random
import warnings
"ignore")
warnings.filterwarnings(
from datascience import *
import numpy as np
import random
After learning about them in Data 8, Wayne wants to write a function that can calculate the hypotenuse of any right triangle. He wants to use his function to assign C
to the hypotenuse of a right triangle with legs (sides adjacent to the hypotenuse) A
and B
. However, he’s made a few mistakes. Which ones can you identify?
Hint: There are 5 unique issues. Assume that numpy
has been imported as np
.
= 3
A = 4 B
def hypotenuse(a, b)
"""Returns the length of the hypotenuse of a right triangle, the square root of a squared + b squared."""
= make_array(side1, side2) * 2
squares sum = sum(squares)
= np.sqrt(sum)
squareroot print(squareroot)
= hypotenuse(A, B) C
Error 1: the function is missing a colon “:” after the arguments list.
def hypotenuse(a, b)
"""Returns the length of the hypotenuse of a right triangle, the square root of a squared + b squared."""
= make_array(side1, side2) * 2
squares sum = sum(squares)
= np.sqrt(sum)
squareroot print(squareroot)
= hypotenuse(A, B) C
Cell In[3], line 1 def hypotenuse(a, b) ^ SyntaxError: expected ':'
Error 2: squares should be squared with ** not *.
Error 3: We need to be consistent with our argument names so they get accurately assigned throughout the function. We can either replace a
and b
with side1
and side2
, or vice versa.
def hypotenuse(a, b):
"""Returns the length of the hypotenuse of a right triangle, the square root of a squared + b squared."""
= make_array(side1, side2) ** 2
squares sum = sum(squares)
= np.sqrt(sum)
squareroot print(squareroot)
= hypotenuse(A, B) C
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[4], line 7 5 squareroot = np.sqrt(sum) 6 print(squareroot) ----> 7 C = hypotenuse(A, B) Cell In[4], line 3, in hypotenuse(a, b) 1 def hypotenuse(a, b): 2 """Returns the length of the hypotenuse of a right triangle, the square root of a squared + b squared.""" ----> 3 squares = make_array(side1, side2) ** 2 4 sum = sum(squares) 5 squareroot = np.sqrt(sum) NameError: name 'side1' is not defined
Error 4: When we assign sum
to a number we have lost the original behavior of the built-in sum
function. We should not re-assign variable names.
def hypotenuse(a, b):
"""Returns the length of the hypotenuse of a right triangle, the square root of a squared + b squared."""
= make_array(a, b) ** 2
squares sum = sum(squares)
= np.sqrt(sum)
squareroot print(squareroot)
= hypotenuse(A, B) C
--------------------------------------------------------------------------- UnboundLocalError Traceback (most recent call last) Cell In[5], line 7 5 squareroot = np.sqrt(sum) 6 print(squareroot) ----> 7 C = hypotenuse(A, B) Cell In[5], line 4, in hypotenuse(a, b) 2 """Returns the length of the hypotenuse of a right triangle, the square root of a squared + b squared.""" 3 squares = make_array(a, b) ** 2 ----> 4 sum = sum(squares) 5 squareroot = np.sqrt(sum) 6 print(squareroot) UnboundLocalError: cannot access local variable 'sum' where it is not associated with a value
Error 5: The function will print the value of squareroot
but will not return it, which means we will not have access to the value of squareroot
anymore. That is, we will not be able to assign it to any values or use it as the argument to any functions! In this case, C
will not be equal to anything (it will actually be None
)!
def hypotenuse(a, b):
"""Returns the length of the hypotenuse of a right triangle, the square root of a squared + b squared."""
= make_array(a, b) ** 2
squares = sum(squares)
sum_squares = np.sqrt(sum_squares)
squareroot print(squareroot)
= hypotenuse(A, B)
C print(C)
5.0
None
Write a function that takes in the following arguments:
tbl
: a table.col
: a string, name of a column in tbl
.n
: an int.The function should return a table that contains the rows that have the n
largest values for the specified column.
def top_n(tbl, col, n):
= _________________________
sorted_tbl = _________________________
top_n_rows return _________________________
def top_n(tbl, col, n):
= tbl.sort(col, descending = True)
sorted_tbl = sorted_tbl.take(np.arange(n))
top_n_rows return top_n_rows
= Table().with_columns(
table "Some Column", [10, 1, 100, 10000, 1000]
)
table
Some Column |
---|
10 |
1 |
100 |
10000 |
1000 |
"Some Column", 3) top_n(table,
Some Column |
---|
10000 |
1000 |
100 |
Shown below are the chocolates
and nutrition
tables respectively.
chocolates
Color | Shape | Amount | Price ($) |
---|---|---|---|
Dark | Round | 4 | 1.30 |
Milk | Rectangular | 6 | 1.20 |
White | Rectangular | 12 | 2.00 |
Dark | Round | 7 | 1.75 |
Milk | Rectangular | 9 | 1.40 |
Milk | Round | 2 | 1.00 |
nutrition
Type | Calories |
---|---|
Dark | 120 |
Milk | 130 |
White | 115 |
Ruby | 120 |
= Table().with_columns(
chocolates 'Color', ['Dark', 'Milk', 'White', 'Dark', 'Milk', 'Milk'],
'Shape', ['Round', 'Rectangular', 'Rectangular', 'Round', 'Rectangular', 'Round'],
'Amount', [4, 6, 12, 7, 9, 2],
'Price ($)', [1.30, 1.20, 2.00, 1.75, 1.40, 1.00]
)
= Table().with_columns(
nutrition 'Type', ['Dark', 'Milk', 'White', 'Ruby'],
'Calories', [120, 130, 115, 120]
)
Match the following table method calls to the resulting descriptions of tables.
Hint: Pay attention to the column names of the resulting tables! For example, what happens when you only specify a column name(s) in .group()
? What happens to the column names when you specify an aggregating function in .group()
?
Letter | Function Call |
---|---|
A | chocolates.group("Shape") |
B | chocolates.group("Shape", max) |
C | chocolates.group(make_array("Shape", "Color"), max) |
D | chocolates.pivot("Color", "Shape", "Price ($)", max) |
E | chocolates.join("Color", nutrition, "Type") |
F | chocolates.group(make_array("Shape", "Color")) |
Number | Columns | # of Rows |
---|---|---|
1 | Shape, Color max, Amount max, Price ($) max | 2 |
2 | Shape, Dark, Milk, White | 2 |
3 | Shape, Color, Amount max, Price ($) max | 4 |
4 | Color, Shape, Amount, Price ($), Calories | 6 |
5 | Shape, count | 2 |
6 | Shape, Color, count | 4 |
A: ____________ C: ____________ E: ____________
B: ____________ D: ____________ F: ____________
A - 5: .group()
with no aggregating function yields a table with just the column that was grouped on, and a count column.
"Shape") chocolates.group(
Shape | count |
---|---|
Rectangular | 3 |
Round | 3 |
B - 1: .group()
with an aggregating function yields the grouped column, and the other columns of the original table with the function name added at the end of each one.
"Shape", max) chocolates.group(
Shape | Color max | Amount max | Price ($) max |
---|---|---|---|
Rectangular | White | 12 | 2 |
Round | Milk | 7 | 1.75 |
C - 3: .group()
with multiple columns and an aggregating function yields the columns that were grouped on, and the remaining columns with the function name added at the ends.
"Shape", "Color"), max) chocolates.group(make_array(
Shape | Color | Amount max | Price ($) max |
---|---|---|---|
Rectangular | Milk | 9 | 1.4 |
Rectangular | White | 12 | 2 |
Round | Dark | 7 | 1.75 |
Round | Milk | 2 | 1 |
D - 2: .pivot()
yields the second argument to .pivot()
(“Shape”), as well as all unique values in the column of the first argument (“Dark”, “Milk”, “White” from the “Color” column) as columns in the resulting table.
"Color", "Shape", "Price ($)", max) chocolates.pivot(
Shape | Dark | Milk | White |
---|---|---|---|
Rectangular | 0 | 1.4 | 2 |
Round | 1.75 | 1 | 0 |
E - 4: .join()
gives you all the columns from the two tables, except for the extra column that is being used to join the two tables (“Type” is dropped in the resulting table).
"Color", nutrition, "Type") chocolates.join(
Color | Shape | Amount | Price ($) | Calories |
---|---|---|---|---|
Dark | Round | 4 | 1.3 | 120 |
Dark | Round | 7 | 1.75 | 120 |
Milk | Rectangular | 6 | 1.2 | 130 |
Milk | Rectangular | 9 | 1.4 | 130 |
Milk | Round | 2 | 1 | 130 |
White | Rectangular | 12 | 2 | 115 |
F - 6: .group()
with multiple columns and no aggregating function yields the columns that were grouped on, and a count column.
"Shape", "Color")) chocolates.group(make_array(
Shape | Color | count |
---|---|---|
Rectangular | Milk | 2 |
Rectangular | White | 1 |
Round | Dark | 2 |
Round | Milk | 1 |
The table squirrel
below contains some information on reported squirrel sightings across the UC Berkeley campus. Each row in the squirrel
table represents one unique squirrel sighting:
= [2937, 8421, 472, 239, 2937]
squirrel_ids = ["Wheeler Hall", "East Asian Library", "Etcheverry Hall", "Campbell Hall", "Moffitt Library"]
locations = [17, 28, 7, 4, 7]
days = [3, 9, 1, 10, 6]
months = [2024, 2022, 2024, 2023, 2021]
years
= [
location_pool "Wheeler Hall", "East Asian Library", "Etcheverry Hall",
"Campbell Hall", "Moffitt Library", "Doe Library",
"Cory Hall", "Soda Hall", "Evans Hall", "Haas Pavilion",
"Stanley Hall", "Physics North", "Physics South"
]
for _ in range(995):
100, 9999))
squirrel_ids.append(random.randint(
locations.append(random.choice(location_pool))1, 28))
days.append(random.randint(1, 12))
months.append(random.randint(2021, 2022, 2023, 2024]))
years.append(random.choice([
= Table().with_columns(
squirrel "Squirrel ID", squirrel_ids,
"Location", locations,
"Day", days,
"Month", months,
"Year", years
)
5) squirrel.show(
Squirrel ID | Location | Day | Month | Year |
---|---|---|---|---|
2937 | Wheeler Hall | 17 | 3 | 2024 |
8421 | East Asian Library | 28 | 9 | 2022 |
472 | Etcheverry Hall | 7 | 1 | 2024 |
239 | Campbell Hall | 4 | 10 | 2023 |
2937 | Moffitt Library | 7 | 6 | 2021 |
... (995 rows omitted)
Write a line of code that evaluates to the proportion of Squirrel IDs in the table that are even.
np.mean(squirrel.column("Squirrel ID") % 2 == 0)
(or any equivalent code)
"Squirrel ID") % 2 == 0) np.mean(squirrel.column(
0.48899999999999999
Jessica wants to find the best location where she is most likely to find a squirrel. Write a line of code that evaluates to the location with the most squirrel sightings.
"Location").sort("count", descending = True).column("Location").item(0) squirrel.group(
'East Asian Library'
Jessica is interested in how many squirrels were sighted at every location during every month. Create a table called sightings
where each cell contains the number of squirrel sightings that occurred in 2023 at each location during each month. Note: Each row should be in a different location.
= _________________________
squirrels_2023 = _________________________ sightings
= squirrel.where("Year", are.equal_to(2023))
squirrels_2023 = squirrels_2023.pivot("Month", "Location") sightings
sightings
Location | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Campbell Hall | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 2 | 3 | 6 | 2 | 3 |
Cory Hall | 3 | 1 | 2 | 2 | 0 | 1 | 0 | 0 | 2 | 2 | 0 | 1 |
Doe Library | 3 | 5 | 0 | 0 | 1 | 1 | 2 | 1 | 0 | 4 | 2 | 4 |
East Asian Library | 5 | 1 | 1 | 0 | 3 | 6 | 0 | 3 | 2 | 0 | 1 | 1 |
Etcheverry Hall | 1 | 1 | 1 | 2 | 1 | 0 | 1 | 0 | 1 | 0 | 3 | 2 |
Evans Hall | 2 | 0 | 3 | 1 | 2 | 2 | 1 | 1 | 3 | 1 | 2 | 1 |
Haas Pavilion | 1 | 0 | 2 | 4 | 3 | 2 | 0 | 1 | 2 | 0 | 2 | 1 |
Moffitt Library | 2 | 0 | 1 | 0 | 0 | 3 | 1 | 0 | 1 | 2 | 0 | 3 |
Physics North | 3 | 2 | 1 | 2 | 1 | 2 | 1 | 0 | 2 | 1 | 3 | 1 |
Physics South | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 3 | 4 | 2 | 2 | 2 |
... (3 rows omitted)
Jessica now has access to another table, species
, that contains information about the species of each squirrel. Some of the rows from this table are shown below.
= [
species_pool "eastern gray squirrel",
"western gray squirrel",
"fox squirrel",
"Douglas squirrel",
"red squirrel"
]
= [2937, 8421, 472, 239]
squirrel_ids_species = [
species_array "eastern gray squirrel",
"fox squirrel",
"western gray squirrel",
"western gray squirrel"
]
for sid in squirrel.column("Squirrel ID")[4:]:
squirrel_ids_species.append(sid)
species_array.append(random.choice(species_pool))
= Table().with_columns(
species "Squirrel ID", squirrel_ids_species,
"Species", species_array
)
4) species.show(
Squirrel ID | Species |
---|---|
2937 | eastern gray squirrel |
8421 | fox squirrel |
472 | western gray squirrel |
239 | western gray squirrel |
... (996 rows omitted)
Write lines of code to find the least observed species in 2024. If multiple species are tied for the least sightings, find the species that comes first alphabetically.
= _________________________
squirrels_2024 = _________________________
species_counts = _________________________ least_observed
= squirrel.where("Year", 2024)
squirrels_2024 = squirrels_2024.join("Squirrel ID", species).group("Species")
species_counts = species_counts.sort("Species").sort("count").column("Species").item(0) least_observed
least_observed
'red squirrel'