Code
import numpy as np
from datascience import *
%matplotlib inline
= Table().with_columns(
rent "Dollars", np.append(np.append(np.append(np.ones(15) * 600, np.ones(25) * 900), np.ones(40) * 1100), np.ones(20) * 1400)
)
The table below shows the distribution of rents paid by students in Boston. The first column consists of ranges of monthly rent, in dollars. Ranges include the lower bound but not the upper bound. The second column shows the percentage of students who pay rent in each of the ranges.
import numpy as np
from datascience import *
%matplotlib inline
= Table().with_columns(
rent "Dollars", np.append(np.append(np.append(np.ones(15) * 600, np.ones(25) * 900), np.ones(40) * 1100), np.ones(20) * 1400)
)
500-800: 0.050% per dollar
800-1000: 0.125% per dollar
1000-1200: 0.200% per dollar
1200-1600: 0.050% per dollar
Draw a histogram of the data. Make sure you label your axes!
"Dollars", bins = [500, 800, 1000, 1200, 1600]) rent.hist(
True or False: If we combine the [500, 800) and [800, 1000) bins together, the height of the new bin would be greater than the heights of both of the old bins. Please explain your answer.
False: When we combine bins together, the height of the new bin is the weighted average of the old bin heights. Thus, the new bin height will be greater than the [500, 800) bin, but less than the [800, 1000) bin. If we calculate the new height, it will be:
height = \(\frac{area}{width} = \frac{40\%}{(\$800 - \$500) + (\$1000 - \$800)} = 0.08\%\) per dollar
"Dollars", bins = [500, 1000, 1200, 1600]) rent.hist(
Samiksha’s favorite activity to celebrate Fridays is buying pastries at Sheng Kee before class. She stores her purchase data in a table, pastries
, to keep track of her spending. Each row represents an individual purchase. The first few rows look like this:
= Table().with_columns(
pastries 'item', ['Hot Dog Bun', 'Yudane Milk Bun', 'Summer Romance', 'Pineapple Bun', 'Ham and Cheese Croissant'],
'category', ['Savory', 'Sweet', 'Sweet', 'Sweet', 'Savory'],
'price', [2.75, 2.99, 2.79, 2.45, 3.15],
'satisfaction', [8.5, 9.0, 10.0, 7.75, 7.25]
)
pastries
item | category | price | satisfaction |
---|---|---|---|
Hot Dog Bun | Savory | 2.75 | 8.5 |
Yudane Milk Bun | Sweet | 2.99 | 9 |
Summer Romance | Sweet | 2.79 | 10 |
Pineapple Bun | Sweet | 2.45 | 7.75 |
Ham and Cheese Croissant | Savory | 3.15 | 7.25 |
The table has 4 columns:
Write a line of code to calculate the total amount Samiksha spent on pastries. Assume all of her pastry purchases are recorded in the table.
sum(pastries.column('price'))
14.130000000000001
Write a line of code to calculate the average satisfaction Samiksha felt after eating sweet pastries.
__________(pastries.__________(__________).column(__________))
'category', are.equal_to('Sweet')).column('satisfaction')) np.mean(pastries.where(
8.9166666666666661
Samiksha’s budget is getting tight, and she wants to buy pastries that will give her the most satisfaction per dollar. Write lines of code that will help us achieve this.
First, create an array that contains each purchase’s satisfaction per dollar. Then, add a new column called “satisfaction per $”, to the pastries
table. (Hint: You can calculate a purchase’s satisfaction per dollar by dividing its satisfaction score by its price.)
= pastries.__________(__________) / pastries.__________(__________)
score_array = __________.with_column(__________, __________) pastries
= pastries.column('satisfaction') / pastries.column('price')
score_array = pastries.with_column('satisfaction per $', score_array) pastries
pastries
item | category | price | satisfaction | satisfaction per $ |
---|---|---|---|---|
Hot Dog Bun | Savory | 2.75 | 8.5 | 3.09091 |
Yudane Milk Bun | Sweet | 2.99 | 9 | 3.01003 |
Summer Romance | Sweet | 2.79 | 10 | 3.58423 |
Pineapple Bun | Sweet | 2.45 | 7.75 | 3.16327 |
Ham and Cheese Croissant | Savory | 3.15 | 7.25 | 2.30159 |
Samiksha is interested in finding the pastries in the table with the top 3 satisfaction values per dollar. Write code that will output the names of these items as an array.
= pastries.__________(__________, __________)
pastries_sorted pastries_sorted.__________(__________).column(__________)
= pastries.sort('satisfaction per $', descending = True)
pastries_sorted 3)).column('item') pastries_sorted.take(np.arange(
array(['Summer Romance', 'Pineapple Bun', 'Hot Dog Bun'],
dtype='<U24')
The table insurance
contains one row for each beneficiary that is covered by a particular insurance company:
= Table.read_table("insurance.csv")
insurance 3) insurance.show(
age | bmi | smoker | region | cost |
---|---|---|---|---|
25 | 20.8 | no | southwest | 3208.79 |
25 | 30.2 | yes | southwest | 33900.7 |
62 | 32.1 | no | northeast | 1355.5 |
... (20198 rows omitted)
The table contains five columns:
In each part below, fill in the blanks to achieve the desired outputs.
A scatter plot comparing the amount paid last year vs. BMI (titles are usually written as Y vs. X) for only the beneficiaries whose costs exceeded $25,000. Each dot on the scatter plot should represent one beneficiary.
= __________.__________(__________, __________)
high_cost __________.__________(__________, __________)
= insurance.where("cost", are.above(25000))
high_cost "bmi", "cost") high_cost.scatter(
Write a function that takes an age as an argument, and returns the average BMI among all beneficiaries of that age.
def average_bmi(age):
= insurance.where(__________, __________)
right_age = right_age.__________(__________)
bmis = sum(bmis) / len(bmis)
avg __________
def average_bmi(age):
= insurance.where("age", age)
right_age = right_age.column("bmi")
bmis = sum(bmis) / len(bmis)
avg return avg
30) average_bmi(
28.487799043062214