20 Discussion 05: Transformation, Probability & Sampling (From Summer 2025)

Slides

20.3 Simple Linear Regression

Lillian and Prabhleen were watching their favorite chemistry Youtuber NileRed experimenting with turning gloves into grape soda and wanted to try it themselves. The experiment was done at various temperatures and yielded various amounts of grape soda. Since this reaction is very costly, they were only able to do it 10 times. This data set of size \(n = 10\) (Yield data) contains measurements of yield from an experiment done at five different temperature levels. The variables are \(y\) = yield in liters and \(x\) = temperature in degrees Fahrenheit. Below is a scatter plot of our data.

\(\sigma_x\)	\(\sigma_y\)	\(r\)	\(\bar{x}\)	\(\bar{y}\)
\(15\)	\(0.3\)	\(0.50\)	\(75.00\)	\(3\)

20.3.1 (a)

Given the above statistics, calculate the slope (\(\hat{\theta}_1\)) and y-intercept (\(\hat{\theta}_0\)) of the line of best fit using Mean Squared Error (MSE) as our loss function and :

\[ y = \hat{\theta}_0 + \hat{\theta}_1 x \]

Answer

\(\hat{\theta}_1 = r \frac{\sigma_y}{\sigma_x}\)

\(\hat{\theta}_1 = 0.5 \frac{0.3}{15} = 0.01\)

\(\hat{\theta}_0 = \bar{y} - \hat{\theta}_1 \bar{x}\)

\(\hat{\theta}_0 = 3 - 75.00*0.01 = 3 - 0.75 = 2.25\)

Note that the values of \(\sigma_y\) and \(\sigma_x\) are slightly different from what the table above shows, this was modified to make calculations slightly easier during discussion. This is why the intercept and slope of the line in the plot above do not match up exactly with the values we calculate.

20.3.2 (b) (Extra)

Below, you can find a plot of the residuals from the line of best fit you calculated in part (a). What does the residual plot tell us about the relationship between x and y?

Answer

The plot of the residuals is not equally variable across all values of \(x\). This means that there is heteroscedasticity in our residuals. Thus, the relationship between \(x\) and \(y\) is likely not linear. \(y\) is likely not linear in terms of \(x\).

20.3.3 (c) (Extra)

Which of the following relations most closely represent the relationship we see between Temperature (\(x\)) and Yield (\(y\))?

\(y = \theta_2 x^2\)
\(y = \theta_2 x^2 + \theta_1 x + \theta_0\)
\(y = \theta_1\log{x} + \theta_0\)
\(y = \theta_1x + \theta_0\)

Answer

\(y = \theta_2 x^2 + \theta_1 x + \theta_0\)

Based on the shape of the original and residual plots, we can see that the graph follows a curved, or quadratic (\(x^2\)), pattern as opposed to a linear one (\(x\)) or a logarithmic one (\(\log{x}\)). To choose between choice A or B, we can look at the intercept of the original graph; because all values have a yield above 2.4, we expect the intercept \(\theta_0\) to be a nonzero value, so we choose B instead of A.

20 Discussion 05: Transformation, Probability & Sampling (From Summer 2025)

Slides

20.1 Logarithmic Transformations

20.1.1 (a)

20.1.2 (b)

20.1.3 (c)

20.1.4 (d)

20.1.5 (e)

20.1.6 (f) (Extra)

20.2 Data Collection through Sampling

20.2.1 (a)

20.2.2 (b)

20.3 Simple Linear Regression

20.3.1 (a)

20.3.2 (b) (Extra)

20.3.3 (c) (Extra)