Data 8 Discussion Notes
Introduction
Welcome to Data 8! One of the most popular and largest classes at UC Berkeley, serving as an introduction to courses such as CS61A and Data 100!
This chapter offers supplementary resources to accompany Wesley’s discussions presented in the Fall 2025 iteration of the UC Berkeley course Data 8: Principles and Techniques of Data Science.
Materials will be updated each week to accompany live discussions, which is why you may see some notes from the Summer 2025 iteration. See my personal website for all slides I have created and presented in the past.
Learning Topics
As mentioned in the course catalog:
Foundations of data science from three perspectives: inferential thinking, computational thinking, and real-world relevance. Given data arising from some real-world phenomenon, how does one analyze that data so as to understand that phenomenon? The course teaches critical concepts and skills in computer programming and statistical inference, in conjunction with hands-on analysis of real-world datasets, including economic data, document collections, geographical data, and social networks. It delves into social and legal issues surrounding data analysis, including issues of privacy and data ownership.
More specifically, you will be learning topics including, but not limited to, the following:
- Python (specifically from the
datascience
library) - Visualizations (such as histograms, scatter plots, line plots, etc.)
- Probability
- Simulation and Hypothesis Testing
- Central Limit Theorem (CLT)
- Linear Regression
- k-Nearest Neighbor (kNN)
Important Websites
You might want to bookmark the following websites:
- Course Website
- Course Textbook
- My Personal Website (where slide decks are located)
- My Discussion Notes Website
General Tips
If you are wondering my tips for suceeding in the course:
For the Midterm and Final:
Start reviewing early and avoid cramming. Consistent, spaced practice is much more effective than last-minute studying. Here are some strategies to help you prepare:
- Create a study schedule: Break down your review into manageable chunks over several days or weeks. Allocate time for each topic and stick to your plan.
- Practice with past exams: Simulate exam conditions by timing yourself and working without notes. Focus on Jeremy’s recent exams first (Su25, Sp25, Fa24, Su24), then expand to older ones if you have time. After finishing, review your mistakes and understand why you got them wrong.
- Active recall and self-testing: Instead of just reading solutions, try to solve problems from discussion worksheets, homework, labs, and projects on your own. Write out your reasoning and check your answers afterward. Knowing how to solve a question right after reading the solution is different from knowing how to solve a similar question without looking at the solution!
- Review concepts, not just procedures: If you get stuck, revisit lecture videos, slides, or the textbook to clarify underlying concepts. Understanding the ‘why’ behind each method will help you tackle unfamiliar questions.
- Join study groups or attend review sessions: Explaining concepts to others and hearing different perspectives can deepen your understanding and reveal gaps in your knowledge.
- Go over the reference sheet: Looking through formulas, definitions, and key concepts helps reinforce your memory and gives you a resource to use during the exam.
Remember, there is only so much we can test you on, so don’t stress too much. A bad score does not mean that you cannot do well in the class. The exam has been lighter in difficulty compared to previous iterations of the course. Focus on understanding the core concepts and practicing problem-solving, rather than memorizing details. Consistent, active review will help you build confidence and perform your best on exam day.
For Labs, Homeworks, and Projects:
Prioritize assignments in this order: labs, homeworks, and projects. Labs are designed to reinforce concepts from lecture and are usually more straightforward, making them a great starting point. Completing labs first will help you build a solid foundation for tackling homeworks, which tend to be more challenging and require deeper understanding. Projects are typically the most complex and time-consuming, so it’s beneficial to approach them after you’ve mastered the material from labs and homeworks.
- Labs: Focus on understanding each step and the reasoning behind the code, rather than just getting the correct answer.
- Homeworks: Try to solve each question independently before seeking help, as this will strengthen your problem-solving skills. If you get stuck, review related lecture materials or discuss the problem with classmates.
- Projects: Start early, break the project into manageable sections, and set milestones to track your progress. Collaborate with peers when allowed, but ensure you understand every part of your submission. Document your code and thought process.
If you encounter a question you don’t know how to solve, start by reviewing the relevant lecture slides for a quick refresher. The textbook is also a valuable resource, as lecture slides are often based on its content and can provide additional explanations and examples. Don’t hesitate to attend office hours—TAs and tutors are there to help you succeed and can offer guidance on difficult topics or assignments.
Consistent engagement with these resources and assignments will help you stay on track and deepen your understanding of the material.
Special Tricks
Below are some special tricks that I discovered and used while I was a student and later on the course staff for this course:
- Both
with_column
andwith_columns
work.with_column
takes only two arguments: the first is the column name, and the second is the column array. In contrast,with_columns
can take an arbitrary number of pairs of column names and arrays. - Refrain from using concepts such as recursion,
while
loops, or list comprehensions, as these will not be taught in this course. Using them in assignments or exams may result in points not being awarded. - Causation is a very sensitive and strong claim to make in statistics. You should remember under which scenarios we can say that our study or experiment implies it.
- Functions and methods that I have rarely seen used are:
Table().read_table(filename)
(useful in certain assignments),tbl.num_columns
,tbl.labels
,tbl.bin(column_name_or_index)
/tbl.bin(column_name_or_index, bins)
,tbl.split(n)
,str.split(separator)
,str.join(array)
,str.replace(old_string, new_string)
,np.diff(array)
, andminimize(function)
(useful in kNNs and linear regression).- Disclaimer: You should still be familiar with these methods and functions; this is only a general tip from me.
- When your code is producing errors, chances are you are encountering one of these problems:
- Forgetting to close a set of parentheses.
- Forgetting to close a string literal.
- Naming a variable differently than intended.
- Not using table operations correctly.
- Calling table operations on
tbl.show()
. - Not
return
ing anything orprint
ing instead ofreturn
ing. - Passing an incorrect set of arguments as the parameter of a function.
- Performing the wrong set of operations on a data type.
- Not understanding exactly what each function does (what it takes in and what it returns).
Future Course Suggestions
I have personally taken a variety of upper-division CS and DS courses, so if you are looking for future course suggestions, here are my recommendations:
- Interested in the coding aspect of the course:
- Interested in the data analysis aspect of the course:
- Interested in the machine learning aspect of the course:
- Interested in the probability aspect of the course:
I personally don’t think CS 10 and Data 6 are worth the time to study, especially after this class.
Letter of Recommendation
Over the past few semesters as a TA for Data 8, I have received a few requests for letters of recommendation. My thoughts are as follows:
- I am happy to write one, but I don’t think I am the best person to provide a strong letter of recommendation. I would say Jeremy’s letter of recommendation is much stronger than mine.
- However, if you believe I am the right person for the job, feel free to let me know or email me.