Data 100 Discussion Notes

Introduction

Welcome to Data 100! One of the most popular upper division CS/DS classes at UC Berkeley!

This chapter offers supplementary resources to accompany Wesley’s discussions presented in the Spring 2025 iteration of the UC Berkeley course Data 100.

Materials will no longer be updated as frequently since I have switched to teaching Data 8. The material shown here is intended for my current students in other courses to spark interest, which is why you may see some notes from the Summer 2025 iteration. See my personal website for all slides I have created and presented in the past.

Learning Topics

As mentioned in the course catalog:

In this course, students will explore the data science lifecycle, including question formulation, data collection and cleaning, exploratory data analysis and visualization, statistical inference and prediction​, and decision-making.​ This class will focus on quantitative critical thinking​ and key principles and techniques needed to carry out this cycle. These include languages for transforming, querying and analyzing data; algorithms for machine learning methods including regression, classification and clustering; principles behind creating informative data visualizations; statistical concepts of measurement error and prediction; and techniques for scalable data processing.

More specifically, you will be learning topics including, but not limited to, the following:

  • pandas (industry version of the datascience library)
  • Regex
  • More visualizations
  • Ordinary Least Squares
  • SQL
  • Logistic Regression
  • Principal Component Analysis

Important Websites

You might want to bookmark the following websites:

General Tips

If you are wondering my tips for suceeding in the course:

For the Midterm and Final:

Start reviewing early and avoid cramming. Consistent, spaced practice is much more effective than last-minute studying. Here are some strategies to help you prepare:

  • Create a study schedule: Break down your review into manageable chunks over several days or weeks. Allocate time for each topic and stick to your plan.
  • Practice with past exams: Simulate exam conditions by timing yourself and working without notes. Focus on Jeremy’s recent exams first (Su25, Sp25, Fa24, Su24), then expand to older ones if you have time. After finishing, review your mistakes and understand why you got them wrong.
  • Active recall and self-testing: Instead of just reading solutions, try to solve problems from discussion worksheets, homework, labs, and projects on your own. Write out your reasoning and check your answers afterward. Knowing how to solve a question right after reading the solution is different from knowing how to solve a similar question without looking at the solution!
  • Review concepts, not just procedures: If you get stuck, revisit lecture videos, slides, or the textbook to clarify underlying concepts. Understanding the ‘why’ behind each method will help you tackle unfamiliar questions.
  • Join study groups or attend review sessions: Explaining concepts to others and hearing different perspectives can deepen your understanding and reveal gaps in your knowledge.
  • Go over the reference sheet: Looking through formulas, definitions, and key concepts helps reinforce your memory and gives you a resource to use during the exam.

Remember, there is only so much we can test you on, so don’t stress too much. A bad score does not mean that you cannot do well in the class. The exam has been lighter in difficulty compared to previous iterations of the course. Focus on understanding the core concepts and practicing problem-solving, rather than memorizing details. Consistent, active review will help you build confidence and perform your best on exam day.

For Labs, Homeworks, and Projects:

Prioritize assignments in this order: homeworks and projects. Homeworks are designed to reinforce concepts from lecture and are usually more straightforward (especially part A), making them a great starting point. Projects are typically the most complex and time-consuming, so it’s beneficial to approach them after you’ve mastered the material from homeworks.

  • Homeworks: Try to solve each question independently before seeking help, as this will strengthen your problem-solving skills. If you get stuck, review related lecture materials or discuss the problem with classmates.
  • Projects: Start early, break the project into manageable sections, and set milestones to track your progress. Collaborate with peers when allowed, but ensure you understand every part of your submission. Document your code and thought process.

If you encounter a question you don’t know how to solve, start by reviewing the relevant lecture slides for a quick refresher. The textbook is also a valuable resource, as lecture slides are often based on its content and can provide additional explanations and examples. Don’t hesitate to attend office hours—TAs and tutors are there to help you succeed and can offer guidance on difficult topics or assignments.

Consistent engagement with these resources and assignments will help you stay on track and deepen your understanding of the material.

Special Tricks

Below are some special tricks that I discovered and used while I was a student and later on the course staff for this course:

  • You should be familiar with the pandas functions listed on the reference sheet. Usually, the ones tested on the exam will be included there.
  • When your code works locally but does not pass tests when you run the last cell or upload to Gradescope/Pensieve, you are most likely doing one of the following:
    • Importing unnecessary external libraries without being instructed to do so by course staff. This can cause later Gradescope/Pensieve test cases to fail and produce a variable not defined error.
    • Renaming a variable while completing the assignment. Although the variable may exist in your environment, our tests run in a fresh, blank environment. If your code references a variable you previously used but later deleted or redefined, it will break.

Future Course Suggestions

I have personally taken a variety of upper-division CS and DS courses, so if you are looking for future course suggestions, here are my recommendations: