This course is the second of a three-course series on statistical data science; see course description.
Many thanks to Xiao-Hui Tai, who has taught STA 035A; this course is structured similarly to hers and much of the material in this syllabus is taken from the material in her syllabus.
The course webpage is here. Lecture notes, homework, supplementary materials, etc., will be posted there. Canvas will be used for lab materials and for turning in labs and homework. Solutions will be posted on Canvas. Piazza will be used for discussion (more details below).
This is a course on advanced programming and data manipulation in the R programming language. The best preparation for this course is STA 035A. If you have no experience with programming in any language, this will be a difficult course, and you will need to expend significant time into the first few weeks to familiarize yourself with R.
The more formal requirements are below.
Prerequisite(s): one of the following two options:
Pre- or co-requisite (i.e., can be concurrent):
These requirements are strict.
Lectures are Mondays, Wednesdays and Fridays, 4:10-5:00 PM, WELLMN 234
Discussions (labs) are run by the TA, Mengjie Shie, email: mjshi@ucdavis.edu
Section A01: Tuesdays, 10:00 AM - 10:50 AM, TLC 2212
Section A02: Tuesdays, 11:00 AM - 11:50 AM, TLC 2212
Office hours are posted on the course webpage.
R is a free, open-source programming language for statistical computing. RStudio is a free, open-source R programming environment. It contains a built-in code editor, many features to make working with R easier, and works the same way across different operating systems.
All of our computing work in this class will be done using R and RStudio. You will use RStudio for homework, labs and exams, so a working version of RStudio is required. You can choose to download it on your personal computer, or use UC Davis JupyterHub. You will need regular, reliable access to a computer either running an up-to-date version of R and RStudio, or with a working browser (for the JupyterHub option). If this is a problem, please let us know right away. There are resources available to support you. Some are listed here.
The room that labs are held in have computers with RStudio installed, and you may choose to use them. If you are using your own laptop, please make sure that it is charged before class.
There will be two textbooks used for the course, both of which are freely available online.
A rough schedule is as follows. This is subject to change.
Week | Topics | Reference | Notes |
---|---|---|---|
1 | Review: dplyr and tidy data | R4DS2, Ch 3+5 | |
2 | Workflow; transformations of vectors, numbers, strings | R4DS2, Ch 6 + 12-14 | HW 1 due |
3 | Regular expressions, factors, dates and times | R4DS2, Ch 15-17 | HW 2 due |
4 | Visualizations with ggplot | R4DS2, Ch 9 | Midterm 1 |
5 | Functions | R4DS2, Ch 25 | HW 3 due |
6 | Linear regression | IMS, Ch 7-8 | HW 4 due |
7 | Inference for comparing means | IMS, Ch 19-20 | HW 5 due |
8 | Inference for comparing means | IMS, Ch 19-20 | Midterm 2 |
9 | Inference for regression | IMS, Ch 24-25 | HW 6 due |
10 | Inference for regression | IMS, Ch 24-25 | HW 7 due |
11 | Final exam |
The grade breakdown is:
Cutoffs for letter grades are:
These thresholds may be adjusted at the end of the semester in a way that improves your grade for the course.
Labs are typically due the Monday after the lab session, at 9 PM, and will be turned in via Gradescope (accessible through Canvas). Always check the course homepage and Gradescope/Canvas to ensure the correct deadline. Labs will be completed in R Markdown format (file extension Rmd). Labs will involve writing a combination of code and written prose, and the R Markdown format allows for a combination of the two. Labs must be submitted only in PDF format, the result of calling “Knit PDF” from RStudio on your R Markdown document. Work submitted in any other format will receive a grade of 0, without exception. All code used to produce your results must be shown in your PDF file (e.g., do not use echo = FALSE
or include = FALSE
as options anywhere). Rmd files do not need to be submitted, but may be requested by the TA and must be available when the assignment is submitted.
No late labs will be accepted for any reason, but your lowest lab grade will be dropped so that your grade for the labs is the average of your highest lab grades. It is highly recommended that you begin working on your labs as soon as they are released.
Students may choose to collaborate with each other on the labs and homework, but must clearly indicate the names of all students with whom they collaborated.
Homework will typically released on Friday after class, and will typically be due Thursday the week after, at 9 PM. Always check Gradescope/Canvas to ensure the correct due date. The same policies that apply to labs will apply to homework. Homework may contain other non-coding components, and these can be typed (use any software you are comfortable with), or written and scanned. The submission must be in PDF format.
No late homeworks will be accepted for any reason, but your lowest homework grade will be dropped so that your grade for the homeworks is the average of your highest homework grades. It is highly recommended that you begin working on your homework as soon as it is released.
Students may choose to collaborate with each other on the labs and homework, but must clearly indicate the names of all students with whom they collaborated.
There will be two midterms and one final. The midterms will be in class, during the scheduled class times.
The lower score of the two midterms will be dropped. There will be no make-up exams. If you must miss an exam due to illness, travel, or some other reason, this will be the exam that will be dropped. For the final, if you have another final starting 30 minutes before or after the scheduled time, you may present documentation and request for an accommodation to start 15 minutes before or after the scheduled time.
All labs and homework will be due at 9 PM Pacific Time, on the relevant due date. No late work will be accepted for any reason, but your lowest grade will be dropped. It is highly recommended that you begin working on your lab/homework as soon as it is released.
Students have 24 hours after receiving a grade on any assignment to contest it. Grading is consistent and we will provide detailed rubrics. If you think you deserve a different grade, prepare a strong argument and submit it by email to the TA.
Class attendance is strongly encouraged. Please be on time. If you miss a lecture for any reason, you are responsible for all material covered and any announcements made in your absence. Active participation is encouraged, both in class and on Piazza. Cell phones, laptops, and other electronic devices must be silenced in class. Laptops are to be used in class for learning purposes related to the lecture only.
All students are expected to follow the UCD Code of Academic Conduct. Any student who cheats on an assignment or exam will be referred to the Office of Student Support and Judicial Affairs and will receive an automatic failing grade on the relevant assignment. A second instance of academic dishonesty will result in a failing grade in the course. More information on the nature of dishonest academic behavior or UCD policy can be found on the website of the Office of Student Support and Judicial Affairs.
Collaboration is encouraged and students are encouraged to discuss course material with classmates. All work that is turned in, however, must be your own. If students have collaborated on labs or homework, the names of all students working together must be clearly indicated.
Please do not distribute any course materials outside of this class. This is an infringement of copyright as per UC policy. Use of sites like Course Hero and Chegg are not permitted.
You can access Piazza by clicking on the “Piazza” link on the sidebar in Canvas, or directly through this link. Students are encouraged to answer each others’ questions, and the TA will moderate by checking in every day. The quickest way to get a question answered is likely on Piazza, since anyone in class can answer. These are the rules for posting on Piazza:
If you have a question that requires more than a short paragraph to answer, labs and office hours are the best options.
Email will be used only for questions relating to private matters (accommodations, grading, emergencies, etc.). Questions about class logistics and content should be posted on Piazza, asked in class, during labs, or during office hours. If you must ask a question about a non-private matter via email, you must first document how you tried to answer the question for yourself or through other means (e.g., “I double checked the syllabus,” “There are conflicting responses on Piazza” …). Emails that do not follow these guidelines may not be answered.
Please do not send me messages on Canvas. I do not monitor my Canvas inbox.
Statistics Tutors at the Academic Assistance and Tutoring Centers provide support for RStudio. More information is available here.
Many students face different challenges during college, and it is healthy to seek support. This is a comprehensive list of resources covering general academics, health and wellness, finances, housing, career/internship, and other topics.
Health and wellness resources are available here. If you have an emergency, call 911 immediately, or go to the nearest emergency room. Mental health staff are available 24 hours/7 days week by phone at 530-752-2349. (Follow the prompts to reach a counselor.)
UC Davis is committed to educational equity in the academic setting, and in serving a diverse student body. All students who are interested in learning more about the Student Disability Center (SDC) are encouraged to contact them directly at https://sdc.ucdavis.edu, sdc@ucdavis.edu or 530-752-3184. If you are a student who requires academic accommodations, please submit your SDC Letter of Accommodation to us as soon as possible, ideally within the first two weeks of this course.