Skip to content

Commit 630f454

Browse files
authored
Update and rename 5. Regression.md to 6. Regression.md
1 parent 8e2766d commit 630f454

File tree

2 files changed

+81
-240
lines changed

2 files changed

+81
-240
lines changed

5. Regression.md

Lines changed: 0 additions & 240 deletions
This file was deleted.

6. Regression.md

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
## Quizz
2+
1.Some people believe that musical activity (e.g. playing an instrument) enhances mathematical ability. 100 high school students were selected at random. For each student, musical activity was recorded in hours per week and mathematical ability was assessed by a test. The correlation coefficient was found to be 0.85.
3+
4+
Does the large correlation coefficient prove that musical activity enhances mathematical ability?
5+
6+
> no, The high correlation indicates a relationship between hours per week dedicated to musical activity and score on a math test, but doesn't explain the nature of the relationship. For example, older students may dedicate more time to music and receive higher scores on the math test than younger students, in which case age would be a confounder.
7+
8+
<br>
9+
10+
2. What would your answer to the previous question be if you learned that all students in the study came from the same grade?
11+
12+
> no, There could be other confounders, such as level of parental involvement.
13+
14+
3. For a group of commuters commuting to work on a given day, the correlation coefficient between a) time spent waiting at traffic signals, and b) total commuting time, was found to be 0.4. Which of the following statements about the correlation coefficient are true?
15+
16+
- **The more time a commuter spends commuting to work, the more time he spends waiting at traffic signals, on average.**
17+
18+
- If a commuter's total commuting time increases by 10 minutes, then he will spend an additional 4 minutes waiting at traffic signals, on average.
19+
20+
- **The more time a commuter spends waiting at traffic signals, the longer the total commuting time, on average.**
21+
22+
- The average commuter spent 40% of the commuting time waiting at traffic signals.
23+
24+
<br>
25+
26+
4. A study followed 1,000 children over time. The scatter plot of heights at age 1 vs. heights at age 2 looks football-shaped with a correlation coefficient r=0.8. Alice's height at age 1 is at the 80th percentile.
27+
Would you predict her height at age 2 to be below, at, or above the 80th percentile?
28+
29+
> below, Because of the regression effect.
30+
31+
<br>
32+
33+
5. In the previous question we learned that in a study of children's height, the correlation coefficient between height at age 1 vs. height at age 2 is r=0.8.
34+
35+
Predict the z-score of Alice's height at age 2. (You may use the fact that the z-score of the 80th percentile is z=0.85.)
36+
37+
> (0.8)(0.85) = 0.68, We know the regression line goes through the point ( x-mean, y-mean) and has slope r * Sy/Sx, where r is the correlation coefficient and
38+
Sx, Sy are the sample standard deviations of x-mean and y- mean, respectively. But this means that after standardizing, the regression line goes through the point (0,0)(0,0) and has slope r, so that its equation is just y = rx. Thus the predicted z-score of Alice's height at age 2 is y = (0.8)(0.85).y=(0.8)(0.85).
39+
40+
6. Questions (a)-(d) below relate to the following situation: In a biology class, both the midterm scores and the final exam scores have an average of 50 and a standard deviation of 10. The scatterplot looks football-shaped and the correlation coefficient is 0.6.
41+
Claudia would like to know what score her friend Emily got on the final.
42+
43+
Question (a): If you have no information on how Emily did on the midterm, what is your prediction for her score on the final?
44+
45+
> 50, Without Emily's midterm score, we would just use the average final score as our prediction of Emily's final score.
46+
47+
<br>
48+
49+
7. Question (b): What is the "give or take" number for your prediction from Question (a)?
50+
51+
> 10, Since we're only considering information about the final scores in our prediction, just as our predicted score is the average final score, our "give or "take" number is the standard deviation of the final scores.
52+
53+
<br>
54+
55+
8.Now you learn that Emily got exactly the mean score of 50 on the midterm.
56+
57+
Question (c): Given this information, what is your prediction for Emily's score on the final?
58+
59+
> 50,Since the distance in standard deviations of Emily's midterm score from the average midterm score is 0, the corresponding distance of Emily's predicted final score from the average final score is r * 0 =0, so that our prediction is unchanged.
60+
61+
<br>
62+
63+
9. Question (d): What is the "give or take" number for your prediction from Question (c)?
64+
65+
> 10 * sqrt{1-(0.6)^2} = 8, Since we're told the scatterplot of midterm scores vs. final scores is football-shaped, we assume we can use the normal approximation to estimate the standard deviation of the final scores for any fixed midterm score by Sy * sqrt{1-r^2}
66+
67+
<br>
68+
69+
10. A tutoring center advertises its services by stating that students who sign up improve their GPA on tests by 0.5 points on average.
70+
Is this indeed evidence that the tutoring helps or could this be due to the regression effect?
71+
72+
- The improvement proves that the tutoring helps.
73+
- **The improvement could be due to the regression effect.**
74+
75+
> Tutoring may attract mostly struggling students.
76+
77+
<br>
78+
79+
11. True or false: If an observation with large leverage has a small residual, then it is not influential.
80+
81+
> False, An observation may have a small residual because it is influential.

0 commit comments

Comments
 (0)