Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Labs Feedback #2

Open
wxindu opened this issue Sep 22, 2019 · 6 comments
Open

Labs Feedback #2

wxindu opened this issue Sep 22, 2019 · 6 comments

Comments

@wxindu
Copy link

wxindu commented Sep 22, 2019

Lab #1 Feedback

✔️
Good work overall

Exercise 1
Good

Exercise 2
Good. I may organize your code slightly different than what you did here. let me show you:
image
As you may see in your pdf file, your labs() can be a bit too long, and it may still have the risk of being cut off. Try to organize your code like what I did here and see if you like it.

Exercise 3
Good

Exercise 4
Good. You can also make a histogram on crim to visually see if there are extreme values

Exercise 5
Good

Exercise 6
Good

Exercise 7
How would the model "apply the correlations to predict the average value of a home"? Please provide more description of how this can be done. Also, how would you consider which variables to be included in your model? How would the correlation between different predictors affect your decision on which ones to include?

@wxindu
Copy link
Author

wxindu commented Sep 29, 2019

Lab #2 Feedback

✔️
Mostly fine work. Think again about the last simulation problem (from 9-12).

Exercise 1 ~ 6
Perfect

Exercise 7
Be careful, question 2 should be inference. Question 4 should be data description.

Exercise 9-12
A few things:

  • Think again about the x's you generated. rnorm() set mean to be 0 and sd to be 1. Is it reasonable to assume that mag center at around 0? Try with different normal distribution.
  • How are you generating your y's? What's the relationship between x and y? It seems like you are using the estimate of error term as your y. Think again about how to generate y's using random error.

@wxindu
Copy link
Author

wxindu commented Oct 15, 2019

Lab #4 Feedback

✔️ ➕
Good job overall!

Lab part:
Looks good

Problem Set:

Exercise 2
a) Good
b) Good

Exercise 3
a) Yes, the training RSS will steadily decrease as s increases until the point where the LS estimates are reached, and from there, the training RSS will remain unchanged with any further increase in s.
b) Yes, the testing RSS will decrease as s increases up until the point where the optimal complexity is reached. It will from there increase again. And once the LS estimates are reached, the testing RSS will remain unchanged.
c) Good. The variance will increase steadily until the LS estimates are reached, at which point it will remain unchanged.
d) Good. The bias will decreases steadily until LS estimates are reached and it will remain unchanged,
e) Good.

Exercise 4
a) Correct.
b) Good
c) Good
d) Good
e) Good

Exercise 5
NA

Exercise 6
good job

@wxindu
Copy link
Author

wxindu commented Oct 26, 2019

Lab #5 Feedback

✔️
Review non-linear regression models.

Problem 1
Please review non linear regression models and how to include quadratic terms in a model: make sure you always include the original variable, not only its quadratic. You need + export + I(exports^2).

Problem 2
Be careful, you would want to change the schooling back to its original value before changing exports. Otherwise your steps look right, you can try to use filter() and mutate() to subset and modify your dataset. Try to fix the model in problem 1 and redo this problem, see if the results make better sense.
2.3
Good. Also consider the response variable of a logistic regression model. As the value of predictors change, the response variable change exponentially.

Problem 3
Your steps look right. Redo the model in problem 1 and try again.

Problem 4
You have the same problem in these models as in problem 1. Always include both the original variable and its quadratic form in your model.

Problem Set

Problem 4
a) Good
b) Good
c) Good
d) Good
e) Good

Problem 6
a) Good
b) Good

Problem 7
Good job

@wxindu
Copy link
Author

wxindu commented Nov 12, 2019

Lab #6 Feedback

✔️ ➖
Review bootstrapping. Review confidence intervals. Be careful with your code, whenever you run into errors and cannot figure out why it happen please ask for help.

Inventing a variable

  1. Good.
  2. Good.
  3. Good. You can create the variable partition_index by doing the following code:
    d <- d %>% mutate(fold = sample(1:5, nrow(d), replace = TRUE)
    Collaborators?

Inverting a variable

  1. Good.
  2. Good

A simple model
I don't understand your work here. Read the question again. Here the estimates of return is simple 1/MAPE.
2. missing

**Is simple sufficient? **
Your code for bootstrap looks right. Think again about what's the parameter of interest here.
Try not to use other packages to help you find confidence intervals. Construct them by yourself. Review how to construct a confidence interval using bootstrapping. What's the margin of error?
Your code does not work here which is one of the reason why the file cannot knit.

One big happy plot
You are missing a + sign in your code here so your code returns an error. Fix that and try again.

The big picture
Missing

Exercise
Missing

@wxindu
Copy link
Author

wxindu commented Dec 4, 2019

Lab #7 Feedback

✔️
Good

Problem 1

  1. Good
  2. Consider the nodal purity. How does the second split affect nodal purity?
  3. Good

Problem 2
Good

Problem 3
Good job

Problem 4
Good.

Problem 5
Good. Try not to print out all data.

Problem 6
You can specify n.var = in your varImpPlot() to display only the first several results on the plot.

@wxindu
Copy link
Author

wxindu commented Dec 16, 2019

Lab #8 Feedback

✔️ ➕
Very good work

Building a boosted tree
Good job

Assessing predictions

  1. Good job.
  2. Very good.
  3. Great! I like your way of finding the most difficult-to-predict one.
  4. Good job.

Slow the learning
Very nice work.

Communities and Crime
Good. One little thing, Andrew should have provided both the training set and the testing set. The dataset you are using here was supposed to be the training set. Be careful. Your steps look correct for subseting the training set and testing set though.

Chapter 8 Exercises
5. Good
6. Good job

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant