Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[coursera] Proofread recaps and week 1 assignments #470

Open
wants to merge 3 commits into
base: coursera
Choose a base branch
from

Conversation

mmamedli
Copy link

No description provided.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@dniku dniku changed the title Coursera [coursera] Proofread recaps and week 1 assignments Jan 12, 2021
Copy link
Collaborator

@dniku dniku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed week 1 assignments. Haven't looked at the recaps yet.

@@ -267,7 +267,7 @@
" policy[s_i,a_i] ~ #[occurrences of s_i and a_i in elite states/actions]\n",
"\n",
" Don't forget to normalize the policy to get valid probabilities and handle the 0/0 case.\n",
" For states that you never visited, use a uniform distribution (1/n_actions for all states).\n",
" For states, that you never visited, use a uniform distribution (1/n_actions for all states).\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I remember English punctuation, this comma shouldn't be there.

"\n",
"In case CEM failed to learn how to win from one distinct starting point, it will simply discard it because no sessions from that starting point will make it into the \"elites\".\n",
"In case CEM failed to learn, how to win from one distinct starting point, it will simply discard it because no sessions from that starting point will make it into the \"elites\".\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for this comma.

"# This code creates a virtual display to draw game images on.\n",
"# It will have no effect if your machine has a monitor.\n",
"# This code creates a virtual display for drawing game images on.\n",
"# It won't have any effect if your machine has a monitor.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this block in (almost?) every notebook. I wouldn't stress that too much, but if this block is changed in one notebook, it should probably be changed similarly in all other notebooks.

Luckily, that can be done quite easily, since this block is absolutely identical across all notebooks.

@@ -82,7 +82,7 @@
" activation='tanh',\n",
")\n",
"\n",
"# initialize agent to the dimension of state space and number of actions\n",
"# initialize agent to the dimension of state space and a number of actions\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should probably be "the" here, since it's a very specific number of actions.

" * Try re-using samples from 3-5 last iterations when computing threshold and training\n",
" * Experiment with amount of training iterations and learning rate of the neural network (see params)\n",
" * Try re-using samples from 3-5 last iterations when computing threshold and during training\n",
" * Experiment with an amount of training iterations and the learning rate of the neural network (see params)\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"an amount" → "the number"?

@@ -359,7 +361,7 @@
" ax.set_xlabel('position (x)')\n",
" ax.set_ylabel('velocity (v)')\n",
" \n",
" # Sample a trajectory and draw it\n",
" # Sample the trajectory and draw it\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reasonably sure that it should be "a trajectory", since we are sampling an arbitrary trajectory, not a specific one.

"* `reset()`: reset environment to the initial state, _return first observation_\n",
"* `render()`: show current environment state (a more colorful version :) )\n",
"* `step(a)`: commit action `a` and return `(new_observation, reward, is_done, info)`\n",
"The three main methods of this environment are:\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be "an environment", since there are many possible environments, and the interface is identical for all of them.

@@ -119,7 +119,7 @@
"source": [
"### Play with it\n",
"\n",
"Below is the code that drives the car to the right. However, if you simply use the default policy, the car will not reach the flag at the far right due to gravity.\n",
"Below is the code that drives the car to the right. However, if you simply use the default policy, the car won't reach the flag at the far right due to the gravity.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this "the". My internal English language model is confused.

" # Your goal is to fix that. You don't need anything sophisticated here,\n",
" # and you can hard-code any policy that seems to work.\n",
" # Hint: think how you would make a swing go farther and faster.\n",
" # Hint: think how you would make a swing go faster and faster.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we meant "farther" here. This change doesn't alter the meaning much, though.

@dniku
Copy link
Collaborator

dniku commented Jan 12, 2021

I will also revert the metadata changes later.

@@ -23,13 +23,13 @@
"source": [
"### Part I: Jupyter notebooks in a nutshell\n",
"* You are reading this line in a jupyter notebook.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word "Jupyter" should be capitalized throughout.

" * This cell contains hypertext, while the next one contains code.\n",
"* You can __run a cell__ with code by selecting it (click) and pressing `Ctrl + Enter` to execute the code and display the output (if any).\n",
"* If you're running this on a device with no keyboard, ~~you are doing it wrong~~ use the top bar (esp. play/stop/restart buttons) to run the code.\n",
"* Behind the curtains, there's a Python interpreter, that runs the code and remembers everything, that was defined.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am reasonably sure that these commas aren't needed.

Comment on lines +66 to +67
"* Use top menu → Kernel → Interrupt (or Stop button), if you want to stop running the cell midway.\n",
"* Use top menu → Kernel → Restart (or cyclic arrow button), if interruption doesn't fix the problem (be aware, you will lose all defined variables).\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same about these commas.

@@ -103,9 +103,9 @@
"metadata": {},
"source": [
"### Part II: Loading data with Pandas\n",
"Pandas is a library that helps you load the data, prepare it and perform some lightweight analysis. The god object here is the `pandas.DataFrame` - a 2D table with batteries included. \n",
"Pandas is the library that helps you to load the data, prepare it and perform some lightweight analysis. The god object here is the `pandas.DataFrame` - a 2D table with included batteries. \n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be "a" here, I think. In addition, "batteries included" is a common expression (e.g. https://www.python.org/dev/peps/pep-0206/).

Comment on lines 378 to 380
"metadata": {
"tags": []
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove these changes myself.

@@ -2303,7 +2325,7 @@
"metadata": {},
"outputs": [],
"source": [
"# plot a histogram of age and a histogram of ticket fares on separate plots\n",
"# plot the histogram of age and the histogram of ticket fares on separate plots\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "a" is fine in both cases here.

@@ -336,7 +339,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"If you compute gradient from multiple losses, the gradients will add up at variables, therefore it's useful to __zero the gradients__ between iteratons."
"If you compute gradient from multiple losses, the gradients will be added to variables, therefore, it's useful to __zero the gradients__ between iteratons."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The meaning is different. The gradient isn't added to variables; rather, each variable has an attached field called .grad which is where the gradients accumulate.

"* In general, PyTorch's error messages are quite helpful, read 'em before you google 'em.\n",
"* if you see nan/inf, print what happens at each iteration to find our where exactly it occurs.\n",
"* In general, PyTorch's error messages are quite helpful, read them before you google them.\n",
"* if you see nan/inf, print what happens at each iteration to find out, where exactly it occurs.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant comma?

Comment on lines +785 to +786
"* PyTorch examples - a repo, that implements many cool DL models in PyTorch - [link](https://github.com/pytorch/examples)\n",
"* Practical PyTorch - a repo, that implements some... other cool DL models... yes, in PyTorch - [link](https://github.com/spro/practical-pytorch)\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two redundant commas?

" \n",
"* If anything seems wrong, try going through one step of training and printing everything you compute.\n",
"* If something seems wrong, try going through each step of training and print everything you compute.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original text meant "try going through a single training step and print everything you compute".

Copy link
Collaborator

@dniku dniku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed weeks 2 & 3. 4-6 still pending.

@@ -6,9 +6,9 @@
"source": [
"### Markov decision process\n",
"\n",
"This week's methods are all built to solve __M__arkov __D__ecision __P__rocesses. In the broadest sense, an MDP is defined by how it changes states and how rewards are computed.\n",
"This week methods are all built to solve __M__arkov __D__ecision __P__rocesses. In the broadest sense, the MDP is defined by how it changes the states and how rewards are computed.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "an MDP" is fine here.

@@ -78,7 +78,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We can now use MDP just as any other gym environment:"
"We can now use the MDP just as any other gym environment:"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably MDP (written in monospace) would be better here (since it's a name).

"num_iter = 100 # maximum iterations, excluding initialization\n",
"# stop VI if new values are this close to old values (or closer)\n",
"# stop VI if new values are as close to old values (or closer)\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this isn't valid English. "As close" implies a continuation (as close as...), which is missing here.

@@ -677,7 +677,7 @@
"source": [
"### Let's visualize!\n",
"\n",
"It's usually interesting to see what your algorithm actually learned under the hood. To do so, we'll plot state value functions and optimal actions at each VI step."
"It's usually interesting to see, what your algorithm actually learned under the hood. To do so, we'll plot the state value functions and optimal actions at each VI step."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant comma?

"\n",
"<img src=https://github.com/yandexdataschool/Practical_RL/raw/master/yet_another_week/_resource/exp_replay.png width=480>\n",
"\n",
"#### Training with experience replay\n",
"1. Play game, sample `<s,a,r,s'>`.\n",
"2. Update q-values based on `<s,a,r,s'>`.\n",
"3. Store `<s,a,r,s'>` transition in a buffer. \n",
" 3. If buffer is full, delete earliest data.\n",
"4. Sample K such transitions from that buffer and update q-values based on them.\n",
" 3. If buffer is full, delete the earliest data.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the buffer?

Comment on lines +191 to +192
" An agent, that changes some of q-learning functions to implement Expected Value SARSA. \n",
" Note: this demo assumes, that your implementation of QLearningAgent.update uses get_value(next_state).\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant commas?

@@ -220,7 +220,7 @@
"source": [
"### Cliff World\n",
"\n",
"Let's now see how our algorithm compares against q-learning in case where we force agent to explore all the time.\n",
"Let's now see, how our algorithm compares against q-learning in case, where we force an agent to explore all the time.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant commas?

I'm also not sure about "an" here — it might have to be "the".

@@ -328,7 +328,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's now see what did the algorithms learn by visualizing their actions at every state."
"Let's now see, what did the algorithms learn by visualizing their actions at every state."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant comma?

It should also be "what the algorithms learned" instead of "what did the algorithms learn".

@@ -395,7 +395,7 @@
"\n",
"Here are some of the things you can do if you feel like it:\n",
"\n",
"* Play with epsilon. See learned how policies change if you set epsilon to higher/lower values (e.g. 0.75).\n",
"* Play with an epsilon. See how learned policies change if you set an epsilon to the higher/lower values (e.g. 0.75).\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a very specific epsilon, so it should be "the".

On the other hand, you may regard "epsilon" as a name, in which case there should be no article.

@@ -395,7 +395,7 @@
"\n",
"Here are some of the things you can do if you feel like it:\n",
"\n",
"* Play with epsilon. See learned how policies change if you set epsilon to higher/lower values (e.g. 0.75).\n",
"* Play with an epsilon. See how learned policies change if you set an epsilon to the higher/lower values (e.g. 0.75).\n",
"* Expected Value SASRSA for softmax policy:\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you replace "SASRSA" with "SARSA" here?

Copy link
Collaborator

@dniku dniku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of redundant commas: roughty ⅔ of the comments I made were about them. Other than that, there are only a few problems here and there.

"\n",
"Ideally you should start small with maybe 1-2 hidden layers with < 200 neurons and then increase network size if agent doesn't beat the target score."
"Ideally you should start small, with maybe 1-2 hidden layers with < 200 neurons and then increase the network size, if agent doesn't beat the target score."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be a comma here, according to this English.SE answer.

"assert isinstance(list(network.modules(\n",
"))[-1], nn.Linear), \"please make sure you predict q-values without nonlinearity (ignore if you know what you're doing)\"\n",
"))[-1], nn.Linear), \"please, make sure you predict q-values without nonlinearity (ignore, if you know, what you're doing)\"\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, we don't need commas here.

Comment on lines +218 to +220
" ) == 2, \"make sure, that you predicted q-values for all actions in next state\"\n",
" assert next_state_values.data.dim(\n",
" ) == 1, \"make sure you computed V(s') as maximum over just the actions axis and not all axes\"\n",
" ) == 1, \"make sure, that you computed V(s') as maximum over just the actions axis and not all axes\"\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, it seems that commas are unnecessary here.

"\n",
"Seriously though,\n",
"* __ mean reward__ is the average reward per game. For a correct implementation it may stay low for some 10 epochs, then start growing while oscilating insanely and converges by ~50-100 steps depending on the network architecture. \n",
"* __ mean reward__ is the average reward per game. For the correct implementation it may stay low for some 10 epochs, then start growing, while oscilating insanely and converges by ~50-100 steps depending on the network architecture. \n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are a few remaining problems with this line. How about:

__ mean reward__ is the average reward per game. When you implement everything correctly, it may stay low for some 10 epochs, then start growing while oscilating insanely, and converge after around 50-100 epochs depending on the network architecture.

"* If it never reaches target score by the end of for loop, try increasing the number of hidden neurons or look at the epsilon.\n",
"* __ epsilon__ - agent's willingness to explore. If you see that agent's already at < 0.01 epsilon before it's is at least 200, just reset it back to 0.1 - 0.5."
"* __ epsilon__ - agent's willingness to explore. If you see, that agent's already at < 0.01 epsilon before it's is at least 200, just reset it back to 0.1 - 0.5."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this is a subordinating conjuction that is not at the beginning of the sentence, and the comma is unnecessary. However, some the sentence is unclear for other reasons. How about:

__ epsilon__ — the agent's willingness to explore. If you see that the agent is already at epsilon < 0.01 before the score is at least 200, just reset the epsilon back to 0.1 - 0.5.

"\n",
"* Optimizer: In the original paper RMSProp was used (they did not have Adam in 2013) and it can work not worse than Adam. For us Adam was default and it worked.\n",
"* Optimizer: In the original paper RMSProp was used (they did not have Adam in 2013) and it can work as good as Adam. For us Adam was a default and it worked.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"as well as"?

"\n",
"* target network update frequency: has something in common with learning rate. Too frequent updates can lead to divergence. Too rare can lead to slow leraning. For millions of total timesteps thousands of inner steps seem ok. One iteration of target network updating is an iteration of the (this time approximate) $\\gamma$-compression that stands behind Q-learning. The more inner steps it makes the more accurate is the compression.\n",
"* target network update frequency: has something in common with the learning rate. Too frequent updates can lead to divergence. Too rare can lead to slow learning. For millions of total timesteps thousands of inner steps seem ok. One iteration of target network updating is an iteration of (this time approximate) $\\gamma$-compression, that stands behind Q-learning. The more inner steps it makes the more accurate is the compression.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant comma? (in "compression, that")

@@ -1217,11 +1215,11 @@
"If you want to play with DQN a bit more, here's a list of things you can try with it:\n",
"\n",
"### Easy:\n",
"* Implementing __double q-learning__ shouldn't be a problem if you've already have target networks in place.\n",
"* Implementing __double q-learning__ shouldn't be a problem, if you already have target networks in place.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant comma?

" * You will probably need `tf.argmax` to select best actions\n",
" * Here's an original [article](https://arxiv.org/abs/1509.06461)\n",
"\n",
"* __Dueling__ architecture is also quite straightforward if you have standard DQN.\n",
"* __Dueling__ architecture is also quite straightforward, if you have standard DQN.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant comma?

"* You can now sample such transitions in proportion to the error (see [article](https://arxiv.org/abs/1511.05952)) for training.\n",
"\n",
"It's probably more convenient to explicitly declare inputs for \"sample observations\", \"sample actions\" and so on to plug them into q-learning.\n",
"\n",
"Prioritized (and even normal) experience replay should greatly reduce amount of game sessions you need to play in order to achieve good performance. \n",
"Prioritized (and even normal) experience replay should greatly reduce an amount of game sessions you need to play in order to achieve good performance. \n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"an amount" → "the number"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants