ray-project · cyk1337 · May 7, 2020
diff --git a/solutions/exercises/colab01-03.ipynb b/solutions/exercises/colab01-03.ipynb
diff --git a/solutions/exercises/colab04-05.ipynb b/solutions/exercises/colab04-05.ipynb
diff --git a/solutions/exercises/colab06-07.ipynb b/solutions/exercises/colab06-07.ipynb
diff --git a/solutions/exercises/exercise01-Introduction.ipynb b/solutions/exercises/exercise01-Introduction.ipynb
@@ -0,0 +1,325 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Exercise 1 - Simple Data Parallel Example\n",
+    "\n",
+    "**GOAL:** The goal of this exercise is to show how to run simple tasks in parallel.\n",
+    "\n",
+    "This script is too slow, and the computation is embarrassingly parallel. In this exercise, you will use Ray to execute the functions in parallel to speed it up.\n",
+    "\n",
+    "### Concept for this Exercise - Remote Functions\n",
+    "\n",
+    "The standard way to turn a Python function into a remote function is to add the `@ray.remote` decorator. Here is an example.\n",
+    "\n",
+    "```python\n",
+    "# A regular Python function.\n",
+    "def regular_function():\n",
+    "    return 1\n",
+    "\n",
+    "# A Ray remote function.\n",
+    "@ray.remote\n",
+    "def remote_function():\n",
+    "    return 1\n",
+    "```\n",
+    "\n",
+    "The differences are the following:\n",
+    "\n",
+    "1. **Invocation:** The regular version is called with `regular_function()`, whereas the remote version is called with `remote_function.remote()`.\n",
+    "2. **Return values:** `regular_function` immediately executes and returns `1`, whereas `remote_function` immediately returns an object ID (a future) and then creates a task that will be executed on a worker process. The result can be obtained with `ray.get`.\n",
+    "    ```python\n",
+    "    >>> regular_function()\n",
+    "    1\n",
+    "    \n",
+    "    >>> remote_function.remote()\n",
+    "    ObjectID(1c80d6937802cd7786ad25e50caf2f023c95e350)\n",
+    "    \n",
+    "    >>> ray.get(remote_function.remote())\n",
+    "    1\n",
+    "    ```\n",
+    "3. **Parallelism:** Invocations of `regular_function` happen **serially**, for example\n",
+    "    ```python\n",
+    "    # These happen serially.\n",
+    "    for _ in range(4):\n",
+    "        regular_function()\n",
+    "    ```\n",
+    "    whereas invocations of `remote_function` happen in **parallel**, for example\n",
+    "    ```python\n",
+    "    # These happen in parallel.\n",
+    "    for _ in range(4):\n",
+    "        remote_function.remote()\n",
+    "    ```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from __future__ import absolute_import\n",
+    "from __future__ import division\n",
+    "from __future__ import print_function\n",
+    "\n",
+    "import ray\n",
+    "import time"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Start Ray. By default, Ray does not schedule more tasks concurrently than there are CPUs. This example requires four tasks to run concurrently, so we tell Ray that there are four CPUs. Usually this is not done and Ray computes the number of CPUs using `psutil.cpu_count()`. The argument `ignore_reinit_error=True` just ignores errors if the cell is run multiple times.\n",
+    "\n",
+    "The call to `ray.init` starts a number of processes."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "2020-05-07 18:10:09,660\tINFO resource_spec.py:205 -- Starting Ray with 7.52 GiB memory available for workers and up to 3.77 GiB for objects. You can adjust these settings with ray.remote(memory=<bytes>, object_store_memory=<bytes>).\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "{'node_ip_address': '172.18.40.24',\n",
+       " 'redis_address': '172.18.40.24:57452',\n",
+       " 'object_store_address': '/tmp/ray/session_2020-05-07_18-10-09_652384_18687/sockets/plasma_store',\n",
+       " 'raylet_socket_name': '/tmp/ray/session_2020-05-07_18-10-09_652384_18687/sockets/raylet',\n",
+       " 'webui_url': None,\n",
+       " 'session_dir': '/tmp/ray/session_2020-05-07_18-10-09_652384_18687'}"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "ray.init(num_cpus=4, ignore_reinit_error=True, include_webui=False)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**EXERCISE:** The function below is slow. Turn it into a remote function using the `@ray.remote` decorator."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# This function is a proxy for a more interesting and computationally\n",
+    "# intensive function.\n",
+    "def slow_function(i):\n",
+    "    time.sleep(1)\n",
+    "    return i"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**EXERCISE:** The loop below takes too long. The four function calls could be executed in parallel. Instead of four seconds, it should only take one second. Once `slow_function` has been made a remote function, execute these four tasks in parallel by calling `slow_function.remote()`. Then obtain the results by calling `ray.get` on a list of the resulting object IDs."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Sleep a little to improve the accuracy of the timing measurements below.\n",
+    "# We do this because workers may still be starting up in the background.\n",
+    "time.sleep(2.0)\n",
+    "start_time = time.time()\n",
+    "\n",
+    "results = [slow_function(i) for i in range(4)]\n",
+    "\n",
+    "end_time = time.time()\n",
+    "duration = end_time - start_time\n",
+    "\n",
+    "print('The results are {}. This took {} seconds. Run the next cell to see '\n",
+    "      'if the exercise was done correctly.'.format(results, duration))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Solution**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[ObjectID(73962dd385f7ffffffff0100000000c001000000), ObjectID(26acd2812046ffffffff0100000000c001000000), ObjectID(41d260bc3921ffffffff0100000000c001000000), ObjectID(f3083a7ebdbaffffffff0100000000c001000000)]\n"
+     ]
+    }
+   ],
+   "source": [
+    "@ray.remote\n",
+    "def slow_function(i):\n",
+    "    time.sleep(1)\n",
+    "    return i\n",
+    "start_time = time.time()\n",
+    "results = []\n",
+    "for i in range(4):\n",
+    "    results.append(slow_function.remote(i))\n",
+    "print(results)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[0, 1, 2, 3]\n",
+      "Executing the for loop took 1.010 seconds.\n",
+      "The results are: [0, 1, 2, 3]\n",
+      "Run the next cell to check if the exercise was performed correctly.\n"
+     ]
+    }
+   ],
+   "source": [
+    "results = ray.get(results)\n",
+    "print(results)\n",
+    "duration = time.time() - start_time\n",
+    "print('Executing the for loop took {:.3f} seconds.'.format(duration))\n",
+    "print('The results are:', results)\n",
+    "print('Run the next cell to check if the exercise was performed correctly.')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**VERIFY:** Run some checks to verify that the changes you made to the code were correct. Some of the checks should fail when you initially run the cells. After completing the exercises, the checks should pass."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Success! The example took 1.0102510452270508 seconds.\n"
+     ]
+    }
+   ],
+   "source": [
+    "assert results == [0, 1, 2, 3], 'Did you remember to call ray.get?'\n",
+    "assert duration < 1.1, ('The loop took {} seconds. This is too slow.'\n",
+    "                        .format(duration))\n",
+    "assert duration > 1, ('The loop took {} seconds. This is too fast.'\n",
+    "                      .format(duration))\n",
+    "\n",
+    "print('Success! The example took {} seconds.'.format(duration))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**EXERCISE:** Use the UI to view the task timeline and to verify that the four tasks were executed in parallel. You can do this as follows.\n",
+    "\n",
+    "1. Run the following cell to generate a JSON file containing the profiling data.\n",
+    "2. Download the timeline file by right clicking on `timeline01.json` in the navigator to the left and choosing **\"Download\"**.\n",
+    "3. Open [chrome://tracing/](chrome://tracing/) in the Chrome web browser, click on the **\"Load\"** button and load the downloaded JSON file.\n",
+    "\n",
+    "To navigate within the timeline, do the following.\n",
+    "- Move around by clicking and dragging.\n",
+    "- Zoom in and out by holding **alt** and scrolling.\n",
+    "\n",
+    "**NOTE:** The timeline visualization will only work in **Chrome**."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ray.timeline(filename=\"timeline01.json\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "download the json and open from [chrome://tracing](chrome://tracing)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.10"
+  },
+  "toc": {
+   "base_numbering": 1,
+   "nav_menu": {},
+   "number_sections": false,
+   "sideBar": true,
+   "skip_h1_title": false,
+   "title_cell": "Table of Contents",
+   "title_sidebar": "Contents",
+   "toc_cell": false,
+   "toc_position": {
+    "height": "calc(100% - 180px)",
+    "left": "10px",
+    "top": "150px",
+    "width": "382.391px"
+   },
+   "toc_section_display": true,
+   "toc_window_display": true
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}