book/planning.html

<!DOCTYPE html>

<html>

  <head>
    <title>Ch. 12 - Sampling-based motion planning</title>
    <meta name="Ch. 12 - Sampling-based motion planning" content="text/html; charset=utf-8;" />
    <link rel="canonical" href="http://underactuated.mit.edu/planning.html" />

    <script src="https://hypothes.is/embed.js" async></script>
    <script type="text/javascript" src="chapters.js"></script>
    <script type="text/javascript" src="htmlbook/book.js"></script>

    <script src="htmlbook/mathjax-config.js" defer></script>
    <script type="text/javascript" id="MathJax-script" defer
      src="htmlbook/MathJax/es5/tex-chtml.js">
    </script>
    <script>window.MathJax || document.write('<script type="text/javascript" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml.js" defer><\/script>')</script>

    <link rel="stylesheet" href="htmlbook/highlight/styles/default.css">
    <script src="htmlbook/highlight/highlight.pack.js"></script> <!-- http://highlightjs.readthedocs.io/en/latest/css-classes-reference.html#language-names-and-aliases -->
    <script>hljs.initHighlightingOnLoad();</script>

    <link rel="stylesheet" type="text/css" href="htmlbook/book.css" />
  </head>

<body onload="loadChapter('underactuated');">

<div data-type="titlepage">
  <header>
    <h1><a href="index.html" style="text-decoration:none;">Underactuated Robotics</a></h1>
    <p data-type="subtitle">Algorithms for Walking, Running, Swimming, Flying, and Manipulation</p>
    <p style="font-size: 18px;"><a href="http://people.csail.mit.edu/russt/">Russ Tedrake</a></p>
    <p style="font-size: 14px; text-align: right;">
      &copy; Russ Tedrake, 2024<br/>
      Last modified <span id="last_modified"></span>.</br>
      <script>
      var d = new Date(document.lastModified);
      document.getElementById("last_modified").innerHTML = d.getFullYear() + "-" + (d.getMonth()+1) + "-" + d.getDate();</script>
      <a href="misc.html">How to cite these notes, use annotations, and give feedback.</a><br/>
    </p>
  </header>
</div>

<p><b>Note:</b> These are working notes used for <a
href="https://underactuated.csail.mit.edu/Spring2024/">a course being taught
at MIT</a>. They will be updated throughout the Spring 2024 semester.  <a
href="https://www.youtube.com/playlist?list=PLkx8KyIQkMfU5szP43GlE_S1QGSPQfL9s">Lecture videos are available on YouTube</a>.</p>

<table style="width:100%;"><tr style="width:100%">
  <td style="width:33%;text-align:left;"><a class="previous_chapter" href=policy_search.html>Previous Chapter</a></td>
  <td style="width:33%;text-align:center;"><a href=index.html>Table of contents</a></td>
  <td style="width:33%;text-align:right;"><a class="next_chapter" href=feedback_motion_planning.html>Next Chapter</a></td>
</tr></table>

<script type="text/javascript">document.write(notebook_header('planning'))
</script>
<!-- EVERYTHING ABOVE THIS LINE IS OVERWRITTEN BY THE INSTALL SCRIPT -->
<chapter style="counter-reset: chapter 11"><h1>Sampling-based motion planning</h1>

  <p>The term "motion planning" is a quite general term which almost certainly
  encompasses the dynamic programming, feedback design, and trajectory optimization
  algorithms that we have already discussed.  However, there are a number of algorithms
  and ideas that we have not yet discussed which have grown from the idea of formulating
  motion planning as a search problem -- for instance searching for a path from a start
  to a goal in a graph which is too large to fit in memory. In order to deal with the
  continuous domains that are common in robotics, these algorithms often rely on
  randomized sampling.</p>
  <!--Some, but certainly not all, of these algorithms sacrifice optimality in order to
  find any path if it exists, and the notion of a planner being "complete" -- guaranteed
  to find a path if one exists -- is highly valued.-->  </p>
  
  <p>My goal for this chapter is to introduce these additional tools into our toolkit.
  For robotics, they will play a particularly valuable role when the planning problem is
  geometrically complex (e.g. a robot moving around obstacles in 3D) or the optimization
  landscape is very nonconvex, because these are the problems where the nonlinear
  trajectory optimization formulations that we've studied before will potentially suffer
  badly from local minima. Many of these algorithms were developed initially for
  discrete or purely kinematic planning problems; a major theme of this chapter will be
  the adaptations that allow them to work for systems with dynamic constraints
  (including underactuated systems).</p>

  <p><elib>LaValle06</elib> is a very nice book on planning algorithms in general and on
  motion planning algorithms in particular.</p>

  <section><h1>Large-scale Incremental Search</h1>

    <p>For decades, many AI researchers felt that the route to creating intelligent
    machines was to collect large ontologies of knowledge, and then perform very
    efficient search. Some notable examples from this storied history include Samuel's
    checker players, automated theorem proving, <a
    href="https://en.wikipedia.org/wiki/Cyc">Cyc</a>, and <a
    href="https://en.wikipedia.org/wiki/Deep_Blue_(chess_computer)">Deep Blue</a>
    playing chess.</p>

    <p>Indeed, thanks to decades of research, planning algorithms in AI have
    also scaled to impressive heights, making efficient use of heuristics and
    factorizations to solve impressively large planning instances. Since 1998,
    the International Conference on Automated Planning and Scheduling (ICAPS)
    has been hosting <a
    href="https://www.icaps-conference.org/competitions/">regular planning
    competitions</a> which have helped to solidify <a
    href="https://en.wikipedia.org/wiki/Planning_Domain_Definition_Language">problem
    specification formats</a> and to benchmark the state of the art.</p>

    <p>These algorithms have focused on primarily on logical/discrete planning (although
    they do support "objects", and so can have a sort of open vocabulary).  In the
    context of these notes, these approaches are most closely related to the
    discrete-state, discrete-action, discrete-time planning problems that we discussed
    to introduce <a href="dp.html#graph_search">dynamic programming as graph search</a>.
    Dynamic programming is a very efficient algorithm to solve for the cost-to-go from
    <i>every</i> state, but if we only need to find an (optimal) path from a single
    start state to the goal, we can potentially do better.  In particular, in order to
    scale to very large planning instances, one of the essential ideas is the idea of
    "incremental" search, which can avoid ever putting the entire (often exponentially
    large and sometimes even infinite) graph into memory.</p>
    
    <p>Is it possible to guarantee that we've found an optimal path from a start to a
    goal without visiting every node? Yes! Indeed, one of the key insights that powers
    these incremental algorithms is the use of <a
    href="https://en.wikipedia.org/wiki/Admissible_heuristic">admissible heuristics</a>
    to guide the search -- an approximate cost-to-go which is guaranteed to never
    over-estimate the cost-to-go. A great example of this would be searching for
    directions on a road map -- the Euclidean distance from the start to the goal ("as
    the crow flies") is guaranteed to never over-estimate the true cost to go (which is
    to stay on the roads). Perhaps most famous search algorithm to leverage these
    heuristics is
    <a href="https://en.wikipedia.org/wiki/A*_search_algorithm">A*</a>. In robotics, we
    often use online planning extensions, such as <a
    href="https://en.wikipedia.org/wiki/D*">D* and D*-Lite</a>.
    </p>

    <algorithm><h1>Discrete A*</h1>
      <p>The A* algorithm solves a shortest-path problem on a graph defined by vertices
      $v \in V$, with source vertex, $v_s$ and target vertex, $v_t$, and (weighted)
      edges defined implicitly by a function
      <code>GetSuccessors(</code>$v_i$<code>)</code> which returns the set of vertices reachable from
      $v$. We additionally define:
      <ul>
        <li>$g(v)$ is the optimal "cost to come" from $v_s$ to $v$,</li>
        <li>$h(v)$ is the optimal "cost to go" from $v$ to $v_t$,</li>
        <li>$f(v) = g(v) + h(v)$ is the cost of the optimal path from $v_s$ to
          $v_t$ that is constrained to go through $v$.</li>
      </ul>
      Let's also define a path $p$ as a sequence of vertices (connected by edges in the graph), starting with $v_s$. We can then define the cost-to-come and cost-to-go for $p$:
      <ul>
        <li>$\tilde{g}(p)$ is the cost of traversing the path $p$,</li>
        <li>$\tilde{h}(v)$ is a <i>"heuristic"</i> approximation the optimal cost to go from $v$ to $v_t,$</li>
        <li>$\tilde{f}(p) = \tilde{g}(p) + \tilde{h}(v)$, where $v$ is the last vertex
        in the sequence $p$.</li>
      </ul>
      We say that a heuristic is <i>admissible</i> if it never over-estimates the cost to go: $$\tilde{h}(v) \le h(v), \forall v \in V.$$
      </p>

      <p>
      Using these definitions, the A* algorithm can be written as follows:
      </p>
      <hr/>
      <div style="padding-left: 20px;">
      <div style="display:inline-block;width:120px;border=1px">\(p = [v_s]\)</div><i>current path</i><br/>
      <div style="display:inline-block;width:120px;border=1px">\(S = \{ v_s : p \}\) </div><i>"visited" dictionary</i><br/>
      <div style="display:inline-block;width:120px;border=1px">\(Q.\text{insert}(p)\) </div><i>priority queue prioritized by lowest $\tilde{f}$</i><br/>
      while not \(Q.\text{empty}()\):</br>
      &emsp; \(p = Q.\text{pop}()\)<br/>
      &emsp; \(u = p.\text{last}()\)<br/>
      &emsp; if \(u = v_t,\) then return \(p\)<br/>
      &emsp; for all \(v \in\) GetSuccessors\((u)\), where \(v \notin p\):<br/>
      &emsp; &emsp; \(p' = \text{Path}(p, u)\)<br/>
      &emsp; &emsp; if \(v \notin S\) or \(\tilde{g}(p') < \tilde{g}(S[v]),\):<br/>
      &emsp; &emsp; &emsp; \(S[v] = p'\)<br/>
      &emsp; &emsp; &emsp; \(Q.\text{insert}(p')\)<br/>
      return failure
      </div>
      <hr/>
      <p>A* has a remarkable property: if $\tilde{h}(v)$ is an admissible heuristic, then A* will never expand paths
      for which $\tilde{f}(p) > f(v_s).$ Take a minute to appreciate that! If the
      heuristic $\tilde{h}(v)$ is the optimal cost to go, then A* will only visit
      vertices that are on the optimal path!</p>

      <p>Unfortunately, the flip side of this is that A* will expand <i>all</i> paths in
      the graph for which $\tilde{f}(p) < f(v_s).$ Stronger heuristics lead to more
      performant/scalable search.</p>
    </algorithm>

    <p>In addition to their empirical performance, there are a handful of features that
    one might like to understand about incremental planning algorithms like A*. A first
    is the notion of <b>completeness</b>; a planning algorithm is complete if it is
    guaranteed to find a feasible path from the start to the goal if one exists. A
    second is the notion of <b>optimality</b> (of the returned solution); an optimal
    planning algorithm is one that is guaranteed to find the optimal path. The A*
    algorithm is both complete and optimal.</p>

    <p>In addition to approximate cost-to-go functions, there are numerous other
    heuristics that power state-of-the-art large-scale logical search algorithms.
    Another important one is factorization. For a robotics example, consider a robot
    manipulating many possible objects -- it's reasonable to plan the manipulation of
    one object assuming it's independent of the other objects and then to revise that
    plan only when the optimal plan ends up putting two objects on intersecting paths.
    Many of these heuristics are summarized nicely in the Fast Downward
    paper<elib>Helmert06</elib>; Fast Downward has been at the forefront of the ICAPS
    planning competitions for many years.</p>

    <p>Some of the most visible success stories in deep learning today still make use of
    planning. For example: DeepMind's <a
    href="https://en.wikipedia.org/wiki/AlphaGo">AlphaGo</a> and <a
    href="https://en.wikipedia.org/wiki/AlphaZero">AlphaZero</a> combine the planning
    algorithms developed over the years in discrete games , notably <a
    href="https://en.wikipedia.org/wiki/Monte_Carlo_tree_search">Monte-Carlo Tree Search
    (MCTS)</a>, with learned heuristics in the form of policies and value functions.
    Finding ways to connect Large-Language Models (LLMs) with planning to support
    longer-term reasoning is an extremely active area of research.</p>
  </section>

  <section><h1>Probabilistic RoadMaps (PRMs)</h1>

    <p>How can we generalize powerful incremental algorithms like A* to continuous
    domains for motion planning?</p>
    
    <p>As typical in this literature, let us start by considering kinematic planning
    problems; in these problems we only consider the positions (e.g. configurations),
    $\bq$, and do not consider velocities or accelerations. In the simplest version of
    the problem, any path between configurations $\bq_1$ and $\bq_2$ that is collision
    free is considered feasible.</p>

    <p>If we choose a fixed discretization of the configuration space, like the ones
    that I used when introducing dynamic programming, then we will have sacrificed
    completeness (and therefore also optimality). For instance, we could have a narrow
    corridor in configuration space between obstacles -- a path could exist but it could
    be easily missed by a fixed discretization. So naturally one must consider
    discretization as an iterative procedure, and start by analyzing the completeness of
    the planner the limit of infinite iterations. We use the term <i>resolution
    complete</i> to describe an algorithm that is complete as the resolution becomes
    arbitrarily fine. One can invoke minor assumptions, like assuming that feasible
    paths are not infinitely narrow, to argue that an algorithm which increases the
    resolution until finding a solution (if it exists) will terminate in a finite number
    of iterations.</p>

    <todo>figure on a fixed discretization failing in a narrow passage</todo>

    <p>More popular is a variant which is even simpler to implement, and surprisingly
    effective in practice. Rather than a fixed discretization, let us sample points from
    a uniform random distribution over the configuration space, and reject samples that
    are in collision. We can add edges in the graph to any neighbors for which the
    straight-line path to the neighbor is collision-free. This allows us to build a
    "probabilistic roadmap" over the configuration space offline <elib>Amato96</elib>.
    Online, when presented with a new start and a new goal, we simply connect those
    points to the roadmap and then plan with A* on the graph. This algorithm is complete
    in the limit of infinite samples; we use the term <i>probabilistically
    complete.</i></p>

    <algorithm><h1>Probabilistic RoadMap (PRM)</h1>
      <p>The PRM algorithm separates the computation into two phases: an offline phase
      which builds the roadmap, and an online phase which finds a path using that
      roadmap from a new start to a new goal.</p>
      <hr/>
      def BuildRoadmap(): &emsp; <i>(offline)</i>
      <div style="padding-left: 20px;">
        $V$ = {}, $E$ = {} <br/>
        for k = 1 to K:<br/>
          <div style="padding-left: 20px;">
            do { $\bq$ = UniformRandom() } until $\bq$ is collision-free<br/>
            $V$.insert($\bq$)<br/>
          </div>
        for all $\bq$ in $V$:<br/>
          <div style="padding-left: 20px;">
          for all $\bq_n$ in NearestNeighbors($\bq$, V):<br/>
          &emsp;if {$\bq, \bq_n$} is collision free:<br/>
          &emsp;&emsp;    $E$.insert({$\bq,\bq_n$})
          </div>
      </div>
      <hr/>
      def FindPath($V$, $E$, $\bq_s$, $\bq_t$): &emsp; <i>(online "query phase")</i>
      <div style="padding-left: 20px;">
        $V$.insert($\bq_s$)<br/>
        $E$.insert(shortest collision-free path from $\bq_s$ to $V$)<br/>
        $V$.insert($\bq_t$)<br/>
        $E$.insert(shortest collision-free path from $V$ to $\bq_t$)<br/>
        return DiscreteGraphSearch($V$, $E$, $\bq_s$, $\bq_t$)
      </div>
    </algorithm>

    <todo>PRM examples!</todo>

    <p>PRMs are a simple idea but surprisingly effective in practice, at least up to
    let's say 10 dimensions. The most expensive steps of the algorithm are the nearest
    neighbor queries and the collision detection queries that are called not only on the
    sampled vertices, but also for the edges. Candidate edges are typically checked for
    collisions by sampling densely along the edge and performing many point collision
    detection queries, but algorithms with stronger guarantees do exist (c.f.
    <elib>Amice23</elib>). Efficient PRM implementations optimize these queries and
    parallelize them on the CPU or GPU. At TRI we have optimized implementations which
    we intend to open-source in Drake, but admittedly <a
    href="https://github.com/RobotLocomotion/drake/issues/14431">the issue</a> has been
    open for a long time! </p>

    <example><h1>How well would a PRM swing up a pendulum?</h1>
    
    </example>

  </section>

  <section><h1>Rapidly-exploring Random Trees (RRTs)</h1>

    <p>Thinking about how to extend the PRM to accommodate dynamic constraints leads up
    to a slightly different, and also wildly popular, algorithm, known as the RRT.</p>

    <example><h1>Planning with a Random Tree</h1>

      <p>Let us denote the data structure which contains the tree as ${\cal
      T}$.  The algorithm is very simple:
      <ul>
        <li>Initialize the tree with the start state: ${\cal T} \leftarrow \bx_0$.</li>
        <li>On each iteration:
          <ul>
            <li>Select a random node, $\bx_{rand}$, from the tree, ${\cal T}$</li>
            <li>Select a random action, $\bu_{rand}$, from a distribution over feasible actions.</li>
            <li>Compute the dynamics: $\bx_{new} = f(\bx_{rand},\bu_{rand})$</li>
            <li>If $\bx_{new} \in {\cal G}$, then terminate. Solution found!</li>
            <li>Otherwise add the new node to the tree, ${\cal T} \leftarrow \bx_{new}$.</li>
          </ul>
        </li>
      </ul>
      It can be shown that this algorithm is, in fact, probabilistically
      complete.  However, without strong heuristics to guide the selection of
      the nodes scheduled for expansion, it can be extremely inefficient.  For
      a simple example, consider the system $\bx[n] = \bu[n]$ with $\bx \in
      \Re^2$ and $\bu_i \in [-1,1]$. We'll start at the origin and put the goal
      region as $\forall i, 15 \le x_i \le 20$.</p>

      <!--
      Try it yourself:
     
      <pre><code class="matlab" testfile="testRandomTree">
T = struct('parent',zeros(1,1000),'node',zeros(2,1000));  % pre-allocate memory for the "tree"
for i=2:size(T.parent,2)
T.parent(i) = randi(i-1);
x_rand = T.node(:,T.parent(i));
u_rand = 2*rand(2,1)-1;
x_new = x_rand+u_rand;
if (15<=x_new(1) && x_new(1)<=20 && 15<=x_new(2) && x_new(2)<=20)
  disp('Success!'); break;
end
T.node(:,i) = x_new;
end
clf;
line([T.node(1,T.parent(2:end));T.node(1,2:end)],[T.node(2,T.parent(2:end));T.node(2,2:end)],'Color','k');
patch([15,15,20,20],[15,20,20,15],'g')
axis([-10,25,-10,25]);
</code></pre>
-->

      <p>Although this "straw-man" algorithm is probabilistically complete, it
      is certainly not efficient. After expanding 1000 nodes, the tree is
      basically a mess of node points all right on top of each other:
      <figure>
        <img width="70%" src="figures/random_tree.svg"/>
      </figure>
      <p>We're nowhere close to the goal yet, and this is a particularly easy
      problem instance.
      </p>

    </example>

    <p>The idea of generating a tree of feasible points has clear advantages,
    but it seems that we have lost the ability to mark a region of space as
    having been sufficiently explored. It seems that, to make randomized
    algorithms effective, we are going to at the very least need some form of
    heuristic for encouraging the nodes to spread out and explore the space.
    </p>

      <figure>
        <img width="70%" src="figures/rrt.svg"/>
      </figure>

      <figure>
        <img width="90%" src="figures/rrt_basic.svg"/>
        <figcaption>(<a href="https://manipulation.csail.mit.edu/data/rrt_basic.html">Click here to watch the animation</a>)</figcaption>
      </figure>

      <figure>
        <img width="90%" src="figures/rrt_voronoi.svg"/>
        <figcaption>(<a href="figures/rrt_voronoi.swf">Click here to watch the animation</a>)</figcaption>
      </figure>


    <subsection><h1>RRTs for robots with dynamics</h1>

    </subsection>

    <subsection><h1>Variations and extensions</h1>

      <p>RRT*, RRT-sharp, RRTx, ...  </p>
  
      <p>Kinodynamic-RRT*, LQR-RRT(*)</p>
  
      <p>Complexity bounds and dispersion limits</p>
  
    </subsection>
  </section>


    <!--
    <subsection><h1>Discussion</h1>

      <p>Not sure yet whether randomness is fundamental here, or whether is a temporary "crutch"
        until we understand geometric and dynamic planning better.</p>

    </subsection>
-->

  <section><h1>Decomposition methods</h1>

    <p>Cell decomposition...</p>

    <p>Approximate decompositions for complex environments (e.g. IRIS)</p>

  </section>

  <section><h1>Exercises</h1>

    <exercise><h1>RRT Planning</h1>

      <p>In this  <script>document.write(notebook_link('planning', 'rrt_planning', link_text = 'notebook'))</script> 
         we will write code for the Rapidly-Exploring Random 
         Tree (RRT). Building on this implementation we will also implement RRT*, 
         a variant of RRT that converges towards an optimal solution.</p>

      <ol type="a">

        <li>Implement RRT</li>

        <li>Implement RRT*</li>

      </ol>

    </exercise>


  </section>


</chapter>
<!-- EVERYTHING BELOW THIS LINE IS OVERWRITTEN BY THE INSTALL SCRIPT -->

<div id="references"><section><h1>References</h1>
<ol>

<li id=LaValle06>
<span class="author">Steven M. LaValle</span>, 
<span class="title">"Planning Algorithms"</span>, Cambridge University Press
, <span class="year">2006</span>.

</li><br>
<li id=Helmert06>
<span class="author">Malte Helmert</span>, 
<span class="title">"The fast downward planning system"</span>, 
<span class="publisher">Journal of Artificial Intelligence Research</span>, vol. 26, pp. 191--246, <span class="year">2006</span>.

</li><br>
<li id=Amato96>
<span class="author">N.M. Amato and Y. Wu</span>, 
<span class="title">"A randomized roadmap method for path and manipulation planning"</span>, 
<span class="publisher">Proceedings of the IEEE International Conference on Robotics and Automation</span> , vol. 1, pp. 113 - 120, <span class="year">1996</span>.

</li><br>
<li id=Amice23>
<span class="author">Amice and Alexandre and Werner and Peter and Tedrake and Russ</span>, 
<span class="title">"Certifying Bimanual RRT Motion Plans in a Second"</span>, 
<span class="publisher">arxiv</span>, <span class="year">2023</span>.
[&nbsp;<a href="http://groups.csail.mit.edu/robotics-center/public_papers/Amice23.pdf">link</a>&nbsp;]

</li><br>
</ol>
</section><p/>
</div>

<table style="width:100%;"><tr style="width:100%">
  <td style="width:33%;text-align:left;"><a class="previous_chapter" href=policy_search.html>Previous Chapter</a></td>
  <td style="width:33%;text-align:center;"><a href=index.html>Table of contents</a></td>
  <td style="width:33%;text-align:right;"><a class="next_chapter" href=feedback_motion_planning.html>Next Chapter</a></td>
</tr></table>

<div id="footer">
  <hr>
  <table style="width:100%;">
    <tr><td><a href="https://accessibility.mit.edu/">Accessibility</a></td><td style="text-align:right">&copy; Russ
      Tedrake, 2024</td></tr>
  </table>
</div>


</body>
</html>