/
planning.html
493 lines (406 loc) · 23.5 KB
/
planning.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
<!DOCTYPE html>
<html>
<head>
<title>Ch. 12 - Sampling-based motion planning</title>
<meta name="Ch. 12 - Sampling-based motion planning" content="text/html; charset=utf-8;" />
<link rel="canonical" href="http://underactuated.mit.edu/planning.html" />
<script src="https://hypothes.is/embed.js" async></script>
<script type="text/javascript" src="chapters.js"></script>
<script type="text/javascript" src="htmlbook/book.js"></script>
<script src="htmlbook/mathjax-config.js" defer></script>
<script type="text/javascript" id="MathJax-script" defer
src="htmlbook/MathJax/es5/tex-chtml.js">
</script>
<script>window.MathJax || document.write('<script type="text/javascript" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml.js" defer><\/script>')</script>
<link rel="stylesheet" href="htmlbook/highlight/styles/default.css">
<script src="htmlbook/highlight/highlight.pack.js"></script> <!-- http://highlightjs.readthedocs.io/en/latest/css-classes-reference.html#language-names-and-aliases -->
<script>hljs.initHighlightingOnLoad();</script>
<link rel="stylesheet" type="text/css" href="htmlbook/book.css" />
</head>
<body onload="loadChapter('underactuated');">
<div data-type="titlepage">
<header>
<h1><a href="index.html" style="text-decoration:none;">Underactuated Robotics</a></h1>
<p data-type="subtitle">Algorithms for Walking, Running, Swimming, Flying, and Manipulation</p>
<p style="font-size: 18px;"><a href="http://people.csail.mit.edu/russt/">Russ Tedrake</a></p>
<p style="font-size: 14px; text-align: right;">
© Russ Tedrake, 2024<br/>
Last modified <span id="last_modified"></span>.</br>
<script>
var d = new Date(document.lastModified);
document.getElementById("last_modified").innerHTML = d.getFullYear() + "-" + (d.getMonth()+1) + "-" + d.getDate();</script>
<a href="misc.html">How to cite these notes, use annotations, and give feedback.</a><br/>
</p>
</header>
</div>
<p><b>Note:</b> These are working notes used for <a
href="https://underactuated.csail.mit.edu/Spring2024/">a course being taught
at MIT</a>. They will be updated throughout the Spring 2024 semester. <a
href="https://www.youtube.com/playlist?list=PLkx8KyIQkMfU5szP43GlE_S1QGSPQfL9s">Lecture videos are available on YouTube</a>.</p>
<table style="width:100%;"><tr style="width:100%">
<td style="width:33%;text-align:left;"><a class="previous_chapter" href=policy_search.html>Previous Chapter</a></td>
<td style="width:33%;text-align:center;"><a href=index.html>Table of contents</a></td>
<td style="width:33%;text-align:right;"><a class="next_chapter" href=feedback_motion_planning.html>Next Chapter</a></td>
</tr></table>
<script type="text/javascript">document.write(notebook_header('planning'))
</script>
<!-- EVERYTHING ABOVE THIS LINE IS OVERWRITTEN BY THE INSTALL SCRIPT -->
<chapter style="counter-reset: chapter 11"><h1>Sampling-based motion planning</h1>
<p>The term "motion planning" is a quite general term which almost certainly
encompasses the dynamic programming, feedback design, and trajectory optimization
algorithms that we have already discussed. However, there are a number of algorithms
and ideas that we have not yet discussed which have grown from the idea of formulating
motion planning as a search problem -- for instance searching for a path from a start
to a goal in a graph which is too large to fit in memory. In order to deal with the
continuous domains that are common in robotics, these algorithms often rely on
randomized sampling.</p>
<!--Some, but certainly not all, of these algorithms sacrifice optimality in order to
find any path if it exists, and the notion of a planner being "complete" -- guaranteed
to find a path if one exists -- is highly valued.--> </p>
<p>My goal for this chapter is to introduce these additional tools into our toolkit.
For robotics, they will play a particularly valuable role when the planning problem is
geometrically complex (e.g. a robot moving around obstacles in 3D) or the optimization
landscape is very nonconvex, because these are the problems where the nonlinear
trajectory optimization formulations that we've studied before will potentially suffer
badly from local minima. Many of these algorithms were developed initially for
discrete or purely kinematic planning problems; a major theme of this chapter will be
the adaptations that allow them to work for systems with dynamic constraints
(including underactuated systems).</p>
<p><elib>LaValle06</elib> is a very nice book on planning algorithms in general and on
motion planning algorithms in particular.</p>
<section><h1>Large-scale Incremental Search</h1>
<p>For decades, many AI researchers felt that the route to creating intelligent
machines was to collect large ontologies of knowledge, and then perform very
efficient search. Some notable examples from this storied history include Samuel's
checker players, automated theorem proving, <a
href="https://en.wikipedia.org/wiki/Cyc">Cyc</a>, and <a
href="https://en.wikipedia.org/wiki/Deep_Blue_(chess_computer)">Deep Blue</a>
playing chess.</p>
<p>Indeed, thanks to decades of research, planning algorithms in AI have
also scaled to impressive heights, making efficient use of heuristics and
factorizations to solve impressively large planning instances. Since 1998,
the International Conference on Automated Planning and Scheduling (ICAPS)
has been hosting <a
href="https://www.icaps-conference.org/competitions/">regular planning
competitions</a> which have helped to solidify <a
href="https://en.wikipedia.org/wiki/Planning_Domain_Definition_Language">problem
specification formats</a> and to benchmark the state of the art.</p>
<p>These algorithms have focused on primarily on logical/discrete planning (although
they do support "objects", and so can have a sort of open vocabulary). In the
context of these notes, these approaches are most closely related to the
discrete-state, discrete-action, discrete-time planning problems that we discussed
to introduce <a href="dp.html#graph_search">dynamic programming as graph search</a>.
Dynamic programming is a very efficient algorithm to solve for the cost-to-go from
<i>every</i> state, but if we only need to find an (optimal) path from a single
start state to the goal, we can potentially do better. In particular, in order to
scale to very large planning instances, one of the essential ideas is the idea of
"incremental" search, which can avoid ever putting the entire (often exponentially
large and sometimes even infinite) graph into memory.</p>
<p>Is it possible to guarantee that we've found an optimal path from a start to a
goal without visiting every node? Yes! Indeed, one of the key insights that powers
these incremental algorithms is the use of <a
href="https://en.wikipedia.org/wiki/Admissible_heuristic">admissible heuristics</a>
to guide the search -- an approximate cost-to-go which is guaranteed to never
over-estimate the cost-to-go. A great example of this would be searching for
directions on a road map -- the Euclidean distance from the start to the goal ("as
the crow flies") is guaranteed to never over-estimate the true cost to go (which is
to stay on the roads). Perhaps most famous search algorithm to leverage these
heuristics is
<a href="https://en.wikipedia.org/wiki/A*_search_algorithm">A*</a>. In robotics, we
often use online planning extensions, such as <a
href="https://en.wikipedia.org/wiki/D*">D* and D*-Lite</a>.
</p>
<algorithm><h1>Discrete A*</h1>
<p>The A* algorithm solves a shortest-path problem on a graph defined by vertices
$v \in V$, with source vertex, $v_s$ and target vertex, $v_t$, and (weighted)
edges defined implicitly by a function
<code>GetSuccessors(</code>$v_i$<code>)</code> which returns the set of vertices reachable from
$v$. We additionally define:
<ul>
<li>$g(v)$ is the optimal "cost to come" from $v_s$ to $v$,</li>
<li>$h(v)$ is the optimal "cost to go" from $v$ to $v_t$,</li>
<li>$f(v) = g(v) + h(v)$ is the cost of the optimal path from $v_s$ to
$v_t$ that is constrained to go through $v$.</li>
</ul>
Let's also define a path $p$ as a sequence of vertices (connected by edges in the graph), starting with $v_s$. We can then define the cost-to-come and cost-to-go for $p$:
<ul>
<li>$\tilde{g}(p)$ is the cost of traversing the path $p$,</li>
<li>$\tilde{h}(v)$ is a <i>"heuristic"</i> approximation the optimal cost to go from $v$ to $v_t,$</li>
<li>$\tilde{f}(p) = \tilde{g}(p) + \tilde{h}(v)$, where $v$ is the last vertex
in the sequence $p$.</li>
</ul>
We say that a heuristic is <i>admissible</i> if it never over-estimates the cost to go: $$\tilde{h}(v) \le h(v), \forall v \in V.$$
</p>
<p>
Using these definitions, the A* algorithm can be written as follows:
</p>
<hr/>
<div style="padding-left: 20px;">
<div style="display:inline-block;width:120px;border=1px">\(p = [v_s]\)</div><i>current path</i><br/>
<div style="display:inline-block;width:120px;border=1px">\(S = \{ v_s : p \}\) </div><i>"visited" dictionary</i><br/>
<div style="display:inline-block;width:120px;border=1px">\(Q.\text{insert}(p)\) </div><i>priority queue prioritized by lowest $\tilde{f}$</i><br/>
while not \(Q.\text{empty}()\):</br>
  \(p = Q.\text{pop}()\)<br/>
  \(u = p.\text{last}()\)<br/>
  if \(u = v_t,\) then return \(p\)<br/>
  for all \(v \in\) GetSuccessors\((u)\), where \(v \notin p\):<br/>
    \(p' = \text{Path}(p, u)\)<br/>
    if \(v \notin S\) or \(\tilde{g}(p') < \tilde{g}(S[v]),\):<br/>
      \(S[v] = p'\)<br/>
      \(Q.\text{insert}(p')\)<br/>
return failure
</div>
<hr/>
<p>A* has a remarkable property: if $\tilde{h}(v)$ is an admissible heuristic, then A* will never expand paths
for which $\tilde{f}(p) > f(v_s).$ Take a minute to appreciate that! If the
heuristic $\tilde{h}(v)$ is the optimal cost to go, then A* will only visit
vertices that are on the optimal path!</p>
<p>Unfortunately, the flip side of this is that A* will expand <i>all</i> paths in
the graph for which $\tilde{f}(p) < f(v_s).$ Stronger heuristics lead to more
performant/scalable search.</p>
</algorithm>
<p>In addition to their empirical performance, there are a handful of features that
one might like to understand about incremental planning algorithms like A*. A first
is the notion of <b>completeness</b>; a planning algorithm is complete if it is
guaranteed to find a feasible path from the start to the goal if one exists. A
second is the notion of <b>optimality</b> (of the returned solution); an optimal
planning algorithm is one that is guaranteed to find the optimal path. The A*
algorithm is both complete and optimal.</p>
<p>In addition to approximate cost-to-go functions, there are numerous other
heuristics that power state-of-the-art large-scale logical search algorithms.
Another important one is factorization. For a robotics example, consider a robot
manipulating many possible objects -- it's reasonable to plan the manipulation of
one object assuming it's independent of the other objects and then to revise that
plan only when the optimal plan ends up putting two objects on intersecting paths.
Many of these heuristics are summarized nicely in the Fast Downward
paper<elib>Helmert06</elib>; Fast Downward has been at the forefront of the ICAPS
planning competitions for many years.</p>
<p>Some of the most visible success stories in deep learning today still make use of
planning. For example: DeepMind's <a
href="https://en.wikipedia.org/wiki/AlphaGo">AlphaGo</a> and <a
href="https://en.wikipedia.org/wiki/AlphaZero">AlphaZero</a> combine the planning
algorithms developed over the years in discrete games , notably <a
href="https://en.wikipedia.org/wiki/Monte_Carlo_tree_search">Monte-Carlo Tree Search
(MCTS)</a>, with learned heuristics in the form of policies and value functions.
Finding ways to connect Large-Language Models (LLMs) with planning to support
longer-term reasoning is an extremely active area of research.</p>
</section>
<section><h1>Probabilistic RoadMaps (PRMs)</h1>
<p>How can we generalize powerful incremental algorithms like A* to continuous
domains for motion planning?</p>
<p>As typical in this literature, let us start by considering kinematic planning
problems; in these problems we only consider the positions (e.g. configurations),
$\bq$, and do not consider velocities or accelerations. In the simplest version of
the problem, any path between configurations $\bq_1$ and $\bq_2$ that is collision
free is considered feasible.</p>
<p>If we choose a fixed discretization of the configuration space, like the ones
that I used when introducing dynamic programming, then we will have sacrificed
completeness (and therefore also optimality). For instance, we could have a narrow
corridor in configuration space between obstacles -- a path could exist but it could
be easily missed by a fixed discretization. So naturally one must consider
discretization as an iterative procedure, and start by analyzing the completeness of
the planner the limit of infinite iterations. We use the term <i>resolution
complete</i> to describe an algorithm that is complete as the resolution becomes
arbitrarily fine. One can invoke minor assumptions, like assuming that feasible
paths are not infinitely narrow, to argue that an algorithm which increases the
resolution until finding a solution (if it exists) will terminate in a finite number
of iterations.</p>
<todo>figure on a fixed discretization failing in a narrow passage</todo>
<p>More popular is a variant which is even simpler to implement, and surprisingly
effective in practice. Rather than a fixed discretization, let us sample points from
a uniform random distribution over the configuration space, and reject samples that
are in collision. We can add edges in the graph to any neighbors for which the
straight-line path to the neighbor is collision-free. This allows us to build a
"probabilistic roadmap" over the configuration space offline <elib>Amato96</elib>.
Online, when presented with a new start and a new goal, we simply connect those
points to the roadmap and then plan with A* on the graph. This algorithm is complete
in the limit of infinite samples; we use the term <i>probabilistically
complete.</i></p>
<algorithm><h1>Probabilistic RoadMap (PRM)</h1>
<p>The PRM algorithm separates the computation into two phases: an offline phase
which builds the roadmap, and an online phase which finds a path using that
roadmap from a new start to a new goal.</p>
<hr/>
def BuildRoadmap():   <i>(offline)</i>
<div style="padding-left: 20px;">
$V$ = {}, $E$ = {} <br/>
for k = 1 to K:<br/>
<div style="padding-left: 20px;">
do { $\bq$ = UniformRandom() } until $\bq$ is collision-free<br/>
$V$.insert($\bq$)<br/>
</div>
for all $\bq$ in $V$:<br/>
<div style="padding-left: 20px;">
for all $\bq_n$ in NearestNeighbors($\bq$, V):<br/>
 if {$\bq, \bq_n$} is collision free:<br/>
   $E$.insert({$\bq,\bq_n$})
</div>
</div>
<hr/>
def FindPath($V$, $E$, $\bq_s$, $\bq_t$):   <i>(online "query phase")</i>
<div style="padding-left: 20px;">
$V$.insert($\bq_s$)<br/>
$E$.insert(shortest collision-free path from $\bq_s$ to $V$)<br/>
$V$.insert($\bq_t$)<br/>
$E$.insert(shortest collision-free path from $V$ to $\bq_t$)<br/>
return DiscreteGraphSearch($V$, $E$, $\bq_s$, $\bq_t$)
</div>
</algorithm>
<todo>PRM examples!</todo>
<p>PRMs are a simple idea but surprisingly effective in practice, at least up to
let's say 10 dimensions. The most expensive steps of the algorithm are the nearest
neighbor queries and the collision detection queries that are called not only on the
sampled vertices, but also for the edges. Candidate edges are typically checked for
collisions by sampling densely along the edge and performing many point collision
detection queries, but algorithms with stronger guarantees do exist (c.f.
<elib>Amice23</elib>). Efficient PRM implementations optimize these queries and
parallelize them on the CPU or GPU. At TRI we have optimized implementations which
we intend to open-source in Drake, but admittedly <a
href="https://github.com/RobotLocomotion/drake/issues/14431">the issue</a> has been
open for a long time! </p>
<example><h1>How well would a PRM swing up a pendulum?</h1>
</example>
</section>
<section><h1>Rapidly-exploring Random Trees (RRTs)</h1>
<p>Thinking about how to extend the PRM to accommodate dynamic constraints leads up
to a slightly different, and also wildly popular, algorithm, known as the RRT.</p>
<example><h1>Planning with a Random Tree</h1>
<p>Let us denote the data structure which contains the tree as ${\cal
T}$. The algorithm is very simple:
<ul>
<li>Initialize the tree with the start state: ${\cal T} \leftarrow \bx_0$.</li>
<li>On each iteration:
<ul>
<li>Select a random node, $\bx_{rand}$, from the tree, ${\cal T}$</li>
<li>Select a random action, $\bu_{rand}$, from a distribution over feasible actions.</li>
<li>Compute the dynamics: $\bx_{new} = f(\bx_{rand},\bu_{rand})$</li>
<li>If $\bx_{new} \in {\cal G}$, then terminate. Solution found!</li>
<li>Otherwise add the new node to the tree, ${\cal T} \leftarrow \bx_{new}$.</li>
</ul>
</li>
</ul>
It can be shown that this algorithm is, in fact, probabilistically
complete. However, without strong heuristics to guide the selection of
the nodes scheduled for expansion, it can be extremely inefficient. For
a simple example, consider the system $\bx[n] = \bu[n]$ with $\bx \in
\Re^2$ and $\bu_i \in [-1,1]$. We'll start at the origin and put the goal
region as $\forall i, 15 \le x_i \le 20$.</p>
<!--
Try it yourself:
<pre><code class="matlab" testfile="testRandomTree">
T = struct('parent',zeros(1,1000),'node',zeros(2,1000)); % pre-allocate memory for the "tree"
for i=2:size(T.parent,2)
T.parent(i) = randi(i-1);
x_rand = T.node(:,T.parent(i));
u_rand = 2*rand(2,1)-1;
x_new = x_rand+u_rand;
if (15<=x_new(1) && x_new(1)<=20 && 15<=x_new(2) && x_new(2)<=20)
disp('Success!'); break;
end
T.node(:,i) = x_new;
end
clf;
line([T.node(1,T.parent(2:end));T.node(1,2:end)],[T.node(2,T.parent(2:end));T.node(2,2:end)],'Color','k');
patch([15,15,20,20],[15,20,20,15],'g')
axis([-10,25,-10,25]);
</code></pre>
-->
<p>Although this "straw-man" algorithm is probabilistically complete, it
is certainly not efficient. After expanding 1000 nodes, the tree is
basically a mess of node points all right on top of each other:
<figure>
<img width="70%" src="figures/random_tree.svg"/>
</figure>
<p>We're nowhere close to the goal yet, and this is a particularly easy
problem instance.
</p>
</example>
<p>The idea of generating a tree of feasible points has clear advantages,
but it seems that we have lost the ability to mark a region of space as
having been sufficiently explored. It seems that, to make randomized
algorithms effective, we are going to at the very least need some form of
heuristic for encouraging the nodes to spread out and explore the space.
</p>
<figure>
<img width="70%" src="figures/rrt.svg"/>
</figure>
<figure>
<img width="90%" src="figures/rrt_basic.svg"/>
<figcaption>(<a href="https://manipulation.csail.mit.edu/data/rrt_basic.html">Click here to watch the animation</a>)</figcaption>
</figure>
<figure>
<img width="90%" src="figures/rrt_voronoi.svg"/>
<figcaption>(<a href="figures/rrt_voronoi.swf">Click here to watch the animation</a>)</figcaption>
</figure>
<subsection><h1>RRTs for robots with dynamics</h1>
</subsection>
<subsection><h1>Variations and extensions</h1>
<p>RRT*, RRT-sharp, RRTx, ... </p>
<p>Kinodynamic-RRT*, LQR-RRT(*)</p>
<p>Complexity bounds and dispersion limits</p>
</subsection>
</section>
<!--
<subsection><h1>Discussion</h1>
<p>Not sure yet whether randomness is fundamental here, or whether is a temporary "crutch"
until we understand geometric and dynamic planning better.</p>
</subsection>
-->
<section><h1>Decomposition methods</h1>
<p>Cell decomposition...</p>
<p>Approximate decompositions for complex environments (e.g. IRIS)</p>
</section>
<section><h1>Exercises</h1>
<exercise><h1>RRT Planning</h1>
<p>In this <script>document.write(notebook_link('planning', 'rrt_planning', link_text = 'notebook'))</script>
we will write code for the Rapidly-Exploring Random
Tree (RRT). Building on this implementation we will also implement RRT*,
a variant of RRT that converges towards an optimal solution.</p>
<ol type="a">
<li>Implement RRT</li>
<li>Implement RRT*</li>
</ol>
</exercise>
</section>
</chapter>
<!-- EVERYTHING BELOW THIS LINE IS OVERWRITTEN BY THE INSTALL SCRIPT -->
<div id="references"><section><h1>References</h1>
<ol>
<li id=LaValle06>
<span class="author">Steven M. LaValle</span>,
<span class="title">"Planning Algorithms"</span>, Cambridge University Press
, <span class="year">2006</span>.
</li><br>
<li id=Helmert06>
<span class="author">Malte Helmert</span>,
<span class="title">"The fast downward planning system"</span>,
<span class="publisher">Journal of Artificial Intelligence Research</span>, vol. 26, pp. 191--246, <span class="year">2006</span>.
</li><br>
<li id=Amato96>
<span class="author">N.M. Amato and Y. Wu</span>,
<span class="title">"A randomized roadmap method for path and manipulation planning"</span>,
<span class="publisher">Proceedings of the IEEE International Conference on Robotics and Automation</span> , vol. 1, pp. 113 - 120, <span class="year">1996</span>.
</li><br>
<li id=Amice23>
<span class="author">Amice and Alexandre and Werner and Peter and Tedrake and Russ</span>,
<span class="title">"Certifying Bimanual RRT Motion Plans in a Second"</span>,
<span class="publisher">arxiv</span>, <span class="year">2023</span>.
[ <a href="http://groups.csail.mit.edu/robotics-center/public_papers/Amice23.pdf">link</a> ]
</li><br>
</ol>
</section><p/>
</div>
<table style="width:100%;"><tr style="width:100%">
<td style="width:33%;text-align:left;"><a class="previous_chapter" href=policy_search.html>Previous Chapter</a></td>
<td style="width:33%;text-align:center;"><a href=index.html>Table of contents</a></td>
<td style="width:33%;text-align:right;"><a class="next_chapter" href=feedback_motion_planning.html>Next Chapter</a></td>
</tr></table>
<div id="footer">
<hr>
<table style="width:100%;">
<tr><td><a href="https://accessibility.mit.edu/">Accessibility</a></td><td style="text-align:right">© Russ
Tedrake, 2024</td></tr>
</table>
</div>
</body>
</html>