diff --git a/book/figures/stochastic_force_field_pang22.png b/book/figures/stochastic_force_field_pang22.png new file mode 100644 index 00000000..007a18bf Binary files /dev/null and b/book/figures/stochastic_force_field_pang22.png differ diff --git a/book/figures/vanderpol_particles.svg b/book/figures/vanderpol_particles.svg new file mode 100644 index 00000000..5ecf8e39 --- /dev/null +++ b/book/figures/vanderpol_particles.svgdiff --git a/book/index.html b/book/index.html index c9d0068a..02b02f80 100644 --- a/book/index.html +++ b/book/index.html @@ -255,7 +255,8 @@

Table of Contents

  • Stationary Distributions
  • Extended Example: The Rimless Wheel on Rough Terrain
  • -
  • Noise models for real robots/systems.
  • +
  • Randomized smoothing of contact dynamics
  • +
  • Noise models for real robots/systems.
  • Nonlinear Planning and Control

  • Chapter 7: Dynamic Programming
  • diff --git a/book/stochastic.html b/book/stochastic.html index ad055eb3..d5e01330 100644 --- a/book/stochastic.html +++ b/book/stochastic.html @@ -67,25 +67,23 @@

    Underactuated RoboticsMy goals for this chapter are to build intuition for the beautiful and rich behavior of nonlinear dynamical system that are subjected to random - (noise/disturbance) inputs. So far we have focused primarily on systems - described by \[ \dot{\bx}(t) = f(\bx(t),\bu(t)) \quad \text{or} \quad \bx[n+1] - = f(\bx[n],\bu[n]). \] In this chapter, I would like to broaden the scope to - think about \[ \dot{\bx}(t) = f(\bx(t),\bu(t),\bw(t)) \quad \text{or} \quad - \bx[n+1] = f(\bx[n],\bu[n],\bw[n]), \] where this additional input $\bw$ is - the (vector) output of some random process. In other words, we can begin - thinking about stochastic systems by simply understanding the dynamics of our - existing ODEs subjected to an additional random input.

    - -

    This form is extremely general as written. $\bw(t)$ can represent - time-varying random disturbances (e.g. gusts of wind), or even constant model - errors/uncertainty. One thing that we are not adding, yet, is measurement - uncertainty. There is a great deal of work on observability and state - estimation that study the question of how you can infer the true state of the - system given noise sensor readings. For this chapter we are assuming perfect - measurements of the full state, and are focused instead on the way that + (noise/disturbance) inputs. So far we have focused primarily on systems described by + \[ \dot{\bx}(t) = f(\bx(t),\bu(t)) \quad \text{or} \quad \bx[n+1] = f(\bx[n],\bu[n]). + \] In this chapter, I would like to broaden the scope to think about \[ \dot{\bx}(t) = + f(\bx(t),\bu(t),\bw(t)) \quad \text{or} \quad \bx[n+1] = f(\bx[n],\bu[n],\bw[n]), \] + where this additional input $\bw$ is the (vector) output of some random process. In + other words, we can begin thinking about stochastic systems by simply understanding + the dynamics of our existing ODEs subjected to an additional random input.

    + +

    This form is extremely general as written. $\bw(t)$ can represent time-varying + random disturbances (e.g. gusts of wind), or even constant model errors/uncertainty. + One thing that we are not adding, yet, is measurement uncertainty (this will come + later, when we discuss state estimation and output feedback). For this chapter we are assuming + perfect measurements of the full state, and are focused instead on the way that "process noise" shapes the long-term dynamics of the system.

    -

    I will also stick primarily to discrete time dynamics for this chapter, +

    I will also stick primarily to discrete-time dynamics for this chapter, simply because it is easier to think about the output of a discrete-time random process, $\bw[n]$, than a $\bw(t)$. But you should know that all of the ideas work in continuous time, too. Also, most of our examples will take the @@ -210,15 +208,15 @@

    Underactuated Robotics -

    Here's what's completely fascinating -- even though the dynamics of any - one initial condition for this system are extremely complex, if we study - the dynamics of a distribution of states through the system, they are - surprisingly simple and well-behaved. This system is one of the rare - cases when we can write the master equation in closed - formLasota13: $$p_{n+1}(x) = \frac{1}{4\sqrt{1-x}} \left[ - p_n\left(\frac{1}{2}-\frac{1}{2}\sqrt{1-x}\right) + p_n\left(\frac{1}{2} + - \frac{1}{2}\sqrt{1-x}\right) \right].$$ Moreover, this master equation - has a steady-state solution: $$p_*(x) = \frac{1}{\pi\sqrt{x(1-x)}}.$$

    +

    Here's what's completely fascinating -- even though the dynamics of any one + initial condition for this system are extremely complex, if we study the dynamics + of a distribution of states through the system, they are surprisingly simple and + well-behaved. This system is one of the rare cases when we can write the master + equation in closed formLasota13: $$p_{n+1}(x) = \frac{1}{4\sqrt{1-x}} + \left[ p_n\left(\frac{1}{2}-\frac{1}{2}\sqrt{1-x}\right) + p_n\left(\frac{1}{2} + + \frac{1}{2}\sqrt{1-x}\right) \right].$$ Moreover, this master equation has a + steady-state solution: $$p_*(x) = \frac{1}{\pi\sqrt{x(1-x)}}, \qquad x \in [0, + 1].$$

    Plotting the (closed-form) evolution of the master equation @@ -255,14 +253,13 @@

    Underactuated Robotics

    Stationary Distributions

    -

    In the example above, the histogram is our numerical approximation of - the probability density. The logistic map example had the remarkable - property that, although the individual trajectories of the system do - not converge, the probability distribution actually does - converge to what's known as a stationary distribution -- a fixed - point of the master equation. Instead of thinking about the dynamics of - the trajectories, we need to start thinking about the dynamics of the - distribution.

    +

    In the example above, the histogram is our numerical approximation of the + probability density. Each of those example systems had the remarkable property that, + although the individual trajectories of the system do not converge, the + probability distribution actually does converge to what's known as a + stationary distribution -- a fixed point of the master equation. Instead of + thinking about the dynamics of the trajectories, we need to start thinking about the + dynamics of the distribution.

    The most important example of this analysis is for systems with linear dynamics and additive Gaussian noise; for this case we have closed-form @@ -301,10 +298,11 @@

    Underactuated RoboticsKalman filter.

    -

    Taking it a step further, we can see that a stationary distribution for - this system is given by a mean-zero Gaussian with \[ \sigma_*^2 = - \frac{\sigma_w^2}{1-a^2}. \] Note that this distribution is well defined - when $-1 < a < 1$ (only when the system is stable).

    +

    Taking it a step further, we can see that a stationary distribution for this + system is given by a mean-zero Gaussian with \[ \sigma_*^2 = + \frac{\sigma_w^2}{1-a^2}. \] Note that this distribution is well defined when $-1 + < a < 1$. In this case, these are the same conditions we have for + deterministic stability of this system.

    @@ -317,11 +315,10 @@

    Underactuated Robotics -

    Given how rich the dynamics can be for deterministic nonlinear systems, - you can probably imagine that the possible long-term dynamics of the - probability are also extremely rich. If we simply flip the signs in the - cubic polynomial dynamics we examined above, we'll get our next - example:

    +

    Given how rich the dynamics can be for deterministic nonlinear systems, you can + probably imagine that the possible long-term dynamics of the probability density are + also extremely rich. If we simply flip the signs in the cubic polynomial dynamics + we examined above, we'll get our next example:

    The Cubic Example + Noise

    @@ -409,17 +406,16 @@

    Underactuated Robotics

    Extended Example: The Rimless Wheel on Rough Terrain

    -

    My favorite example of a meaningful source of randomness on a model +

    One of my favorite examples of a meaningful source of randomness on a model underactuated system is the rimless - wheel rolling down stochastically "rough" terrainByl08f. - Generating interesting/relevant probabilistic models of terrain in general - can be quite complex, but the rimless wheel makes it easy -- since the robot - only contacts that ground at the point foot, we can model almost arbitrary - rough terrain by simply taking the ramp angle, $\gamma$, to be a random - variable. If we restrict our analysis to rolling only in one direction (e.g. - only downhill), then we can even consider this ramp angle to be i.i.d.; - after each footstep we will independently draw a new ramp angle $\gamma[n]$ - for the next step.

    + wheel rolling down stochastically "rough" terrainByl08f. Generating + interesting/relevant probabilistic models of terrain in general can be quite + complex, but the rimless wheel makes it easy -- since the robot only contacts that + ground at the point foot, we can model almost arbitrary rough terrain by simply + taking the ramp angle, $\gamma$, to be a random variable. If we restrict our + analysis to rolling only in one direction (e.g. only downhill), then we can even + consider this ramp angle to be i.i.d.; after each footstep we will independently + draw a new ramp angle $\gamma[n]$ for the next step.

    @@ -435,6 +431,75 @@

    Underactuated Robotics +

    Randomized smoothing of contact dynamics

    + +

    It is interesting to think more generally about how stochastic dynamics interact + with the contact dynamics that we have begun to study in these notes. For the + stochastic rimless wheel we studied the dynamics on the apex-to-apex map, but now + we'd like to consider a more typical (discrete-time, with a small, fixed, time step) + model of contact dynamics.

    + +

    First we have to think about a simplest reasonable model for the process + noise/dynamics. In the multibody appendix, we develop the time-stepping dynamic + models of contact as the solution to an optimization problem, which strictly + enforces contact constraints (e.g. non-penetration) at the end of every time step. + Let's use that idea again here, following the ideas developed in + Suh22a+Pang22.

    + +

    A (stochastic) block near a wall

    + +

    Consider the dynamics of an unactuated 1D block with a wall occupying $q + \leq 0$, such that the physical dynamics is identity if the block is in a + non-penetrating configuration, $q[n+1] = f(q[n])=q[n]$ if $q[n]\geq 0$. The + dynamics within the penetrating regime is not well-defined physically; yet, + applying the quasi-dynamic equations of motions from + Pang20b gives us a model that defines a minimal projection $\delta q$ + which gets applied to the system to project it back out of collision via: + \begin{align} \underset{\delta q}{\minimize} \; &\frac{1}{2} m (\delta q)^2, \; + \text{subject to} \\ & q + \delta q \geq 0. \end{align} which leads to the + following deterministic dynamics: \begin{equation} + \label{eq:1d_projection_solution} f(q) = q + \delta q = \begin{cases} q & \text{ + if } q \geq 0, \text{ (no penetration) }\\ 0 & \text{ otherwise. } \text{ + (penetration) } \end{cases} \end{equation} This model extends naturally to more + complicated contact systems.

    + +

    This now gives us a natural model for adding noise while respecting the + non-penetration conditions. On each time step, we will apply a Gaussian + perturbation (e.g. Brownian motion), $w[n]$, and apply the dynamics $q[n+1] = + f(q[n] + w[n]).$ In this model, if we start the system from a known initial + condition, $p_0(q) = \delta(q-q_0),$ then after one step we obtain the + distribution pictured in the bottom left:

    + +
    + + +
    (a) The block near a wall. (b) The distribution $q+w$ (green) and + $f(q+w)$ (pink). (c) The expected value of the one-step stochastic dynamics + looks like a "smoothed" version of the deterministic dynamics. (d) This has + important implications for gradient-based optimization Suh22b.
    +
    + +

    In some stochastic optimal control frameworks (and almost all reinforcement + learning algorithms), the optimization objective is specified in terms of the + expected value of the cost/reward. So it is very interesting to think about the + effect that the stochasticity has on the expected value of simulation roll-outs. + Here we see that, even after a single step, the stochasticity has the effect of + "smoothing" the hard contact dynamics, and giving a form of "contact forces at a + distance". Suh22b studied the effect that this can have on the + optimization landscape.

    + + What is the stationary distribution of this system? +
    + +

    Interestingly, reinforcement learning (RL) algorithms often explicitly + inject random perturbations (typically in the policy outputs) as a mechanism for + exploring the policy parameters. When coupled with a deterministic contact + simulation engine, the resulting dynamics look like the simple stochastic example + illustrated above. This is one explanation for why RL has performed surprisingly + well in problems involving contact dynamics.

    + +
    +

    Noise models for real robots/systems.

    Sensor models. Beam model from probabilistic robotics. RGB-D @@ -468,6 +533,34 @@

    Underactuated Robotics"Chaos, fractals, and noise: stochastic aspects of dynamics", Springer Science \& Business Media , vol. 97, 2013. +
    +
  • +H.J. Terry Suh and Tao Pang and Russ Tedrake, +"Bundled Gradients through Contact via Randomized Smoothing", +IEEE Robotics and Automation Letters , vol. 7 (2), pp. 4000-4007, April, 2022. +[ link ] + +

  • +
  • +Tao Pang and H.J. Terry Suh and Lujie Yang and Russ Tedrake., +"Global Planning for Contact-Rich Manipulation via Local Smoothing of Quasi-dynamic Contact Models", +Transactions of Robotics, vol. 39, no. 6, pp. 4691--4711, December, 2023. +[ link ] + +

  • +
  • +H. J. Terry Suh and Max Simchowitz and Kaiqing Zhang and Russ Tedrake, +"Do Differentiable Simulators Give Better Policy Gradients?", +Proceedings of the 39th International Conference on Machine Learning , vol. 162, pp. 20668--20696, July, 2022. +[ link ] + +

  • +
  • +Tao Pang and Russ Tedrake, +"A Convex Quasistatic Time-stepping Scheme for Rigid Multibody Systems with Contact and Friction", +IEEE International Conference on Robotics and Automation (ICRA), May, 2021. +[ link ] +