Skip to content

In this repository, we try to solve musculoskeletal tasks with `Double DQN reinforcement learning` by using a `transformer` model has been used as the base model architecture.

License

Notifications You must be signed in to change notification settings

victor369basu/MyosuiteDDQN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Musculoskeletal Simulation with Double DQN reinforcement learning

myoSarcHandObjHoldRandom-v0-model.mp4

Solved task - myoSarcHandObjHoldRandom-v0

Introduction

The dexterous human hand requires the coordination of multiple highly redundant muscles, which have complementary and antagonistic effects on various joints. This musculoskeletal model is comprised of 29 bones, 23 joints, and 39 muscle-tendon units. Our objective is to simulate the coordination of these bones and muscles for different tasks via reinforcement learning.

In this repository, we try to solve musculoskeletal tasks with Double DQN reinforcement learning. A transformer model has been used as the base model architecture.

The goal of challenge is to push our understanding of physiological motor-control responsible for nimble and agile movements of the human body. we are focusing on developing controllers for contact rich dexterous manipulation behaviors.

TasksALL

Tools and Technologies

  1. MyoSuite is a collection of musculoskeletal environments and tasks simulated with the MuJoCo physics engine and wrapped in the OpenAI gym API to enable the application of Machine Learning to bio-mechanic control problems.

  2. PyTorch an open source machine learning framework that accelerates the path from research prototyping to production deployment.

Double DQN

The standard DQN method has been shown to overestimate the true Q-value, because for the target an argmax over estimated Q-values is used. Therefore when some values are overestimated and some underestimated, the overestimated values have a higher probability to be selected.

Standard DQN target:
Q(st, at) = rt + Q(st+1, argmaxaQ(st, a))

By using two uncorralated Q-Networks we can prevent this overestimation. In order to save computation time we do gradient updates only for one of the Q-Networks and periodically update the parameters of the target Q-Network to match the parameter of the Q-Network that is updated.

The Double DQN target then becomes:
Q(st, at) = rt + Qθ(st+1, argmaxaQtarget(st, a))

And the loss function is given by:
(Q(st, at) - Qθ(st, at))^2

Non-stationarities task variations

Sarcopenia

Sarcopenia is a muscle disorder that occurs commonly in the elderly population (Cruz-Jentoft and Sayer (2019)) and is characterized by a reduction in muscle mass or volume. The peak in grip strength can be reduced by up to 50% from age 20 to 40 (Dodds et al. (2016)). The simulation dataset modelled sarcopenia for each muscle as a reduction of 50% of its maximal isometric force.

Fatigue

Muscle Fatigue is a short-term (second to minute) effect that happens after sustained or repetitive voluntary movement and it has been linked to traumas e.g. cumulative trauma disorder (Chaffin et al. (2006)). This model was based on the idea that different types of muscle fibre have different contributions and resistance to fatigue (Vøllestad (1997)). The current implementation is simplified to consider the same fatigue factor for all muscles and that muscle can be completely fatigued.

Fatigue

Tendon transfer

Contrary to muscle fatigue or sarcopenia that occurs in all muscles, tendon transfer surgery can target a single muscle-tendon unit. Tendon transfer surgery allows redirecting the application point of muscle forces from one joint DoF to another. It can be used to regain functional control of a joint or limb motion after injury. One of the current procedures in the hand is the tendon transfer of the Extensor Indicis Proprius (EIP) to replace the Extensor Pollicis Longus (EPL) (Gelb (1995)). Rupture of the EPL can happen after a broken wrist and create a loss of control of the Thumb extension. The MyoSuite module comes with a physical tendon transfer where the EIP application point of the tendon was moved from the index to the thumb and the EPL was removed.

Fatigue

Suite

MyoSuite consists of three models: :ref:myoFinger, :ref:myoElbow and :ref:myoHand. Using these models the MyoSuite module design a rich collection of tasks ranging from simple reaching movements to contact-rich movements like pen-twirling and baoding balls.

It also consists of three Musculoskeletal condition Variations: :ref:sarcopenia, :ref:fatigue, :ref:ttransfer

Results

The left side video represents the simulation after training the transformer model with Double DQN reinforcement learning, and the right side video represents the simulation before training the transformer model with Double DQN reinforcement learning. Also, the transformer model and training hyper-parameters that have been used are mentioned w.r.t. each task.

myoHandReachFixed-v0

!python main.py --env_name myoHandReachFixed-v0 --gamma 0.99 --learning_rate 0.0003 
--eps 0.09 --batch 64 --epochs 200 --loss_fn mse --train True

myoHandReachFixed-v0.mp4

myoHandReachRandom-v0

!python main.py --env_name myoHandReachRandom-v0 --gamma 0.99 --learning_rate 0.00003 
--eps 0.09 --batch 32 --epochs 500 --loss_fn cel --train True

myoHandReachRandom-v0.mp4

myoSarcHandPose1Fixed-v0

!python main.py --env_name myoSarcHandPose1Fixed-v0 --gamma 0.99 --learning_rate 0.00003
 --eps 0.09 --batch 64 --epochs 400 --loss_fn cel --train True
myoSarcHandPose1Fixed-v0.mp4

myoHandObjHoldFixed-v0

!python main.py --env_name myoHandObjHoldFixed-v0 --gamma 0.99 --learning_rate 0.00003
 --eps 0.09 --batch 64 --epochs 400 --loss_fn cel --train True
myoHandObjHoldFixed-v0.mp4

myoSarcHandObjHoldRandom-v0

!python main.py --env_name myoSarcHandObjHoldRandom-v0 --gamma 0.99 --learning_rate 0.0000003
 --eps 0.09 --batch 64 --epochs 600 --loss_fn cel --train True
myoSarcHandObjHoldRandom-v0.mp4

myoHandKeyTurnFixed-v0

!python main.py --env_name myoHandKeyTurnFixed-v0 --gamma 0.99 --learning_rate 0.00003
 --eps 0.09 --batch 64 --epochs 600 --loss_fn cel --train True
myoHandKeyTurnFixed-v0.mp4

myoSarcHandPenTwirlFixed-v0

!python main.py --env_name myoSarcHandPenTwirlFixed-v0 --gamma 0.5 --learning_rate 0.000003
 --eps 0.09 --batch 64 --epochs 600 --loss_fn mse --train True
myoSarcHandPenTwirlFixed-v0.mp4

myoChallengeDieReorientP1-v0

!python main.py --env_name myoChallengeDieReorientP1-v0 --gamma 0.5 --learning_rate 0.000003
 --eps 0.09 --batch 64 --epochs 600 --loss_fn cel --train True
myoChallengeDieReorientP1-v0.mp4

myoChallengeBaodingP1-v1

!python main.py --env_name myoChallengeBaodingP1-v1 --gamma 0.6 --learning_rate 0.0000003
 --eps 0.09 --batch 64 --epochs 700 --loss_fn cel --train True
myoChallengeBaodingP1-v1.mp4

myoFatiElbowPose1D6MExoRandom-v0

!python main.py --env_name myoFatiElbowPose1D6MExoRandom-v0 --gamma 0.99 --learning_rate 0.0003
 --eps 0.09 --batch 64 --epochs 400 --loss_fn mse --train True
myoFatiElbowPose1D6MExoRandom-v0.mp4

Inference

!python main.py --env_name myoHandReachFixed-v0 --train False --model_save_path ./model/

The following code saves loads the trained model from the directory and runs the simulation, saving it to a video.

Conclusion

  • Transformer Model works well for most of the tasks except myoChallengeDieReorient and myoChallengeBaoding.
  • Model is trained with enough epochs to understand and perform the task but could have performed better with more episodes.

About

In this repository, we try to solve musculoskeletal tasks with `Double DQN reinforcement learning` by using a `transformer` model has been used as the base model architecture.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages