Skip to content

Papers on integrating large language models with embodied AI

Notifications You must be signed in to change notification settings

thunlp/EmbodiedAIxLLMPapers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 

Repository files navigation

Embodied AI x LLM Papers

This is a paper list on integrating large language models with embodied AI. Large language models have shown sparks of artificial general intelligence, but they are not grounded in the physical world, lacking human-like embodied intelligence. The integration of LLM with embodied AI is undertaken to address this challenge.

Keywords Convention

The abbreviation of the work.

The mainly explored domains of the work.

Papers

Embodied LLM

  • A Generalist Agent. TMLR 2022.

    Scott Reed, Konrad Zolna, Emilio Parisotto, Sergio Gomez Colmenarejo, Alexander Novikov, Gabriel Barth-Maron, Mai Gimenez, Yury Sulsky, Jackie Kay, Jost Tobias Springenberg, Tom Eccles, Jake Bruce, Ali Razavi, Ashley Edwards, Nicolas Heess, Yutian Chen, Raia Hadsell, Oriol Vinyals, Mahyar Bordbar, Nando de Freitas [pdf], 2022.5

  • PaLM-E: An Embodied Multimodal Language Model. ICML 2023.

    Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duckworth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, Pete Florence [pdf], [page], 2023.3

  • EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought. Preprint.

    Yao Mu, Qinglong Zhang, Mengkang Hu, Wenhai Wang, Mingyu Ding, Jun Jin, Bin Wang, Jifeng Dai, Yu Qiao, Ping Luo [pdf], 2023.5

  • Language Models Meet World Models: Embodied Experiences Enhance Language Models. Preprint.

    Jiannan Xiang, Tianhua Tao, Yi Gu, Tianmin Shu, Zirui Wang, Zichao Yang, Zhiting Hu [pdf], [page], 2023.5

  • AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation. Preprint.

    Chuhao Jin, Wenhui Tan, Jiange Yang, Bei Liu, Ruihua Song, Limin Wang, Jianlong Fu [pdf], [video], 2023.5

  • RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control. Preprint.

    Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, Pete Florence, Chuyuan Fu, Montse Gonzalez Arenas, Keerthana Gopalakrishnan, Kehang Han, Karol Hausman, Alexander Herzog, Jasmine Hsu, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal, Lisa Lee, Tsang-Wei Edward Lee, Sergey Levine, Yao Lu, Henryk Michalewski, Igor Mordatch, Karl Pertsch, Kanishka Rao, Krista Reymann, Michael Ryoo, Grecia Salazar, Pannag Sanketi, Pierre Sermanet, Jaspiar Singh, Anikait Singh, Radu Soricut, Huong Tran, Vincent Vanhoucke, Quan Vuong, Ayzaan Wahid, Stefan Welker, Paul Wohlhart, Jialin Wu, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Tianhe Yu, Brianna Zitkovich [pdf], [page], 2023.7

  • Embodied Task Planning with Large Language Models. Preprint.

    Zhenyu Wu, Ziwei Wang, Xiuwei Xu, Jiwen Lu, Haibin Yan [pdf], [page], 2023.7

  • Large Language Models as General Pattern Machines. CoRL 2023.

    Suvir Mirchandani, Fei Xia, Pete Florence, Brian Ichter, Danny Driess, Montserrat Gonzalez Arenas, Kanishka Rao, Dorsa Sadigh, Andy Zeng [pdf], [page], 2023.7

  • Large Language Models as Generalizable Policies for Embodied Tasks. Preprint.

    Andrew Szot, Max Schwarzer, Harsh Agrawal, Bogdan Mazoure, Walter Talbott, Katherine Metcalf, Natalie Mackraz, Devon Hjelm, Alexander Toshev [pdf], [page], 2023.10

  • Octopus: Embodied Vision-Language Programmer from Environmental Feedback. Preprint.

    Jingkang Yang, Yuhao Dong, Shuai Liu, Bo Li, Ziyue Wang, Chencheng Jiang, Haoran Tan, Jiamu Kang, Yuanhan Zhang, Kaiyang Zhou, Ziwei Liu [pdf], [page], 2023.10

  • Vision-Language Foundation Models as Effective Robot Imitators. Preprint.

    Xinghang Li, Minghuan Liu, Hanbo Zhang, Cunjun Yu, Jie Xu, Hongtao Wu, Chilam Cheang, Ya Jing, Weinan Zhang, Huaping Liu, Hang Li, Tao Kong [pdf], [page], 2023.11

  • An Embodied Generalist Agent in 3D World. Preprint.

    Jiangyong Huang, Silong Yong, Xiaojian Ma, Xiongkun Linghu, Puhao Li, Yan Wang, Qing Li, Song-Chun Zhu, Baoxiong Jia, Siyuan Huang [pdf], [page], 2023.11


LLM for Planning, Tool Using and Beyond (w/o training)

  • Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. ICML 2022.

    Wenlong Huang, Pieter Abbeel, Deepak Pathak, Igor Mordatch [pdf], [page], 2022.1

  • Do As I Can, Not As I Say: Grounding Language in Robotic Affordances. CoRL 2022.

    Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Kuang-Huei Lee, Sergey Levine, Yao Lu, Linda Luu, Carolina Parada, Peter Pastor, Jornell Quiambao, Kanishka Rao, Jarek Rettinghouse, Diego Reyes, Pierre Sermanet, Nicolas Sievers, Clayton Tan, Alexander Toshev, Vincent Vanhoucke, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Mengyuan Yan, Andy Zeng [pdf], [page], 2022.4

  • Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language. ICLR 2023.

    Andy Zeng, Maria Attarian, Brian Ichter, Krzysztof Choromanski, Adrian Wong, Stefan Welker, Federico Tombari, Aveek Purohit, Michael Ryoo, Vikas Sindhwani, Johnny Lee, Vincent Vanhoucke, Pete Florence [pdf], 2022.4

  • Inner Monologue: Embodied Reasoning through Planning with Language Models. CoRL 2022.

    Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, Brian Ichter [pdf], [page], 2022.7

  • Code as Policies: Language Model Programs for Embodied Control. ICRA 2023.

    Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, Andy Zeng [pdf], [page], 2022.9

  • ProgPrompt: Generating Situated Robot Task Plans using Large Language Models. ICRA 2023.

    Ishika Singh, Valts Blukis, Arsalan Mousavian, Ankit Goyal, Danfei Xu, Jonathan Tremblay, Dieter Fox, Jesse Thomason, Animesh Garg [pdf], [page], 2022.9

  • LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models. Preprint.

    Chan Hee Song, Jiaman Wu, Clayton Washington, Brian M. Sadler, Wei-Lun Chao, Yu Su [pdf], [page], 2022.12

  • Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents. NeurIPS 2023.

    Zihao Wang, Shaofei Cai, Anji Liu, Xiaojian Ma, Yitao Liang [pdf], [page], 2023.2

  • Collaborating with language models for embodied reasoning. Preprint.

    Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, Brian Ichter [pdf], [page], 2023.2

  • Plan, Eliminate, and Track - Language Models are Good Teachers for Embodied Agents. Preprint.

    Yue Wu, So Yeon Min, Yonatan Bisk, Ruslan Salakhutdinov, Amos Azaria, Yuanzhi Li, Tom Mitchell, Shrimai Prabhumoye [pdf], 2023.5

  • Voyager: An Open-Ended Embodied Agent with Large Language Models. Preprint.

    Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, Anima Anandkumar [pdf], [page], 2023.5

  • Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory. Preprint.

    Xizhou Zhu, Yuntao Chen, Hao Tian, Chenxin Tao, Weijie Su, Chenyu Yang, Gao Huang, Bin Li, Lewei Lu, Xiaogang Wang, Yu Qiao, Zhaoxiang Zhang, Jifeng Dai [pdf], [page], 2023.5

  • VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models. ICML 2022.

    Wenlong Huang, Chen Wang, Ruohan Zhang, Yunzhu Li, Jiajun Wu, Li Fei-Fei [pdf], [page], 2023.7

  • Building Cooperative Embodied Agents Modularly with Large Language Models. Preprint.

    Hongxin Zhang, Weihua Du, Jiaming Shan, Qinhong Zhou, Yilun Du, Joshua B. Tenenbaum, Tianmin Shu, Chuang Gan [pdf], [page], 2023.7

  • Building Cooperative Embodied Agents Modularly with Large Language Models. Preprint.

    Hongxin Zhang, Weihua Du, Jiaming Shan, Qinhong Zhou, Yilun Du, Joshua B. Tenenbaum, Tianmin Shu, Chuang Gan [pdf], [page], 2023.7

  • Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning Approach. Preprint.

    Bin Hu, Chenyang Zhao, Pu Zhang, Zihao Zhou, Yuanhang Yang, Zenglin Xu, Bin Liu [pdf], [page], 2023.8

  • JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models. Preprint.

    Zihao Wang, Shaofei Cai, Anji Liu, Yonggang Jin, Jinbing Hou, Bowei Zhang, Haowei Lin, Zhaofeng He, Zilong Zheng, Yaodong Yang, Xiaojian Ma, Yitao Liang [pdf], [page], 2023.11


LLM for Guidance, Supervision and Beyond

  • Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making using Language Guided World Modelling. ICML 2023.

    Kolby Nottingham, Prithviraj Ammanabrolu, Alane Suhr, Yejin Choi, Hannaneh Hajishirzi, Sameer Singh, Roy Fox [pdf], [page], 2023.1

Releases

No releases published

Packages

No packages published