Skip to content

On the Road with GPT-4V(ision): Explorations of Utilizing Visual-Language Model as Autonomous Driving Agent

License

Notifications You must be signed in to change notification settings

PJLab-ADG/GPT4V-AD-Exploration

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Head pic

Static Badge GitHub stars GitHub license

This is the official repository for the technical report:

On the Road with GPT-4V(ision): Explorations of Utilizing Visual-Language Model as Autonomous Driving Agent.

📌 Introduction

In our report, we explore the revolutionary GPT-4V, a visionary in the field of autonomous driving. Here, you'll find a treasure trove of original test images and in-depth results demonstrating the model's capabilities in understanding complex driving scenes and making decisions like a seasoned driver.

🚀 Usage

Dive into our insightful findings by exploring the categorized directories:

  • Scenario Understanding: Tests on GPT-4V's perception of its environment and fellow road-goers.
  • Reasoning: A peek into the model's advanced reasoning capabilities.
  • Serving as a Driving Agent: Scenes where GPT-4V showcases its multi-task driving skills.

Each case within these categories is accompanied by a .json file detailing the prompts and responses from GPT-4V, alongside a .png image that the model assessed.

🖼 Illustrations

Here's a glimpse into some of the fascinating results from our report:

  • Weather Understanding: This image showcases GPT-4V's capability to understand different weather conditions, a critical factor in autonomous driving.

    Weather Understanding
  • Corner Cases: An illustration of how GPT-4V handles complex and unusual traffic scenarios, which are often challenging for autonomous systems.

    Corner cases
  • Serving as a Driving Agent: A demonstration of GPT-4V showcasing its capabilities as a driving agent, making real-world decisions in various driving scenarios.

    Act as driver

🚀 Usage

Dive into our insightful findings by exploring the categorized directories:

  • Scenario Understanding: Tests on GPT-4V's perception of its environment and fellow road-goers.
  • Reasoning: A peek into the model's advanced reasoning capabilities.
  • Act as A Driver: Scenes where GPT-4V showcases its driving acumen.

Each case within these categories is accompanied by a .json file detailing the prompts and responses from GPT-4V, alongside a .png image that the model assessed.

🤝 Contributions

Your thoughts and contributions are a green signal for us! 🚦

If you have suggestions or additional insights, feel free to open an issue or submit a pull request.

📜 Citation

If our repository accelerates your research, please use the following citation:

@article{wen2023road,
  title={On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving},
  author={Licheng Wen and Xuemeng Yang and Daocheng Fu and Xiaofeng Wang and Pinlong Cai and Xin Li and Tao Ma and Yingxuan Li and Linran Xu and Dengke Shang and Zheng Zhu and Shaoyan Sun and Yeqi Bai and Xinyu Cai and Min Dou and Shuanglu Hu and Botian Shi},
  journal={arXiv preprint arXiv:2311.05332},
  year={2023}
}

🌟 Other Awesome Projects from Our Team

Our team is actively involved in various innovative projects in the realm of autonomous driving. Here are some other exciting repositories that you might find interesting:

📄 License

The content of this repository is under the hood of an MIT License.

About

On the Road with GPT-4V(ision): Explorations of Utilizing Visual-Language Model as Autonomous Driving Agent

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published