Awesome List Generator 📜✨

This is a Python program that automatically generates an "awesome list" for a specific keyword as a markdown file. An "awesome list" is a list of resources related to a specific topic. Currently, the resources include GitHub projects, Google Scholar articles, YouTube videos, courses, slides and presentations, software and tools and podcasts. The awesome list is automatically generated using GPT models; you can choose between different models to generate the list, such as GPT 3.5 or GPT 4.

Demo 🎥

awesome_list.mp4

Setup ⚙️

Make sure you are using Python 3.10.
Install poetry from the official site.
Install dependencies using poetry:
```
poetry install
```
Create an .env file at the root of the project and add the following environment variable:
```
OPENAI_API_KEY=<your_openai_api_key>
```

Setting up Google Cloud API Key and Custom Search Engine ID

Prerequisites

A Google account.

Instructions

Obtain a Google Cloud API Key

Visit the Google Cloud Console.
If you haven't already, create a new project by clicking on the "Select a project" dropdown at the top-right corner, then click on "NEW PROJECT".
Once your project is created and selected, navigate to the Navigation menu (three horizontal lines at the top-left corner), and then click on "APIs & Services" > "Credentials".
Click on the "Create Credentials" button and select "API key". Once created, your API key will be displayed.
Copy your API key and save it securely. You'll use this key in your application to authenticate your requests.

Set Up a Custom Search Engine and Get the Search Engine ID

Go to Google Custom Search homepage.
Click on "Create a custom search engine".
In the "Sites to search" section, you can specify websites you want to search or choose "Search the entire web" to allow broader search capabilities. However, if you choose "Search the entire web", make sure to toggle "Search only included sites" off under the "Sites to search" section.
Fill in other required fields like the name of your search engine.
Click on the "Create" button at the bottom.
Once your search engine is created, you'll be directed to a setup page. Here, find and copy the "Search engine ID" (also called "cx" in some contexts). You'll use this ID in your application to specify which custom search engine to use for queries.

Enable the Custom Search API for Your Project:

Visit the Google Cloud Console.
Navigate to "APIs & Services" > "Library".
Search for "Custom Search API".
Click on it, and you should see an "Enable" button. Click that button to enable the Custom Search API for your project.

Finally, add the following environment variables to .env file:

    GOOGLE_CLOUD_API_KEY='<google cloud api key>'
    CUSTOM_SEARCH_ENGINE_ID='<custom search engine id>'

Usage 💻

Using Streamlit UI

We've provided a Streamlit interface for running this application. To use it:

Run the Streamlit application using Poetry:
```
poetry run streamlit run streamlit_run.py
```
Open http://localhost:8501

You can easily input the necessary parameters (like model type, keyword, and description) through the UI and generate your awesome list!

Direct Code Usage

The main class used in this project is the AwesomeListGenerator. This class accepts the following parameters:

keyword: A string representing the keyword for which the awesome list will be generated.
description: A string providing a description related to the keyword.
model: A string representing the OpenAI model to be used for generating the markdown (default is " gpt-3.5-turbo-16k").
data_extraction_batch_size: An integer representing the number of data items to process in each batch (default is 10). For example, if the batch size is 10, then the data will be fetched from the data sources in batches of 10 (like 10 github projects at a time).
number_of_results: An integer representing the number of results to fetch from each data source (default is 20). the number of results to fetch from each data source (default is 20). For example, fetch 20 Github projects then process them with LLM model in batches based on data_extraction_batch_size.

After initializing the class with these parameters, invoke the save_and_return_awesome_list method to generate the markdown file. Here's an example:

# Initialize an instance of the AwesomeListGenerator
generator = AwesomeListGenerator(keyword="Your Keyword",
                                 description="Your Description",
                                 model="gpt-3.5-turbo-16k",
                                 data_extraction_batch_size=10,
                                 number_of_results=20)
# Generate and save the markdown
markdown_content = generator.save_and_return_awesome_list()

The program will generate a markdown file in the output directory named after your keyword (e.g., Your_Keyword.md). This file contains the "awesome list" generated by the program.

How It Works 🕹️

The AwesomeListGenerator program operates in two main phases: Data Scraping and Data Processing.

Data Scraping 🕸️

In the data scraping phase, the program fetches resources related to your provided keyword from multiple data sources. Currently, the resources include GitHub repositories, Google Scholar articles, YouTube videos, and podcasts. The program utilizes specialized scrapers for each source, each of which is designed to fetch the most relevant and highest quality resources.

For instance, the GitHub scraper fetches repositories that match the keyword, sorted by the number of stars (a common indicator of a repository's relevance and quality). Similarly, the Google Scholar scraper retrieves articles related to the keyword and sorted by citation count.

Data Processing 🧠

Once the data is scraped, it is passed on to the data processing phase. In this phase, the program uses the selected GPT model to process the fetched resources. The model filters and ranks the resources based on relevance to the keyword, quality of content, and potential usefulness to users. The GPT model also formats the data into a markdown list, adding necessary formatting such as links and brief descriptions.

Notably, both scraping and processing operations are executed in batches. This batch-wise operation allows the program to support as many results as needed, based on the configured number_of_results and data_extraction_batch_size. This way, you have control over the extent of data being handled at a time, ensuring efficient resource usage.

Expansion and Contributions 💡

We're looking to expand the number of data sources in the future. Here are some ideas we have in mind:

Scrape resources from Medium.
Search for related books using Google Books or Amazon API.
Fetch blog posts from dev.to and other developer-focused platforms.
Retrieve documents from preprint servers like arXiv and bioRxiv.
Extract relevant resources from online course platforms such as Coursera, Udemy, and Khan Academy.

If you're interested in contributing, you can pick one of the above tasks or propose your own ideas. We welcome all kinds of contributions and appreciate your interest in our project!

TODO Checklist

Projects Created Using Our Tool 🚀

Awesome Auto-GPT - This awesome list is dedicated to Auto-GPT, a pioneering open-source project that showcases the capabilities of the GPT-4 language model. Auto-GPT is at the forefront of AI, seamlessly chaining together Large Language Model (LLM) "thoughts" to autonomously accomplish any set objectives, thereby redefining the boundaries of what AI can do.
Awesome Automatic1111 (a1111) Stable Diffusion WebUI - An awesome list centering on the Automatic1111 (a1111) Stable Diffusion WebUI, an interactive browser interface for the Stable Diffusion generative model. This tool simplifies the process of creating realistic images from textual or visual inputs, and its user-friendly interface, built on Gradio, makes web-based interaction with machine learning models more accessible.
Awesome AI Talking Heads - This awesome list focuses on the fascinating domain of 'Talking Head Generation', which involves creating lifelike digital representations of human heads and faces. The list includes a collection of key research papers, state-of-the-art algorithms, important GitHub repositories, educational videos, insightful blogs, and more. It serves as a one-stop resource for AI researchers, computer graphics professionals, or AI enthusiasts looking to delve into the world of Talking Head Generation.

We love seeing the incredible awesome lists that our community creates. If you've used our tool to generate an awesome list, feel free to let us know, and we will feature your project here!

Support Us ❤️

Did you find this project useful? If it has brought value to you, please give us a ⭐ on GitHub. This gesture not only validates our efforts but also helps this project reach more people and continue development.

Feel free to fork the repository, contribute by submitting pull requests, or open an issue. Your feedback and contributions are always welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
clients		clients
output		output
section_data_pipelines		section_data_pipelines
.env_example		.env_example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
awesome_list_generator.py		awesome_list_generator.py
main.py		main.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
section_data_extractor.py		section_data_extractor.py
section_markdown_generator.py		section_markdown_generator.py
streamlit_run.py		streamlit_run.py
utils.py		utils.py

License

Curated-Awesome-Lists/GPT-Awesome-List-Generator

Folders and files

Latest commit

History

Repository files navigation