(Constructs): Web RAG - Web Crawler, Chatting with Web Pages and Search #291

spugachev · 2024-02-28T14:42:11Z

Describe the feature

Many RAG experiences are built around websites. Users want to crawl one or more websites, retrieve content from pages, schedule periodic updates, and inject results into OpenSearch to enable RAG requests based on website data.

To support this scenario, a WebCrawler construct can be created. It should be capable of creating new OpenSearch indexes or using existing ones.

This construct can also be used to obtain data from websites in real-time. For example, a user could ask a chatbot to summarize a specific webpage. In this case, the web crawler should extract data from the webpage and provide it to the chatbot.

We should also consider web search scenarios, where users want to use a search engine to obtain results. The results found by the search engine should be parsed and returned to the chatbot.

Use Case

RAG over websites

Proposed Solution

No response

Other Information

No response

Acknowledgements

I may be able to implement this feature request
This feature might incur a breaking change

krokoko · 2024-03-11T20:53:46Z

As discussed, assigning it temporarily to you @spugachev , thanks ! :)

github-actions · 2024-05-11T01:28:57Z

This issue is now marked as stale because it hasn't seen activity for a while. Add a comment or it will be closed soon. If you wish to exclude this issue from being marked as stale, add the "backlog" label.

github-actions · 2024-05-18T01:29:50Z

Closing this issue as it hasn't seen activity for a while. Please add a comment @mentioning a maintainer to reopen. If you wish to exclude this issue from being marked as stale, add the "backlog" label.

spugachev added the needs-triage This issue or PR still needs to be triaged. label Feb 28, 2024

krokoko added RFC-proposal RFC Proposal - used for tracking through process on Project board. NOT an "issue" as such. and removed needs-triage This issue or PR still needs to be triaged. labels Feb 28, 2024

emerging-tech-cdk-constructs-bot mentioned this issue Mar 1, 2024

Monthly issue metrics report #294

Closed

krokoko assigned spugachev Mar 11, 2024

github-actions bot added the stale label May 11, 2024

github-actions bot closed this as completed May 18, 2024

krokoko reopened this May 20, 2024

krokoko linked a pull request May 22, 2024 that will close this issue

feat(construct): add webcrawler construct #474

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Constructs): Web RAG - Web Crawler, Chatting with Web Pages and Search #291

(Constructs): Web RAG - Web Crawler, Chatting with Web Pages and Search #291

spugachev commented Feb 28, 2024

krokoko commented Mar 11, 2024

github-actions bot commented May 11, 2024

github-actions bot commented May 18, 2024

(Constructs): Web RAG - Web Crawler, Chatting with Web Pages and Search #291

(Constructs): Web RAG - Web Crawler, Chatting with Web Pages and Search #291

Comments

spugachev commented Feb 28, 2024

Describe the feature

Use Case

Proposed Solution

Other Information

Acknowledgements

krokoko commented Mar 11, 2024

github-actions bot commented May 11, 2024

github-actions bot commented May 18, 2024