[Feature Request]: Docs to be crawler friendly, and LLM discoverable #2203

nikhil-swamix · 2024-05-15T00:35:58Z

Describe the problem

i tried loading doc using requests lib and parse it, but due to some tabbing nature of JS and py, it requires browser to render.
example

Describe the proposed solution

static site generation, dump next and react once for all! its ment for really huge projects with heavy components, not documentation. not the right tech stack. (my opinion, i prefer svelte) . but hey i can copy the same from github source md files, but it wont allow systematic discovery via crawlers... which can be indexed by LLM for RAG purposes and prevent hallucinations.

in short:
prerender flag / server side render in next

Alternatives considered

hugo or some well known SSG

Importance

nice to have

Additional Information

minor priority. but very useful.

tazarov · 2024-05-17T16:24:38Z

@nikhil-swamix, thanks for your explanation. There are several reasons why we use markdoc and next, and this is unlikely to change.

We value user experience over bot/crawler experience (as you pointed out, anyone that needs to index the docs can use the GH markdown files)
We want visual continuity between Chroma docs and the hosted platform (which will be coming out soon).
While we value your and the rest of the community's opinions, we do certain things a certain way 😀

Have you considered using a different than requests library - have a look here for inspiration https://python.langchain.com/v0.1/docs/integrations/document_loaders/url/

nikhil-swamix · 2024-05-17T23:00:14Z

thanks for the update, i understand it would mean a change of many things. However, I accomplished my objective with different architecture, i.e. using a webkit engine, putting it behind a server, and receiving a rendered page with javascript support for given url, requests was pretty basic. the same way google bot crawls single page apps. I have considered navigating the README docs folder directly on github, but it was not scalable, as every project may or may not have docs folder, or maybe unorganized. for that, I'm building an auto documentation engine with LLM, so git clone source code, and running this layer will provide a base documentation if codebases are poorly documented. im doing it on 100s of projects so needed a universal solution. also checkout https://github.com/nikhil-swamix/UniversalDB , i'm creating a meta library which provide a uniform way to access different type of DBs, including SQL,NOSQL and vector. its aim is to be as pythonic as possible and naturally query db. let me know if similar functionality will benefit the chroma project.just a thought. Regards.

…

On Fri, May 17, 2024 at 9:54 PM Trayan Azarov ***@***.***> wrote: @nikhil-swamix <https://github.com/nikhil-swamix>, thanks for your explanation. There are several reasons why we use markdoc and next, and this is unlikely to change. - We value user experience over bot/crawler experience (as you pointed out, anyone that needs to index the docs can use the GH markdown files) - We want visual continuity between Chroma docs and the hosted platform (which will be coming out soon). - While we value your and the rest of the community's opinions, we do certain things a certain way 😀 Have you considered using a different than requests library - have a look here for inspiration https://python.langchain.com/v0.1/docs/integrations/document_loaders/url/ — Reply to this email directly, view it on GitHub <#2203 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AM4AVT3KQNTX6FSYYAZZWFDZCYVNXAVCNFSM6AAAAABHXES7SKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJXHE2TCNRUGI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

nikhil-swamix added the enhancement New feature or request label May 15, 2024

tazarov added the wontfix This will not be worked on label May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]: Docs to be crawler friendly, and LLM discoverable #2203

[Feature Request]: Docs to be crawler friendly, and LLM discoverable #2203

nikhil-swamix commented May 15, 2024

tazarov commented May 17, 2024

nikhil-swamix commented May 17, 2024 via email

[Feature Request]: Docs to be crawler friendly, and LLM discoverable #2203

[Feature Request]: Docs to be crawler friendly, and LLM discoverable #2203

Comments

nikhil-swamix commented May 15, 2024

Describe the problem

Describe the proposed solution

Alternatives considered

Importance

Additional Information

tazarov commented May 17, 2024

nikhil-swamix commented May 17, 2024 via email