-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request]: Docs to be crawler friendly, and LLM discoverable #2203
Labels
Comments
@nikhil-swamix, thanks for your explanation. There are several reasons why we use markdoc and next, and this is unlikely to change.
Have you considered using a different than |
thanks for the update, i understand it would mean a change of many things.
However, I accomplished my objective with different architecture, i.e.
using a webkit engine, putting it behind a server, and receiving a rendered
page with javascript support for given url, requests was pretty basic. the
same way google bot crawls single page apps. I have considered navigating
the README docs folder directly on github, but it was not scalable, as
every project may or may not have docs folder, or maybe unorganized. for
that, I'm building an auto documentation engine with LLM, so git clone
source code, and running this layer will provide a base documentation if
codebases are poorly documented. im doing it on 100s of projects so needed
a universal solution.
also checkout https://github.com/nikhil-swamix/UniversalDB , i'm creating a
meta library which provide a uniform way to access different type of DBs,
including SQL,NOSQL and vector. its aim is to be as pythonic as possible
and naturally query db. let me know if similar functionality will benefit
the chroma project.just a thought.
Regards.
…On Fri, May 17, 2024 at 9:54 PM Trayan Azarov ***@***.***> wrote:
@nikhil-swamix <https://github.com/nikhil-swamix>, thanks for your
explanation. There are several reasons why we use markdoc and next, and
this is unlikely to change.
- We value user experience over bot/crawler experience (as you pointed
out, anyone that needs to index the docs can use the GH markdown files)
- We want visual continuity between Chroma docs and the hosted
platform (which will be coming out soon).
- While we value your and the rest of the community's opinions, we do
certain things a certain way 😀
Have you considered using a different than requests library - have a look
here for inspiration
https://python.langchain.com/v0.1/docs/integrations/document_loaders/url/
—
Reply to this email directly, view it on GitHub
<#2203 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AM4AVT3KQNTX6FSYYAZZWFDZCYVNXAVCNFSM6AAAAABHXES7SKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJXHE2TCNRUGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the problem
i tried loading doc using requests lib and parse it, but due to some tabbing nature of JS and py, it requires browser to render.
example
Describe the proposed solution
static site generation, dump next and react once for all! its ment for really huge projects with heavy components, not documentation. not the right tech stack. (my opinion, i prefer svelte) . but hey i can copy the same from github source md files, but it wont allow systematic discovery via crawlers... which can be indexed by LLM for RAG purposes and prevent hallucinations.
in short:
prerender flag / server side render in next
Alternatives considered
hugo or some well known SSG
Importance
nice to have
Additional Information
minor priority. but very useful.
The text was updated successfully, but these errors were encountered: