Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thoughts on Pursuit and the new Registry #425

Open
hdgarrood opened this issue Feb 26, 2020 · 16 comments
Open

Thoughts on Pursuit and the new Registry #425

hdgarrood opened this issue Feb 26, 2020 · 16 comments

Comments

@hdgarrood
Copy link
Collaborator

Quite a bit of Pursuit's design followed from the constraints imposed by the fact that we didn't have a registry of our own, so this is probably a good opportunity to revisit that.

Currently, Pursuit only accepts JSON package uploads, using the schema produced by the compiler when you run purs publish, which is defined in Language.PureScript.Docs.Types. The original reason for this is twofold:

  1. I didn't want to allow publishing of arbitrary HTML from untrusted sources
  2. We need descriptions of all of the types for type search

The current architecture has a few drawbacks:

  1. The JSON format is coupled fairly tightly to compiler internals; in particular the Type data type, for representing PureScript types, appears in the JSON schema. For instance, polykinds is likely to cause breaking changes to the format. Breaking changes to the format usually necessitate regenerating the database, which I think we have done two or three times now, and it usually means that older packages can no longer be hosted on Pursuit, which is a shame.
  2. Pursuit itself is tied to a single compiler version. For example, right now, we can only show Prim docs for that particular compiler version.
  3. Pursuit has a strange caching setup where requests for static docs pages are sent to the Pursuit Yesod app first, and after the first successful request they are written out to disk, where nginx should be able to serve them without hitting the Yesod app. It is error prone (i.e. prone to both not invalidating when it should be, and not saving caches when it should) and complicated.
  4. We don't really need a Yesod app to serve static HTML pages, and it's a shame that Pursuit can go down and make static HTML documentation inaccessible.
  5. Of course, packages can appear in the "registry", i.e. be installable, as a result of being in either package-sets or in the bower registry, without appearing on Pursuit. On Hackage, it is possible for a package to appear even if its API docs haven't yet been successfully built. I would like Pursuit to have the same property.

I'd like to investigate splitting Pursuit into a couple of separate services: a job, probably associated with this repo, which can generate and upload static HTML docs to pursuit.purescript.org, and which also can generate search index data to upload to a search server. The static HTML docs could potentially be hosted on GH pages, and the search server would be run in DigitalOcean on our own infrastructure. That way, if the search server goes down, people can still access static HTML docs.

There's a few things we'd need to be careful of: for example, making sure that links don't break will require a bit more care since we won't be able to take advantage of Yesod's type-safe routes any more.

@hdgarrood
Copy link
Collaborator Author

Also, the issues #85 or #139 have languished for quite a while, and although we probably shouldn't try to address them immediately, I think if we are going to rethink Pursuit's architecture we should at least think a bit about how these features could fit in, so that we can build them later. I guess if we were going for the approach I described above, then the natural approach would be to have the search server house the database which is capable of answering queries such as "what was the earliest version of this package to define this identifier", and then perhaps the HTML generation job could query the search server's API while generating the documentation and interleave that information in in the form of a "Since: v0.1.0" or something like that.

@hdgarrood
Copy link
Collaborator Author

Perhaps what we really just need to do is to define a simplified representation of PureScript types for storage in the Pursuit database, which retains just enough structure to be useful for type search, but not so much that changes to the compiler can break it. I'm thinking: remove the constructors which are not relevant to type search, such as TUnknown, Skolem, ParensInType, BinaryNoParensType, and wildcards (they should be expanded during the initial docs generation), forget about any details which we don't absolutely need, such as foralls (Forall) and kind annotations (KindedType), and potentially also represent rows in a slightly more convenient way. So perhaps the following could work as a starting point:

data Type a
  -- | A named type variable
  | TypeVar a Text
  -- | A type-level string
  | TypeLevelString a PSString
  -- | A type constructor
  | TypeConstructor a (Qualified (ProperName 'TypeName))
  -- | A type operator.
  | TypeOp a (Qualified (OpName 'TypeOpName))
  -- | A type application
  | TypeApp a (Type a) (Type a)
  -- | A binary type operator application
  | TypeOpApp a (Type a) (Type a) (Type a)
  -- | A type with a set of type class constraints
  | ConstrainedType a (Constraint a) (Type a)
  -- | A row
  | Row a [(Label, Type a)]

@f-f
Copy link
Member

f-f commented Jun 4, 2020

@hdgarrood how would you see docs-search in the picture?

I think we could actually replace the backend-side search from Pursuit with frontend-side search (using docs-search), e.g. see Starsuit

@hdgarrood
Copy link
Collaborator Author

My only problem with moving search to the frontend is that you need to have every version of every package around to be able to answer queries such as "what was the earliest version that this function was in" or "how did this module's interface change between v1.0.0 and v2.0.3", and I don't think that will be feasible with frontend search. At least, the size of the index is going to become a problem much sooner than it will with backend search that way. I'd prefer not to make an architectural change that makes it harder to answer these questions if we can avoid it.

@f-f
Copy link
Member

f-f commented Jun 13, 2020

@hdgarrood I think that could be solved by having a richer search index right? I assume we don't want to enable all possible queries out there (since we'd need special support every kind of query anyways), so adapting the search index to have facilities to answer them sounds like it could work?

@hdgarrood
Copy link
Collaborator Author

No, I don’t think so - the problem with front end search is that you have to worry about the size of the index. My worry is that including enough information in the index to be able to answer these queries will cause the index to become too large much more quickly.

@f-f
Copy link
Member

f-f commented Jun 13, 2020

@hdgarrood makes sense. I think at this point then a goal worth pursuing (heh, pun intended) could be to have a single codebase for frontend and backend search, so that we don't split efforts. (the assumption here is that having offline/local search is useful and desirable)
So how would you feel about having that part of Pursuit (read: the backend that answers search queries) in PureScript?

@hdgarrood
Copy link
Collaborator Author

I’m not particularly keen; Pursuit’s search already exists and is written in Haskell, and also obviously the compiler is written in Haskell so I don’t want to make it more difficult to make use of nice things that the compiler can give us. For example, writing the backend in PureScript essentially rules out the possibility of using the same type search that the compiler’s typed holes feature uses.

@JordanMartinez
Copy link
Contributor

While I think writing it in PureScript would make it easier for people to contribute because it's already the language they know, PS isn't that mature on the backend. So, even if this was done, wouldn't this slow down development?

@f-f
Copy link
Member

f-f commented Jun 13, 2020

@JordanMartinez note that this is already done in docs-search - what do you refer to when you say "slow down development"?

@JordanMartinez
Copy link
Contributor

I'm assuming that this will require implementing a server, and I feel like PS doesn't yet have as good ecosystem for building such a thing when compared with Haskell. It seems like one would need to reinvent the wheel a few times and that's what would "slow down" the development.

So, this is just a general feeling/belief I have about the situation, not something based on fact.

@kl0tl
Copy link
Member

kl0tl commented Jun 17, 2020

Couldn’t we extract the type search into a PureScript package and wrap a small TCP server around it for usage by Pursuit? We could even try the native backend if Node.js is an issue.

I started to think about replicating the search index to the browser IndexedDB inside a Service Worker for supporting offline searches on Pursuit, and getting different results offline due to subtle differences between two implementations of the search wouldn’t be ideal.

Is the possibility of reusing the typed holes search really more important than reusing the same implementation for online and offline searches on Pursuit and also for local searches in the compiler generated documentation with a web browser or purescript-docs-search CLI?

@hdgarrood
Copy link
Collaborator Author

I think supporting both online and offline searching in Pursuit would be overly complicated for minimal benefit, so I’m not keen on that. Also, it’s not just the possibility of using typed holes search; that was just one example. To give another potential example, I want to implement comparison between module interfaces at different versions of a library inside the compiler, so that the compiler can say eg “this needs a major bump” when publishing a new version. That’s something we would most likely want to be able to use inside Pursuit too, which is why I think it makes the most sense for us to stay in Haskell. If we want to support searching in locally produced documentation and we think it’s really important that they behave in exactly the same way, then I would rather move that search functionality into the compiler so that it can also be used locally.

@hdgarrood
Copy link
Collaborator Author

Then again I suppose we could have a hybrid backend with parts in both PureScript and Haskell. I think I need to consider what the actual API exposed by this backend would look like in a bit more detail. I’m not 100% sure we should even consider it a problem that search results might differ. For example, using typed hole search is still very appealing to me, as it would allow much better search results (since the results of typed hole search are guaranteed to type check), but of course that isn’t a possibility with frontend search. To give another example, the Pursuit backend needs to be aware of all versions of all libraries, but from the perspective of docs generated locally, you only care about the current package set. It’s not yet clear to me what that might mean concretely, but it seems plausible that these scenarios are different enough that we shouldn’t consider it a problem if they behave a bit differently.

@kl0tl
Copy link
Member

kl0tl commented Jun 17, 2020

Then again I suppose we could have a hybrid backend with parts in both PureScript and Haskell

That’s exactly my point. I don’t even suggest to have the PureScript search server handle HTTP requests itself if a TCP server means that more things can stay in Haskell.

To give another example, the Pursuit backend needs to be aware of all versions of all libraries, but from the perspective of docs generated locally, you only care about the current package set. It’s not yet clear to me what that might mean concretely, but it seems plausible that these scenarios are different enough that we shouldn’t consider it a problem if they behave a bit differently.

I was assuming that even when Pursuit will have knowledge of all versions of all packages it should be able to answer for requests inside specific packages set. Am I mistaken?

@f-f
Copy link
Member

f-f commented Oct 11, 2020

Reading through this issue again it seems like it's more related to Pursuit itself rather than the Registry, so I'll move it over there and clarify the title

@f-f f-f transferred this issue from purescript/registry-dev Oct 11, 2020
@f-f f-f changed the title Thoughts on Pursuit Thoughts on Pursuit and the new Registry Oct 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants