Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial draft of search API. #2868

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Initial draft of search API. #2868

wants to merge 2 commits into from

Conversation

kentonv
Copy link
Member

@kentonv kentonv commented Feb 11, 2017

DO NOT MERGE

There are some TODO(now)s in there regarding parts I'm still thinking about, but wanted to get some initial feedback on this.

# Indexer implementation

interface IndexerSession {
# A session type specifically implemented by the indexer app. The app's UiView's newSession()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many indexer app grains are there? Just one? One per user? Maybe the are created on demand when the user manually sets up some grains as indexable?

interface GrainIndexer(Metadata) {
# Capability used to index the content of a grain.
#
# This is a one-way capability. Although GrainIndexer is implemented by the indexer and is called
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only vaguely understand what you're intending to protect against by making this a "one-way" capability. Should we be worried that capabilities could be smuggled through the metadata field of IndexableContent?

# Note that, as an optimization, Sandstorm may start out assuming that all users see exactly
# the same content, and may do all indexing as an anonymous user, perhaps with no permissions.
# However, if the app logs an ActivityEvent that specifies that it requires specific permissions
# or is visible only to certain users, Sandstorm uses that as a hint to index the path associated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that Sandstorm tries to index any grain that calls SessionContext.activity()?

using Grain = import "grain.capnp";

interface IndexingSession(Metadata) {
# This is a UiView session type, created by calling UiView.newSession().
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I guess this should have extends(UiSession)?

@kentonv
Copy link
Member Author

kentonv commented Feb 13, 2017

I added comments answering all of @dwrensha's questions.

@ocdtrekkie
Copy link
Collaborator

How well do you expect this to scale? Global full-text search of say... documents... seems reasonable to expect Sandstorm servers to do, but what happens if I have an eBook library with a few hundred books of text in my Sandstorm?

Will it be possible to define what type of content is being indexed? To extend the above example, if I have an eBook library, and a lot of personal documents, it may be hard to find personal documents (data I generated) I am looking for if it also returns a lot of results from books (data I have).

@kentonv
Copy link
Member Author

kentonv commented Feb 14, 2017

@ocdtrekkie I would guess that the tech can handle indexing your ebooks. Lucene is designed to be able to index millions or even billions of documents, and I assume Lucy is similar. As a rule of thumb, the index ought to be similar in size to the text it is indexing.

As for usability issues, that's a good point. I think we would handle this by adding search operators and such. This initial design doesn't really get into how those will work, but it can be extended in that direction in the future.

@ocdtrekkie ocdtrekkie added the app-platform App/Sandstorm integration features label Jan 15, 2020
# metadata format.
#
# TODO(someday): Spec out how this works. Not needed for MVP of search.
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious to me how an indexer would be able to use "generic" operators like this while treating the metadata as completely opaque.

If the plan is to not implement this right away anyway, is there a good reason to even include this method? We can always add more later...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
app-platform App/Sandstorm integration features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants