Skip to content
This repository has been archived by the owner on Jul 27, 2018. It is now read-only.

Exploration: Create bookmarklet for pinning content #1267

Open
6 tasks
Gozala opened this issue Jan 30, 2017 · 42 comments
Open
6 tasks

Exploration: Create bookmarklet for pinning content #1267

Gozala opened this issue Jan 30, 2017 · 42 comments

Comments

@Gozala
Copy link
Contributor

Gozala commented Jan 30, 2017

Before we go ahead at persisting visited sites in the navigation trail as they appeared to user it would make more sense to start with a less ambitious task as the case study. Given that broken URLs are no 1 issue why browser bookmarks and history is unreliable and users resort to pictures instead let's allow users to select fragment of a site and save it to the local library - kind of like pintereset.com but not just for images but rather for arbitrary content & in a way that it's still select-able, searchable, sharable.

To do an exploratory work let's start with a most basic implementation:

Bookmarklet

Let's start with browser bookmarklet that,

  • Allows you to select hovered elements if 'Shift' key is pressed, with a visual feedback similar to one provided by devtools inspector when selecting a node.
  • Allow selection range modification similar to how mobile safari does it.
  • Capture selected content so that it can be rendered pretty much the same is it looked when user selected it (images, styling, etc...). It could be that more simplistic presentation would work out better but let's find that out.
  • Store selected content & associated resources into indexedDB as blobs.
  • Display stored fragments when bookmarklet is activated.
  • Clicking the fragment should activate selection it was captured from. If saved fragment is no longer on the page provide some visual clue indicating that.

Here are the mockups:

saving

@Gozala Gozala added this to the Content pinning milestone Jan 30, 2017
@Gozala
Copy link
Contributor Author

Gozala commented Jan 30, 2017

Status update

I have being prototyping Shift+hover in as bookmarklet in this gist:
https://gist.github.com/Gozala/58cc14aeae44bf57636108ce9fdd2d31

You can install bookmarklet from https://runkit.io/gozala/5887e949adda540013624150/branches/master/ it just serves code from that gist https://runkit.com/gozala/5887e949adda540013624150

Once bookmarklet is activated you can start using Shift key to visually outline underlying elements. Once clicked it's content get's selected. I did run into the problem that after this interaction clicking anything on the page seems to extend selection. Given that Selection API is not very intuitive I suspect I might be misusing it. I suspect that I create that I create incomplete selection which causes it to act in a way described above. I am trying to understand spec in order to fix that issue, before moving further with selection changes.

I’ll likely face other complication when I’ll get to storing assets like css and images while saving selected fragment, as some of those could be from different origins and there for access will be restricted.

@Gozala
Copy link
Contributor Author

Gozala commented Jan 30, 2017

@edsilv dropped by a channel today and provided some interesting pointers. He has being drafting proposal for saving / sharing trails (https://github.com/memex/trails-proposal). Turns out there is web annotation proposal https://www.w3.org/TR/annotation-model/ that we need to investigate as it seems really relevant. The selectors are immediately relevant bit as we're attempting to store & present selections from the page.

Another interesting bit is https://hypothes.is/ as they have bookmarklet & while I have not tried it, from description it seems to do very much the same thing as us here. Worth looking into it & talking to folks behind it, thy might be able to offer some feedback on limitations they faced and ways to work around them. What is also interesting is that they seem to try and keep annotations even when pages have changed but relevant fragments are present (See https://twitter.com/edsilv/status/773413548756242432), which is also what we aim for.

@vsinha
Copy link

vsinha commented Jan 31, 2017

here's some more prior art we might be able to piggyback on / learn from:
http://brettterpstra.com/2013/07/30/precise-web-clipping-to-markdown-with-bullseye/

@Gozala
Copy link
Contributor Author

Gozala commented Jan 31, 2017

I have tried things ppl have pointed out. Here are some of my thoughts.

bullseye

  • 👎 selection tool. In my experience it's selector tool isn't really interesting (at least they way it behaves in firefox). I think bookmarklet we already have is already better as is.
  • 👍 saving in markdown. They store fragments as a markdown. If we give up on attempt to preserve styling that seems like a better option than saving a lot more verbose HTML where most of the markup will be somewhat obsolete. That being said I'd still favor pursuing keeping styling as that is interesting exploration that would be useful in the context of trail entries. @patrykadas what do you think ?
  • 👎 saving in markdown. Side effect of this approach is sometimes pages have fallback images hidden behind primary elements. In markdown it manifests in oddly duplicated images or worse.

hypothes.is

  • 👍 Wow how come I never heard of it, I'm going to use it every day now!!!
  • 👎 They seem to store annotations on their server, I can guess why they'd do it, but I we should keep user data to a user.
  • 👍 Playing around with it I got a better grasp of https://www.w3.org/TR/annotation-model/ spec and I'm impressed. We should totally follow the spec here.
  • 👎 Saved annotations don't seem to concern itself with anything but text. Which I find to be unfortunate especially for the cases where site content changes and there will be no way of seeing what was attempted to be saved.
  • 👍 Initially I was skeptical of trying to map pinned content to the actual page content, but now I think that more relaxed approach from hypotheses may be just fine. For example if user only pinned a content that is just text & text is still present in updated page but styled differently it seems totally worthwhile to provide mapping. If it's mix of text and media it's a lot more tricky as image could be very different not to mention that asserting that would be difficult as well. Another concern is images used as background of the elements, it may appear that just text is selected while in reality user maybe trying to save an image, I think some popular sites try to make saving images more difficult using this technique.
  • 👍 I they way they highlight fragments is simple so simple that I feel stupid to have something else in mind. We should totally do the same until we could expose selection like functionality through a rendering engine.

@patrykadas
Copy link
Collaborator

I think that most important right now is simply being able to annotate / highlight content, so it's saved and associate it with particular history entry.

This way we could do actually both: connect content structurally (trails, see ancestors of the page), but also give user possibility to combine different interesting pieces and connect them by their meaning. Once we have annotations correlated to the particular page in the history, it would be really powerful feature once utilized correctly.

Also, if we can draw this connection (saved piece of the content with entry in the history), we don't really need styling, as you can always see the full version.

RE: my mockups, I think we should for now just use text selection as you'd suggested and go with the 2nd one (bottom row).

@sean-roberts
Copy link

sean-roberts commented Feb 1, 2017

Hello @Gozala I am a developer at Hypothesis. We caught wind that you were interested in potentially using our bookmarklet to tackle some of your annotation and highlighting needs. Let us know if you have any questions - we are spread pretty thin but we can try to answer any questions you bring up.

A bit about us, we are a non-profit and, barring confidential bits, all of our work is open source. Part of the team even played a significant role in getting the W3C spec where it is today. With that, we are trying to build a platform so that products like yours can take it and run with it to get your annotation needs knocked out and enable users to annotate the collective knowledge of the web.

Regarding some of the 👎 mentioned above:

👎 They seem to store annotations on their server, I can guess why they'd do it, but I we should keep user data to a user.

Our backend is open source and docker based. So if you wanted to, you could host the annotations yourself. 👍

👎 Saved annotations don't seem to concern itself with anything but text. Which I find to be unfortunate especially for the cases where site content changes and there will be no way of seeing what was attempted to be saved.

Yes, if you are referring to being aware of particular sites like twitter or other post based sites. We have not quite taken the plunge on app context aware annotating but is something we'd explore further.

Let us know how we could help 👍

@Treora
Copy link

Treora commented Feb 1, 2017

@Gozala, FYI: You may enjoy talking with @BigBlueHat when doing anything with the Web Annotation spec and PouchDB. Relevant might be his page-notes, annotator-pouchdb (older), and probably another dozen of his repos. Also @tilgovi's work may be useful, this small demo of his may give a feeling for the structure of an annotation target selector. There are quite a few people happy to collaborate on making reusable components for annotation/memexy tools, including myself. (all three of us have worked at hypothes.is by the way)

@Gozala
Copy link
Contributor Author

Gozala commented Feb 1, 2017

Hi @sean-roberts and @Treora thanks for jumping on this thread and providing useful pointers & prior art in form of Hypothesis & W3C spec work, it is amazing to have this foundation to build upon. I could use some input based on your experience to better understand why specific decisions were made.

  1. As far as I can tell Hypothesis only attempts to capture text with in the selection ignoring styling of the selected html and most of the media it may contain. What was a reason for this approach ? Looking at the annotator demo it seems to capture markup as well but to be honest I'm little lost trying to find a relevant code paths to find out if it actually could also capture resources.

  2. One thing that does not quite sits well with me (unless I'm misunderstanding) is the way Hypothesis is using selectors to find out if annotation is still present. Can't remember if it was xpath or css selector but either way it seems that it may not be ideal. For example if page presentation has changed but content is still there Hypothesis won't be able to locate annotations. Although I can't think of any better way, except you could still try to just check if text is present in the document tree and if it is try to find it's new location, maybe that's what Hypothesis already does, also it seems that search could be a costly task.

@Gozala
Copy link
Contributor Author

Gozala commented Feb 1, 2017

Our backend is open source and docker based. So if you wanted to, you could host the annotations yourself. 👍

@sean-roberts I did not meant it as a criticism, it's just I think we would rather store them on users machine & not in the cloud.

👎 Saved annotations don't seem to concern itself with anything but text. Which I find to be unfortunate especially for the cases where site content changes and there will be no way of seeing what was attempted to be saved.

Yes, if you are referring to being aware of particular sites like twitter or other post based sites. We have not quite taken the plunge on app context aware annotating but is something we'd explore further.

@sean-roberts What I meant is that it's not a best fit for what we're trying to do, which is - Fix the No 1 problem as per Mozilla user research concerning the Saving / Sharing, according to which most users just resort to screenshots / mail as everything else seems unreliable for different reasons:

  • URLs don't always display same content across devices / OS / browsers / users.
  • Content under URLs can be altered or disappear.
  • Often times relevant fragment intended to be shared is buried in the content or is behind the user session, so extra instructions are required along the URL.
  • Often times providing a simple annotation along the shared content goes a long way.
  • Very few people resort to additional tools as there are no guarantees that recipient has / will / wants to have at hand.

With that said we want to build something like pinterest.com where arbitrary page content can be:

  1. Pinned into local library (saved so updates to page won't have effect on it) & make it searchable.
  2. Allow sharing of the pinned content.
  3. Allow annotating pinned content.

Each as a separate milestone. We'd be happy to get any help we can to do this. In longer term we would like to use this foundation to allow annotating navigation trails

@BigBlueHat
Copy link

Greetings @Gozala! First, thank you for what you're building! I've been a distant fan for awhile. 😄 Pretty sure @Treora aims to fix that. 😉

So. Selectors and anchoring. Essentially, the Web Annotation Data Model allows for storing a set of selectors (or lists or choices or composites) for use in (re)anchoring the annotation. Those selectors can be used in a combination of ways but Hypothes.is and others store several and work through each of them until something hopefully anchors to the resource in its current state. If it doesn't, then the annotation is "orphaned" but still associated to the resource itself.

Sometimes, as in editorial reviews, you may actually want an annotation to not re-anchor (i.e. "please remove this sentence."). If the text is gone in the next revision (which may have the original URL and/or a version URL), then the content owner made the change--if it does re-anchor, then someone missed something. 😃

All of what you describe is spot on and matches efforts by loads of people I'd love to connect you with (currently resisting the urge to "mention dump" them all into this thread 😉).

Relatedly, @tilgovi and I are working on getting the Apache Annotator project on its feet--which aims to implement the Web Annotation Data Model and Protocol as well as give a migration path for former Annotator.js developers. It's early stages yet, but that demo mentioned earlier is likely to lay the foundation for some of the DOM-connected bits.

I'll stop here, but I'd be more than happy to chat about this ad infinitum.

Cheers!
🎩

@BigBlueHat
Copy link

Oh! Also, since you're connected to Mozilla, you might enjoy these past projects that helped inspire me along the way (and consequently I was disappointed got abandoned...):

  • Pancake - most of the screenshot links are dead or "private" to Moz (sadly)
  • Popcorn.js - video annotation (essentially); again abandoned and demos are defunct...however, I saw folks using it at REST Fest, so there's still hope for it, perhaps. 😁

Just some highlights. 😉

@Gozala
Copy link
Contributor Author

Gozala commented Feb 1, 2017

@BigBlueHat thanks for more pointers! I was just actually skimming through the annotator-pouchdb code to get a sense how all pieces do fit together. Also unless I'm mistaken all of these projects seem to leverage http://annotatorjs.org aren't they ? I'll be looking into that next.

Would it make sense for folks interested in this subject to join our slack channel ? I also don't mind joining whatever the other good place to discuss this would be.

@sean-roberts
Copy link

@Gozala sure thing.

  • We only do text (for now) because there is a lot of hidden complexity of identifying all of the relevant data to match styling. We allow markdown and such in the annotation text because we can control that as it's just primitive elements in markdown. As far as other media, like images and video, we are exploring those bits but it has interesting implications of storing copyrighted content and such. A bridge we deemed not worth crossing just yet.

  • Regarding selectors, we have a 3 pass process that is pretty good at finding text on pages. Based on these modules for text quote selector, text position selector, and xpath based range selectors.

With that said we want to build something like pinterest.com ...

So the pintrest.com approach where it's largely screenshot part of the page? If that's what you're thinking about doing, the annotation stuff is probably a different direction. It's mostly focused on the references to the raw values (such as the actual text or the video content) and not the stylings/visuals that make it a part of that original page.

But if that is the type of approach you are wanting, perhaps it makes sense to allow user to select an area of the page for pinning. Then you screenshot that piece and grab any text selection inside of that area as well. Allowing you to capture styling/exact versions of the page as the user sees it and have content to make it searchable?

@Gozala
Copy link
Contributor Author

Gozala commented Feb 1, 2017

@BigBlueHat funny that you mention Pancake as @gordonbrander amazing UX engineer who kick started this project used work on it, sadly he's no longer with Mozilla as he's off saving the world from hunger.

@Gozala
Copy link
Contributor Author

Gozala commented Feb 1, 2017

We only do text (for now) because there is a lot of hidden complexity of identifying all of the relevant data to match styling.

Yeah I have being considering solutions for some, which is why I'm also curios if you have more comprehensive list of issues.

We allow markdown and such in the annotation text because we can control that as it's just primitive elements in markdown.

We chatted among the team yesterday and decided that for the first cut we might just capturing selections as markdown, it would simplify styling but possibly provide a slightly better connection. We will aim to capture styling in a followup step, mainly so it's easier for user to identify relevant pins from own library.

As far as other media, like images and video, we are exploring those bits but it has interesting implications of storing copyrighted content and such. A bridge we deemed not worth crossing just yet.

That's part of the why we don't store anything other than users machine, at which point it seems no different than a cache. But honestly I'm no expert here and we'd need to discuss these with legal before we can make it available. Also can't text be copyrighted as well ?

Regarding selectors, we have a 3 pass process that is pretty good at finding text on pages. Based on these modules for text quote selector, text position selector, and xpath based range selectors.

Cool, thanks for the pointers!

With that said we want to build something like pinterest.com ...

So the pintrest.com approach where it's largely screenshot part of the page? If that's what you're thinking about doing, the annotation stuff is probably a different direction. It's mostly focused on the references to the raw values (such as the actual text or the video content) and not the stylings/visuals that make it a part of that original page.

No we want to allow saving arbitrary fragments of page that user selected. Kind of like bookmarking that actually works and can target arbitrary content on sites not just URL.

But if that is the type of approach you are wanting, perhaps it makes sense to allow user to select an area of the page for pinning.

That is exactly an interaction we're aiming for.

Then you screenshot that piece and grab any text selection inside of that area as well. Allowing you to capture styling/exact versions of the page as the user sees it and have content to make it searchable?

That's is one possibility we're considering, but saving as pictures isn't really ideal. We want to try & capture actual data & maybe bend rendering engine to manage that so that we can have different presentation depending on context. But we'll have to see I guess.

@BigBlueHat
Copy link

@Gozala I'd be happy to join a slack channel. We don't yet have a chat thing setup for Apache Annotator, but when we do, you're welcome to join! Also, I'm always find-able on IRC (moz, w3c, or freednode).

Many of the projects you'll find do (or did in some fashion) use Annotator.js. However, they've nearly all forked it in some way, and the 2.x pre-release didn't solicit enough interest to "stick"--though it's quite usable. Mostly, it's been a problem of ocean boiling--which is a key thing we hope not to do with Apache Annotator.

@tilgovi's dom-* repos are the future foundation of Apache Annotator and are currently in use (last I checked) in Hypothes.is code also. Essentially, they serve as the initial glue between a Web Annotation Selector and the DOM. Next is adding the actual highlight/selection presentation, then adding UI, etc.

We're still working out the details of how much of that Apache Annotator will provide and how much we'll leave the wonderfully crowded world of JS & CSS frameworks. 😉

@sean-roberts
Copy link

Yeah I have being considering solutions for some, which is why I'm also curios if you have more comprehensive list of issues.

We chatted among the team yesterday and decided that for the first cut we might just capturing selections as markdown, it would simplify styling but possibly provide a slightly better connection. We will aim to capture styling in a followup step, mainly so it's easier for user to identify relevant pins from own library.

So, I am curious how you plan to convert selections to markdown and have that make styling easier? Because if you're going to try to match styling of the original site, you're better off to use the outerHTML of your selection instead of trying to map html to markdown and then map the stylings of every element. Which points back to your first question about issues. The main issue is that it's easier said than done to grab all of the styling for every element and then have it be presented in a view for the user. On top of that, as users get more and more annotations, your UI becomes a rather random collection of text stylings which includes more complexity to show, load, and maintain. So that's the big reason why we didn't go down that rabbit hole. We opted to keep it simple and target getting conversations going (for our users the need to match styling is almost nil).

No we want to allow saving arbitrary fragments of page that user selected. Kind of like bookmarking that actually works and can target arbitrary content on sites not just URL.

That's is one possibility we're considering, but saving as pictures isn't really ideal. We want to try & capture actual data & maybe bend rendering engine to manage that so that we can have different presentation depending on context.

So the suggestion I made, helped you do two things, keep style matching easy (it's in the image exactly) and store actual text fragments. Selections are largely immutable once they are made. So having the data backed behind the image, you can present the view you want (styled) and have it be searchable based on the selection. So you would store image and selection contents. The image can be in a dataURI format which is still likely to be much easier than storing html, styling rules, etc. - But again, only necessary if you think the end user wants their selection to match the visuals as well

@Gozala
Copy link
Contributor Author

Gozala commented Feb 2, 2017

So, I am curious how you plan to convert selections to markdown and have that make styling easier?

You have misunderstood my comment, I said:

it would simplify styling

Meaning that we'd drop page styles & instead use general styles for used elements. Kind of capture selections in reader view style.

Because if you're going to try to match styling of the original site, you're better off to use the outerHTML of your selection instead of trying to map html to markdown and then map the stylings of every element.

Strategy I was aiming to go for is to walk DOM tree up from the selected nodes dropping all the siblings and inlining all relevant styles while also rewriting positioning to take into account dropped nodes. Then run test out algorithm on http://www.alexa.com/topsites in attempt to get a result we'd be happy with.

Which points back to your first question about issues. The main issue is that it's easier said than done to grab all of the styling for every element and then have it be presented in a view for the user.

Agreed it's not the easy task, question for me if it is doable reliably at all.

On top of that, as users get more and more annotations, your UI becomes a rather random collection of text stylings which includes more complexity to show, load, and maintain.

There is definitely a challenge in presenting it in a nice & usable way, but I suspect that's to be lesser issue than capturing everything as appropriate.

So that's the big reason why we didn't go down that rabbit hole. We opted to keep it simple and target getting conversations going (for our users the need to match styling is almost nil).

That makes sense, especially given that Hypothesis is solving somewhat different problem.

So the suggestion I made, helped you do two things, keep style matching easy (it's in the image exactly) and store actual text fragments. Selections are largely immutable once they are made. So having the data backed behind the image, you can present the view you want (styled) and have it be searchable based on the selection. So you would store image and selection contents. The image can be in a dataURI format which is still likely to be much easier than storing html, styling rules, etc. - But again, only necessary if you think the end user wants their selection to match the visuals as well

You are right it definitely makes capturing a lot simpler, but there is drawbacks, even though library still is searchable actually saved content isn't really interactive (can't select, copy & paste, edit, further annotate, etc...). Which is why I still would like to see if actual capturing is gonig to work. Here is how I envision going about it:

  1. Start capturing selection as markdown (to preserve basic styling & images).
  2. In the followup iteration also capture segment of DOM tree with style in-lining etc.. That would allow use to present in either mode.
  3. If No 2 turns out to be a unworkable, add capturing screenshots, so that we could use images in library view, but display captured content when entered in markdown.
  4. Use something like vibrant.js to analyze captured images and try to display markdowns in familiar pallet as it may work even better than actual styles.
  5. Maybe capture both image and simplified DOM tree and compare, if they're alike use No 2 otherwise fallback to either No 3 or No 4.

Along the way we can talk to Servo team to find out if we could expose API to decide what styles contribute to the presentation and in which ways. It could be that rendering engine could provide use with all we need as it already has information.

@Treora
Copy link

Treora commented Feb 2, 2017

Strategy I was aiming to go for is to walk DOM tree up from the selected nodes dropping all the siblings and inlining all relevant styles while also rewriting positioning to take into account dropped nodes.

If this works it would be nice to build a reusable module for this. Would perhaps fiddling a bit with window.getSelection().getRangeAt(0).cloneContents() and window.computedStyle(element) already do the trick?

Likewise a module could be made for the backup plan, of creating a screenshot of the area containing a selection. Perhaps using (Range.getBoundingClientRect) and then overlaying a canvas element and grabbing its contents (like so)?

I'd gladly consume (and help with) both such modules.

@davidar
Copy link

davidar commented Feb 2, 2017

It seems like there's two, somewhat independent, issues here:

  • archiving a copy of a page (as it exists at a certain point in time), including styling, images, etc
  • annotating the archived copy, which doesn't need to concern itself much with the presentation as this can be extracted from the archived copy as needed

@BigBlueHat
Copy link

@Treora @Gozala I'd also be more than happy to help extract the more broadly useful bits from the https://github.com/mozilla-services/pageshot/ code (or similar). It's broadly useful in and out of Web Extension code, and it's something I'd very much like Apache Annotator to provide eventually.

@Gozala
Copy link
Contributor Author

Gozala commented Feb 2, 2017

It seems like there's two, somewhat independent, issues here:

  • archiving a copy of a page (as it exists at a certain point in time), including styling, images, etc
  • annotating the archived copy, which doesn't need to concern itself much with the presentation as this can be extracted from the archived copy as needed

That is an interesting spin on this, I'll have to think more about it if it can be viable solution to problem we're trying to solve. Let me try to elaborate - This work is driven by the user study that Mozilla did and I would really recommend reading it through:

Idea of capturing specific fragments, fits really well (at least in my mind) with Organize and even more so for a Synthesize use case.

Organize

I imagine organizing "pinned content" in a galleries or a magazines styled user library that can be organized by subject & tags & captured metadata & well it's searchable.

Synthesize

One important aspect for me in regards to organizing magazines / galleries / catalogs (whatever we end up naming that) is that it should happen without users direct input. Not everyone is organized & only tool that will work for those who are not is the tool that just does it for them. In other words "pinned content" should automatically get synthesized into a magazines that are attractive way to revisit discoveries / results of research. Sure we need to allow users to manually organize & synthesize if they wish to do so, but that's not as important in this context.

Now if we were to attempt to build such a library into browser I can't really imagine an attractive and useful version of it which has catalogs presented as series of cards of plain text or a screenshots or full page views. To be clear, I'm not saying it's not possible, it's just I have hard time imagining it, but maybe @patrykadas can ?

@sean-roberts
Copy link

@Gozala Sorry for my confusion here but "matching styling" might need to have an exact definition set to it.

  • Are you wanting to take the text contents that are selected, find all of the elements and add their respective getComputedStyle inline in an attempt to match how the element looks in the original page?

or

  • Crawl the DOM of the selection to create a more primitive markdown version of the nodes and have a basic styling which does not attempt to look like the original page

@ianb
Copy link

ianb commented Feb 2, 2017

There is certainly some overlap here with Page Shot, especially earlier directions Page Shot took. Now as we're getting close to shipping Page Shot the experience has become much more conservative, capturing just images and basic page metadata.

Page Shot included DOM freezing as part of the experience. I'm still very interested in this capability, but shipping this under the Mozilla name is difficult (security and privacy concerns). Because of this I've recently forked this to pagearchive and the code will disappear from Page Shot soon. I've only started this split so the code is still messy. I'm hoping to provide the functionality on my own server, as I feel comfortable as an individual providing a use-at-your-own-risk service.

There's a few layers to what Page Shot does. The DOM freezing happens primarily in make-static-html.js, with a little of it in extractor-worker.js. The split was between a framescript for make-static-html, which runs at high permissions, and a content worker for extractor-worker that runs as lesser permissions. (Web Extensions offer no high-permission worker that can penetrate cross-domain barriers, which is disappointing.)

Before freezing we would try to add ids to every element, to make addressability easier. Though technically creating a CSS selector to address any element (given a static DOM) is not particularly hard either.

The DOM freezing tries to eliminate any element that isn't visible, as we are preparing to share the content, and there may be hidden information that the user doesn't realize would be shared. That's probably incidental to these use cases. Of course we also remove scripts, which shouldn't (and generally can't) run.

Another topic is all the embedded resources in the page – images and CSS primarily. Page Shot identifies these and rewrites the page replacing each resource URL with a UUID. This makes it a simple string substitution to put those resources elsewhere (we thought about storing them, but out of laziness have only proxied them). CSS can be recursive, so proxying is an incomplete solution. I had pretty good CSS inlining happening (translating all the CSS to a single <style> tag), but it breaks in a Web Extension context.

The frozen DOM is a nice context for extracting or adding other information to a page. For instance, when Page Shot had the ability to save a clipped image and save the DOM, we'd annotate the clip with its position in the DOM (finding nearby anchor nodes, and giving pixel offsets). This made it reasonable to locate the clip in the DOM in a resolution-independent manner.

We also captured the readable view of a page, as a way of trying to distill the page in some sense. But Readability.js is a really slow library and that caused problems. An area I had hoped to explore further was extracting other semantic information from pages at that moment. I haven't kept track of Fathom development, but maybe they've continued on that path. There was some mobile work around cards that also went in that direction, but I believe that was put aside.

There was also some experimental support for extracting text clips. If you selected text it would capture the selection and some surrounding context (when it worked well, the paragraph the selection was contained in). It would try to maintain limited styling, with just a whitelist of tags. I'm not sure where that code went.

The goal with saving the various layers of information was to keep many options open in terms of display. You could display an image, a text clip, you could use the Open Graph information, you can zoom into the frozen page, you have a URL to return to the real page, etc. As a counterexample, if you use the Evernote Web Clipper you have to make lots of choices about what you want to save – I wanted to create an experience in Page Shot where it would save everything it could, and treat other things as annotations (e.g., treating screenshotting as a way of highlighting a certain part of the page).

A long time ago we also tried to capture the navigational context as another annotation on the page. So if there was a search page in the history for that tab/navigation then the query would be another kind of annotation on the page, the why for the page. Two shots with serendipitous overlap might be related, and so on.

Then as all that gets collected we make a big ol' JSON file that includes everything. In some ways that object is the most interesting part, the other things are all there in the service of producing that object. That itself also opens up a question: at what moment do you create that object? The page can change, but the object can't along with it. We treat the user's intent to save something as the moment in time to save something, and if the user invokes that multiple times on the same page then we create multiple objects.

Anyway, there's a big dump of information about Page Shot/Page Archive. This is still something that interests me a great deal, even if Page Shot itself is moving in a more mechanical and less navigational direction, so it would be nice to find a home for some of its ideas.

@patrykadas
Copy link
Collaborator

Now if we were to attempt to build such a library into browser I can't really imagine an attractive and useful version of it which has catalogs presented as series of cards of plain text or a screenshots or full page views.

Once we can corelate saved snippet / card to an entry in the history (ie. particlar page within particuar trail) I think I can come up with some ideas. This would be a really nice feature, as we could show content that's connected not only structurally, but also semantically. I remember that some people requested topic-oriented browser sessions.

@BigBlueHat
Copy link

Whatever gets build out of this (and the related conversations), I'd love to see them be as small and modular and community built as possible. To many amazing bits get lumped into larger projects and ultimately lost in the long run. I'd very much like to not see that happen again. This stuff is too promising. 😃

@Gozala
Copy link
Contributor Author

Gozala commented Feb 3, 2017

Alright I have followed the rabbit-hole of web-annotation spec. There are some useful bits but some just seem over-engineered or under-engineered. As I was trying to build up understanding of the spec I've wrote down a type signatures (inline below) for Web annotations Selectors as my No 1 goal would be to create a pair of function that can translate to / from DOM Range and Web Annotation Selectors

/* @flow */

type StringEncodedCSSSelector = string

type CSSSelector = {
  type: "CssSelector",
  value: StringEncodedCSSSelector,
  refinedBy?: Selector
}

type XPath = string

type XPathSelector = {
  type: "XPathSelector",
  value: XPath,
  refinedBy?: Selector
}


type FragmentSpecification =
  | 'http://tools.ietf.org/rfc/rfc3236'
  | 'http://tools.ietf.org/rfc/rfc3778'
  | 'http://tools.ietf.org/rfc/rfc5147'
  | 'http://tools.ietf.org/rfc/rfc3023'
  | 'http://tools.ietf.org/rfc/rfc3870'
  | 'http://tools.ietf.org/rfc/rfc7111'
  | 'http://www.w3.org/TR/media-frags/'
  | 'http://www.w3.org/TR/SVG/'
  | 'http://www.idpf.org/epub/linking/cfi/epub-cfi.html'

type FragmentSelector = {
  type: "FragmentSelector",
  conformsTo: FragmentSpecification,
  value: string
}

type TextQuoteSelector = {
  exact: string,
  prefix: string,
  suffix: string,
  refinedBy?: Selector
}

type Integer = number

type TextPositionSelector = {
  start: Integer,
  end: Integer,
  refinedBy?: Selector
}

type DataPositionSelector = {
  start: Integer,
  end: Integer,
  refinedBy?: Selector
}

type SerializedSVG = string

type SVGSelector = {
  type: "SvgSelector",
  value: SerializedSVG,
  refinedBy?: Selector
}

type RangeSelector = {
  type: "RangeSelector",
  startSelector: Selector,
  endSelector: Selector,
  refinedBy?: Selector
}

type Selector =
  | FragmentSelector
  | CSSSelector
  | XPathSelector
  | TextQuoteSelector
  | TextPositionSelector
  | DataPositionSelector
  | SVGSelector
  | RangeSelector

Not everything can be entyped due to the direction chosen by a spec here is the list & it would be nice to provide this feedback to the working group somehow:

  1. FragmentSelector is an odd one, essentially it's just a proxy to one of the (less privileged) selectors (from the table below) that happen to be string encoded instead of being JSON encoded as other ones.
  2. As you can see in Example 8 (for convenience inlined below) some of the selectors presumably can outsource it's parts in an RDF like manner.
    {
        "source": "http://example.org/map1",
        "selector": {
          "type": "SvgSelector",
          "id": "http://example.org/svg1"
        }
    }
    I'm inclined to not support that, as it can't be entyped in the current form and also adds a lot of complexity
  3. I hope that Selector Refinement example(for convenience inlined below) has error and conformsTo is missing by error, otherwise I am not following the FragmentSelector spec.
    {
        "source": "http://example.org/page1",
        "selector": {
          "type": "FragmentSelector",
          "value": "para5",
          "refinedBy": {
            "type": "TextQuoteSelector",
            "exact": "Selected Text",
            "prefix": "text before the ",
            "suffix": " and text after it"
          }
        }
    }
    I'll just assume that example meant to have "conformsTo": "http://tools.ietf.org/rfc/rfc3236" there.
  4. I kind of wish that selection composition addressed by SelectorRefinement was defined as separate selector type or just a delimiter instead of adding an optional property to each every selector type, especially since it makes a very little sense for some of them. But not a huge deal.

With that said my plan is to implement following (assumes types from above):

const rangeToSelector = (range:Range):null|Selector => { ... }
const rangeFromSelector = (selector:Selector):Range => { ... }

I would like to possibly provide a parameter to configure a strategy used for selector generation, like it maybe more preferable to use TextPositionSelector over the TextQuoteSelector or vice versa, or a strategy to use while generating CSSSelector like preference of child offset vs class names.

@Gozala
Copy link
Contributor Author

Gozala commented Feb 3, 2017

I also still need digg into a part of the spec in regards to States might have something relevant.

@tilgovi
Copy link

tilgovi commented Feb 3, 2017

You will definitely want to look at my libraries, tilgovi/dom-anchor-text-position and tilgovi/dom-anchor-text-quote as these do exactly that conversion for those two selector types. I would jump in to help with XPath as well, if it's useful to you to have this.

The libraries don't actually return JSON-LD because they're meant to be deeply unopinionated.

@Gozala
Copy link
Contributor Author

Gozala commented Feb 3, 2017

@tilgovi I was actually looking at those, I'm little confused in regards to the root argument to be honest. Is there place we could chat, I think that would really help.

Thanks!

@Gozala
Copy link
Contributor Author

Gozala commented Feb 3, 2017

To be more clear what i don't understand is why not just use range.startContainer it seems like resolving it to the another anchor is somewhat irrelevant in that context.

@tilgovi
Copy link

tilgovi commented Feb 3, 2017 via email

@Gozala
Copy link
Contributor Author

Gozala commented Feb 3, 2017

Sure. Any preference as to tool for chat?

How about our slack channel ? Anything else works as well.

@tilgovi
Copy link

tilgovi commented Feb 3, 2017

Happy to, but I don't know what Slack organization or channel. Do I need an invite?

@Gozala
Copy link
Contributor Author

Gozala commented Feb 3, 2017

There is the link on the the Readme, coping it here for convenience:

slack

@tilgovi
Copy link

tilgovi commented Feb 3, 2017

The root argument is there to make things explicit and not rely on the browser range to be aware of the appropriate context for the operation.

You are welcome to pass the common ancestor container, or anything else. Often, I've found it helpful to measure from some root that the application finds appropriate, like the body element, or main element, or some other block that's meaningful to the application.

Again, my libraries try to be flexible, low level, and explicit wherever possible.

@Gozala
Copy link
Contributor Author

Gozala commented Feb 9, 2017

Status update

  • I have chatted with @tilgovi about the web annotations spec and Selectors in particular on slack. I was very confused about the RangeSelector. For example if you have a DOM tree as this one:

     a
    / | \
    c d e
    

    To express selection from c's 5th char to an e's 7th char from the end you'd need to use a RangeSelector, something along the lines of following:

      {
      "type": "RangeSelector",
      "startSelector": {
        "type": "CSSSelector",
        "value": "a > c",
        "refinedBy": {
          "type": "TextPositionSelector",
          "start": 3,
          "end": 7
        }
      },
      "endSelector": {
        "type": "CSSSelector",
        "value": "a > e",
        "refinedBy": {
          "type": "TextPositionSelector",
          "start": 0,
          "end": 7
        }
      }
    }

    But then selector.startSelector.refinedBy.end and selector.endSelector.refinedBy.start make very little sense as you don't want to have an end in the startSelection and you don't want to have start in the endSelection.

    It seemed to me that instead of using TextPositionSelector there needs to be a TextOffsetSelector so that above selector could be be in more straight forward way
    (Note: assuming that e consists of 80 characters) :

    {
      "type": "RangeSelector",
      "startSelector": {
        "type": "CSSSelector",
        "value": "a > c",
        "refinedBy": {
          "type": "TextOffsetSelector",
          "start": 3
        }
      },
      "endSelector": {
        "type": "CSSSelector",
        "value": "a > e",
        "refinedBy": {
          "type": "TextOffsetSelector",
          "offset": 73
        }
      }
    }

    @tilgovi told me that working group meant to express TextOffsetSelector via TextPositionSelector who's start and end values are the same. Which maybe not be intuitive but makes does make sense.

  • I have decided to work on the code that would be able to take DOM selection Range and return "somewhat normalized" selector as per Web Annotation spec. By "somewhat normalized" I mean that selector will be computed as follows:

    • Take range.commonAncestorContainer
      • If it is an Element node generate a CssSelector for it.
      • if it is a Text node generate a CssSelector for it's parentElement.
    • Refine generated CssSelector with a RangeSelector where startSelection is selector to a range.startContainer and endSelection is a selector to range.endContainer.
    • Both startSelection and endSelection selectors are generated by calculating CssSelector to a Text node & refined by TextPosotionSelector with same start and end values.

    So the previous example would have selector as this:

    {
      "type": "CSSSelector",
      "value": ":root > a",
      "refinedBy": {
        "type": "RangeSelector",
        "startSelector": {
          "type": "CSSSelector",
          "value": "c",
          "refinedBy": {
            "type": "TextPositionSelector",
            "start": 3,
            "end": 3
          }
        },
        "endSelector": {
          "type": "CSSSelector",
          "value": "e",
          "refinedBy": {
            "type": "TextPositionSelector",
            "start": 73,
            "end": 73
          }
        }
      }
    }
  • I have updated bookmarklet such that it uses Meta key (that is Command key on OSX) instead of Shift for triggering visual selector. Shift was not a good choice as if you already have selection clicking anything will by default extend selection which did not quite fit our intended behavior. Now you can use Meta click to select contents of highlighted element.

  • As it turns out storing target elements via bounding boxes isn't ideal because getting a desired element from that bounding box isn't going to work all that well. Specifically if you have elements stacked over each other it's would be non-trivial to capture the desired one. Either way plan was to use selectors as per "Web Annotations" spec so I'll start encoding selections with Selectors instead.

  • I've started writing code to turn DOM selection Range into a selector as described above. Here is the work in progress code. It would make sense to release pieces of it as separate libraries like @tilgovi did, but I'll do it once I have everything working as intended.

@Gozala
Copy link
Contributor Author

Gozala commented Feb 9, 2017

@tilgovi If you want to help out you could maybe work on code that would take a selector as described above and produce a visual highlighting similar to the way hypothesis does it if such fragment exists on the page.

Another area we could use help with would be with taking a selector as described above and extracting the content from the document as a markdown. In other words mapping DOM Range to a markdown + capturing things like images etc. There is good amount of existing code doing something along those lines as pointed out in the comment above

@BigBlueHat
Copy link

@Gozala really glad you and @tilgovi connected. If possible, I'd love for you to "say hi" on the Apache Annotator mailing list. @tilgovie, I and others are working to get that community on its feet building and growing a code foundation around the Web Annotation specs and the Annotator.js fork-ers. 😃

Also, depending on how quickly this comes together, it might be possible to get this project listed as an implementer of the W3C Web Annotation Data Model. Here's the current test results (fwiw). The specs should be reaching "Published Recommendation" status Real Soon Now. /cc @azaroth42 @iherman

I've signed up for a Slack channel invite also in case you have other questions. Cheers!

@Gozala
Copy link
Contributor Author

Gozala commented Feb 27, 2017

I did a poor job at posting updates last couple of days, but I'll try to do better - post daily updates. Here is a summary of where things are now:

  1. I have moved from gists to a full repo on github as it made more sense given the amount of code accumulated. So code for this effort lives under conservateur repo.

  2. I spend quite some time implementing a range highlighting & tried multiple approaches and tried to describe pros and cons of each as I explored them:

    There were few other things I tried like replacing image urls with a blob url for the modified image content, but given the limitations I've faced I gave up on those early enough that I did not spend time publishing as gists. For now the last approach seems most promising, and also most extensible. Meaning that support for more element types could be added along the way.

  3. I had some interesting meeting with @Gerben where we tried to figure out a strategy for collaboration. We took some notes but I'm afraid they're chaotic enough for anyone else to be able to make some sense. In summary though we identified set of functions that we both need and have decided we can collaborate to improve existing implementations, document and test. Basically get the library that others in the space could and would want to use.

  4. Friday I have found out that my highlighter code did not work on github properly, as it turns out there were several reasons:

    1. Styling empty text nodes that was just an indentation in markup can mess up rendering. Case in point turns out <tr>\n <td>.... text node between tr and td elements in github source view totally messes up layout. I end up ignoring white space only text nodes that avoids this issue, but would prefer a better solution (Suggestions are welcomed!).
    2. Turns out selecting code in the github.com's code view creates selection with multiple ranges, each line of code get's it's own range.

    Turns out DOM Selection API makes a lot more sense than it seemed to me. It also means I need to rethink some of my code to handle this properly. With some tweaks highlighting selections does work on github.com code view now.

Thanks everyone for suggestions here on slack and sharing your code, it really helped! I hope to seed some of these work done back eventually.

@Gozala
Copy link
Contributor Author

Gozala commented Feb 28, 2017

devlog entry

  • I have tracked down & fixed some bugs I've discovered in implementation while playing with it across different web sites. I was unable to find a case where it does no works.
  • Started entyping code final implementation from gist and integrating it into conservateur repo. Run into some annoyances due to use of generators & babels dependency on regenerator. Started rewriting iterators & utils as plain iterators but that's no fun & code is more complex.
  • Still in progress but I intend to have highlighter working in a bookmarklet by end of the day tomorrow. Once that's done I'll switch my focus to selection extraction & presentation.

@davidar
Copy link

davidar commented Apr 3, 2017

Possibly relevant: https://ctxt.io/

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants