Skip to content
This repository has been archived by the owner on Mar 11, 2021. It is now read-only.

Wishlist

Ted Han edited this page Jun 12, 2013 · 3 revisions

This is a page of ideas for the DocumentCloud platform which interested parties can add to or cherry pick from. If you'd like to help out, drop us an email or a tweet!

Annotizer

provide journalists who have annotated a corpus of docs and written an article from that research with a system which will allow the author to paste their article text/markup in, log into documentcloud, highlight portions of the article and associate a note with a sentence/phrase. The system should spit back out some HTML and JavaScript.

Newsworthy Docs

Use a twitter search for documentcloud urls as an index to browse noteworthy documents and the discussions around them. Perhaps classify the social network discussing each doc.

Comparison view

use two viewers and alignment data for two documents to link together sections or notes in the two viewers to highlight similarities. Or, a system to link together/align two documents, perhaps automated(such as super fast match) or manual.

Dropbox sync

drop files into a Dropbox folder and have your documents automatically upload to documentcloud. (This one is hard and requires mucking with DC internals)

NYT style inverted embedding

Rather than a viewer with a note/section index, provide an authoring workspace and/or pane in the viewer allowing for a more structured narrative surrounding the notes list. http://www.nytimes.com/interactive/2012/06/29/us/scotus-healthcare-document-annotations.html

Non-linear document exploration

Places like the WSJ have done an exemplary job of providing readers with non-traditional interfaces for exploring collections of documents. Thinking about other ways notes, document sets and viewers can be mashed up is an avenue ripe for exploration

http://projects.wsj.com/surveillance-catalog/

Popcorn.js + DocumentCloud

DocumentCloud and the Popcorn.js project put together a basic Popcorn.js plugin allowing Popcorn users to trigger the opening and navigation of documents.

Navis-DocumentCloud Wordpress plugin

NPR's Argo / State Impact Project has created a wordpress plugin for embedding documentcloud documents. As it stands when provided with a documentcloud URL, the plugin will create a short code.

New types of notes, or Media embedding into notes.

Journalists can annotate documents through DocumentCloud and use HTML in notes to provide more than just text. Images, videos and other multi-media are all embeddable within DocumentCloud notes. The Wall Street Journal used multimedia to embed clips of President Obama's 2013 State of the Union onto their annotated transcript of the speech. Alternative ways to thread notes with media, and make it salient/useful to readers is one avenue to explore

http://online.wsj.com/article/SB10001424127887323696404578300814056031032.html

Annotating with text or multimedia is only one type of document highlighting. There are other sorts of ways that readers might want to interact w/ or explore using a document as an index to other content, or visa versa.

Document browsing via entity lists and relevance.

DocumentCloud makes entities, their position(s) and relevance scores available via the DocumentCloud API. These figures are currently only lightly used within DocumentCloud, and not readily accessible to general readers in the viewer, or in the search embed. Entities could be used as a way to browse through pages of a document, or if retrieved as a batch to search through a collection of documents.

Topic specific entity extractors

DocumentCloud's entity extraction is primarily built on top of Thompson Reuters's black-box OpenCalais system. OpenCalais serves as a fantastic general purpose entity extractor, but provides no facility towards entities for specific domains or topics. For example, the State Decoded project uses regular expressions to identify references in a state's legal code to other sections or subsections w/in that legal code. Another example would be identifying the standard references or citations in court documents to court cases or statute, which might be applied to present that information to readers or journalists.

Better ways to display documents in a timeline

There are now a variety of tools that journalists and developers can use to present information on a timeline. Thus far none of them accommodate embedding documents/notes especially well (or visa versa). Exploring how pieces of documents could be integrated into a timeline has fruitful possibilities.