Skip to content

Getting started

timrdf edited this page Sep 1, 2012 · 74 revisions

See my latest talk on DataFAQs!

What do you want to do?

I want to see results!

Head to http://aquarius.tw.rpi.edu/projects/datafaqs/ to see the 332 lodcloud datasets (those in the LOD Cloud diagram).

I want to know why you're making DataFAQs

What is quality?

We don't know -- and want to find out. Although we have some ideas about [what makes up data quality](Data Quality), we're really sure that others have different and better ideas. That's why DataFAQs is designed so that others can share their views on what makes "good data".

We're kicking around some ideas for using DataFAQs, and jotting notes here:

I want to evaluate others' datasets with others' existing services

Do you want to get your feet wet?

We hope you do!

This is the simplest route, since you don't have to worry about publishing data or writing an evaluation service. After you finish this, hopefully you'll want to move on to analyze your own datasets or [write an evaluation service](FAqT Service) to reflect a quality characteristic that you think is important for you and that others should use against their datasets.

Take this route:

  • [Install DataFAQs](Installing DataFAQs) on your local machine or server.
  • Set up the DATAFAQS environment variables to specify some directory locations and processing options.
  • Write an epoch configuration to [select the evaluation services to inspect](Selecting the evaluation services to apply) and the [datasets to analyze](Selecting the datasets to analyze).
  • Run an epoch to start your analyses, storing the results in a FAqT Brick.
  • Look at the results (by SPARQL-querying the FAqT Brick or using the [default views](FAqT Brick Explorer)).
  • Repeat the analysis every day, and watch the quality of your data grow!

I want to evaluate my own datasets with others' existing services

If you're publishing data, would you like to know what your audience thinks about it? Would you like to get status updates for how well your published data is doing? Would you like concrete, actionable analysis that leads you towards publishing better data?

We do too.

Take this route:

  • Listing your dataset at CKAN is a quick and easy way to announce your dataset. This will let more people find it. Plus, a bunch of systems are built to pull from CKAN's listings (including DataFAQs). So it's a win-win.
  • If you have a pile of datasets and want to avoid manually entering them into CKAN, they have a pretty simple API. Unfortunately, if you want to get into the LOD Cloud, then you have to go through some extra hoops and use some barely documented conventions. We think it'd be nicer to describe your datasets using [RDF to begin with](CKAN lodcloud RDF vocabulary), and let some thingamawidget submit it to CKAN for you.
  • If you use some thingamawidget to submit your datasets to CKAN, you'll need to make sure you're not Missing CKAN API Key.
  • DCAT Data Catalog Vocabulary - another convention from which one can find out about datasets.
  • LOD Cloud - the subset of Linked Data that is in the lodcloud CKAN group.

I want to tell people how much I like/dislike their dataset

Are you trying to use other peoples' data? Are they making it harder than it needs to be for you to use it? Want to let them know? After you go through the hassle of telling them, would you like it if other data publishers heeded your feedback without you having to lift finger?

We do too.

Take this route:

I want to know how to analyze the results

We do too.

Take this route:

I want nuts and bolts

I want some background

Related work:

  • Pedantic Web Group
  • Integration tools
  • Validation tools
  • Testing apparatuses
  • frbr:lebo2012datafaqs
Clone this wiki locally