Skip to content

HacksQueriesAndScripts

Dan Brickley edited this page Apr 5, 2016 · 3 revisions

Sometimes the best way to query schema.org's dataset is via commandline tools. This is a place to share ugly hacks:

N-Triples

N-Triples is a simple W3C format for representing RDF graphs, such as schema.org's data/schema.rdfa

Here is a Python rdflib wrapper to use e.g. as bin/rdfa:

#!/usr/bin/env python
import sys
from rdflib import Graph
if __name__ == '__main__':
        g = Graph()
        p = Graph()
        u = str(sys.argv[1])
        print "URL: ",u
        g.parse(u, format='rdfa', pgraph=p)#, charset="utf8")
        print "# errors: "
#        print p.serialize(format="nt")
        print "# data: "
        print g.serialize(format="nt", encoding="utf-8")

(it can report syntax errors if you uncomment the line)

Awk and sed scripts against ntriples

  • rdfa ../../schema.rdfa | grep domainIncludes | grep Vehicle | awk '{print $1}' | sed 's#http://schema.org/##' | sed 's/[<>]//g'

properties with 'date' in their name

rdfa schema.rdfa | grep -i date | grep Property | awk '{print $1}'
    <http://schema.org/productionDate>
    <http://schema.org/scheduledPaymentDate>
    <http://schema.org/orderDate>
    <http://schema.org/liveBlogUpdate>
    <http://schema.org/candidate>
    <http://schema.org/dateModified>
    <http://schema.org/datePublished>
    <http://schema.org/releaseDate>
    <http://schema.org/candidate>
    <http://schema.org/endDate>
    <http://schema.org/dateline>
    <http://schema.org/startDate>
    <http://schema.org/dissolutionDate>
    <http://schema.org/uploadDate>
    <http://schema.org/vehicleModelDate>
    <http://schema.org/deathDate>
    <http://schema.org/foundingDate>
    <http://schema.org/dateCreated>
    <http://schema.org/purchaseDate>
    <http://schema.org/dateIssued>
    <http://schema.org/birthDate>
    <http://schema.org/previousStartDate>
    <http://schema.org/guidelineDate>
    <http://schema.org/dateVehicleFirstRegistered>
    <http://schema.org/datePosted>

Properties with 'is' in their name, shell script

./scripts/rdfa2nt data/schema.rdfa | grep "Property" | grep '#type' | grep '/is'

SPARQL

Sometimes a real query language is useful.

Links: