HacksQueriesAndScripts
Dan Brickley edited this page Apr 5, 2016
·
3 revisions
Sometimes the best way to query schema.org's dataset is via commandline tools. This is a place to share ugly hacks:
N-Triples is a simple W3C format for representing RDF graphs, such as schema.org's data/schema.rdfa
Here is a Python rdflib wrapper to use e.g. as bin/rdfa:
#!/usr/bin/env python
import sys
from rdflib import Graph
if __name__ == '__main__':
g = Graph()
p = Graph()
u = str(sys.argv[1])
print "URL: ",u
g.parse(u, format='rdfa', pgraph=p)#, charset="utf8")
print "# errors: "
# print p.serialize(format="nt")
print "# data: "
print g.serialize(format="nt", encoding="utf-8")
(it can report syntax errors if you uncomment the line)
- rdfa ../../schema.rdfa | grep domainIncludes | grep Vehicle | awk '{print $1}' | sed 's#http://schema.org/##' | sed 's/[<>]//g'
rdfa schema.rdfa | grep -i date | grep Property | awk '{print $1}'
<http://schema.org/productionDate>
<http://schema.org/scheduledPaymentDate>
<http://schema.org/orderDate>
<http://schema.org/liveBlogUpdate>
<http://schema.org/candidate>
<http://schema.org/dateModified>
<http://schema.org/datePublished>
<http://schema.org/releaseDate>
<http://schema.org/candidate>
<http://schema.org/endDate>
<http://schema.org/dateline>
<http://schema.org/startDate>
<http://schema.org/dissolutionDate>
<http://schema.org/uploadDate>
<http://schema.org/vehicleModelDate>
<http://schema.org/deathDate>
<http://schema.org/foundingDate>
<http://schema.org/dateCreated>
<http://schema.org/purchaseDate>
<http://schema.org/dateIssued>
<http://schema.org/birthDate>
<http://schema.org/previousStartDate>
<http://schema.org/guidelineDate>
<http://schema.org/dateVehicleFirstRegistered>
<http://schema.org/datePosted>
./scripts/rdfa2nt data/schema.rdfa | grep "Property" | grep '#type' | grep '/is'
Sometimes a real query language is useful.
Links:
- Dydra e.g. http://dydra.com/danbri/schema-org
- tests/test_graphs.py - our unit tests can be written in SPARQL