purl-fetcher

An HTTP API for querying and updating PURLs. See the API section below for docs. Purl-fetcher is only a cache which enables the access portfolio to efficiently index and query data such as release tags and collection memberships. It is not the canonical source for any information.

Requirements

Ruby (3.2 or greater)
bundler gem
Apache Kafka (0.10 or greater), or Docker

Installation

Clone the repository:

git clone https://github.com/sul-dlss/purl-fetcher.git
cd purl-fetcher

Install dependencies:

bundle install

Set up the database:

rake db:migrate

Developing

The API communicates with a Kafka broker to dispatch and process updates asynchronously. You can run a Kafka broker locally, or use the provided docker-compose configuration:

docker-compose up

Then, in a separate terminal, start a development API server:

bin/rails server

Making requests

You can make requests to the API using curl or a similar tool. To add an object to the database, you can first download its public Cocina JSON from production PURL:

curl https://purl.stanford.edu/bb112zx3193.json > bb112zx3193.json

Then, you can use the POST /purls/:druid endpoint to add the object to the database:

curl -X POST -H "Content-Type: application/json" -d @bb112zx3193.json http://localhost:3000/purls/bb112zx3193

After the object has been added, it will be added to the Kafka topic for indexing.

Testing

The full test suite (with RuboCop style enforcement) can be run with the default rake task:

rake

The tests can be run without RuboCop style enforcement:

rake spec

The RuboCop style enforcement can be run without running the tests:

rake rubocop

API

Purls

GET `/purls/:druid`

GET /purls/:druid

Summary

Display a single purl

Description

The GET /purls/:druid endpoint provides the ability to display a PURL document. This endpoint is used by purl to know if an item should be in the sitemap

Parameters

Name	Located In	Description	Required	Schema	Default
`druid`	url	Druid of a specific PURL	Yes	string eg(`druid:cc1111dd2222`)	null
`version`	header	Version of the API request eg(`version=1`)	No	integer	1

Example Response

{
  "druid": "druid:dd111ee2222",
  "latest_change": "2014-01-01T00:00:00Z",
  "true_targets": ["PURL sitemap"],
  "collections": ["druid:oo000oo0001"]
}

POST `/purls/:druid`

POST /purls/:druid

Summary

Purl Document Update

Description

The POST /purls/:druid endpoint provides the ability to create or update a PURL document from public Cocina JSON. This endpoint is used by dor-services-app as part of SDR workflows.

Parameters

Name	Located In	Description	Required	Schema	Default
`druid`	url	Druid of a specific PURL	Yes	string eg(`druid:cc1111dd2222`)	null
`version`	header	Version of the API request eg(`version=1`)	No	integer	1

Example Response

true

Collections

`/collections/:druid/purls`

GET /collections/:druid/purls

Summary

Collection Purls route

Description

The /collections/:druid/purls endpoint a listing of Purls for a specific collection. This endpoint is used by the Exhibits application.

Parameters

Name	Located In	Description	Required	Schema	Default
`druid`	url	Druid of a specific collection	Yes	string eg(`druid:cc1111dd2222`)	null
`page`	query	request a specific page of results	No	integer	1
`per_page`	query	Limit the number of results per page	No	integer (1 - 10000)	100
`version`	header	Version of the API request eg(`version=1`)	No	integer	1

Example Response

{
  "purls": [
    {
      "druid": "druid:ee111ff2222",
      "published_at": "2013-01-01T00:00:00.000Z",
      "deleted_at": "2016-01-03T00:00:00.000Z",
      "object_type": "set",
      "catkey": "",
      "title": "Some test object number 4",
      "collections": [
        "druid:ff111gg2222"
      ],
      "true_targets": [
        "SearchWorksPreview"
      ]
    },
...
    {
      "druid": "druid:cc111dd2222",
      "published_at": "2016-01-01T00:00:00.000Z",
      "deleted_at": "2016-01-02T00:00:00.000Z",
      "object_type": "item",
      "catkey": "567",
      "title": "Some test object number 2",
      "collections": [
        "druid:ff111gg2222"
      ],
      "true_targets": [
        "SearchWorksPreview"
      ],
      "false_targets": [
        "SearchWorks"
      ]
    }
  ],
  "pages": {
    "current_page": 1,
    "next_page": null,
    "prev_page": null,
    "total_pages": 1,
    "per_page": 100,
    "offset_value": 0,
    "first_page?": true,
    "last_page?": true
  }
}

Released items

GET `/released/:tag`

Parameters

Name	Located In	Description	Required	Schema	Default
`tag`	url	Tag to search for	Yes	string eg(`PURL%20sitemap`)	null

Summary

List the PURLs that should display on the sitemap.

Description

This is used by the PURL application to generate a sitemap

Example Response

[
    {
      "druid": "druid:ee111ff2222",
      "updated_at": "2016-01-03T00:00:00.000Z",
    },
...
    {
      "druid": "druid:cc111dd2222",
      "updated_at": "2016-01-02T00:00:00.000Z",
    }
]

PUT `/v1/released/:druid`

Parameters

Name	Located In	Description	Required	Schema	Default
`druid`	url	object identifier	Yes	string eg(`druid:bc123df4567`)	null
`actions`	body	list of actions to take on the object. This object should contain two keys, "index" and "delete", each value is an array of properties to release to.	Yes	object	null

Summary

Set the release tags for an item

Description

This tells purl-fetcher to update the cache of release tags and puts messages on the appropriate Kafka streams.

Example Response

204 Accepted true

Administration

Reindexing

You can create Kafka messages that will cause all the Purls to be reindexed by doing:

Purl.unscoped.find_in_batches.with_index do |group, batch|
  puts "Processing group ##{batch}"
  group.each(&:produce_indexer_log_message)
end

Or only for searchworks:

Purl.target('Searchworks').find_in_batches.with_index do |group, batch|
  puts "Processing group ##{batch}"
  Racecar.wait_for_delivery do
    group.each { |purl| purl.produce_indexer_log_message(async: true) }
  end
end

Reporting

The API's internals use an ActiveRecord data model to manage various information about published PURLs. This model consists of Purl, Collection, and ReleaseTag active records. See app/models/ and db/schema.rb for details.

This approach provides administrators a couple ways to explore the data outside of the API.

Using Rails runner

With Rails' runner, you can query the database using ActiveRecord. For example, running the Ruby in script/reports/summary.rb using:

RAILS_ENV=environment bundle exec rails runner script/reports/summary.rb

produces output like this:

Summary report as of 2016-08-24 09:52:49 -0700 on purl-fetcher-dev.stanford.edu
PURLs: 193960
Deleted PURLs: 1
Published PURLs: 193959
Published PURLs in last week: 0
Released to SearchWorks: 5

Using SQL

With Rails' dbconsole, you can query the database using SQL. For example, running the SQL in script/reports/summary.sql using:

RAILS_ENV=environment bundle exec rails dbconsole -p < script/reports/summary.sql

produces output like this:

PURLs	193960
Deleted PURLs	1
Published PURLs	193959
Published this year	9
Released to SearchWorks	5

Authentication

To generate an authentication token run RAILS_ENV=production bin/rails generate_token on the prod server. This will use the HMAC secret to sign the token. It will ask you to submit a value for "Account". This should be the name of the calling service, or a username if this is to be used by a specific individual. This value is used for traceability of errors and can be seen in the "Context" section of a Honeybadger error. For example:

{"invoked_by" => "workflow-service"}

Name		Name	Last commit message	Last commit date
Latest commit History 1,706 Commits
.github/workflows		.github/workflows
app		app
bin		bin
config		config
db		db
lib		lib
log		log
public		public
script/reports		script/reports
spec		spec
test/fixtures		test/fixtures
.gitignore		.gitignore
.rspec		.rspec
.rubocop.yml		.rubocop.yml
.rubocop_todo.yml		.rubocop_todo.yml
Capfile		Capfile
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
Jenkinsfile		Jenkinsfile
README.md		README.md
Rakefile		Rakefile
config.ru		config.ru
docker-compose.yml		docker-compose.yml

sul-dlss/purl-fetcher

Folders and files

Latest commit

History

Repository files navigation

purl-fetcher

Requirements

Installation

Developing

Making requests

Testing

API

Purls

GET /purls/:druid

Summary

Description

Parameters

Example Response

POST /purls/:druid

Summary

Description

Parameters

Example Response

Collections

/collections/:druid/purls

Summary

Description

Parameters

Example Response

Released items

GET /released/:tag

Parameters

Summary

Description

Example Response

PUT /v1/released/:druid

Parameters

Summary

Description

Example Response

Administration

Reindexing

Reporting

Using Rails runner

Using SQL

Authentication

About

Topics

Resources

Stars

Watchers

Forks

Languages

GET `/purls/:druid`

POST `/purls/:druid`

`/collections/:druid/purls`

GET `/released/:tag`

PUT `/v1/released/:druid`