oaicrawl

Harvest an OAI endpoint by fetching records one by one. This is a different strategy from the windowed and cached approach used in metha.

Install via go get or releases.

oaicrawl does not cache anything and will write the raw responses directly to standard output.

$ oaicrawl -f oai_dc -verbose http://www.academicpub.org/wapoai/OAI.aspx > harvest.data
...
DEBU[0025] fetched 1395 identifiers with 1 requests in 25.140568087s

Use the -b flag to crawl in a best effort way (continue in the presence of errors):

$ oaicrawl -b -f oai_dc -verbose http://www.academicpub.org/wapoai/OAI.aspx > harvest.data
...
... worker-11 backoff [3]: ..ertasacademica.com/5318&metadataPrefix=oai_dc

This crawler was written for working with endpoints that are slightly off-standard and cannot be harvested easily in chunks.

Test it yourself (might take a day to harvest completely):

$ oaicrawl -f mets -b -verbose http://zvdd.de/oai2/
...

Usage

$ oaicrawl -h
Usage of oaicrawl:
  -b    create best effort data set
  -e duration
        max elapsed time (default 10s)
  -f string
        format (default "oai_dc")
  -retry int
        max number of retries (default 3)
  -verbose
        more logging
  -version
        show version
  -w int
        number of parallel connections (default 16)

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
cmd/oaicrawl		cmd/oaicrawl
examples		examples
packaging		packaging
testdata		testdata
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
harvester.go		harvester.go
response.go		response.go
response_test.go		response_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/oaicrawl

cmd/oaicrawl

examples

examples

packaging

packaging

testdata

testdata

.gitignore

.gitignore

Makefile

Makefile

README.md

README.md

go.mod

go.mod

go.sum

go.sum

harvester.go

harvester.go

response.go

response.go

response_test.go

response_test.go

Repository files navigation

oaicrawl

Usage

About

Releases 1

Packages

Languages

miku/oaicrawl

Folders and files

Latest commit

History

Repository files navigation

oaicrawl

Usage

About

Topics

Resources

Stars

Watchers

Forks

Languages