Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the MARC exporter seems to include unpublished resources regardless of parameter #2864

Open
regineheberlein opened this issue Nov 14, 2022 · 5 comments
Assignees

Comments

@regineheberlein
Copy link

regineheberlein commented Nov 14, 2022

When retrieving a MARC-XML representation of a resource record via API, records set to "publish"=>false are included by default. I can't override this behavior by adding the include_unpublished_marc=false parameter, either (https://archivesspace.github.io/archivesspace/api/?python#get-a-marc-21-representation-of-a-resource).

(I'm also wondering about the include_marc parameter (different parameter name than in the documentation above) in https://github.com/archivesspace/archivesspace/blob/master/backend/app/exporters/models/marc21.rb, but I can't get it to work, either.)

There's every chance that this is due to a mistake on my end, but I'm beginning to suspect that it's a bug. Thanks for looking into it!

Expected Behavior

  1. By default, resource records set to "publish"=>false should not be included in the export
  2. The parameter include_unpublished_marc=false should exclude unpublished resource records from the export

Current Behavior

Currently, unpublished resource records are exported with my get requests against the MARC exporter endpoint, with or without parameter

Possible Solution

Steps to Reproduce (for bugs)

Any API get request against an unpublished resource record, e.g.

  1. @client.get("/repositories/5/resources/marc21/4265.xml"
  2. @client.get(uri, {query: { include_unpublished_marc: false }})
  3. @client.get("/repositories/5/resources/marc21/4265.xml?include_unpublished_marc=false")

Context

This results in us exposing closed records in our catalog.

Your Environment

@quoideneuf
Copy link
Collaborator

@regineheberlein It looks to me like the include_unpublished_marc flag was originally added to satisfy this issue:
https://archivesspace.atlassian.net/browse/ANW-376
I think if I understand what you are asking for correctly, it would break the workflow for people who need to export a MARC record without the unpublished subjects and agents. Perhaps we need an additional parameter (include_unpublished_resource?). However, I am not sure what should be returned when a client requests a MARC record for an unpublished resource.

@regineheberlein
Copy link
Author

@quoideneuf
Thanks for looking into this! Our use case is that we batch-export MARC records for our resource records, for purposes of loading them into our catalog. We only want to include resource records that are set to publish in ASpace. So I am, indeed, looking for nothing to be returned when I request a MARC record for an unpublished resource.

In other words, what I'm looking for is a way to include with the export only resource records that are set to publish=true. I expected the endpoint to default to that behavior; when I realized that is not the case I thought I could accomplish it by adding the parameter include_unpublished_marc=false, but even with that parameter the endpoint returns resource records set to publish=false.

@quoideneuf
Copy link
Collaborator

@regineheberlein When you say "we batch-export MARC records", does that mean that you have some locally-written code that fetches MARC records using the resource ids at the [:GET] /repositories/:repo_id/resources/marc21/:id.xml
endpoint
? If that is the case, how do you build the list of resource IDs, and would it be possible to filter out unpublished resources while building that list? Or are you using a tool like this one? https://github.com/uga-libraries/ASpace_Batch_Export-Cleanup-Upload

@regineheberlein
Copy link
Author

@quoideneuf Yes, we are using locally written code (here: https://github.com/pulibrary/aspace_helpers/blob/main/reports/aspace2alma/get_MARCxml.rb)
The below method gets the resource uri's--I'm not aware of a parameter that would allow me to filter out the unpublished resources without knowing which ones they are first:

def get_all_resource_uris_for_repos(repos = [])
  resources_endpoints = []
  repos.each do |repo|
    resources_endpoints << 'repositories/'+repo.to_s+'/resources'
    end
  @uris = []
  resources_endpoints.each do |endpoint|
    ids_by_endpoint = []
    ids_by_endpoint << @client.get(endpoint, {
      query: {
       all_ids: true
      }}).parsed
    ids_by_endpoint = ids_by_endpoint.flatten!
    ids_by_endpoint.each do |id|
      @uris << "/#{endpoint}/#{id}"
    end
  end #close resources_endpoints.each
  @uris
end #close method

@quoideneuf quoideneuf self-assigned this Dec 7, 2022
@regineheberlein
Copy link
Author

Hi Brian, just checking to see whether there's a timeline for this feature yet?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants