Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure licenses get inherited from parent targets that are not host-level URLs #37

Open
anjackson opened this issue Jun 22, 2023 · 1 comment
Labels

Comments

@anjackson
Copy link
Contributor

anjackson commented Jun 22, 2023

We have an example target which is a 'child' of another target that is open access:

https://twitter.com/rpoonline/
https://twitter.com/rpoonline/status/1544706478359068675/

This is showing up in this Collections page as 'available only in Reading Rooms'.

At the time of writing, we've not actually crawled the child URL, but I don't think this is the cause of the issue.

The problem may instead lie with this license logic:

# Determine license status:
licenses = []
if target.get('isOA', False):
licenses = target.get("license_ids",[])
# Use a special value to indicate an inherited license:
if len(licenses) == 0:
licenses = ['1000']

...generated at...

targets[tid]['isOA'] = check_oa_status(targets[tid])

...but this gets populated at...

# Second pass to add inherited statuses:
# FIXME Both should be inherited from all higher-level Targets. This version only inherits from hosts.
for tid in targets:
for url in targets[tid].get('urls',[]):
parsed_uri = urlparse(url)
base = '{uri.scheme}://{uri.netloc}/'.format(uri=parsed_uri)
if base in oa_urls and not targets[tid]['isOA']:
targets[tid]['isOA'] = True
targets[tid]['inheritsOA'] = True
if base in npld_urls and not targets[tid]['isNPLD']:
targets[tid]['isNPLD'] = True
targets[tid]['inheritsNPLD'] = True

NOTE the FIXME that this version only inherits from host-level records. So this looks like an old outstanding issue that needs to be resolved.

@anjackson anjackson added the bug label Jun 22, 2023
@anjackson
Copy link
Contributor Author

Hah, no, this isn't what's going on. Complete rewrite of issue needed!

@anjackson anjackson changed the title Review what happens with seeds that have not been crawled Check if license inheritance is working Jun 22, 2023
@anjackson anjackson changed the title Check if license inheritance is working Ensure licenses get inherited from parent targets that are not host-level URLs Jun 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant