Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude datasets on Tape or Disk from CLI query by default #1045

Open
mmarchegiani opened this issue Feb 22, 2024 · 5 comments
Open

Exclude datasets on Tape or Disk from CLI query by default #1045

mmarchegiani opened this issue Feb 22, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@mmarchegiani
Copy link

mmarchegiani commented Feb 22, 2024

The DataDiscoveryCLI is saving in the fileset_available.json.gz file also files that are not readable since they are stored on Tape sites.
As an example, this record is stored in the available fileset when querying the dataset /DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18NanoAODv9-106X*/NANOAODSIM:

{'root://lyogrid06.in2p3.fr:1094//dpm/in2p3.fr/home/cms/data//store/mc/RunIISummer20UL18NanoAODv9/DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/NANOAODSIM/106X_upgrade2018_realistic_v16_L1v1-v2/230000/00EA9563-5449-D24E-9566-98AE8E2A61AE.root': {'object_path': 'Events',
  'steps': None,
  'uuid': None}}

which has also the 'steps' field as None.

By direct query to DAS with dasgoclient we can find the site where the file is stored:

$ dasgoclient -query "site file=/store/mc/RunIISummer20UL18NanoAODv9/DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/NANOAODSIM/106X_upgrade2018_realistic_v16_L1v1-v2/230000/00EA9563-5449-D24E-9566-98AE8E2A61AE.root"

T1_FR_CCIN2P3_Tape
[...]

It would be desirable to exclude the Tape and Disk sites by default, since these files are not readable through xrootd.

At the moment the only way out is to exclude the offending site by blacklist or via regular expression.

FYI @iasonkrom @valsdav

@mmarchegiani mmarchegiani added the enhancement New feature or request label Feb 22, 2024
@lgray
Copy link
Collaborator

lgray commented Feb 22, 2024

Good catch - thanks!

@ikrommyd
Copy link
Contributor

@valsdav, Could you please take care of this when you get the time?

@andrzejnovak
Copy link
Collaborator

Hold up, why is Disk not reachable over xrd? Also this shouldn't happen quietly ever, preferably the user should get an error with a message of how to opt-in into skipping files

@lgray
Copy link
Collaborator

lgray commented Mar 11, 2024

Yes - agreed - an error and opt-in would be a much better outcome than silently shifting your endpoints around.

@ikrommyd
Copy link
Contributor

You can select file sites one by one if I remember correctly but there's too many of them to do over whole datasets.
When I created some fileset jsons last week that had Tape sites inside them I would get a TLS error during reading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants