Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retain Folder Structure in Uploads #1828

Open
amandabee opened this issue Dec 8, 2023 · 3 comments
Open

Retain Folder Structure in Uploads #1828

amandabee opened this issue Dec 8, 2023 · 3 comments
Assignees

Comments

@amandabee
Copy link

We may need to address some of this at the Mail and Scans level. But currently, if an agency provides responsive materials in a drive or zip file, the material provided by the agency is stripped of any folder structures. So if they send us material like:

Staff Rosters
│   roster.csv
│
└───Prior Years
│   └───2023
│       │   roster.csv
│       │   ...
│   └───2022
│       │   roster.csv
│       │   ...

What ends up on the actual request will be:

│   roster.csv
│   roster.csv
│   roster.csv

Materials are often organized into a folder by case or incident, and even if the files have distinct names, the folder(s) those files sit in include relevant metadata. Currently, our process strips all of that.

@amandabee
Copy link
Author

amandabee commented Dec 8, 2023

Allan did play with some potential visual treatments for this in a related exploration at: https://www.notion.so/muckrock/Allan-s-Thoughts-and-Ideas-afbe7c3a49b74e49acebcded8b55d0da

@mitchelljkotler mitchelljkotler self-assigned this Dec 11, 2023
@amandabee
Copy link
Author

amandabee commented Jan 18, 2024

Here are two examples of requests where the folder structure was nuked:

  1. https://www.muckrock.com/foi/tulare-3477/2023-sb1421sb16-request-tulare-police-department-139456/

This looks like it was a direct upload by the agency, but I'm still not good at telling for sure. Because the folders were nuked, there are a lot of files named "interview_transcript.pdf" that were initially in folders.

  1. https://www.muckrock.com/foi/richmond-3396/sb1421-records-2022-123053/

Sharepoint. It looks like there are a lot of video files that were provided that were stripped of folders structure. I can't access the original sharepoint directory so I don't know what the file structure looked like.

@amandabee
Copy link
Author

amandabee commented Jan 24, 2024

Sharepoint:

  • We can potentially download a zip file, but often CRP responses include video files, so we'd be stuck with a zip file too big to download. In some cases it can be per-folder zip files.
  • Downloading it as a zip means you can't skim the files.
  • Downloading it as a zip would also make it very hard to pull it over to DocumentCloud.

Direct Uploads:

  • You can't upload folders, we don't support that. So the agency would have had to open individual folders. Most requests don't have 245 responsive documents. We can assess what it would take to incorporate folder support into direct uploads.
  • We would need to modify our data model to support folders on MuckRock. We can come up with an estimate, but we'd need funding to make the change to our systems. This would be a multi-week project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants