Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spreadsheet viewer has trouble displaying large tabular files #20

Open
jggautier opened this issue Jan 24, 2023 · 2 comments
Open

Spreadsheet viewer has trouble displaying large tabular files #20

jggautier opened this issue Jan 24, 2023 · 2 comments

Comments

@jggautier
Copy link

jggautier commented Jan 24, 2023

A depositor reported last week that the spreadsheet viewer is having trouble viewing the CSV file they uploaded to the Harvard Dataverse Repository.

Because the file is not published, I can't share it publicly, but the depositor said I could share it privately with any colleagues who want to do more digging. In the meantime, the depositor wrote that they'll add a note in the dataset or file metadata to explain the situation with the file previewer.

The file is 17.4 MB, with 10 columns and 134 rows. The cells in one of the columns has a lot of text. Once the spreadsheet viewer is able to load the preview, it doesn't display all of the columns right away and there's no indication that the viewer is still trying to load parts of the file. This made the depositor think that the viewer would never display all of the columns.

Questions
How quickly the viewer can show the entire tabular file depends at least partly on the user's internet speed and/or computer. Is those two factors?

Recommendations

  • Let users know that the previewer is trying to display the file. This way users know if the viewer has finished trying and has failed to display all or parts of the file. Sometimes I do see an error graphic in the Preview tab indicating that the preview failed to load, but I don't with the 17.4 MB file and other larger files I've looked at.
  • Let installation's set a byte size limit specifically for the spreadsheet viewer.
    • If each installation is allowed to set the byte size limit for the spreadsheet viewer, then installation admins would have to figure out which limit to set, maybe by doing some performance testing to answer questions like what's the largest tabular file that the spreadsheet viewer can display using the "average" computer and "average" internet speed (assuming that those are factors in how quickly the previewer can display tabular files).
  • Make the spreadsheet viewer show only a certain number of rows and columns, and let users know that only a certain number of rows and columns are being shown.
    • This way the size of the file doesn't matter as much.
  • Let depositors turn off the previewer for certain files
    • This solution might scale best if enough depositors are aware of the previewer and aware of how to turn it off when they don't like how it displays their files. So in addition to testing this functionality with users before it's implemented, after it's implemented we would need to review the number of depositors who turn off the previewer versus the number of files that are too large to display quickly, to see if most depositors have turned off the previewer for files that cannot be displayed on "average" computers and over "average" internet speeds (if those are two factors)
@claudiodsf
Copy link

Hi, I was going to post on this same problem today, when I saw this new issue 🙃

We have the same problem on a not-yet-published dataset, which I cannot share, but I found an example on Harvard Dataverse (89.5 MB - 145 Variables, 56200 Observations).

https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/D1N0GO/3NK9D8

I agree on the proposition that there should be a limit on file size (bytes or number of observations) which the admin could configure at install time.

@pdurbin
Copy link
Member

pdurbin commented Jan 25, 2023

It's a somewhat longstanding problem so thank you to @jggautier and @claudiodsf for getting the discussion going here. 😄

My first thought is that the next version of Dataverse (5.13 probably) will include a new feature for the external tools framework whereby tools can express "requirements" that they need to operate. Here's an example...

  "requirements": {
    "auxFilesExist": [
      {
        "formatTag": "NcML",
        "formatVersion": "0.1"
      }
    ]
  }

... from this pull request:

What's going on here is that the NcML preview tool has a requirement that a certain auxiliary file be present for the eyeball to show up (to offer a preview, that is).

Perhaps, like @jggautier suggested with "let installations set a byte size limit specifically for the spreadsheet viewer" each tool could express a size limit, something like this:

  "requirements": {
    "sizeLimitInBytes": 8388608
  }

The idea would be to simply not show the eyeball for large files.

We could get fancier, of course, as suggested above (preview only some rows) and maybe the logic should be in the spreadsheet viewer itself, but I thought I'd at least mention this new "requirements" feature.

For now, docs are here (look for "requirements"): http://preview.guides.gdcc.io/en/develop/api/external-tools.html

It was added in this PR:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants