Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add file-type to file upload #118

Open
lincolnsherpa opened this issue Mar 12, 2021 · 8 comments · May be fixed by #142
Open

Add file-type to file upload #118

lincolnsherpa opened this issue Mar 12, 2021 · 8 comments · May be fixed by #142
Assignees
Labels
pkg:api api related activities prio:high status:confirmed Is a valid issue and will be moved forward soon. type:bug Something isn't working

Comments

@lincolnsherpa
Copy link

lincolnsherpa commented Mar 12, 2021

Hello,
-While uploading datafile, the file-type is not recognised ( it is by default text/plain) while uploading. Even if contentType field is assigned (manually) using set command, the contentType again goes to default

@lincolnsherpa lincolnsherpa added the status:incoming Newly created issue to be forwarded label Mar 12, 2021
@skasberger skasberger changed the title Edit/Replace/Add fields for metadata using Pydataverse Add file-type to file upload Mar 13, 2021
@skasberger skasberger self-assigned this Mar 13, 2021
@skasberger skasberger added pkg:api api related activities prio:high status:confirmed Is a valid issue and will be moved forward soon. type:bug Something isn't working and removed status:incoming Newly created issue to be forwarded labels Mar 13, 2021
@skasberger skasberger added this to the v0.4.0 milestone Mar 13, 2021
@skasberger skasberger mentioned this issue Mar 14, 2021
35 tasks
@skasberger skasberger modified the milestones: v0.4.0, v0.3.1 Apr 1, 2021
landreev added a commit to landreev/pyDataverse that referenced this issue Jan 21, 2022
…e method use the default content type when encoding the POST request if it's not supplied explicitly. (gdcc#118)
@landreev
Copy link

Yes, I can confirm that this is the case - there is currently no way to pass the mime type to upload_datafile(); AND all files uploaded via pyDataverse end up with the mime type "text/plain" (not "type unknown" or "application/octet-stream", but "text/plain" specifically!).
I believe I know why this happens; see the issue IQSS/dataverse#8344 in the main Dataverse project.
In short: inside the upload_datafile() method, when the multi-part POST form is created, NO content type is specified for the upload. This apparently fools Dataverse into defaulting to "text/plain", without attempting to use its normal type detection methods. This defaulting behavior can and should be addressed on the Dataverse side. But it should be a good idea to fix it on the pyDataverse side as well; and a) provide a way to supply the mime type explicitly; and b) make it default to the standard application/octet-stream - a polite way to say "type unknown" - like curl does; which then prompts Dataverse to at least attempt to identify the file more accurately.
I will make a PR shortly for your consideration.

@landreev landreev linked a pull request Jan 21, 2022 that will close this issue
27 tasks
@matthew-a-dunlap
Copy link

matthew-a-dunlap commented Jun 30, 2022

Fwiw, the solution proposed does not work with older versions of Dataverse (in our case 5.3). The solution we found at Odum was to add the mime type explicitly to the files.

If someone needs to support this with an older install, the work is here https://github.com/OdumInstitute/pyDataverse/tree/mime_type_upload . Note that to use this functionality you'll have to install a package in your project to get the mime type for your file. We use python-magic (and the underlying libmagic library).

I decided not to create a PR for this because my understanding of pyDataverse is that it doesn't try to support the intricacies of older Dataverse versions. But if this work is something that the community wants I can create an issue and a PR for it.

@skasberger
Copy link
Member

@matthew-a-dunlap pyDataverse tries to help everyone with any kind of Dataverse version, so your solution would be really nice to be merged. The problem is, I am not funded anymore, so there is no one right now maintaining this repo. And it would need some proper testing and reviewing before it can be merged (and then a release later on to merge it to master).

@pdurbin
Copy link
Member

pdurbin commented Feb 14, 2024

As discussed during the 2024-02-14 meeting of the pyDataverse working group, we are closing old milestones in favor of a new project board at https://github.com/orgs/gdcc/projects/1 and removing issues (like this one) from those old milestones. Please feel free to join the working group! You can find us at https://py.gdcc.io and https://dataverse.zulipchat.com/#narrow/stream/377090-python

@pdurbin pdurbin removed this from the v0.4.0 milestone Feb 14, 2024
@landreev
Copy link

(please note that I made a quick/trivial PR addressing this issue 2 years ago - #142; I don't know/haven't checked if it's still relevant)

@pdurbin
Copy link
Member

pdurbin commented Feb 27, 2024

@landreev hmm, thanks, I just brought it up on Zulip: https://dataverse.zulipchat.com/#narrow/stream/377090-python/topic/Add.20file-type.20to.20file.20upload.20.23118/near/423708188

@pdurbin pdurbin linked a pull request Feb 28, 2024 that will close this issue
27 tasks
@pdurbin
Copy link
Member

pdurbin commented Mar 27, 2024

@lincolnsherpa hi! Nice seeing you in Braga last June. Great talk.

As @JR-1991 and I discussed (recording), we're pretty sure this has been fixed in the default (master) branch thanks to a switch from requests to httpx in #174. Are you interested in re-testing? Thanks!

@lincolnsherpa
Copy link
Author

lincolnsherpa commented Mar 27, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg:api api related activities prio:high status:confirmed Is a valid issue and will be moved forward soon. type:bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants