Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't save HTML file attachments #75

Open
davidswelt opened this issue May 17, 2017 · 9 comments
Open

Can't save HTML file attachments #75

davidswelt opened this issue May 17, 2017 · 9 comments

Comments

@davidswelt
Copy link

davidswelt commented May 17, 2017

Could support for arbitrary files, or at least HTML attachments be added?

Content type is contentType': u'text/html', and calling the "file" method on this attachment item produces errors that vary with python version, e.g. "UnicodeEncodeError: 'ascii' codec can't encode characters in position 11-12: ordinal not in range(128)", or a ValueError in the latest PyZotero.

Would it make sense to fail gracefully or throw a documented exception?

File "./zot.py", line 1584, in dumpFiles f.write(self.zot.file(item.key)) File "/usr/local/lib/python2.7/site-packages/pyzotero/zotero.py", line 187, in wrapped_f return retrieved.json() File "/usr/local/lib/python2.7/site-packages/requests/models.py", line 819, in json return json.loads(self.text, **kwargs) File "/usr/local/Cellar/python/2.7.10/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads return _default_decoder.decode(s) File "/usr/local/Cellar/python/2.7.10/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 366, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/local/Cellar/python/2.7.10/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 384, in raw_decode raise ValueError("No JSON object could be decoded")

@urschrei
Copy link
Owner

I've just pushed v1.2.11 to PyPI, which should allow you to dump HTML snapshots. Note that these are zip archives, so they'll need to be decompressed first. If you use the .dump() method, it'll append .zip to the filename.

Let me know if this works for you – retrieval of arbitrary file attachments is supported for most common types, but I overlooked this use case. Sorry!

@davidswelt
Copy link
Author

OK, this works better. But if dump() does not adhere to the file name given, and if it does not actually return the resulting filename, how is the calling code supposed to know what the file is?

Generally, if "filename" is "foo.html", it would make much more sense to unzip it right there and then in file(). Calling code could check the new "snapshot" variable, but is this meant to be public?

I'm not using dump() by the way because it isn't present in older versions of Pyzotero.

@urschrei
Copy link
Owner

dump() will adhere to a file name if it's given, which implies that the user is familiar with the snapshot format, and that it's compressed (which I now also call out in the docs). As for unzipping automatically:

  • I could unzip the contents in the specified (or working) dir. That's going to cause difficulties, because some snapshot components are generically-named (item.css), so things will be overwritten if you're dumping several snapshots at once.
  • dump the contents into a newly-created folder under the specified or default path. But what to call it? The file returned by the API is always item.html

One possibility is dumping the snapshot contents into folders with the same name as their item key, which is predictable and easy to document.

(the snapshot variable is not meant to be public, and is going away again)

urschrei added a commit that referenced this issue May 17, 2017
This causes tests related to #75 to pass again
@davidswelt
Copy link
Author

davidswelt commented May 18, 2017 via email

@urschrei
Copy link
Owner

It turns out that in the case of snapshots, the fileName property refers to the primary file to be opened inside the archive.
As such, the file name of a snapshot is undefined by the API, although its MIME type should be application/zip (which will be fixed). This should allow consumers of the file method to know what to do with the attachment they retrieve.
As for dump, I think I'm going to default to writing the file with the attachment item key + zip, and I'll probably add an optional extract_snapshot=False keyword, which will extract into a folder of the same name if set to True (in both cases, this only applies in the absence of a user-supplied filename)

@davidswelt
Copy link
Author

As for the Zotero API, once the MIME type is fixed, this makes sense. It's good to know this filename.
For dump(), it's good to be able to expand, but for the client it's more important to know what the name of the generated file is. Consider returning this file name. Right now it returns nothing, and it's a backwards-compatible change.

@urschrei
Copy link
Owner

It's good to know this filename.
for the client it's more important to know what the name of the generated file is. Consider returning this file name.

I don't follow. To which file name are you referring?

@davidswelt
Copy link
Author

dump() decides about the filename for the file it creates (by default). Either with, or without ".zip". The calling code will want to do something with it. In my case, I want to make a URL from it, to link to this file, or maybe I will unzip it. Rather than recreating the logic that is in PyZotero (and which may be updated in the future), I'm suggesting that PyZotero's dump() function returns the file name it has created.

Am I missing something?

@urschrei
Copy link
Owner

I'm suggesting that PyZotero's dump() function returns the file name it has created.

Now I get it. Yes, that's a good approach. I'm going to leave this open until the MIME change has happened and I've landed the change.

urschrei added a commit that referenced this issue May 18, 2017
This change also alters dump() to return the path and file name, see
discussion in #75
urschrei added a commit that referenced this issue Jun 3, 2017
This causes tests related to #75 to pass again
urschrei added a commit that referenced this issue Jun 3, 2017
This change also alters dump() to return the path and file name, see
discussion in #75
urschrei added a commit that referenced this issue Oct 8, 2017
This causes tests related to #75 to pass again
urschrei added a commit that referenced this issue Oct 8, 2017
This change also alters dump() to return the path and file name, see
discussion in #75
urschrei added a commit that referenced this issue Oct 22, 2017
This causes tests related to #75 to pass again
urschrei added a commit that referenced this issue Oct 22, 2017
This change also alters dump() to return the path and file name, see
discussion in #75
urschrei added a commit that referenced this issue Nov 2, 2017
This causes tests related to #75 to pass again
urschrei added a commit that referenced this issue Nov 2, 2017
This change also alters dump() to return the path and file name, see
discussion in #75
urschrei added a commit that referenced this issue Nov 2, 2017
This causes tests related to #75 to pass again
urschrei added a commit that referenced this issue Nov 2, 2017
This change also alters dump() to return the path and file name, see
discussion in #75
urschrei added a commit that referenced this issue Nov 20, 2017
This causes tests related to #75 to pass again
urschrei added a commit that referenced this issue Nov 20, 2017
This change also alters dump() to return the path and file name, see
discussion in #75
urschrei added a commit that referenced this issue Jan 30, 2018
This causes tests related to #75 to pass again
urschrei added a commit that referenced this issue Jan 30, 2018
This change also alters dump() to return the path and file name, see
discussion in #75
urschrei added a commit that referenced this issue Feb 3, 2018
This causes tests related to #75 to pass again
urschrei added a commit that referenced this issue Feb 3, 2018
This change also alters dump() to return the path and file name, see
discussion in #75
urschrei added a commit that referenced this issue May 15, 2018
This causes tests related to #75 to pass again
urschrei added a commit that referenced this issue May 15, 2018
This change also alters dump() to return the path and file name, see
discussion in #75
urschrei added a commit that referenced this issue Jun 16, 2018
This causes tests related to #75 to pass again
urschrei added a commit that referenced this issue Jun 16, 2018
This change also alters dump() to return the path and file name, see
discussion in #75
urschrei added a commit that referenced this issue Jun 17, 2018
This causes tests related to #75 to pass again
urschrei added a commit that referenced this issue Jun 17, 2018
This change also alters dump() to return the path and file name, see
discussion in #75
urschrei added a commit that referenced this issue Aug 14, 2018
This causes tests related to #75 to pass again
urschrei added a commit that referenced this issue Aug 14, 2018
This change also alters dump() to return the path and file name, see
discussion in #75
urschrei added a commit that referenced this issue Oct 21, 2018
This causes tests related to #75 to pass again
urschrei added a commit that referenced this issue Oct 21, 2018
This change also alters dump() to return the path and file name, see
discussion in #75
urschrei added a commit that referenced this issue Nov 30, 2018
This causes tests related to #75 to pass again
urschrei added a commit that referenced this issue Nov 30, 2018
This change also alters dump() to return the path and file name, see
discussion in #75
urschrei added a commit that referenced this issue Nov 30, 2018
This causes tests related to #75 to pass again
urschrei added a commit that referenced this issue Nov 30, 2018
This change also alters dump() to return the path and file name, see
discussion in #75
urschrei added a commit that referenced this issue Dec 31, 2018
This causes tests related to #75 to pass again
urschrei added a commit that referenced this issue Dec 31, 2018
This change also alters dump() to return the path and file name, see
discussion in #75
urschrei added a commit that referenced this issue Jan 1, 2019
This causes tests related to #75 to pass again
urschrei added a commit that referenced this issue Jan 1, 2019
This change also alters dump() to return the path and file name, see
discussion in #75
urschrei added a commit that referenced this issue Jan 3, 2019
This causes tests related to #75 to pass again
urschrei added a commit that referenced this issue Jan 3, 2019
This change also alters dump() to return the path and file name, see
discussion in #75
urschrei added a commit that referenced this issue Apr 27, 2019
This causes tests related to #75 to pass again
urschrei added a commit that referenced this issue Apr 27, 2019
This change also alters dump() to return the path and file name, see
discussion in #75
urschrei added a commit that referenced this issue Jun 9, 2019
This causes tests related to #75 to pass again
urschrei added a commit that referenced this issue Jun 9, 2019
This change also alters dump() to return the path and file name, see
discussion in #75
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants