New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iptc.Envelope.CharacterSet not reported correctly #1003
Comments
@tester0077 Arnold. I'd like to help here, however I simply don't know what this is about. And that's about what I said 7 years ago on: https://dev.exiv2.org/boards/3/topics/1288 Here's what I see. There are 3 bytes of metadata for Iptc.Envelope.CharacterSet and they are '' '%' 'G'
There's no question that those bytes are in the file:
I can't remember what a UTF-8 Enabled DOS console is, however I think you're saying that the DOS console does not present the data correctly. Isn't that an issue with the DOS Console? I don't understand the standard you have referenced. As with the previous discussion of this matter, I don't know what is being discussed or what is expected. I believe Exiv2 is copying the data correctly from the file. Could you provide a suitable test file and relevant output from ExifTool? |
DOS Console: the default console uses a code page which does not handle UTF-8 characters. This is important because without UTF-8 support, running exiv2 from the command line shows no output at all for the command I quoted run against the image in question. One can set up a UTF-8 enabled console via a batch file to handle the output from exiv2, which then does show the output as shown in your example. Unfortunately, my example output is mangled in the Github window and I did not catch that the first time around. Output for Iptc.Envelope.CharacterSet is 'correct' in as far as it goes, but it ought to be translated to identify the intended char set as per the ISO standard I referenced. It is a very strange mapping of a very unusual sequence of characters to a specific character set and for many images in my case, it does matter, although I can get away with ignoring it and defaulting to UTF-8. The Exiv2 IPTC page @ https://www.exiv2.org/iptc.html identifies it as a string of 'control functions' The output from Exiftool for the relevant section - Exiftool seems to output the data in an order I don't understand - but it does translate the control sequence to what seems to be the correct equivalent and FWIW, it (and utilities which use it as a base) is the only one I have found to do so, with the exception of the output from the dumpfile utility which is/was part of the Adobe XMP SDK distribution Dumpfile output: It also took me a good bit of time to understand what I was looking for, where I might find a reference and even how to read this reference.
and applying the necessary translation, we get The same page identifies a number of other control sequences, but labels several of these as 'deprecated'. And finally: From the Exiftool pages @ https://sno.phy.queensu.ca/~phil/exiftool/TagNames/IPTC.html#EnvelopeRecord 90 | CodedCharacterSet | string[0,32]! | (values are entered in the form "ESC X Y[, ...]". The escape sequence for UTF-8 character coding is "ESC % G", but this is displayed as "UTF8" for convenience. Either string may be used when writing. The value of this tag affects the decoding of string values in the Application and NewsPhoto records. This tag is marked as "unsafe" to prevent it from being copied by default in a group operation because existing tags in the destination image may use a different encoding. When creating a new IPTC record from scratch, it is suggested that this be set to "UTF8" if special characters are a possibility) |
Arnold. There's lots of detail here. However I don't know what you want me to change. Am I to change:
to output Exiv2 is a library and we are read the data from file and give it to the caller. If it needs special treatment to display correctly in a DOS box, a GUI or any presentation layer, I'm not convinced that is our responsibility. For that matter, I'm not sure that Exiv2 can even determine that it is outputting to a DOS box. For sure, the library does not know. I suppose the exiv2(.exe) command-line program might be able to detect this. However the purpose of exiv2.exe is to act as a test harness for the library. |
The short answer: yes, I would expect Exiv2 to translate the 'control sequence to what it is meant to convey to the user of Exiv2 or the library. Just as it 'translates' the labels, camera model, lens type etc, etc to something meaningful (to an English speaker :-) ) The reference to the DOS box was necessary - albeit (potentially) distracting - because Exiv2, Exiftool and even dumpfile are command line tools and as such depend to some degree on the idiosyncrasies of the 'terminal emulator' they are having to report to. |
@tester0077 I'm going to close this. You're requesting something for which we have no specification. You may find somebody else willing to undertake this, however I will not. |
Well ,,,, shrug ... your call. FWIW, this issue must have been considered in the past by Exiv2 developers.
It just doesn't seem to be used and I cannot afford to dig in deep enough to become conversant enough with the Exiv2 code just to figure out why this bit isn't used where and how it should have been. |
I apologise for saying "No" in such an abrupt manner. It's possible that the code exists. Andreas (the founder of Exiv2) included the iconv library. Team Exiv2 currently have two objectives:
I know you intend well by raising these issues. However, please remember we are a small team of volunteers. For v0.27.3, I'm working on matters which you have raised such as taglist, and README-SAMPLES.md. You are respected. However we cannot fix every concern you raise. |
Understood & no problem, Robin. |
After getting the IPTC data sorted out, I would like to record here my implementation.
Some of this code depends on wxWidgets - but can easily be adapted to other frameworks |
Thanks for this, Arnold. As Exiv2 usually builds/links iconv, I believe we already have character set conversion alternatives to wx and avoids a build dependency between Exiv2 and (the huge) wx library. I think wstr can deal with this on MSVC/Windows. I don't want to get involved with this. Another member of the team make undertake this challenge, however my priorities are the 0.27 "dots" and working on the book/work-shop next year in Rennes. I feel that the centre of exiv2 is reading/writing/modifying metadata and I wish the library had never become involved in lens recognition, data convertors and other "data presentation/interpretation" matters. |
I’d like to get Exiv2 v0.27.3 released (it was due on 2019-09-30). I’ve assigned you to review a couple of PRs. Dan’s busy moving house. When I last spoke to Luis (about 2 weeks ago), he said “hope to have more time for open-source later this year”. Can you “review and approve” your PRs. I’d really like Dan and/or Luis to do this as they always think of something smart and clever. However, it’s more important to get this stuff released.
Can you send me your email address and I will invite you to join the Team Exiv2 Chat Server on Riot/Matrix. That’s how the team discusses stuff “off-line”. It will take up very little of your time, however you can speak directly to the Team or 2 a team member. You’ll find this useful. You won’t find it intrusive. |
exiv2-0.27.2 does not decode the data of the IPTC tag 0x005a correctly
For an image which contains the tag, the output from Exiv2 is
0x005a Envelope Iptc.Envelope.CharacterSet CharacterSet Character Set �%G <<<<<< should be translated to UTF-8
I am attaching an image which shows the output, but there was a discussion some time ago @ https://dev.exiv2.org/boards/3/topics/1288 where Robin showed the output from one of his images
C:\Users\rmills\Desktop>exiv2 -pi robin.jpg
Iptc.Envelope.ModelVersion Short 1 4
Iptc.Envelope.CharacterSet String 3 ←%G
In my case the output was from a command line such as:
D:>D:\pkg\C++\MSVC2017\exiv2-master-0.27.2\exiv2-0.27.2-Source\build32ReleaseStatic\bin\exiv2.exe -PIXxgklnt D:\wxIctest\Media\headst\indi\PerdueAnnElizabeth-GraveMarker-FAG-62151533_132727369452.jpg
Running under Windows 10,
With some searching, I have been able to track down the following bits and pieces
The output comes from code in Exiv2 code actions.cpp somewhere between line #662 & 758 or there about
The above output was taken from a UTF-8 enabled Win 10 DOS console window; without the UTF-8 facility the string �%G does not show.
Initially I found the issue in my test app, which is able to handle UTF-8 strings.
It seems this is a rather obscure part of the standards, but I have been able to find a reference in the ISO/IEC 10646 standard of 2017/12 on page 19
obtainable from: https://standards.iso.org/ittf/PubliclyAvailableStandards/c069119_ISO_IEC_10646_2017.zip
12.2 Identification of a UCS encoding scheme
When the escape sequences from ISO/IEC 2022 are used, the identification of a UCS encoding scheme (see
Clause 10) specified by this International Standard shall be by a designation sequence chosen from the following
list:
ESC 02/05 02/15 04/09
UTF-8 encoding form; UTF-8 encoding scheme
ESC 02/05 02/15 04/12
UTF-16 encoding form; UTF-16BE encoding scheme
ESC 02/05 02/15 04/06
UTF-32 encoding form; UTF-32BE encoding scheme
NOTE – The following designation sequences: ESC 02/05 02/15 04/00, ESC 02/05 02/15 04/01, ESC 02/05 02/15 04/03, ESC 02/05
02/15 04/04, ESC 02/05 02/15 04/07, ESC 02/05 02/15 04/08, ESC 02/05 02/15 04/10, ESC 02/05 02/15 04/11 used in previous
versions of this standard to identify implementation levels 1 and 2 are deprecated. The remaining designation sequences correspond
to the former level 3 which is now the only supported content definition for code unit sequences.
ESC 02/05 04/07
UTF-8 encoding form; UTF-8 encoding scheme
If such an escape sequence appears within a code unit sequence conforming to ISO/IEC 2022, it shall consist
only of the sequences of bit combinations as shown above.
If such an escape sequence appears within a code unit sequence conforming to this International Standard, it
shall be padded in accordance with Clause 11 when the identified encoding form is either UTF-16 or UTF-32.
No padding is necessary when the identified encoding form is UTF-8. See also 12.5.
There are references to this 'conversion/translation in convert.cpp at about line 1174, though in my limited testing this code was not accessed.
Still, I am not at all familiar with the workings of the Exiv2 code that deep in the bowels of the libraries and so I have shelved this issue at my end for now. FWIW, Exiftool does identify this 'code' correctly.
In the image I am attaching, there is no text which would require knowledge of the character set, but some other images in the same series definitely contain UTF-8 encoded strings, so that knowing the character set becomes important.
AFAIK, all this data was added by XnViewMP, which seems to stick with this character set by default, I would expect that there will be images about with different character sets, though I have not come across any others.
The text was updated successfully, but these errors were encountered: