Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure that DTBook XML encoding value is UTF-8 #318

Open
TamJ opened this issue Apr 7, 2015 · 10 comments
Open

Make sure that DTBook XML encoding value is UTF-8 #318

TamJ opened this issue Apr 7, 2015 · 10 comments

Comments

@TamJ
Copy link
Contributor

TamJ commented Apr 7, 2015

It looks like the 'us-ascii' value is now being set as the xml encoding value. Maybe a bug from the latest xslt/xproc updates??

@josteinaj
Copy link
Member

It stores it initially using us-ascii so that all non-ascii characters are hex encoded; then it should set the encoding in the xml declaration to utf-8. Is there any errors or warnings in the logs?

@josteinaj josteinaj added this to the 1.0.1 milestone Apr 7, 2015
@josteinaj josteinaj self-assigned this Apr 7, 2015
@TamJ
Copy link
Contributor Author

TamJ commented Apr 7, 2015

No. So far I can't find anything in the logs. I've tried even to validate a dtbook with 'us-ascii' encoding using the Nordic script and no errors or warnings are shown. So it seems the files are correctly encoded but the us-ascii value still present in the encoding attr.
I'll send over an example epub in a moment ...

@josteinaj
Copy link
Member

ok, thanks.

@josteinaj
Copy link
Member

I'm not able to reproduce this.

@TamJ: Which build of the migrator were you using?

@josteinaj josteinaj modified the milestones: 1.1.1, 1.1.0 May 11, 2015
@EdmarS
Copy link

EdmarS commented Aug 12, 2015

We experience still the same issue in build 314. Al html files in the epub3 are using encoding utf-8, but the generated DTBook output is in us-ascii encoding.

@josteinaj
Copy link
Member

I was not able to reproduce this earlier.

@EdmarS: I will send you login info to our test-server. See if you can reproduce it there.

@josteinaj
Copy link
Member

Ok, we need to determine in what environment and for which books this issue occurs so that we can reproduce it. Here's some more detailed steps to collect information about the environment from a Windows install:

  1. make sure Pipeline 2 is running, and keep it running for the rest of these steps
  2. run a job that fails to set the correct encoding, download the results as well as the detailed log file to your desktop, also; if you don't use default options when running the job, make a note somewhere about what values were used so that it can be reproduced later
  3. go to http://localhost:9000/log and save the log to your desktop
  4. download and run dp2env.bat - I wrote this up today and haven't tested it thoroughly but it works in my Windows 7 VM at least. It will collect and store to a text file the following (shouldn't be anything too sensitive in this info, but don't post it to github, send it to me by e-mail!):
    • locale
    • cpu architecture
    • memory
    • windows version
    • java version
    • a list of all environment variables
    • a list of all running processes (only need the pipeline stuff but there's no grep in windows so all running processes will be included)
    • user permissions for the Pipeline 2 installation folder and application data folder
  5. you should now have three log files, the input file as well as the output file on your desktop, attach those to an e-mail addressed to me (or if the files are too big we'll find another way)
  6. some other info if relevant (I may have asked this before, but just to be sure...):
    • does it happen to all books or just some books in particular?
      • any idea what the difference can be with those books that do not work?
      • do they come from a particular supplier?
      • are they produced in a different way than the other books?
      • do they contain any special content that would distinguish them from other books?
      • are the books particularly large, or contain large files?
    • does it happen sporadically or does it always happen to certain books?
    • does it happen only periodically, i.e. some days it works, some days it doesn't?
    • was pipeline 2 installed using the normal windows installer?
    • does a reinstall help?
    • do you have administrator privileges to your computer?
    • was pipeline 2 installed using the same user as the user that is using it? if not; do you know if the user that installed it has administrator privileges?
    • are you running Pipeline 2 as a normal user but with administrator privileges?

@josteinaj
Copy link
Member

Anders (SPSM) was able to provide a log containing an exception. I would still like to get log files from others who experience this problem so that I can compare the environments, but in any case this at least shows us where in the code the problem lies:

2015-08-18 08:04:06,927 [ERROR] com.xmlcalabash.library.DefaultStep - px:set-xml-declaration failed to read from C:\Users\Admin\AppData\Roaming\DAISY Pipeline 2\jobs\3efa03c6-e97b-4f9b-8964-fa39331fd4a2\output\output-dir\X40089A\X40089A.xml
java.nio.file.FileSystemException: C:\Users\Admin\AppData\Roaming\DAISY Pipeline 2\jobs\3efa03c6-e97b-4f9b-8964-fa39331fd4a2\output\output-dir\X40089A\X40089A.xml: The process cannot access the file because it is being used by another process.

    at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) ~[na:1.8.0_31]
    at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) ~[na:1.8.0_31]
    at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) ~[na:1.8.0_31]
    at sun.nio.fs.WindowsFileCopy.move(Unknown Source) ~[na:1.8.0_31]
    at sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) ~[na:1.8.0_31]
    at java.nio.file.Files.move(Unknown Source) ~[na:1.8.0_31]
    at org.daisy.common.xproc.calabash.steps.SetXmlDeclarationProvider$SetXmlDeclaration.setXmlDeclaration(SetXmlDeclarationProvider.java:119) ~[na:na]
    at org.daisy.common.xproc.calabash.steps.SetXmlDeclarationProvider$SetXmlDeclaration.run(SetXmlDeclarationProvider.java:75) ~[na:na]
    at com.xmlcalabash.runtime.XAtomicStep.run(Unknown Source) ~[na:na]
    at com.xmlcalabash.runtime.XCompoundStep.run(Unknown Source) ~[na:na]
    at com.xmlcalabash.runtime.XChoose.run(Unknown Source) ~[na:na]
    at com.xmlcalabash.runtime.XPipeline.doRun(Unknown Source) ~[na:na]
    at com.xmlcalabash.runtime.XPipeline.run(Unknown Source) ~[na:na]
    at com.xmlcalabash.runtime.XPipelineCall.run(Unknown Source) ~[na:na]
    at com.xmlcalabash.runtime.XViewport.processStartElement(Unknown Source) ~[na:na]
    at com.xmlcalabash.util.ProcessMatch.traverse(Unknown Source) ~[na:na]
    at com.xmlcalabash.util.ProcessMatch.traverse(Unknown Source) ~[na:na]
    at com.xmlcalabash.util.ProcessMatch.traverse(Unknown Source) ~[na:na]
    at com.xmlcalabash.util.ProcessMatch.match(Unknown Source) ~[na:na]
    at com.xmlcalabash.runtime.XViewport.run(Unknown Source) ~[na:na]
    at com.xmlcalabash.runtime.XCompoundStep.run(Unknown Source) ~[na:na]
    at com.xmlcalabash.runtime.XChoose.run(Unknown Source) ~[na:na]
    at com.xmlcalabash.runtime.XPipeline.doRun(Unknown Source) ~[na:na]
    at com.xmlcalabash.runtime.XPipeline.run(Unknown Source) ~[na:na]
    at com.xmlcalabash.runtime.XPipelineCall.run(Unknown Source) ~[na:na]
    at com.xmlcalabash.runtime.XPipeline.doRun(Unknown Source) ~[na:na]
    at com.xmlcalabash.runtime.XPipeline.run(Unknown Source) ~[na:na]
    at org.daisy.common.xproc.calabash.impl.CalabashXProcPipeline.run(CalabashXProcPipeline.java:242) ~[na:na]
    at org.daisy.pipeline.job.Job.run(Job.java:216) ~[na:na]
    at org.daisy.pipeline.job.impl.DefaultJobExecutionService$1.run(DefaultJobExecutionService.java:110) ~[na:na]
    at java.lang.Thread.run(Unknown Source) ~[na:1.8.0_31]

So it attempts to change the encoding but fails because the file is already in use. It is unclear why this happens, so more debugging info (including answers to the questions I asked in the debugging instructions) is much appreciated.

josteinaj added a commit to daisy/pipeline-modules-common that referenced this issue Feb 12, 2016
- enabled xprocspec tests in common-utils, file-utils, fileset-utils and zip-utils
- added debug messages in Java to SetDoctype and SetXmlDeclaration
- in SetDoctype and SetXmlDeclaration, make sure the new file is closed before moving it (might help with nlbdev/nordic-epub3-dtbook-migrator#318)
- ability to force usage of XProc/XSLT implementation over Java implementation in px:copy-resource, px:file-peek, px:file-xml-peek, px:set-doctype and px:set-xml-declaration
- added test for both Java and XProc/XSLT implementation of px:set-doctype
- fixed xprocspec test for px:file-xml-peek
- moved unzip-fileset, and its tests, from zip-utils to fileset-utils to avoid a circular dependency between the two modules
  - also cleaned up some of the related filenames and step names
  - ...such as: px:unzip-fileset are now called px:fileset-unzip and are available from fileset-utils instead of zip-utils
  - bumped minor version of fileset-utils since it now has new features
josteinaj added a commit to daisy/pipeline-scripts-utils that referenced this issue Mar 30, 2016
- enabled xprocspec tests in common-utils, file-utils, fileset-utils and zip-utils
- added debug messages in Java to SetDoctype and SetXmlDeclaration
- in SetDoctype and SetXmlDeclaration, make sure the new file is closed before moving it (might help with nlbdev/nordic-epub3-dtbook-migrator#318)
- ability to force usage of XProc/XSLT implementation over Java implementation in px:copy-resource, px:file-peek, px:file-xml-peek, px:set-doctype and px:set-xml-declaration
- added test for both Java and XProc/XSLT implementation of px:set-doctype
- fixed xprocspec test for px:file-xml-peek
- moved unzip-fileset, and its tests, from zip-utils to fileset-utils to avoid a circular dependency between the two modules
  - also cleaned up some of the related filenames and step names
  - ...such as: px:unzip-fileset are now called px:fileset-unzip and are available from fileset-utils instead of zip-utils
  - bumped minor version of fileset-utils since it now has new features
rdeltour pushed a commit to daisy/pipeline-modules-common that referenced this issue Sep 16, 2016
- enabled xprocspec tests in common-utils, file-utils, fileset-utils and zip-utils
- added debug messages in Java to SetDoctype and SetXmlDeclaration
- in SetDoctype and SetXmlDeclaration, make sure the new file is closed before moving it (might help with nlbdev/nordic-epub3-dtbook-migrator#318)
- ability to force usage of XProc/XSLT implementation over Java implementation in px:copy-resource, px:file-peek, px:file-xml-peek, px:set-doctype and px:set-xml-declaration
- added test for both Java and XProc/XSLT implementation of px:set-doctype
- fixed xprocspec test for px:file-xml-peek
- moved unzip-fileset, and its tests, from zip-utils to fileset-utils to avoid a circular dependency between the two modules
  - also cleaned up some of the related filenames and step names
  - ...such as: px:unzip-fileset are now called px:fileset-unzip and are available from fileset-utils instead of zip-utils
  - bumped minor version of fileset-utils since it now has new features
@josteinaj
Copy link
Member

Unfortunately not fixed by v1.2.0. It's still the same exception in the logs as previously reported.

@josteinaj
Copy link
Member

josteinaj commented Sep 15, 2020

Reported again today by Martin (MTM).

We could possibly add a boolean option called for instance "hex-encode-non-ascii-characters", with a default value of true to preserve the current default behavior. By setting it to false, we could store directly using utf-8, and avoid the whole race condition (I think).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants