New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compression error -2 #54
Comments
Thanks for providing this case. Lots of vector graphics, i.e. Besides, two more issues to convert this pdf:
So, unfortunately,
|
Thanks @dothinking for the clear explanation. I'm surprised this library isn't more popular than it is. The current version is already very good and I know a lot of people can benefit from it. Please let me know how I can help to resolve any of the issues you listed (I will need some guidance.) Whether resolving the bugs, testing, or otherwise. |
Thanks a lot @echan00. Some progress on this issue:
After that, any test or suggestions are appreciated. Comment on 2020-12-31: the latest PyMuPDF 1.18.5 solved this issue partly, but not perfectly, especially clipping path. |
Since inline image is supported in
xml structure results:
So, the idea is to create
|
Seems that floating picture with # -*- coding: utf-8 -*-
'''
Implement floating image based on python-docx.
- Text wrapping style: BEHIND TEXT <wp:anchor behindDoc="1">
- Picture position: top-left corner of PAGE `<wp:positionH relativeFrom="page">`.
Create a docx sample (Layout | Positions | More Layout Options) and explore the
source xml (Open as a zip | word | document.xml) to implement other text wrapping
styles and position modes per `CT_Anchor._anchor_xml()`.
'''
from docx.oxml import parse_xml, register_element_cls
from docx.oxml.ns import nsdecls
from docx.oxml.shape import CT_Picture
from docx.oxml.xmlchemy import BaseOxmlElement, OneAndOnlyOne
# refer to docx.oxml.shape.CT_Inline
class CT_Anchor(BaseOxmlElement):
"""
``<w:anchor>`` element, container for a floating image.
"""
extent = OneAndOnlyOne('wp:extent')
docPr = OneAndOnlyOne('wp:docPr')
graphic = OneAndOnlyOne('a:graphic')
@classmethod
def new(cls, cx, cy, shape_id, pic, pos_x, pos_y):
"""
Return a new ``<wp:anchor>`` element populated with the values passed
as parameters.
"""
anchor = parse_xml(cls._anchor_xml(pos_x, pos_y))
anchor.extent.cx = cx
anchor.extent.cy = cy
anchor.docPr.id = shape_id
anchor.docPr.name = 'Picture %d' % shape_id
anchor.graphic.graphicData.uri = (
'http://schemas.openxmlformats.org/drawingml/2006/picture'
)
anchor.graphic.graphicData._insert_pic(pic)
return anchor
@classmethod
def new_pic_anchor(cls, shape_id, rId, filename, cx, cy, pos_x, pos_y):
"""
Return a new `wp:anchor` element containing the `pic:pic` element
specified by the argument values.
"""
pic_id = 0 # Word doesn't seem to use this, but does not omit it
pic = CT_Picture.new(pic_id, filename, rId, cx, cy)
anchor = cls.new(cx, cy, shape_id, pic, pos_x, pos_y)
anchor.graphic.graphicData._insert_pic(pic)
return anchor
@classmethod
def _anchor_xml(cls, pos_x, pos_y):
return (
'<wp:anchor distT="0" distB="0" distL="0" distR="0" simplePos="0" relativeHeight="0" \n'
' behindDoc="1" locked="0" layoutInCell="1" allowOverlap="1" \n'
' %s>\n'
' <wp:simplePos x="0" y="0"/>\n'
' <wp:positionH relativeFrom="page">\n'
' <wp:posOffset>%d</wp:posOffset>\n'
' </wp:positionH>\n'
' <wp:positionV relativeFrom="page">\n'
' <wp:posOffset>%d</wp:posOffset>\n'
' </wp:positionV>\n'
' <wp:extent cx="914400" cy="914400"/>\n'
' <wp:wrapNone/>\n'
' <wp:docPr id="666" name="unnamed"/>\n'
' <wp:cNvGraphicFramePr>\n'
' <a:graphicFrameLocks noChangeAspect="1"/>\n'
' </wp:cNvGraphicFramePr>\n'
' <a:graphic>\n'
' <a:graphicData uri="URI not set"/>\n'
' </a:graphic>\n'
'</wp:anchor>' % ( nsdecls('wp', 'a', 'pic', 'r'), int(pos_x), int(pos_y) )
)
# refer to docx.parts.story.BaseStoryPart.new_pic_inline
def new_pic_anchor(part, image_descriptor, width, height, pos_x, pos_y):
"""Return a newly-created `w:anchor` element.
The element contains the image specified by *image_descriptor* and is scaled
based on the values of *width* and *height*.
"""
rId, image = part.get_or_add_image(image_descriptor)
cx, cy = image.scaled_dimensions(width, height)
shape_id, filename = part.next_id, image.filename
return CT_Anchor.new_pic_anchor(shape_id, rId, filename, cx, cy, pos_x, pos_y)
# refer to docx.text.run.add_picture
def add_float_picture(p, image_path_or_stream, width=None, height=None, pos_x=0, pos_y=0):
"""Add float picture at fixed position `pos_x` and `pos_y` to the top-left point of page.
"""
run = p.add_run()
anchor = new_pic_anchor(run.part, image_path_or_stream, width, height, pos_x, pos_y)
run._r.add_drawing(anchor)
# refer to docx.oxml.shape.__init__.py
register_element_cls('wp:anchor', CT_Anchor)
if __name__ == '__main__':
from docx import Document
from docx.shared import Inches, Pt
document = Document()
# add a floating image
p = document.add_paragraph()
add_float_picture(p, 'test.png', width=Inches(5.0), pos_x=Pt(20), pos_y=Pt(30))
# add text
p.add_run('Hello World'*50)
document.save('output.docx') |
Nice @dothinking, it looks like you know what the issues are exactly. I have a variety of PDFs I can help test once you're ready |
@dothinking thank you so much for your code sample! Solves my problem perfectly!!!! |
Didn't get time to this project for so long a time. New version
With this latest version, the sample pdf can be converted successfully, but still need lots of work to improve the quality of converted docx file, due to the complicated/gorgeous style. |
Wow this is a great upgrade. Thanks very much for your hard work @dothinking |
Close for now since this issue itself was resolved. Still need lots of efforts to improve the conversion quality for complicated layouts like this test file. |
Running into an error
compression error -2
. It would be great if anyone is able to provide some pointersAttached the PDF with the issue:
5_EN.pdf
Error message:
The text was updated successfully, but these errors were encountered: