Problem with the docx file after the convert #81

JoHnTsIm · 2021-05-13T16:12:32Z

Hello to the community, im new in the programming. So, thanks in advance, i run the program in pycharm, the Convert starts and seems to work without problems (Parsing Page... -> Creating Page... etc.) then, when i go to the directory that my file was saved, to check, if the conversion worked, i see what is shown in the attach picture (the docx file is shown like pictures, like pieces, not like text) and i was wonder , if you any idea why this happening and if you have any idea how to fix it.

dothinking · 2021-05-13T17:53:00Z

Hi, welcome. From the screenshot, I guess the "text" you saw is not real text. Can you copy and paste the text? It'd be great if you can upload the pdf (one page you failed is enough) for my test.

JoHnTsIm · 2021-05-13T18:04:50Z

one_page.pdf

The pdf i want to convert.

one_page.docx

The converted docx.

i hope this helps

dothinking · 2021-05-13T18:13:56Z

Sorry one limitation of pdf2docx is that it can process text-based pdf only. You pdf page consists of multi-pieces of images, which would not be ocr-ed, but copied to docx directly. The screenshot below shows the images in pdf.

JoHnTsIm · 2021-05-13T18:37:41Z

ok, i will have that in mind, Thanks for your quick reply. i dont know if you have already done user interface, but i have done a basic friendly user interface for your program and here is the code from this.

from pdf2docx import Converter
from tkinter import *
from tkinter.filedialog import *
from tkinter import filedialog

root = Tk()
root.title('PDF_2_Docx Converter')
root.geometry('500x500')
root.config(bg='grey')


def pdf_file_location():
    Tk().withdraw()
    filename = askopenfilename()
    file_path_pdf_entry.insert(0, filename)


def docx_folder_location():
    Tk().withdraw()
    folder_selected = filedialog.askdirectory() + "/" + 'New_DOCX.docx'
    file_path_docx_entry.insert(0, folder_selected)


def convert_button_function():
    cv = Converter(file_path_pdf_entry.get())
    cv.convert(file_path_docx_entry.get(), start=0, end=None)
    cv.close()


"""Labels"""
label1 = Label(text='PDF to Docx', font='Impact 40', bg='white', fg='#1E90FF')
label1.grid(column=2, row=1, sticky='n', pady=50, padx=120)


"""Entries"""

# PDF file entry
file_path_pdf_entry = Entry(border=5)
file_path_pdf_entry.grid(ipadx=90, ipady=4, padx=20, sticky='nw', column=2, pady=1, row=2)

# Docx file entry
file_path_docx_entry = Entry(border=5)
file_path_docx_entry.grid(column=2, ipady=4, ipadx=90, padx=20, sticky='nw', pady=70, row=3)

"""Buttons"""

# Convert Button
converter_button = Button(text='Convert', bg='#1E90FF', fg='white', font='impact 20', border=5,
                          command=convert_button_function)
converter_button.grid(padx=175, sticky='s', ipady=5, ipadx=10, column=2, row=4)

select_pdf_file = Button(text='Select PDF file', fg='black', bg='white', border=3,
                         command=pdf_file_location)
select_pdf_file.grid(column=2, sticky='ne', row=2, pady=6, padx=60)

select_new_file_folder = Button(text='Select new file folder', fg='black', bg='white', border=3,
                                command=docx_folder_location)
select_new_file_folder.grid(column=2, sticky='ne', row=3, pady=74, padx=26)


root.mainloop()

dothinking · 2021-05-14T02:58:24Z

Much appreciated. It's a good idea -> I'll put GUI into the backlog.

Would you like to make a bit more improvement, e.g. convert multi-pdf files under a user defined folder in a batch mode. After that, please submit a PR, so I can merge you work to this library to benefit more people.

JoHnTsIm · 2021-05-15T00:22:17Z

batch mode you mean, to save as batch file and run it? I can do it windows exe. what do you prefer?

dothinking · 2021-05-15T00:46:27Z

With your user interface, one can convert one file per time. But one might need to convert lots of pdf files, in such case, it's more convenient to put all pdf files in a folder, select that folder and convert them all per one go.

dothinking self-assigned this May 13, 2021

dothinking added question discussion good issue Good issue labels May 13, 2021

JoHnTsIm closed this as completed May 14, 2021

dothinking linked a pull request May 16, 2021 that will close this issue

User Interface for the program #82

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with the docx file after the convert #81

Problem with the docx file after the convert #81

JoHnTsIm commented May 13, 2021

dothinking commented May 13, 2021

JoHnTsIm commented May 13, 2021

dothinking commented May 13, 2021

JoHnTsIm commented May 13, 2021

dothinking commented May 14, 2021

JoHnTsIm commented May 15, 2021 •

edited

dothinking commented May 15, 2021

Problem with the docx file after the convert #81

Problem with the docx file after the convert #81

Comments

JoHnTsIm commented May 13, 2021

dothinking commented May 13, 2021

JoHnTsIm commented May 13, 2021

The pdf i want to convert.

dothinking commented May 13, 2021

JoHnTsIm commented May 13, 2021

dothinking commented May 14, 2021

JoHnTsIm commented May 15, 2021 • edited

dothinking commented May 15, 2021

JoHnTsIm commented May 15, 2021 •

edited