Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with the docx file after the convert #81

Closed
JoHnTsIm opened this issue May 13, 2021 · 7 comments
Closed

Problem with the docx file after the convert #81

JoHnTsIm opened this issue May 13, 2021 · 7 comments
Assignees
Labels
good issue Good issue question discussion

Comments

@JoHnTsIm
Copy link
Contributor

Hello to the community, im new in the programming. So, thanks in advance, i run the program in pycharm, the Convert starts and seems to work without problems (Parsing Page... -> Creating Page... etc.) then, when i go to the directory that my file was saved, to check, if the conversion worked, i see what is shown in the attach picture (the docx file is shown like pictures, like pieces, not like text) and i was wonder , if you any idea why this happening and if you have any idea how to fix it.
problem

@dothinking dothinking self-assigned this May 13, 2021
@dothinking dothinking added question discussion good issue Good issue labels May 13, 2021
@dothinking
Copy link
Collaborator

Hi, welcome. From the screenshot, I guess the "text" you saw is not real text. Can you copy and paste the text? It'd be great if you can upload the pdf (one page you failed is enough) for my test.

@JoHnTsIm
Copy link
Contributor Author

one_page.pdf

The pdf i want to convert.

one_page.docx

The converted docx.

i hope this helps

@dothinking
Copy link
Collaborator

Sorry one limitation of pdf2docx is that it can process text-based pdf only. You pdf page consists of multi-pieces of images, which would not be ocr-ed, but copied to docx directly. The screenshot below shows the images in pdf.

image

@JoHnTsIm
Copy link
Contributor Author

ok, i will have that in mind, Thanks for your quick reply. i dont know if you have already done user interface, but i have done a basic friendly user interface for your program and here is the code from this.

from pdf2docx import Converter
from tkinter import *
from tkinter.filedialog import *
from tkinter import filedialog

root = Tk()
root.title('PDF_2_Docx Converter')
root.geometry('500x500')
root.config(bg='grey')


def pdf_file_location():
    Tk().withdraw()
    filename = askopenfilename()
    file_path_pdf_entry.insert(0, filename)


def docx_folder_location():
    Tk().withdraw()
    folder_selected = filedialog.askdirectory() + "/" + 'New_DOCX.docx'
    file_path_docx_entry.insert(0, folder_selected)


def convert_button_function():
    cv = Converter(file_path_pdf_entry.get())
    cv.convert(file_path_docx_entry.get(), start=0, end=None)
    cv.close()


"""Labels"""
label1 = Label(text='PDF to Docx', font='Impact 40', bg='white', fg='#1E90FF')
label1.grid(column=2, row=1, sticky='n', pady=50, padx=120)


"""Entries"""

# PDF file entry
file_path_pdf_entry = Entry(border=5)
file_path_pdf_entry.grid(ipadx=90, ipady=4, padx=20, sticky='nw', column=2, pady=1, row=2)

# Docx file entry
file_path_docx_entry = Entry(border=5)
file_path_docx_entry.grid(column=2, ipady=4, ipadx=90, padx=20, sticky='nw', pady=70, row=3)

"""Buttons"""

# Convert Button
converter_button = Button(text='Convert', bg='#1E90FF', fg='white', font='impact 20', border=5,
                          command=convert_button_function)
converter_button.grid(padx=175, sticky='s', ipady=5, ipadx=10, column=2, row=4)

select_pdf_file = Button(text='Select PDF file', fg='black', bg='white', border=3,
                         command=pdf_file_location)
select_pdf_file.grid(column=2, sticky='ne', row=2, pady=6, padx=60)

select_new_file_folder = Button(text='Select new file folder', fg='black', bg='white', border=3,
                                command=docx_folder_location)
select_new_file_folder.grid(column=2, sticky='ne', row=3, pady=74, padx=26)


root.mainloop()

@dothinking
Copy link
Collaborator

Much appreciated. It's a good idea -> I'll put GUI into the backlog.

Would you like to make a bit more improvement, e.g. convert multi-pdf files under a user defined folder in a batch mode. After that, please submit a PR, so I can merge you work to this library to benefit more people.

@JoHnTsIm
Copy link
Contributor Author

JoHnTsIm commented May 15, 2021

batch mode you mean, to save as batch file and run it? I can do it windows exe. what do you prefer?

@dothinking
Copy link
Collaborator

With your user interface, one can convert one file per time. But one might need to convert lots of pdf files, in such case, it's more convenient to put all pdf files in a folder, select that folder and convert them all per one go.

@dothinking dothinking linked a pull request May 16, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good issue Good issue question discussion
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants