Error: Spaces make a new line #1171

AlanReviews · 2020-11-05T20:44:32Z

For some documents that contain spaces and text in the first row, the CSV file contains a new line instead of space. For example, if I had the first row with a, "Hello world", and b, use Tabula to capture the first row, I have the CSV file as:

a,"Hello
world",b

I expect the CSV file to be:

a, "Hello world", b

only for when we're testing thumbnail generation on the command line

jonardcaguioa · 2021-06-07T12:56:28Z

Hi. I cant raise an issue so im posting here.

I encounter several issues in parsing lattice and stream tables. is it possible to add param like row or column tolerate like in camelot ? thanks

nharrisanalyst · 2021-08-14T00:13:30Z

Hi I cant raise an issue so I am posting here

Tabula opens to keycloak (I don't even know that is)

jeremybmerrill · 2021-08-14T00:42:36Z

Whatever Keycloak is, it's running on port 8080, blocking Tabula from using that port. Quit Keycloak, then restart Tabula, then Tabula should work. Jeremy B. Merrill Sent from my mobile device

…

On Fri, Aug 13, 2021, 8:13 PM Nathan Harris ***@***.***> wrote: Hi I cant raise an issue so I am posting here Tabula opens to keycloak (I don't even know that is) [image: image] <https://user-images.githubusercontent.com/16217396/129428482-15fa5c1b-5c16-4bf1-aae3-27d620a6e24d.png> — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1171 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAEF3GXSMXWJ6KFCR4OHHVLT4WYLNANCNFSM4TL3EPNA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email> .

dakial · 2021-09-03T12:10:22Z

Hi guys, sorry for posting this here but there is nowhere else to do it. Maybe you should enable discussions on this repo, so that tabula users can help each other.

My post here is about line breaks inside table cells, like this (this is a screen of Tabula output):

Tabula will insert those line breaks in the CSV but that would create chaos when importing to excel as it would create a new line for every line break, break text string indications (between quotes), etc...

Maybe there is a use case to keep those line breaks inside the cell but a "remove line breaks inside cells" flag would be a great feature.

Meanwhile, for my fellow users who are battling this problem, after much searching I've found a solution that would work with both Notepad++ and Sublime. A regex find and replace (taken for a very helpful post at StackOverflow):

_Use Notepad++ regex Find-and-Replace:

Find what:_

(,"[^"]*?)[\r\n]+

Replace with:

$1

(_There is a single space after $1)

Repeatedly click "Replace All" until no more matches are found._

This works.

flywire · 2021-09-03T12:17:07Z

My post here is about line breaks inside table cells

In Excel they are entered as .

undine-su-menulio · 2022-01-08T15:10:41Z

Hey! I can't raise an issue, so am writing here. Could you please tell, what to do with "Java Heap Error", if my pdf is only 50 KB and has only 1 page?

I've already tried multiple times to reinstall both Java and Tabula, restarted the whole computer, cleared the "Local/Temp" folder, cleared ports and changed 8080 to 9999, but nothing happens. OS: Windows 10

The pfd is 100% valid — I have opened it this morning without any trouble but then suddenly it just stopped working. Since then it returns "Sorry, your file upload could not be processed. Please double-check that the file you uploaded is a valid PDF file and try again" — and I cannot see any of my previous files, it just shows "First time using Tabula? Welcome!"

I will be very grateful for any help 🙏

The error is the following:
2022-01-08 17:33:25.054:INFO:oejsh.ContextHandler:main: Started o.e.j.w.WebAppContext@47089e5f{/,file:/C:/Users/User/AppData/Local/Temp/jetty-0.0.0.0-8080-tabula.jar-_-any-6136871881336378357.dir/webapp/,AVAILABLE}{file:/H:/Course/tabula-win-1.2.1/tabula/tabula.jar} 2022-01-08 17:33:25.056:WARN:oejsh.RequestLogHandler:main: !RequestLog 2022-01-08 17:33:25.068:INFO:oejs.ServerConnector:main: Started ServerConnector@2bb0e277{HTTP/1.1}{0.0.0.0:8080} 2022-01-08 17:33:25.069:INFO:oejs.Server:main: Started @8398ms 2022-01-08 17:33:35.840:WARN:oejs.ServletHandler:qtp1706234378-18: Error for /documents java.lang.OutOfMemoryError: Java heap space at org.jruby.util.ByteList.ensure(ByteList.java:341) at org.jruby.util.io.EncodingUtils$1.resize(EncodingUtils.java:1197) at org.jruby.util.io.EncodingUtils.moreOutputBuffer(EncodingUtils.java:1513) at org.jruby.util.io.EncodingUtils.transcodeLoop(EncodingUtils.java:1415) at org.jruby.util.io.EncodingUtils.transcodeLoop(EncodingUtils.java:1312) at org.jruby.util.io.EncodingUtils.strTranscode0(EncodingUtils.java:936) at org.jruby.util.io.EncodingUtils.strTranscode(EncodingUtils.java:857) at org.jruby.util.io.EncodingUtils.strEncode(EncodingUtils.java:829) at org.jruby.RubyString.encode(RubyString.java:5398) at json.ext.Parser.convertEncoding(Parser.java:196) at json.ext.Parser.initialize(Parser.java:175) at json.ext.Parser$INVOKER$i$0$1$initialize.call(Parser$INVOKER$i$0$1$initialize.gen) at org.jruby.internal.runtime.methods.JavaMethod$JavaMethodN.call(JavaMethod.java:725) at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:208) at org.jruby.internal.runtime.methods.JavaMethod$JavaMethodN.call(JavaMethod.java:741) at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:278) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:79) at org.jruby.RubyObject.callInit(RubyObject.java:348) at json.ext.Parser.newInstance(Parser.java:151) at json.ext.Parser$INVOKER$s$0$1$newInstance.call(Parser$INVOKER$s$0$1$newInstance.gen) at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:212) at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:208) at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:338) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:183) at org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:324) at org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:74) at org.jruby.ir.interpreter.InterpreterEngine.interpret(InterpreterEngine.java:84) at org.jruby.internal.runtime.methods.MixedModeIRMethod.INTERPRET_METHOD(MixedModeIRMethod.java:179) at org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:165) at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:200) at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:318) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:155)

john-harrold · 2022-01-08T15:20:49Z

Hey! I can't raise an issue, so am writing here. Could you please tell, what to do with "Java Heap Error", if my pdf is only 50 KB and has only 1 page?

`

Howdy. I don't have any affiliation with this project. I just follow it on github. Can you share your pdf? I can try it out on my computer and see if I can figure out what is going on.

undine-su-menulio · 2022-01-08T15:26:37Z

Hey! I can't raise an issue, so am writing here. Could you please tell, what to do with "Java Heap Error", if my pdf is only 50 KB and has only 1 page?

`

Howdy. I don't have any affiliation with this project. I just follow it on github. Can you share your pdf? I can try it out on my computer and see if I can figure out what is going on.

Thank you! Here

john-harrold · 2022-01-08T16:03:13Z

Hey! I can't raise an issue, so am writing here. Could you please tell, what to do with "Java Heap Error", if my pdf is only 50 KB and has only 1 page?

`
Howdy. I don't have any affiliation with this project. I just follow it on github. Can you share your pdf? I can try it out on my computer and see if I can figure out what is going on.

Thank you! Here

So I was able to extract the tables. Here is the result:
tabula-un1.csv

I'm running on a mac. It's weird but I cannot figure out what version of Java I have installed. Like

java --version

Says it cannot find it, but it's got to be installed somewhere because Tabula is running :). I don't want to muck around with Java and end up breaking it. Hopefully this extraction works for you.

undine-su-menulio · 2022-01-08T16:07:56Z

Hey! I can't raise an issue, so am writing here. Could you please tell, what to do with "Java Heap Error", if my pdf is only 50 KB and has only 1 page?

`
Howdy. I don't have any affiliation with this project. I just follow it on github. Can you share your pdf? I can try it out on my computer and see if I can figure out what is going on.

Thank you! Here

So I was able to extract the tables. Here is the result: tabula-un1.csv

I'm running on a mac. It's weird but I cannot figure out what version of Java I have installed. Like

java --version

Says it cannot find it, but it's got to be installed somewhere because Tabula is running :). I don't want to muck around with Java and end up breaking it. Hopefully this extraction works for you.

Thank you so much for your help! Wish you all the best things in the world and all the blessings! ☀️
Hope one day I will figure out what gets in the way in my case

ginozzzz · 2022-01-22T06:28:53Z

Great job guys! thanks so much!

now... not sure how to raise an issue because some links are broken...

IMPORTING A TABLE with '(' brackets causes the software to omit the entire line.

ictalbaniaco · 2022-08-31T17:58:35Z

Great tool and thank you for your great job!
I just wanted to post an issue that i encountered:
I'm using tabula-win-1.2.1 on win11 and i'm exporting tables from pdf files that contains only tables (same columns and rows all the way down). Each pdf file contains 90-100 pages.
The problem is that it doesn't export more than 3300 rows.
I tested it in 9 different files and the result is the same, no more than 3300 rows.
Thank you again for your job!

OldGuy1949 · 2023-11-19T18:23:37Z

Great tool! We use it to parse PDF files that appear to be the same format to us humans, but drive Tabula nuts in either stream or matrix mode. Culprits include formatting characters used by the PDF creators (different people I assume). We find spaces, tabs and other un-printable characters in the CSV output. The files themselves always present as header, trailer and intermediate details with page headers and trailer in the details.

May I suggest that since we humans recognize the format, allowing us to specify vertical columns (groups) and horizontal rows (elements) might remove reliance on recognizing said groups and elements. Were I a seasoned open-source developer who had time on his hands (I'm neither), I'd look at the code and see where / how this might work.

jeremybmerrill and others added 5 commits April 13, 2017 22:54

tweaks to deal with changes in the pdfbox 2.0 version of tabula-java

b340017

adds PDFBox2 thumbnail renderer; deletes JPedal

33c8588

remove require

89e93a6

remove STDERR.puts

8a10aa2

re-adds require (at teh bottom)

40fc33e

only for when we're testing thumbnail generation on the command line

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error: Spaces make a new line #1171

Error: Spaces make a new line #1171

AlanReviews commented Nov 5, 2020

jonardcaguioa commented Jun 7, 2021

nharrisanalyst commented Aug 14, 2021

jeremybmerrill commented Aug 14, 2021 via email

dakial commented Sep 3, 2021

flywire commented Sep 3, 2021

undine-su-menulio commented Jan 8, 2022

john-harrold commented Jan 8, 2022

undine-su-menulio commented Jan 8, 2022

john-harrold commented Jan 8, 2022

undine-su-menulio commented Jan 8, 2022

ginozzzz commented Jan 22, 2022

ictalbaniaco commented Aug 31, 2022

OldGuy1949 commented Nov 19, 2023

Error: Spaces make a new line #1171

Are you sure you want to change the base?

Error: Spaces make a new line #1171

Conversation

AlanReviews commented Nov 5, 2020

jonardcaguioa commented Jun 7, 2021

nharrisanalyst commented Aug 14, 2021

jeremybmerrill commented Aug 14, 2021 via email

dakial commented Sep 3, 2021

flywire commented Sep 3, 2021

undine-su-menulio commented Jan 8, 2022

john-harrold commented Jan 8, 2022

undine-su-menulio commented Jan 8, 2022

john-harrold commented Jan 8, 2022

undine-su-menulio commented Jan 8, 2022

ginozzzz commented Jan 22, 2022

ictalbaniaco commented Aug 31, 2022

OldGuy1949 commented Nov 19, 2023