fix unicsv reader truncation differences on i386/debian #1201

tsteven4 · 2023-10-28T00:13:32Z

with 1.9.0 debian had some test failures on i386.
in the unicsv test with reference unicsv_subsecond.csv
13:20:20.060 was echoed as 13:20:20.059.
13:20:21.600 was echoed as 13:20:21.599.

The same bug also caused a failure in the track test with reference utm_subsecond_track~csv.csv.

These variations aren't serious, but this PR uses an integer arithmetic conversion instead of a floating point arithmetic to avoid any variation in the intentional truncation to milliseconds.

I also see an error in the nmea test with reference nmeadatetime.csv. This is a rounding difference. It is not fixed.
I also see an error in the xcsv test with reference datetime~xcsv.xcsv. This is a Y2038 bug. It is not fixed.

with 1.9.0 debian had some test failures on i386. in the unicsv test with reference unicsv_subsecond.csv 13:20:20.060 was echoed as 13:20:20.059. 13:20:21.600 was echoed as 13:20:21.599. These variations aren't serious, but this PR uses an integer arithmetic conversion instead of a floating point arithmetic to avoid any variation in the intentional truncation to milliseconds.

codacy-production · 2023-10-28T00:19:12Z

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation	Diff coverage
+0.00%	100.00%

Coverage variation details

	Coverable lines	Covered lines	Coverage
Common ancestor commit (`c95e1cf`)	23128	16018	69.26%
Head commit (`e5b3093`)	23130 (+2)	16020 (+2)	69.26% (+0.00%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details

	Coverable lines	Covered lines	Diff coverage
Pull request (#1201)	4	4	100.00%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings Change summary preferences

robertlipe · 2023-10-28T01:06:35Z

unicsv.cc

@@ -297,14 +296,17 @@ UnicsvFormat::unicsv_parse_time(const char* str, QDate& date)
    str += consumed;
    date = ldate;
  }
-  int ct = sscanf(str, "%d%*1[.://]%d%*1[.://]%d%lf", &hour, &min, &sec, &frac_sec);
+  char msecstr[4];
+  int ct = sscanf(str, "%d%*1[.://]%d%*1[.://]%d.%3[0123456789]", &hour, &min, &sec, msecstr);


I know you're tweaking that's there, but this passes testo:

@@ -310,6 +311,11 @@ UnicsvFormat::unicsv_parse_time(const char* str, QDate& date) } QTime result{hour, min, sec, msec}; +#if 1 + QTime qt = QTime::fromString(QString(str).section(" ", 0, 0).trimmed(), Qt::ISODateWithMs); + qDebug() << result << qt; + assert(result == qt); +#endif if (!result.isValid()) { fatal(FatalMsg() << MYNAME << ": Invalid time parsed from string (" << str << ")."); } Which makes this a drop-in replacement for the original: diff --git a/unicsv.cc b/unicsv.cc index 92ca8504..5a24b85f 100644 --- a/unicsv.cc +++ b/unicsv.cc @@ -282,12 +282,14 @@ UnicsvFormat::unicsv_parse_date(const char* str, int* consumed) QTime UnicsvFormat::unicsv_parse_time(const char* str, QDate& date) { +#if 0 int hour; int min; int sec; int msec; - int consumed = 0; double frac_sec; +#endif + int consumed = 0; /* If we have something we're pretty sure is a date, parse that * first, skip over it, and pass that back to the caller) @@ -297,6 +299,8 @@ UnicsvFormat::unicsv_parse_time(const char* str, QDate& date) str += consumed; date = ldate; } +#if 0 + qDebug() << str; int ct = sscanf(str, "%d%*1[.://]%d%*1[.://]%d%lf", &hour, &min, &sec, &frac_sec); if (ct < 3) { fatal(FatalMsg() << MYNAME << ": Could not parse time string (" << str << ")."); @@ -310,6 +314,11 @@ UnicsvFormat::unicsv_parse_time(const char* str, QDate& date) } QTime result{hour, min, sec, msec}; +#else + QTime result; + + result = QTime::fromString(QString(str).trimmed().section(" ", 0, 0), Qt::ISODateWithMs); +#endif if (!result.isValid()) { fatal(FatalMsg() << MYNAME << ": Invalid time parsed from string (" << str << ")."); }

robertlipe · 2023-10-28T01:22:31Z

I am, of course, OK with the proposal if you like it better. I can sweep back with the second form. But I'm 95% sure I've read that Y2038 issues is one of the reasons that 32-bit support in Linux in general is going away - the issues are so deep and there are so few people left that it's predicted to be removed instead of fixed. So I'm not sure I see any needed heroics there.

…

On Fri, Oct 27, 2023 at 7:19 PM codacy-production[bot] < ***@***.***> wrote: Coverage summary from Codacy See diff coverage on Codacy <https://app.codacy.com/gh/GPSBabel/gpsbabel/coverage/pull-requests/1201?utm_source=github.com&utm_medium=coverageSummary&utm_campaign=coverageSummaryPullRequest> Coverage variation Diff coverage *+0.00%* *100.00%* Coverage variation details Coverable lines Covered lines Coverage Common ancestor commit (c95e1cf <c95e1cf> ) 23128 16018 69.26% Head commit (e5b3093 <e5b3093> ) 23130 (+2) 16020 (+2) 69.26% (*+0.00%*) *Coverage variation* is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit> Diff coverage details Coverable lines Covered lines Diff coverage Pull request (#1201 <#1201>) 4 4 *100.00%* *Diff coverage* is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100% See your quality gate settings <https://app.codacy.com/gh/GPSBabel/gpsbabel/settings/quality?utm_source=github.com&utm_medium=coverageSummary&utm_campaign=coverageSummaryGates> Change summary preferences <https://app.codacy.com/gh/GPSBabel/gpsbabel/settings/integrations?utm_source=github.com&utm_medium=coverageSummary&utm_campaign=coverageSummarySettings> — Reply to this email directly, view it on GitHub <#1201 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACCSD3ZC3FOJAVLMJTALAP3YBRFQ3AVCNFSM6AAAAAA6TQOBHCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBTGYZTENJWG4> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

tsteven4 · 2023-10-28T12:01:10Z

It sure is cleaner, but we have been burned with QTime::fromString before, but apparently not from Qt::ISODateWithMs although I thought I had tried that back then.

gpsbabel/nmea.cc

Lines 351 to 355 in ae4ac82

    
           QTime NmeaFormat::nmea_parse_hms(const QString& str) 
        
           { 
        
             // QTime::fromString z expects 1 to 3 digit fractional part of seconds. 
        
             // It specifically does not accept 0 digits or > 3 digits. 
        
             // QTime::fromString zzz expects exactly 3 digits representing milliseconds.

Does section(" ", 0, 0) do anything?

I do see test changes, I added the last one "22,33.784805,-117.474327,"WPT022",2014/09/17,23:59:59.999999999999". So it appears QTime::fromString may be rounding, but is smart enough not to round to an illegal QTime.

Running unicsv.test
--- ./reference/unicsv_subsec~csv.csv   2023-10-28 05:45:52.092839226 -0600
+++ /tmp/gpsbabel.2699/unicsv_subsec.csv        2023-10-28 05:52:58.487210034 -0600
@@ -17,7 +17,8 @@
 16,33.784805,-117.474327,"WPT016",2014/09/17,13:20:23
 17,33.784805,-117.474327,"WPT017",2014/09/17,13:20:23.690
 18,33.784805,-117.474327,"WPT018",2014/09/17,13:20:23.699
-19,33.784805,-117.474327,"WPT019",2014/09/17,13:20:23.699
-20,33.784805,-117.474327,"WPT020",2014/09/17,13:20:23.699
-21,33.784805,-117.474327,"WPT021",2014/09/17,13:20:23.699
-22,33.784805,-117.474327,"WPT022",2014/09/17,13:20:23.699
+19,33.784805,-117.474327,"WPT019",2014/09/17,13:20:23.700
+20,33.784805,-117.474327,"WPT020",2014/09/17,13:20:23.700
+21,33.784805,-117.474327,"WPT021",2014/09/17,13:20:23.700
+22,33.784805,-117.474327,"WPT022",2014/09/17,13:20:23.700
+23,33.784805,-117.474327,"WPT022",2014/09/17,23:59:59.999

tsteven4 · 2023-10-28T12:59:37Z

QTime::fromString is quite complicated, rounding, watching for rounding up to cause an invalid QTime, potentially limiting at 23:59:59.999 or rolling over into the next day.
https://github.com/qt/qtbase/blob/6.2.4/src/corelib/time/qdatetime.cpp#L2182-L2211

tsteven4 · 2023-10-28T13:26:48Z

@robertlipe some questions:

The unicsv date and time parsing came in with e3f54a668, and has a tortured 24 year history of rounding problems.

Isn't the format specifier %1[-.//] the same as %1[-./]? (unicsv_parse_date) We are not escaping a backslash.
Isn't the format specifier %1[.://] the same as %1[.:/]? (unicsv_parse_time) We are not escaping a backslash.
I can imagine a date using a '-', '.', '/' as a separator, e.g. 10.11.2022 or 10-11-2022 or 10/11/2022, so those three separators make some sense. But I have a harder time imaging a time using '.' or '/' as a separator, e.g. 12.10.55 or 12/10/55, wouldn't everyone use 12:10:55 if they used a separator?

tsteven4 · 2023-10-28T13:29:18Z

https://www.ibm.com/docs/en/i/7.3?topic=80-timsep-time-separator-keyword-display-files suggests time separators of colon, period, comma or blank.

tsteven4 · 2023-10-28T13:30:17Z

https://docs.oracle.com/cd/E19455-01/806-0169/overview-6/index.html suggests time separators of a colon or period.

GPSBabelDeveloper · 2023-10-28T22:30:55Z

I certainly won't hard sell it. It was just the reading that regex made my head hurt. Your subsequent read of it made it hurt more. Simplified() was to fix one case with a leading space that failed. Split() was to fix a case passed in with a trailing " AM". I'm fine with the original - or anything else you bake - I was just pointing out this had been a problem for us in the past and that we had the opportunity to replace it with a simpler looking fragments that surely has NEW and exciting problems. I'm fine with taking your change and never thinking about this again.

…

On Sat, Oct 28, 2023, 8:30 AM tsteven4 ***@***.***> wrote: https://docs.oracle.com/cd/E19455-01/806-0169/overview-6/index.html suggests time separators of a colon or period. — Reply to this email directly, view it on GitHub <#1201 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC3VAD3G3GIO73JLOBPQXG3YBUCHLAVCNFSM6AAAAAA6TQOBHCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBTHAYTCMBWG4> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

robertlipe · 2023-10-29T21:55:20Z

I"m not sure about #1 vs #2, but I agree that the // is bogus. For #3, I've never seen periods used, but the original submission (from Olaf, in Germany) said they were used there. Are they common? I have no idea. Quoting the dot may be necessary to stop it from matching any single character.

I was experimenting with letting Bard write them - since I hate writing regexes AND reading them - it produced an overkill solution that's a bit less clever and therefore, IMO, a bit easier to read:
https://g.co/bard/share/3c91a625a6ba
I like that it gives examples of wht it will and will not read. It almost writes our test cases.
This is a bit more on parity with our current parser:
https://g.co/bard/share/95e0b1fc56a6
Even if we don't use that, it clarifies that [.:] (no escaping on dot needed after all) is fine.

But I really don't want to gum this up.

tsteven4 · 2023-10-29T22:07:23Z

Are our wires crossed? How did we get from a format string to a regular expression? Bard is interesting, I have used other online regular expression checkers when I have to go there.

tsteven4 · 2023-10-29T22:08:28Z

After looking at the Qt QDateTime/QDate/QTime::fromString code I don't feel as bad about the amount of thrashing we have done trying to get these conversions correct.

robertlipe · 2023-10-29T22:15:29Z

My wires are just plain wrong. Sorry. When I scanned my mail today, that scanf string locked into my brain as a regex. The syntax is vaguely similar and that's a weapon we use elsewhere, so that sort of made sense to my impaired brain. Agreed that accepting free form, unstructured text is generally horrible. Humans just tend to not communicate in protobufs. :-) RJL

…

On Sun, Oct 29, 2023 at 5:08 PM tsteven4 ***@***.***> wrote: After looking at the Qt QDateTime/QDate/QTime::fromString code I don't feel as bad about the amount of thrashing we have done trying to get these conversions correct. — Reply to this email directly, view it on GitHub <#1201 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACCSD36MRQCNVU3KKNM45C3YB3HWPAVCNFSM6AAAAAA6TQOBHCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBUGIZTSMZUGQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

robertlipe reviewed Oct 28, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix unicsv reader truncation differences on i386/debian #1201

fix unicsv reader truncation differences on i386/debian #1201

tsteven4 commented Oct 28, 2023

codacy-production bot commented Oct 28, 2023

robertlipe Oct 28, 2023 •

edited by tsteven4

robertlipe commented Oct 28, 2023 via email

tsteven4 commented Oct 28, 2023

tsteven4 commented Oct 28, 2023

tsteven4 commented Oct 28, 2023

tsteven4 commented Oct 28, 2023

tsteven4 commented Oct 28, 2023

GPSBabelDeveloper commented Oct 28, 2023 via email

robertlipe commented Oct 29, 2023 •

edited by tsteven4

tsteven4 commented Oct 29, 2023

tsteven4 commented Oct 29, 2023

robertlipe commented Oct 29, 2023 via email

fix unicsv reader truncation differences on i386/debian #1201

Are you sure you want to change the base?

fix unicsv reader truncation differences on i386/debian #1201

Conversation

tsteven4 commented Oct 28, 2023

codacy-production bot commented Oct 28, 2023

Coverage summary from Codacy

See diff coverage on Codacy

See your quality gate settings Change summary preferences

robertlipe Oct 28, 2023 • edited by tsteven4

Choose a reason for hiding this comment

robertlipe commented Oct 28, 2023 via email

tsteven4 commented Oct 28, 2023

tsteven4 commented Oct 28, 2023

tsteven4 commented Oct 28, 2023

tsteven4 commented Oct 28, 2023

tsteven4 commented Oct 28, 2023

GPSBabelDeveloper commented Oct 28, 2023 via email

robertlipe commented Oct 29, 2023 • edited by tsteven4

tsteven4 commented Oct 29, 2023

tsteven4 commented Oct 29, 2023

robertlipe commented Oct 29, 2023 via email

robertlipe Oct 28, 2023 •

edited by tsteven4

robertlipe commented Oct 29, 2023 •

edited by tsteven4