Support extracting end-of-line comments? #27

nickrobinson251 · 2021-10-05T14:41:02Z

Suppose we have a line like:

PowerFlowData.jl/test/testfiles/synthetic_data_v30.raw

    
           111,'STBC      ',161.00,1,    0.00,    0.00,227,   1,1.09814,  -8.327,  1 /* [STBC   1   ] */

can we support the user wanting to extract the end-of-line comment "[STBC 1 ]" or even "STBC 1"

The text was updated successfully, but these errors were encountered:

nickrobinson251 · 2022-02-09T18:03:45Z

this may require reverting back to the tactic we were using before #28

nickrobinson251 · 2022-02-22T11:12:31Z

@raphaelsaavedra discovered these actually come in two ways (from different sources):

the "trailing characters" case, given above, which i presumed was some kind of end-of-line comment (with /* as some kind of comment marker)

 111,'STBC      ',161.00,1,    0.00,    0.00,227,   1,1.09814,  -8.327,  1 /* [STBC   1   ] */

and
2. the "extra column" case (note the final , separator):

 111,'STBC      ',161.00,1,    0.00,    0.00,227,   1,1.09814,  -8.327,  1, /* [STBC   1   ] */

And we may need to support both.

Fortunately, i think we can support both.

we probably need to add to all Records an extra Union{Missing,String} column (maybe this String type could be String31 or something? or even be detected as part of parsing e.g. the smallest string type possible, like CSV.jl does)
we might want the ability to opt-in to parsing them (i.e. returning them as part of the parsed data) e.g. a comments=true keyword... which we could either default to false or default to "true if present" and then do some auto-detecting on whether or not there are comments present (e.g. by checking the first line of the Buses data).
For the "comments" case, we need to handle hitting an invalid delimiter...
- currently we rely on Parsers.jl to parse the correct value from the last colum (e.g. 1) but consume up to the newline character (e.g. 1 /* [STBC 1 ] */) and set the code to OK | INVALID_DELIMITER | ... and then we just use the parsed value (1) and ignore the invalid delim (as if it is expected)
- instead we'd need to go back to treating the last column of each line differently, by passing the comment character as the delim
- potentially this could depend on a comments::Bool keyword
For the "extra column" case, i guess we'd need to follow all the current last columns by a _parse_maybemissing call
- potentially this could depend on a comments::Bool keyword
All of this is slightly complicated further when records are not a single line (e.g. Transformers, Multi-Terminal DC lines, etc)

raphaelsaavedra · 2022-02-22T11:19:34Z

Just guessing out here since I know very little about how this package is structured, but wouldn't it be a good idea to make it so that both cases can be addressed in the same way? e.g. by doing something like "if we see there's a comment at the end of the line, split it out to a new column", which makes case 1 become the same as case 2.

nickrobinson251 · 2022-02-22T11:35:58Z

the difference is in what Parsers.jl sees

basically how parsing works is that the file is a big vector of bytes (a Vector{UInt8}) and we go "byte by byte" through it (well, Parsers.jl does).

We tell Parsers.jl:

(i) how to split the file up into bytes which go together (e.g. that "bytes which go together" are separated by the delimiter , i.e. 0x2c) which is done via the Parsers.Options, and
(ii) what type those bytes should be parsed into (e.g. [0x31, 0x32, 0x33] should be an Int64) which is given by the field type for that column (i.e. we hardcode the column-types in dedicated structs, e.g. Loads, then pass this info to Parsers.jl)

then Parsers.xparse does the heavy-lifting (here's the main parsing code, which is all just "use xparse and handle what it gives us e.g. check it worked and store thee returned value)

Anyway, all of this is to say, that Parsers will see the two cases differently, because in the first case 1 /* [STBC 1 ] */ won't be split correctly into "bytes which go together" unless we tell it how to (i.e. if we say "',' is the delimiter between bytes which go together" then this won't be split up as we need it to bein step (i)), in contrast 1, /* [STBC 1 ] */ will be split up fine with the current code... but we'd still need to add an extra String column to the structs for step (ii)

nickrobinson251 added the improvement improvement to an existing feature label Oct 5, 2021

nickrobinson251 added new feature and removed improvement improvement to an existing feature labels Oct 26, 2021

nickrobinson251 self-assigned this Jun 6, 2022

nickrobinson251 mentioned this issue Jun 10, 2022

Support Strings (multiple chars) as 'quotechars' JuliaData/Parsers.jl#117

Closed

This was referenced Jul 14, 2022

Parse trailing comments from buses data in v30 files #78

Draft

Public interface for manually constructing Records? #79

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support extracting end-of-line comments? #27

Support extracting end-of-line comments? #27

nickrobinson251 commented Oct 5, 2021 •

edited

nickrobinson251 commented Feb 9, 2022

nickrobinson251 commented Feb 22, 2022

raphaelsaavedra commented Feb 22, 2022

nickrobinson251 commented Feb 22, 2022 •

edited

Support extracting end-of-line comments? #27

Support extracting end-of-line comments? #27

Comments

nickrobinson251 commented Oct 5, 2021 • edited

nickrobinson251 commented Feb 9, 2022

nickrobinson251 commented Feb 22, 2022

raphaelsaavedra commented Feb 22, 2022

nickrobinson251 commented Feb 22, 2022 • edited

nickrobinson251 commented Oct 5, 2021 •

edited

nickrobinson251 commented Feb 22, 2022 •

edited