Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Row/column of current element / error? #26

Closed
MortenMacFly opened this issue May 18, 2012 · 18 comments
Closed

Row/column of current element / error? #26

MortenMacFly opened this issue May 18, 2012 · 18 comments

Comments

@MortenMacFly
Copy link
Contributor

In TinyXML1 it was possible to query the row/column of the currrent element (e.g. in case of an error). This was very nice to show a descriptive error message so the user was able to pinpoint the error easily. Even it was possible to open an editor and jumping to that element.

I see that in TinyXML2 the Row() and Column() methods are gone. Will the return or is there an alternative / better way to obtain that information?

Note that the error must not be an XML error, but may come from a user-defined validation. So the element might be XML correct, but the content may be wrong for the application.

@leethomason
Copy link
Owner

The tricky bit is that TinyXML-2 doesn't parse the row/column on the initial parse pass. This is done for performance reasons. In the case of an error, the code could flush the parsing and figure out the current row/col, but it's a reasonably involved change. I haven't really investigated the value of row/col tracking vs. code complexity is a good tradeoff, but I obviously have concerns it isn't.

@MortenMacFly
Copy link
Contributor Author

Thank you for the fast (!) answer.

I see your point, but I think its a feature I am really missing. In one of my projects I use this very intensively. The XML files processed by the tool based on TinyXML(2) are often with errors, e.g. that an enumeration is not fulfilled. I was able to pinpoint the exact location of that error easily, so users of the tool did know exactly where the loading went wrong. Now, all I can tell is that an element is wrong. But if that element occurs several hundred times in a XML file then this is of no help. ;-)

Maybe I am the only one, but seeing this features has gone is sad. :-(

@leethomason
Copy link
Owner

Good feedback; no promises on how/when it can be done, but knowing it makes a difference certainly ups the priority. I'll take a look and see how it might be done on the current system.

@MortenMacFly
Copy link
Contributor Author

Sounds good to me... nice to see that you understand my use-case. Just take your time, I'll be patiently watching GIT. ;-)

And nevertheless: Keep up the great work, I am using this library since ages as is really fulfils all my needs and is... erm... just tiny. ;-)

@avkumar
Copy link

avkumar commented Jul 9, 2012

Hey Leethomason!! I would be glad to take up this bug ... Thanks in advance...

@leethomason
Copy link
Owner

avkumar - I'd be happy to see a pull request to address this, certainly! Just be careful to not cause the strings to be parsed/null terminated until an error occurs.

@MortenMacFly
Copy link
Contributor Author

I stumbled across this again - @avkumar: Did you tried already (did you actually pull?!)?

@reFX-Mike
Copy link

@MortenMacFly: why don't you stick to tinyXML-1 then?

@MortenMacFly
Copy link
Contributor Author

Well I do. Here, I wanted to point out that IMHO this is an important feature that has gone. Nothing else.

@leethomason
Copy link
Owner

It's a valid request, certainly, just tricky to implement. I also think there are a lot of advantages of using TinyXML-2 over -1.

@reFX-Mike
Copy link

If you want to add this, could you make sure to make it optional? Like a pre-processer #define to avoid unnecessary code for people who don't need it? Makes the compiled code faster and shorter.

I would even like something like that to leave out the printXML functionality, because in our case we only need to read and parse XML, not be able to modify or write it. Should I open another issue for that?

@leethomason
Copy link
Owner

I appreciate the desire to minimize the code, but optional features are intentionally avoided. Each #if through the code doubles the testing space. (Although OS/platform #if states are unavoidable.) That quickly becomes a problem: bugs can creep into one path but not the other, and everyone who shares/discusses the code has to clarify which path they are using. Bug inconsistency on Windows vs. Android vs. *Nix has already bit me a few times on TinyXML-2, which only has the platform #if, and the feature #if is an issue that plagues TinyXML-1.

@reFX-Mike
Copy link

Fair enough. Let's hope it doesn't add too much bloat...

@DavidEGrayson
Copy link

I think that error reporting is a very important thing, and unfortunately this library is getting it wrong. There might be an easy solution though:

TinyXML2's XMLDocument class currently has GetErrorStr1 and GetErrorStr2, which always seem to return either NULL or pointers to a location into its internal _charBuffer. It should be easy to add methods like GetErrorOffset1 and GetErrorOffset2 that return the offset of each of those strings inside the char buffer. In C++, subtracting two pointers to the same array gives you an offset.

The offset isn't as useful as a row and column number, but good text editors do provide ways to seek to an offset in a file. Additionally, TinyXML2 could provide a method that takes a offset and gives you the row and column number.

In fact, for any part of the XML document, TinyXML2 should provide ways to get a const char * pointer to it, along with a byte offset, a character offset, and row and column numbers. This would help the user to create their own error messages for higher-level errors that are not detected by TinyXML2 (e.g. elements being out of order).

If GetErrorStr1 and GetErrorStr2 are always guaranteed to be NULL or point to a place inside the char buffer, then adding this feature causes no performance penalty. If they sometimes point elsewhere, there would be a small performance penalty because you would need to add a bool that records whether they are pointing into the _charBuffer or not; I believe if they are not pointing into the char buffer, the C++ standard won't guarantee the results from pointer arithmetic on them.

@kezenator
Copy link
Contributor

I've added line number (row) support in a fork:
https://github.com/kezenator/tinyxml2

Errors report a line number, and every parsed node and attribute also has the line number stored.
Also added additional tests.

Also created pull request #503

@kezenator
Copy link
Contributor

The approach I tool was to pass a current line number through the ParseDeep and associated functions.
SkipWhitespace and ParseText have been updated to increment the line when the see a '\n'.
Also, Identify has to roll-back the line number when it detects text content starting with whitespace.

The dream.xml parse test changed from about 1.2ms to about 1.4ms on my PC (i7-6770HQ + VisualStudio 2015 Community Update 3)

@MortenMacFly
Copy link
Contributor Author

MortenMacFly commented Nov 27, 2016

If I may quote leethomason:

The way TinyXML-2 does parsing makes this substantially more difficult to implement;

I'd say with this minimal overhead I personally see more advantages than disadvantages.
Hopefully leethomason agrees and merges this into the main repo.
Than you very much, kezenator, very nice work...

@leethomason
Copy link
Owner

Was fixed some time ago; failed to close issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants