Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix WFS property format writer for multiline strings #15638

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

aboulan
Copy link

@aboulan aboulan commented Mar 13, 2024

When updating a feature through WFS, if there are any newline
characters in a string properties, these are lost (for example,
geoserver handles these newlines as whitespace).

To fix this, multiline string properties should be encoded as
CDATA block.

Fix #10127

@ahocevar ahocevar changed the title Fix #10127: Fix WFS property format writer for multiline strings Fix WFS property format writer for multiline strings Mar 13, 2024
Copy link

📦 Preview the website for this branch here: https://deploy-preview-15638--ol-site.netlify.app/.

@aboulan aboulan force-pushed the fix/wfs-property-format-multiline-strings branch from 41be1a5 to 6936d9a Compare March 13, 2024 18:05
@@ -938,7 +939,11 @@ function writeProperty(node, pair, objectStack) {
GML32.prototype.writeGeometryElement(value, pair.value, objectStack);
}
} else {
writeStringTextNode(value, pair.value);
if (typeof pair.value === 'string' && pair.value.search('\n') >= 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (typeof pair.value === 'string' && pair.value.search('\n') >= 0) {
if (typeof pair.value === 'string' && pair.value.includes('\n') >= 0) {

The above should work as well, and yield better performance because no regular expression is created.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to remove the >= 0 part as well, @ahocevar, like below?

Suggested change
if (typeof pair.value === 'string' && pair.value.search('\n') >= 0) {
if (typeof pair.value === 'string' && pair.value.includes('\n')) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks @rdewit

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When looking at the issue more closely, I wonder if it wouldn't be better to completely skip the test and always output a CDATA block.
Indeed, it is not only end of lines that are not correctly interpreted in a text node: any white space combination (space, tab, end of line) is interpreted as a single space (as per the html/xml rules), so that for example a string with multiple consecutive spaces is also not correctly handled.
But using CDATA blocks are not the perfect solution either: they cannot contain the end of block sequence ("]]>").
They also cannot contain arbitrary binary data (it must be a valid utf8 string) but this is probably not a problem since WFS property values are supposed to be valid xml text (but I don't know if the WF specs explicitly says something about this)
For simple use cases (multi-line string, which I needed initially), CDATA blocks are working great out of the box with geoserver (and any xml parser), but maybe the correct solution for the general case would be to have an API to let the user handle the serialization himself (base64 ?)
What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following may be the best compromise to catch any white space combination without using CDATA blocks for single words:
[ ' ', '\n', '\t' ].some(c => pair.value.includes(c)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me to always create a CDATA block.

@aboulan aboulan force-pushed the fix/wfs-property-format-multiline-strings branch from 6936d9a to 97fef06 Compare March 14, 2024 14:05
@M393
Copy link
Contributor

M393 commented Mar 14, 2024

If you have text with ]]> end marker included you can write multiple CDATA sections by splitting the end marker.

pair.value.split(']]>').forEach((part, i, a) => {
  if (i < a.length - 1) {
    part += ']]';
  }
  if (i > 0) {
     part = '>' + part;
  }
  writeCDATASection(value, part);
});

@aboulan aboulan force-pushed the fix/wfs-property-format-multiline-strings branch from 97fef06 to 3c13544 Compare March 15, 2024 08:25
Comment on lines 943 to 949
try {
// Generate a CDATA section to preserve whitespaces.
writeCDATASection(value, pair.value);
} catch {
// If value contains the CDATA closing marker ']]>', fallback to a text node.
writeStringTextNode(value, pair.value);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it would work, but how about

Suggested change
try {
// Generate a CDATA section to preserve whitespaces.
writeCDATASection(value, pair.value);
} catch {
// If value contains the CDATA closing marker ']]>', fallback to a text node.
writeStringTextNode(value, pair.value);
}
pair.value.split(']]>').forEach((text) => writeCDataSection(value, text));

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I thought I already made the change following @M393 's remark but failed to push correctly.
Should be good now.

  When updating a feature through WFS, if there are any newline
  characters in a string properties, these are lost (for example,
  geoserver handles these newlines as whitespace).

  To fix this, multiline string properties should be encoded as
  CDATA block.
@aboulan aboulan force-pushed the fix/wfs-property-format-multiline-strings branch from 3c13544 to 5fc0e90 Compare March 21, 2024 12:58
if (i > 0) {
part = '>' + part;
}
writeCDATASection(value, part);
Copy link
Member

@ahocevar ahocevar Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this addition could benefit from a test case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WFS property format writer handling of string values with new lines
4 participants