richttext fields set to `text/x-web-markdown` return HTML #1526

me-kell · 2022-10-25T09:24:03Z

The REST API returns the HTML-Version of richttext fields even if the content-type is set to text/x-web-markdown.

I'd expect the stored markdown version (value.raw) to be returned.

We send:

"text": {
  "content-type": "text/x-web-markdown",
  "data": "xyz **bold** normal",
  "encoding": "utf8"
}

We get:

"text": {
  "content-type": "text/x-web-markdown",
  "data": "<p>xyz <strong>bold</strong> normal</p>",
  "encoding": "utf8"
}

To reproduce it:

SITE_URL=https://6-classic.demo.plone.org
USER=manager:****   # <- set manager password here

cat <<EOF | tee config_markup.json
{
  "allowed_types": [
    {"title":"text/html","token": "text/html"},
    {"title": "text/x-web-markdown","token": "text/x-web-markdown"},
    {"title": "text/x-web-textile","token": "text/x-web-textile"}
  ],
  "default_type": {"title":"text/html","token": "text/html"},
  "markdown_extensions":[
    "markdown.extensions.fenced_code",
    "markdown.extensions.footnotes",
    "markdown.extensions.tables"
  ]
}
EOF

curl -i -X PATCH "${SITE_URL}/@controlpanels/markup" \
    -H "Accept: application/json" \
    -H "Content-Type: application/json" \
    --data-binary "@my_document.json" \
    --user ${USER}

cat <<EOF | tee my_document.json
{
  "@type": "Document",
  "title": "MyDocument",
  "text": {
    "content-type": "text/x-web-markdown",
    "data": "xyz **bold** normal",
    "encoding": "utf8"
  }
}
EOF

ID=$(
    curl -s -X POST "${SITE_URL}" \
    -H "Accept: application/json" \
    -H "Content-Type: application/json" \
    --data-binary "@my_document.json" \
    --user ${USER} \
    | jq -r '."@id"' \
)
echo $ID
curl -s -X GET "${ID}" \
    -H "Accept: application/json" \
    -H "Content-Type: application/json" \
    --user ${USER} \
    | jq -r '."text"'

It returns

...
"text": {
  "content-type": "text/x-web-markdown",
  "data": "<p>xyz <strong>bold</strong> normal</p>",
  "encoding": "utf8"
}
...

The text was updated successfully, but these errors were encountered:

jensens · 2022-10-25T09:33:03Z

IMO the bug is either that the content-type is wrong or the data is wrong, because both do not match.

Having a mode to get raw or rendered is kind of feature request then. For editing one want raw, for display maybe rendered.

jensens · 2022-10-25T09:35:24Z

see also https://community.plone.org/t/markdown-or-mdx-from-plone-rest-api

me-kell · 2022-11-20T21:19:42Z

The problem seems to be in the inconsistent handling of RichTextValue in the serializer and deserializer.

plone.app.textfield.value.RichTextValue has the following properties:

raw
encoding
raw_encoded
mimeType
outputMimeType
output
output_relative_to

While the serializer classs plone.restapi.serializer.converters.RichtextDXContextConverter only returns

output_relative_to in key data
mimeType in key content_type
encoding in key encoding

The deserializer class plone.restapi.deserializer.dxfields.RichTextFieldDeserializer sets

raw from key data through html_parser.unescape
mimeType from key content_type
output from self.field.output_mime_type
encoding from key encodding

The properties

raw_encoded and outputMimeType are not handled by the serializing/deserializing process
encoding and mimeType (in the key content-type) are passed transparently by the serializing/deserializing process
The serializer returns output_relative_to in the key data while the deserializer puts the key data parsed with html_parser.unescape into raw and sets outputMimeType to self.field.output_mime_type. See table below

`value`	`serializer`	`deserializer`	`value`
`raw`		`html_parser.unescape(data)`	`raw`
`encoding`	`encoding`	`encoding`	`encoding`
`raw_encoded`			`raw_encoded`
`mimeType`	`content_type`	`content_type`	`mimeType`
`outputMimeType`		`self.field.output_mime_type`	`outputMimeType`
`output`			`output`
`output_relative_to`	`data`		`output_relative_to`

classs plone.restapi.serializer.converters.RichtextDXContextConverter

@adapter(IRichTextValue, IDexterityContent)
@implementer(IContextawareJsonCompatible)
class RichtextDXContextConverter:
    def __init__(self, value, context):
        self.value = value
        self.context = context

    def __call__(self):
        value = self.value
        output = value.output_relative_to(self.context)
        return {
            "data": json_compatible(output),
            "content-type": json_compatible(value.mimeType),
            "encoding": json_compatible(value.encoding),
        }

class plone.restapi.deserializer.dxfields.RichTextFieldDeserializer

@implementer(IFieldDeserializer)
@adapter(IRichText, IDexterityContent, IBrowserRequest)
class RichTextFieldDeserializer(DefaultFieldDeserializer):
    def __call__(self, value):
        content_type = self.field.default_mime_type
        encoding = "utf8"
        if isinstance(value, dict):
            content_type = value.get("content-type", content_type)
            encoding = value.get("encoding", encoding)
            data = value.get("data", "")
        elif isinstance(value, TUSUpload):
            content_type = value.metadata().get("content-type", content_type)
            with open(value.filepath, "rb") as f:
                data = f.read().decode("utf8")
        else:
            data = value
        value = RichTextValue(
            raw=html_parser.unescape(data),
            mimeType=content_type,
            outputMimeType=self.field.output_mime_type,
            encoding=encoding,
        )
        self.field.validate(value)
        return value

tisto · 2022-11-22T04:16:49Z

@me-kell this is something that we never implemented in plone.restapi. Therefore I'd consider this to be a feature request, not a bug. That being said, I'd be more than happy to include this feature in plone.restapi. The content negotiation mechanism allows us to handle multiple formats, not only JSON (which is the only supported option right now). Though, I'd expect this to require some effort to make it work...

jensens · 2022-11-22T08:27:38Z

Well, firstly it's a bug: mismatch between data and content-type.

Next then is a missing feature, to improve the overall situation to handle different types in the field.

me-kell · 2022-11-22T10:56:28Z

This is for different reasons a bug.

When we get a value via RESTAPI we would expect that nothing changes when we set the same value again.

This is not the case with objects having a field with RichTextValue. See how to reproduce it below.

This problem concerns not only text/x-web-markdown, but all mime-types (text/x-web-textile, text/x-rst, etc.).

And affects all content types having fields with RichtTextValue (not only RichTextField) like Document, News Item etc and other Dexterity content types.

To reproduce it

Import and define some variables

import requests
from plone import api
from plone.app.textfield.value import RichTextValue

url = 'http://localhost:8081/Plone'
headers = {'Accept': 'application/json', 'Content-Type': 'application/json'}

configure markup via plone.restapi

data = {
    "allowed_types": [{"title": "text/x-web-markdown", "token": "text/x-web-markdown"}],
    "default_type": {"title": "text/x-web-markdown", "token": "text/x-web-markdown"}
}
response = requests.patch(url + '/@controlpanels/markup', headers=headers, json=data, auth=('admin', 'admin'))

create a Document with mimeType="text/x-web-markdown" via plone.api

app._p_jar.sync()
obj = api.content.create(
    container=plone,
    type='Document',
    title='MyDocument3',
    mimeType='text/x-web-markdown',
    text=RichTextValue(
        raw="normal *italics* normal **bold** normal",
        mimeType="text/x-web-markdown",
        outputMimeType="text/x-html-safe",
        encoding='utf-8',
    )
)
transaction.commit()
print(f"[plone.api] old_text.raw={obj.text.raw}")

[plone.api] old_text.raw=normal *italics* normal **bold** normal

Get the created document via plone.restapi.

response = requests.get(url + '/' + obj.id, headers=headers, auth=('admin', 'admin'))
old_text = response.json()['text']
print(f"[plone.restapi] old_text={old_text}")

[plone.restapi] old_text={
    'content-type': 'text/x-web-markdown', 
    'data': '<p>normal <em>italics</em> normal <strong>bold</strong> normal</p>', 
    'encoding': 'utf-8'}

And simply patch the document with the same text value via plone.restapi.

response = requests.patch(
    url + '/' + obj.id,
    headers=headers,
    auth=('admin', 'admin'),
    json={'text': old_text}
)
response = requests.get(url + '/' + obj.id, headers=headers, auth=('admin', 'admin'))
new_text = response.json()['text']
print(f"[plone.restapi] new_text={new_text}")

[plone.restapi] new_text={
    'content-type': 'text/x-web-markdown', 
    'data': '<p>normal <em>italics</em> normal <strong>bold</strong> normal</p>', 
    'encoding': 'utf-8'}

The RESTAPI doesn't see any difference: old_text and new_text are the same.

But it has changed the value of the document:

app._p_jar.sync()
print(f"[plone.api] new_text.raw={obj.text.raw}")

[plone.api] new_text.raw=<p>normal <em>italics</em> normal <strong>bold</strong> normal</p>

me-kell · 2022-11-22T11:11:13Z

this is something that we never implemented in plone.restapi.

@tisto I'm not sure I understand what you are referring to.

jensens · 2022-11-22T14:58:26Z

Agree, this is indeed more than a bug of wrong content-type, this is a bug on the conceptional API design level.

me-kell · 2023-07-21T10:46:27Z

@tisto Are there any plans to fix this bug?

davisagli · 2023-07-21T16:06:26Z

@me-kell I'm willing to review a proposal for how to fix it, and a PR to implement the fix. I won't prioritize fixing it myself, since I'm not working on any projects that use non-HTML rich text fields.

sneridagh · 2024-02-24T09:28:44Z

@me-kell what @davisagli and the others are saying is that if you need that fix, you implement it and send a PR with it.

me-kell · 2024-02-24T09:44:16Z

Sorry. But I would expect that such conceptional bug would have been rejected from the beginning.

IMHO an API should pass the simplest test ("you get back exactly what you put in") which is not the case here from the beginning.

I'd very much like to fix this bug if I had the time (and the know how).

I've described the bug as good as possible so that everyone can understand what, why and where the problem is.

I'd suggest to ask the original developer and/or the maintainer to take this responsibility on code quality.

stevepiercy · 2024-02-24T10:31:54Z

I'd suggest to ask the original developer and/or the maintainer to take this responsibility on code quality.

@me-kell this comment is disrespectful, insulting, and imflamatory. It implies that the contributors to this package do not care about code quality. I strongly recommend you reconsider your words.

That unsavory business aside, a pull request is always welcome. I don't think anyone opposes adding support as you describe. The reality is that we are all volunteer, unpaid contributors to open source software. There is no single maintainer, and they often come and go. The Plone community members are the maintainers. As in all open source software projects, we add features or fix bugs because it affects our work, and do not fix them when they do not. The reason it does not get implemented to your specifications is because no one else has the need to fix it at this time. When someone does have the time and desire to implement this proposal, only then will it get done. That is how open source software works.

me-kell · 2024-02-24T11:33:24Z

@stevepiercy Sorry. No offence or disrespect intended at all.

me-kell · 2024-02-24T11:44:01Z

The reason it does not get implemented to your specifications is because no one else has the need to fix it at this time.

@stevepiercy No offence, but those are not "my specifications". I think it should be obvious that this issue is not an "enhancement" but a "bug". Like it or not. See @jensens comment above:

Agree, this is indeed more than a bug of wrong content-type, this is a bug on the conceptional API design level.

@stevepiercy May I ask the reason why did you remove the "bug" label? Was it a technical one?

jensens · 2024-02-25T10:56:48Z

The mismatch is a bug, still. To fix it on conceptual level is a feature.

davisagli · 2024-02-28T07:17:17Z

From my perspective as a maintainer:

It certainly seems like a reasonable request to have a way to get the richtext data in the raw format it is stored (especially since that is what you need in order to save changes).

I don't really care whether it's considered a bug or a feature but I do care about not introducing a breaking change for existing systems that assume data is converted to the output mimetype. So I don't think we can simply change it to use the raw value.

We could add a new key raw that has the raw data. That would mean a few more bytes being transferred. If someone is concerned about that we could make it configurable in the request (with something like Accept: application/x-json-rawrichtext)

I'd suggest to ask the original developer and/or the maintainer to take this responsibility on code quality.

You are welcome to use the software we made available for free, and even contribute to it if you can find the time. The software is made available under a license that disclaims any warranty or guarantee of quality, and I'm sorry if that's inconvenient for you but I don't feel responsible for fixing this design oversight. I don't intend to be mean about that -- and I'm not offended by you asking -- but sometimes in the past I tried to take care of too much in an open source project and then risked burnout. So now I take care of a few things now and then when I have an interest, but I'll also not feel bad about leaving some of it for others.

me-kell · 2024-02-28T10:26:25Z

@davisagli thank your for your proposal: adding a new key raw is surely a good first step.

The RichTextFieldDeserializer should also not overwrite raw with data a it is the case today:

plone.restapi/src/plone/restapi/deserializer/dxfields.py

Line 292 in cb63e1a

raw=html_parser.unescape(data),

I'm not sure if it will suffice or whether other keys like e.g. outputMimeType would be also necessary.

jensens · 2024-02-28T10:43:32Z

I don't really care whether it's considered a bug or a feature but I do care about not introducing a breaking change for existing systems that assume data is converted to the output mimetype. So I don't think we can simply change it to use the raw value.

You got me wrong. It's not about one or the other.

We first need to fix the bug to report the right mime-type (the output-type of the text-field, not the input-type).

As @me-kell said, currently it is currently

"text": {
  "content-type": "text/x-web-markdown",
  "data": "<p>xyz <strong>bold</strong> normal</p>",
  "encoding": "utf8"
}

but correct is

"text": {
  "content-type": "text/html",
  "data": "<p>xyz <strong>bold</strong> normal</p>",
  "encoding": "utf8"
}

Code is here:

plone.restapi/src/plone/restapi/serializer/converters.py

Line 188 in cb63e1a

"content-type": json_compatible(value.mimeType),

And it should be

"content-type": json_compatible(value.outputMimeType),

Then we can open a new issue about a feature to additional output the raw (input) data and it's mime-type.

me-kell · 2024-02-28T11:36:10Z

Actually the RichtTextValue object of the RichText field has two different attributes: raw and output with their corresponding mime type attributes: mimeType (for raw) and outputMimeType (for output).

Currently the RESTAPI returns output (actually output_relative_to) (as data) and mimeType (as conttent_type) which is the mimeType of raw.

And the RESTAPI currently do not allow to transparently write back data (to raw) because it parses data with html_parser.unescape.

Furthermore the RESTAPI allows to read and to change the mimeType (via content_type) but it doesn't allow to read nor to change the outputMimeType.

If the RESTAPI is meant to both read and write RichtText fields it should have read and write keys for both: raw and output (with its corresponding mime types).

If the RESTAPI only offers keys for the output attribute then it is not clear to me why to read the output attribute but write back to the raw attribute. The whole point of RichText and Transformation in Plone would then be obfuscated and unnecessary.

me-kell · 2024-02-29T10:22:37Z

AFAICS following will solve this issue:

change the key content_type to return the outputMimeType (as @jensens proposed).
adding the keys raw_encoded (or raw as @davisagli proposed) and raw_content_type.

Here is the current (wrong) mapping:

`value`	`serializer`	`comment`
`raw`
`encoding`	`encoding`
`raw_encoded`
`mimeType`	`content_type`	this is raw mimeType!
`outputMimeType`
`output`
`output_relative_to`	`data`

and the proposed mapping:

`value`	`serializer`	`comment`
`raw`
`encoding`	`encoding`
`raw_encoded`	`raw_encoded`	new key
`mimeType`	`raw_content_type`	new key
`outputMimeType`	`content_type`	corrected key
`output`
`output_relative_to`	`data`

Following is the code to test the current situation:

from urllib.parse import urlparse

import requests

URL = 'https://classic.demo.plone.org'
AUTH = ('admin', 'admin')
API_PATH_PREFIX = '++api++'


def get_restapi_url(url):
    parsed_url = urlparse(url)
    restapi_url = parsed_url._replace(path=API_PATH_PREFIX + parsed_url.path).geturl()
    return restapi_url


result = requests.patch(
    urlparse(URL)._replace(path='@controlpanels/markup').geturl(),
    headers={'Accept': 'application/json', 'Content-Type': 'application/json'},
    json={
        "allowed_types": [
            {"title": "text/html", "token": "text/html"},
            {"title": "text/x-web-markdown", "token": "text/x-web-markdown"},
            {"title": "text/x-web-textile", "token": "text/x-web-textile"}
        ],
        "default_type": {"title": "text/html", "token": "text/html"},
        "markdown_extensions": [
            "markdown.extensions.fenced_code",
            "markdown.extensions.footnotes",
            "markdown.extensions.tables"
        ]
    },
    auth=AUTH
)

my_text = {"content-type": "text/x-web-markdown", "data": "xyz **bold** normal", "encoding": "utf8"}
print(f"my_text: {my_text}")
# {'content-type': 'text/x-web-markdown', 'data': 'xyz **bold** normal', 'encoding': 'utf8'}

result_post = requests.post(
    urlparse(URL).geturl(),
    headers={'Accept': 'application/json', 'Content-Type': 'application/json'},
    json={
        "@type": "Document",
        "title": "MyDocument",
        "text": my_text
    },
    auth=AUTH
)

print(f"result_post_at_id: {result_post.json()['@id']}")
print(f"result_post_text: {result_post.json()['text']}")
# this is wrong because it returns
#     output (actually output_relative_to) (as data) and
#     mimeType (as conttent_type) which is the mimeType of raw.
# {'content-type': 'text/x-web-markdown', 'data': '<p>xyz <strong>bold</strong> normal</p>', 'encoding': 'utf8'}

result_patch = requests.patch(
    get_restapi_url(result_post.json()['@id']),
    headers={'Accept': 'application/json', 'Content-Type': 'application/json', 'Prefer': 'return=representation'},
    json={'text': {"content-type": "text/x-web-markdown", "data": "xyz **bold** normal again", "encoding": "utf8"}},
    auth=AUTH
)

print(f"result_patch_text: {result_patch.json()['text']}")
# {'content-type': 'text/x-web-markdown', 'data': '<p>xyz <strong>bold</strong> normal again</p>', 'encoding': 'utf8'}

result_get = requests.get(
    get_restapi_url(result_patch.json()['@id']),
    headers={'Accept': 'application/json'},
    auth=AUTH
)
print(f"result_get_text: {result_get.json()['text']}")
# {'content-type': 'text/x-web-markdown', 'data': '<p>xyz <strong>bold</strong> normal again</p>', 'encoding': 'utf8'}

jensens added the 01 type: bug label Oct 25, 2022

tisto added 04 type: enhancement and removed 01 type: bug labels Nov 22, 2022

jensens added 01 type: bug 21 status: confirmed and removed 04 type: enhancement labels Nov 22, 2022

This was referenced Apr 18, 2023

Deserializing richtext unescapes html-entities #1310

Open

Fix importing richtext with html entities collective/collective.exportimport#100

Merged

stevepiercy added 04 type: enhancement 01 type: bug and removed 01 type: bug labels Feb 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

richttext fields set to `text/x-web-markdown` return HTML #1526

richttext fields set to `text/x-web-markdown` return HTML #1526

me-kell commented Oct 25, 2022

jensens commented Oct 25, 2022

jensens commented Oct 25, 2022

me-kell commented Nov 20, 2022 •

edited

tisto commented Nov 22, 2022

jensens commented Nov 22, 2022

me-kell commented Nov 22, 2022 •

edited

me-kell commented Nov 22, 2022 •

edited

jensens commented Nov 22, 2022

me-kell commented Jul 21, 2023

davisagli commented Jul 21, 2023

sneridagh commented Feb 24, 2024

me-kell commented Feb 24, 2024

stevepiercy commented Feb 24, 2024

me-kell commented Feb 24, 2024

me-kell commented Feb 24, 2024 •

edited

jensens commented Feb 25, 2024

davisagli commented Feb 28, 2024

me-kell commented Feb 28, 2024 •

edited

jensens commented Feb 28, 2024 •

edited

me-kell commented Feb 28, 2024 •

edited

me-kell commented Feb 29, 2024

richttext fields set to text/x-web-markdown return HTML #1526

richttext fields set to text/x-web-markdown return HTML #1526

Comments

me-kell commented Oct 25, 2022

jensens commented Oct 25, 2022

jensens commented Oct 25, 2022

me-kell commented Nov 20, 2022 • edited

tisto commented Nov 22, 2022

jensens commented Nov 22, 2022

me-kell commented Nov 22, 2022 • edited

To reproduce it

me-kell commented Nov 22, 2022 • edited

jensens commented Nov 22, 2022

me-kell commented Jul 21, 2023

davisagli commented Jul 21, 2023

sneridagh commented Feb 24, 2024

me-kell commented Feb 24, 2024

stevepiercy commented Feb 24, 2024

me-kell commented Feb 24, 2024

me-kell commented Feb 24, 2024 • edited

jensens commented Feb 25, 2024

davisagli commented Feb 28, 2024

me-kell commented Feb 28, 2024 • edited

jensens commented Feb 28, 2024 • edited

me-kell commented Feb 28, 2024 • edited

me-kell commented Feb 29, 2024

richttext fields set to `text/x-web-markdown` return HTML #1526

richttext fields set to `text/x-web-markdown` return HTML #1526

me-kell commented Nov 20, 2022 •

edited

me-kell commented Nov 22, 2022 •

edited

me-kell commented Nov 22, 2022 •

edited

me-kell commented Feb 24, 2024 •

edited

me-kell commented Feb 28, 2024 •

edited

jensens commented Feb 28, 2024 •

edited

me-kell commented Feb 28, 2024 •

edited