Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

richttext fields set to text/x-web-markdown return HTML #1526

Open
me-kell opened this issue Oct 25, 2022 · 21 comments
Open

richttext fields set to text/x-web-markdown return HTML #1526

me-kell opened this issue Oct 25, 2022 · 21 comments

Comments

@me-kell
Copy link

me-kell commented Oct 25, 2022

The REST API returns the HTML-Version of richttext fields even if the content-type is set to text/x-web-markdown.

I'd expect the stored markdown version (value.raw) to be returned.

We send:

"text": {
  "content-type": "text/x-web-markdown",
  "data": "xyz **bold** normal",
  "encoding": "utf8"
}

We get:

"text": {
  "content-type": "text/x-web-markdown",
  "data": "<p>xyz <strong>bold</strong> normal</p>",
  "encoding": "utf8"
}

To reproduce it:

SITE_URL=https://6-classic.demo.plone.org
USER=manager:****   # <- set manager password here

cat <<EOF | tee config_markup.json
{
  "allowed_types": [
    {"title":"text/html","token": "text/html"},
    {"title": "text/x-web-markdown","token": "text/x-web-markdown"},
    {"title": "text/x-web-textile","token": "text/x-web-textile"}
  ],
  "default_type": {"title":"text/html","token": "text/html"},
  "markdown_extensions":[
    "markdown.extensions.fenced_code",
    "markdown.extensions.footnotes",
    "markdown.extensions.tables"
  ]
}
EOF

curl -i -X PATCH "${SITE_URL}/@controlpanels/markup" \
    -H "Accept: application/json" \
    -H "Content-Type: application/json" \
    --data-binary "@my_document.json" \
    --user ${USER}

cat <<EOF | tee my_document.json
{
  "@type": "Document",
  "title": "MyDocument",
  "text": {
    "content-type": "text/x-web-markdown",
    "data": "xyz **bold** normal",
    "encoding": "utf8"
  }
}
EOF

ID=$(
    curl -s -X POST "${SITE_URL}" \
    -H "Accept: application/json" \
    -H "Content-Type: application/json" \
    --data-binary "@my_document.json" \
    --user ${USER} \
    | jq -r '."@id"' \
)
echo $ID
curl -s -X GET "${ID}" \
    -H "Accept: application/json" \
    -H "Content-Type: application/json" \
    --user ${USER} \
    | jq -r '."text"'

It returns

...
"text": {
  "content-type": "text/x-web-markdown",
  "data": "<p>xyz <strong>bold</strong> normal</p>",
  "encoding": "utf8"
}
...
@jensens
Copy link
Sponsor Member

jensens commented Oct 25, 2022

IMO the bug is either that the content-type is wrong or the data is wrong, because both do not match.

Having a mode to get raw or rendered is kind of feature request then. For editing one want raw, for display maybe rendered.

@jensens
Copy link
Sponsor Member

jensens commented Oct 25, 2022

@me-kell
Copy link
Author

me-kell commented Nov 20, 2022

The problem seems to be in the inconsistent handling of RichTextValue in the serializer and deserializer.

plone.app.textfield.value.RichTextValue has the following properties:

  • raw
  • encoding
  • raw_encoded
  • mimeType
  • outputMimeType
  • output
  • output_relative_to

While the serializer classs plone.restapi.serializer.converters.RichtextDXContextConverter only returns

  • output_relative_to in key data
  • mimeType in key content_type
  • encoding in key encoding

The deserializer class plone.restapi.deserializer.dxfields.RichTextFieldDeserializer sets

  • raw from key data through html_parser.unescape
  • mimeType from key content_type
  • output from self.field.output_mime_type
  • encoding from key encodding

The properties

  • raw_encoded and outputMimeType are not handled by the serializing/deserializing process
  • encoding and mimeType (in the key content-type) are passed transparently by the serializing/deserializing process
  • The serializer returns output_relative_to in the key data while the deserializer puts the key data parsed with html_parser.unescape into raw and sets outputMimeType to self.field.output_mime_type. See table below
value serializer deserializer value
raw html_parser.unescape(data) raw
encoding encoding encoding encoding
raw_encoded raw_encoded
mimeType content_type content_type mimeType
outputMimeType self.field.output_mime_type outputMimeType
output output
output_relative_to data output_relative_to

classs plone.restapi.serializer.converters.RichtextDXContextConverter

@adapter(IRichTextValue, IDexterityContent)
@implementer(IContextawareJsonCompatible)
class RichtextDXContextConverter:
    def __init__(self, value, context):
        self.value = value
        self.context = context

    def __call__(self):
        value = self.value
        output = value.output_relative_to(self.context)
        return {
            "data": json_compatible(output),
            "content-type": json_compatible(value.mimeType),
            "encoding": json_compatible(value.encoding),
        }

class plone.restapi.deserializer.dxfields.RichTextFieldDeserializer

@implementer(IFieldDeserializer)
@adapter(IRichText, IDexterityContent, IBrowserRequest)
class RichTextFieldDeserializer(DefaultFieldDeserializer):
    def __call__(self, value):
        content_type = self.field.default_mime_type
        encoding = "utf8"
        if isinstance(value, dict):
            content_type = value.get("content-type", content_type)
            encoding = value.get("encoding", encoding)
            data = value.get("data", "")
        elif isinstance(value, TUSUpload):
            content_type = value.metadata().get("content-type", content_type)
            with open(value.filepath, "rb") as f:
                data = f.read().decode("utf8")
        else:
            data = value
        value = RichTextValue(
            raw=html_parser.unescape(data),
            mimeType=content_type,
            outputMimeType=self.field.output_mime_type,
            encoding=encoding,
        )
        self.field.validate(value)
        return value

@tisto
Copy link
Sponsor Member

tisto commented Nov 22, 2022

@me-kell this is something that we never implemented in plone.restapi. Therefore I'd consider this to be a feature request, not a bug. That being said, I'd be more than happy to include this feature in plone.restapi. The content negotiation mechanism allows us to handle multiple formats, not only JSON (which is the only supported option right now). Though, I'd expect this to require some effort to make it work...

@jensens
Copy link
Sponsor Member

jensens commented Nov 22, 2022

Well, firstly it's a bug: mismatch between data and content-type.

Next then is a missing feature, to improve the overall situation to handle different types in the field.

@me-kell
Copy link
Author

me-kell commented Nov 22, 2022

This is for different reasons a bug.

When we get a value via RESTAPI we would expect that nothing changes when we set the same value again.

This is not the case with objects having a field with RichTextValue. See how to reproduce it below.

This problem concerns not only text/x-web-markdown, but all mime-types (text/x-web-textile, text/x-rst, etc.).

And affects all content types having fields with RichtTextValue (not only RichTextField) like Document, News Item etc and other Dexterity content types.

To reproduce it

Import and define some variables

import requests
from plone import api
from plone.app.textfield.value import RichTextValue

url = 'http://localhost:8081/Plone'
headers = {'Accept': 'application/json', 'Content-Type': 'application/json'}

configure markup via plone.restapi

data = {
    "allowed_types": [{"title": "text/x-web-markdown", "token": "text/x-web-markdown"}],
    "default_type": {"title": "text/x-web-markdown", "token": "text/x-web-markdown"}
}
response = requests.patch(url + '/@controlpanels/markup', headers=headers, json=data, auth=('admin', 'admin'))

create a Document with mimeType="text/x-web-markdown" via plone.api

app._p_jar.sync()
obj = api.content.create(
    container=plone,
    type='Document',
    title='MyDocument3',
    mimeType='text/x-web-markdown',
    text=RichTextValue(
        raw="normal *italics* normal **bold** normal",
        mimeType="text/x-web-markdown",
        outputMimeType="text/x-html-safe",
        encoding='utf-8',
    )
)
transaction.commit()
print(f"[plone.api] old_text.raw={obj.text.raw}")
[plone.api] old_text.raw=normal *italics* normal **bold** normal

Get the created document via plone.restapi.

response = requests.get(url + '/' + obj.id, headers=headers, auth=('admin', 'admin'))
old_text = response.json()['text']
print(f"[plone.restapi] old_text={old_text}")
[plone.restapi] old_text={
    'content-type': 'text/x-web-markdown', 
    'data': '<p>normal <em>italics</em> normal <strong>bold</strong> normal</p>', 
    'encoding': 'utf-8'}

And simply patch the document with the same text value via plone.restapi.

response = requests.patch(
    url + '/' + obj.id,
    headers=headers,
    auth=('admin', 'admin'),
    json={'text': old_text}
)
response = requests.get(url + '/' + obj.id, headers=headers, auth=('admin', 'admin'))
new_text = response.json()['text']
print(f"[plone.restapi] new_text={new_text}")
[plone.restapi] new_text={
    'content-type': 'text/x-web-markdown', 
    'data': '<p>normal <em>italics</em> normal <strong>bold</strong> normal</p>', 
    'encoding': 'utf-8'}

The RESTAPI doesn't see any difference: old_text and new_text are the same.

But it has changed the value of the document:

app._p_jar.sync()
print(f"[plone.api] new_text.raw={obj.text.raw}")
[plone.api] new_text.raw=<p>normal <em>italics</em> normal <strong>bold</strong> normal</p>

@me-kell
Copy link
Author

me-kell commented Nov 22, 2022

this is something that we never implemented in plone.restapi.

@tisto I'm not sure I understand what you are referring to.

@jensens
Copy link
Sponsor Member

jensens commented Nov 22, 2022

Agree, this is indeed more than a bug of wrong content-type, this is a bug on the conceptional API design level.

@me-kell
Copy link
Author

me-kell commented Jul 21, 2023

@tisto Are there any plans to fix this bug?

@davisagli
Copy link
Sponsor Member

@me-kell I'm willing to review a proposal for how to fix it, and a PR to implement the fix. I won't prioritize fixing it myself, since I'm not working on any projects that use non-HTML rich text fields.

@sneridagh
Copy link
Member

@me-kell what @davisagli and the others are saying is that if you need that fix, you implement it and send a PR with it.

@me-kell
Copy link
Author

me-kell commented Feb 24, 2024

Sorry. But I would expect that such conceptional bug would have been rejected from the beginning.

IMHO an API should pass the simplest test ("you get back exactly what you put in") which is not the case here from the beginning.

I'd very much like to fix this bug if I had the time (and the know how).

I've described the bug as good as possible so that everyone can understand what, why and where the problem is.

I'd suggest to ask the original developer and/or the maintainer to take this responsibility on code quality.

@stevepiercy
Copy link
Contributor

I'd suggest to ask the original developer and/or the maintainer to take this responsibility on code quality.

@me-kell this comment is disrespectful, insulting, and imflamatory. It implies that the contributors to this package do not care about code quality. I strongly recommend you reconsider your words.

That unsavory business aside, a pull request is always welcome. I don't think anyone opposes adding support as you describe. The reality is that we are all volunteer, unpaid contributors to open source software. There is no single maintainer, and they often come and go. The Plone community members are the maintainers. As in all open source software projects, we add features or fix bugs because it affects our work, and do not fix them when they do not. The reason it does not get implemented to your specifications is because no one else has the need to fix it at this time. When someone does have the time and desire to implement this proposal, only then will it get done. That is how open source software works.

@me-kell
Copy link
Author

me-kell commented Feb 24, 2024

@stevepiercy Sorry. No offence or disrespect intended at all.

@me-kell
Copy link
Author

me-kell commented Feb 24, 2024

The reason it does not get implemented to your specifications is because no one else has the need to fix it at this time.

@stevepiercy No offence, but those are not "my specifications". I think it should be obvious that this issue is not an "enhancement" but a "bug". Like it or not. See @jensens comment above:

Agree, this is indeed more than a bug of wrong content-type, this is a bug on the conceptional API design level.

@stevepiercy May I ask the reason why did you remove the "bug" label? Was it a technical one?

@jensens
Copy link
Sponsor Member

jensens commented Feb 25, 2024

The mismatch is a bug, still. To fix it on conceptual level is a feature.

@davisagli
Copy link
Sponsor Member

From my perspective as a maintainer:

It certainly seems like a reasonable request to have a way to get the richtext data in the raw format it is stored (especially since that is what you need in order to save changes).

I don't really care whether it's considered a bug or a feature but I do care about not introducing a breaking change for existing systems that assume data is converted to the output mimetype. So I don't think we can simply change it to use the raw value.

We could add a new key raw that has the raw data. That would mean a few more bytes being transferred. If someone is concerned about that we could make it configurable in the request (with something like Accept: application/x-json-rawrichtext)

I'd suggest to ask the original developer and/or the maintainer to take this responsibility on code quality.

You are welcome to use the software we made available for free, and even contribute to it if you can find the time. The software is made available under a license that disclaims any warranty or guarantee of quality, and I'm sorry if that's inconvenient for you but I don't feel responsible for fixing this design oversight. I don't intend to be mean about that -- and I'm not offended by you asking -- but sometimes in the past I tried to take care of too much in an open source project and then risked burnout. So now I take care of a few things now and then when I have an interest, but I'll also not feel bad about leaving some of it for others.

@me-kell
Copy link
Author

me-kell commented Feb 28, 2024

@davisagli thank your for your proposal: adding a new key raw is surely a good first step.

The RichTextFieldDeserializer should also not overwrite raw with data a it is the case today:

raw=html_parser.unescape(data),

I'm not sure if it will suffice or whether other keys like e.g. outputMimeType would be also necessary.

@jensens
Copy link
Sponsor Member

jensens commented Feb 28, 2024

I don't really care whether it's considered a bug or a feature but I do care about not introducing a breaking change for existing systems that assume data is converted to the output mimetype. So I don't think we can simply change it to use the raw value.

You got me wrong. It's not about one or the other.

We first need to fix the bug to report the right mime-type (the output-type of the text-field, not the input-type).

As @me-kell said, currently it is currently

"text": {
  "content-type": "text/x-web-markdown",
  "data": "<p>xyz <strong>bold</strong> normal</p>",
  "encoding": "utf8"
}

but correct is

"text": {
  "content-type": "text/html",
  "data": "<p>xyz <strong>bold</strong> normal</p>",
  "encoding": "utf8"
}

Code is here:

"content-type": json_compatible(value.mimeType),

And it should be

"content-type": json_compatible(value.outputMimeType), 

Then we can open a new issue about a feature to additional output the raw (input) data and it's mime-type.

@me-kell
Copy link
Author

me-kell commented Feb 28, 2024

Actually the RichtTextValue object of the RichText field has two different attributes: raw and output with their corresponding mime type attributes: mimeType (for raw) and outputMimeType (for output).

Currently the RESTAPI returns output (actually output_relative_to) (as data) and mimeType (as conttent_type) which is the mimeType of raw.

And the RESTAPI currently do not allow to transparently write back data (to raw) because it parses data with html_parser.unescape.

Furthermore the RESTAPI allows to read and to change the mimeType (via content_type) but it doesn't allow to read nor to change the outputMimeType.

If the RESTAPI is meant to both read and write RichtText fields it should have read and write keys for both: raw and output (with its corresponding mime types).

If the RESTAPI only offers keys for the output attribute then it is not clear to me why to read the output attribute but write back to the raw attribute. The whole point of RichText and Transformation in Plone would then be obfuscated and unnecessary.

@me-kell
Copy link
Author

me-kell commented Feb 29, 2024

AFAICS following will solve this issue:

  • change the key content_type to return the outputMimeType (as @jensens proposed).
  • adding the keys raw_encoded (or raw as @davisagli proposed) and raw_content_type.

Here is the current (wrong) mapping:

value serializer comment
raw
encoding encoding
raw_encoded
mimeType content_type this is raw mimeType!
outputMimeType
output
output_relative_to data

and the proposed mapping:

value serializer comment
raw
encoding encoding
raw_encoded raw_encoded new key
mimeType raw_content_type new key
outputMimeType content_type corrected key
output
output_relative_to data

Following is the code to test the current situation:

from urllib.parse import urlparse

import requests

URL = 'https://classic.demo.plone.org'
AUTH = ('admin', 'admin')
API_PATH_PREFIX = '++api++'


def get_restapi_url(url):
    parsed_url = urlparse(url)
    restapi_url = parsed_url._replace(path=API_PATH_PREFIX + parsed_url.path).geturl()
    return restapi_url


result = requests.patch(
    urlparse(URL)._replace(path='@controlpanels/markup').geturl(),
    headers={'Accept': 'application/json', 'Content-Type': 'application/json'},
    json={
        "allowed_types": [
            {"title": "text/html", "token": "text/html"},
            {"title": "text/x-web-markdown", "token": "text/x-web-markdown"},
            {"title": "text/x-web-textile", "token": "text/x-web-textile"}
        ],
        "default_type": {"title": "text/html", "token": "text/html"},
        "markdown_extensions": [
            "markdown.extensions.fenced_code",
            "markdown.extensions.footnotes",
            "markdown.extensions.tables"
        ]
    },
    auth=AUTH
)

my_text = {"content-type": "text/x-web-markdown", "data": "xyz **bold** normal", "encoding": "utf8"}
print(f"my_text: {my_text}")
# {'content-type': 'text/x-web-markdown', 'data': 'xyz **bold** normal', 'encoding': 'utf8'}

result_post = requests.post(
    urlparse(URL).geturl(),
    headers={'Accept': 'application/json', 'Content-Type': 'application/json'},
    json={
        "@type": "Document",
        "title": "MyDocument",
        "text": my_text
    },
    auth=AUTH
)

print(f"result_post_at_id: {result_post.json()['@id']}")
print(f"result_post_text: {result_post.json()['text']}")
# this is wrong because it returns
#     output (actually output_relative_to) (as data) and
#     mimeType (as conttent_type) which is the mimeType of raw.
# {'content-type': 'text/x-web-markdown', 'data': '<p>xyz <strong>bold</strong> normal</p>', 'encoding': 'utf8'}

result_patch = requests.patch(
    get_restapi_url(result_post.json()['@id']),
    headers={'Accept': 'application/json', 'Content-Type': 'application/json', 'Prefer': 'return=representation'},
    json={'text': {"content-type": "text/x-web-markdown", "data": "xyz **bold** normal again", "encoding": "utf8"}},
    auth=AUTH
)

print(f"result_patch_text: {result_patch.json()['text']}")
# {'content-type': 'text/x-web-markdown', 'data': '<p>xyz <strong>bold</strong> normal again</p>', 'encoding': 'utf8'}

result_get = requests.get(
    get_restapi_url(result_patch.json()['@id']),
    headers={'Accept': 'application/json'},
    auth=AUTH
)
print(f"result_get_text: {result_get.json()['text']}")
# {'content-type': 'text/x-web-markdown', 'data': '<p>xyz <strong>bold</strong> normal again</p>', 'encoding': 'utf8'}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants