Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Editor: PostTitle: Decode entities for display #20887

Closed
wants to merge 10 commits into from

Conversation

aduth
Copy link
Member

@aduth aduth commented Mar 13, 2020

Closes #20490
Related: #18616, #19955, https://core.trac.wordpress.org/ticket/11311

This pull request seeks to improve the behavior for title character encoding for users who do not have unfiltered_html capability. Without these changes, a contributing user to the site would see escape sequences for characters in the title after they have saved a post (i.e. & becomes &, shown literally in the editor's title field).

Before After
Before After

Notably, this implementation must make quite a few accommodations:

  • It must not save the title as escaped. The previous implementation of Post Title: fix special chars #18616 did escape titles, thus breaking "texturize" behavior that is expected to apply for titles, content, etc.
    • We depend on the server to escape input based on the user capabilities. In fact, this server-side escaping is what causes the issue in the first place, and why it only affects users of particular capabilities.
  • It should leave the underlying data intact. The post entities (core/data) should still report a raw title value including entities, if applicable. The implementation here could have transformed the values at the editor level. However, since this is largely a concern of display, it was decided to implement this at the level of the title component.
  • It shouldn't interfere with user input. If a user starts typing &, &a, &am, etc., the title should not suddenly update to decode that input. This is accounted for by preemptively encoding an ampersand input.
  • A user should still be able to insert HTML in the title according to their own capabilities.

Current Status:

This is largely complete. However, as may be evidenced by the size of the changes, a fair bit of unrelated refactoring was included. This could be included as part of review of these changes, or separated out to separate pull requests and rebased on this branch to make this smaller.

These include:

  • Components: Implement new component DetectFocusOutside (e14f981)
  • Components: Refactor withFocusOutside to use DetectFocusOutside (bece56c)
  • Editor: Refactor PostTitle as function component (9fc08cb)
  • Editor: PostTitle: Remove unused className has-fixed-toolbar (bec7518)
  • Edit Post: Layout: Remove unused className has-fixed-toolbar (df174b9)
  • Editor: PostTitle: Remove unused className is-selected (9eadc8e)
  • Editor: PostTitle: Refactor PostTitle label screen-reader-text as Vis… … (d80332d)

Testing Instructions:

Repeat Steps to Reproduce from #20490, both as a user with and without unfiltered_html capability (i.e. adminstrators and contributors). Note that it's not necessary to use a multisite installation. This can be tested in single-site as well.

Verify also all of the above accommodations.

Specifically, be on the lookout for edge cases concerning:

  • User input of HTML or escape sequences.
  • Title field updates at load, and after save.

@aduth
Copy link
Member Author

aduth commented Mar 13, 2020

It would probably be good to add an end-to-end test case which verifies some of the behaviors here:

  • Typing input which could be decoded should leave the input alone.
    • i.e. If I type (verbatim) "This & That", then it should be left verbatim as such.
  • Saving a post with characters that may be escaped, should present them as decoded after (a) save and (b) reload
    • i.e. If I type "This & That", then save the post, the title field should still show "This & That". Likewise for after a reload.

@github-actions
Copy link

github-actions bot commented Mar 13, 2020

Size Change: +159 B (0%)

Total Size: 857 kB

Filename Size Change
build/a11y/index.js 998 B -8 B (0%)
build/annotations/index.js 3.43 kB +1 B
build/autop/index.js 2.58 kB -1 B
build/block-directory/index.js 6.02 kB -6 B (0%)
build/block-editor/index.js 100 kB -181 B (0%)
build/block-editor/style-rtl.css 10.9 kB +133 B (1%)
build/block-editor/style.css 10.9 kB +134 B (1%)
build/block-library/index.js 111 kB +63 B (0%)
build/block-library/style-rtl.css 7.42 kB +40 B (0%)
build/block-library/style.css 7.43 kB +40 B (0%)
build/blocks/index.js 57.6 kB -118 B (0%)
build/components/index.js 191 kB -116 B (0%)
build/components/style-rtl.css 15.7 kB +2 B (0%)
build/components/style.css 15.7 kB +3 B (0%)
build/compose/index.js 6.2 kB -13 B (0%)
build/core-data/index.js 10.6 kB +5 B (0%)
build/data-controls/index.js 1.04 kB +1 B
build/data/index.js 8.2 kB -14 B (0%)
build/deprecated/index.js 771 B -1 B
build/dom/index.js 3.06 kB -7 B (0%)
build/edit-post/index.js 91.2 kB -112 B (0%)
build/edit-post/style-rtl.css 8.52 kB -100 B (1%)
build/edit-post/style.css 8.51 kB -102 B (1%)
build/edit-site/index.js 5.07 kB +427 B (8%) 🔍
build/edit-widgets/index.js 4.43 kB -16 B (0%)
build/editor/index.js 43.9 kB +111 B (0%)
build/element/index.js 4.45 kB +3 B (0%)
build/format-library/index.js 7.08 kB -10 B (0%)
build/hooks/index.js 1.93 kB +6 B (0%)
build/i18n/index.js 3.49 kB +1 B
build/is-shallow-equal/index.js 711 B +1 B
build/keyboard-shortcuts/index.js 2.3 kB -1 B
build/keycodes/index.js 1.69 kB +3 B (0%)
build/media-utils/index.js 4.84 kB -8 B (0%)
build/notices/index.js 1.58 kB +4 B (0%)
build/nux/index.js 3.01 kB +1 B
build/plugins/index.js 2.54 kB -4 B (0%)
build/primitives/index.js 1.5 kB +8 B (0%)
build/priority-queue/index.js 780 B +1 B
build/redux-routine/index.js 2.84 kB -2 B (0%)
build/rich-text/index.js 14.3 kB -5 B (0%)
build/url/index.js 4.01 kB +3 B (0%)
build/viewport/index.js 1.61 kB -6 B (0%)
build/warning/index.js 1.14 kB -1 B
ℹ️ View Unchanged
Filename Size Change
build/api-fetch/index.js 3.39 kB 0 B
build/blob/index.js 620 B 0 B
build/block-directory/style-rtl.css 760 B 0 B
build/block-directory/style.css 760 B 0 B
build/block-library/editor-rtl.css 7.23 kB 0 B
build/block-library/editor.css 7.24 kB 0 B
build/block-library/theme-rtl.css 669 B 0 B
build/block-library/theme.css 671 B 0 B
build/block-serialization-default-parser/index.js 1.65 kB 0 B
build/block-serialization-spec-parser/index.js 3.1 kB 0 B
build/date/index.js 5.37 kB 0 B
build/dom-ready/index.js 568 B 0 B
build/edit-site/style-rtl.css 2.53 kB 0 B
build/edit-site/style.css 2.53 kB 0 B
build/edit-widgets/style-rtl.css 2.58 kB 0 B
build/edit-widgets/style.css 2.58 kB 0 B
build/editor/editor-styles-rtl.css 381 B 0 B
build/editor/editor-styles.css 382 B 0 B
build/editor/style-rtl.css 3.97 kB 0 B
build/editor/style.css 3.96 kB 0 B
build/escape-html/index.js 733 B 0 B
build/format-library/style-rtl.css 502 B 0 B
build/format-library/style.css 502 B 0 B
build/html-entities/index.js 621 B 0 B
build/list-reusable-blocks/index.js 2.99 kB 0 B
build/list-reusable-blocks/style-rtl.css 226 B 0 B
build/list-reusable-blocks/style.css 226 B 0 B
build/nux/style-rtl.css 616 B 0 B
build/nux/style.css 613 B 0 B
build/server-side-render/index.js 2.55 kB 0 B
build/shortcode/index.js 1.7 kB 0 B
build/token-list/index.js 1.27 kB 0 B
build/wordcount/index.js 1.18 kB 0 B

compressed-size-action

@@ -695,6 +695,12 @@
"markdown_source": "../packages/components/src/date-time/README.md",
"parent": "components"
},
{
"title": "DetectFocusOutside",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems useful for the Gallery block. I know we merged a similar thing to unselect images lately cc @nosolosw

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, nice. Yeah, it looks like the same use case (link for reference). Note that I didn't extract this logic to a reusable component there, so we can refactor gallery to use this when the PR lands.

Copy link
Member Author

@aduth aduth Mar 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nosolosw FWIW, there is the withFocusOutside higher-order component already available today:

https://github.com/WordPress/gutenberg/tree/master/packages/components/src/higher-order/with-focus-outside

This pull request "discourages" it, but it will expect to continue to exist I think. Maybe still worth waiting, in case we want to consolidate to DetectFocusOutside as the encouraged implementation.

@@ -0,0 +1,36 @@
# DetectFocusOutside
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lately, I have been wondering if it's better to avoid components for these "behavior-based" components and instead write hooks that accept "refs" or return elements (like useResizeAware). The advantage being to avoid the extra wrapper.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I'm not sure exactly how that would look in this case. useResizeAware returns an element which is expected to be rendered as a child of a component. We need to capture events bubbling from within, so it would at least need to be a wrapping element, which would basically be the same as what this implementation is already.

I do think it would be ideal if we could at least flatten the DOM hierarchy, since currently this component must add a div wrapper. We don't strictly need the div, aside from being able to capture bubbling events. I recall having some hope that in the future, React might support event handlers to be assigned to Fragment, since their synthetic events don't use the DOM anyways and it can technically be possible.

In fact, they still note this in their documentation:

key is the only attribute that can be passed to Fragment. In the future, we may add support for additional attributes, such as event handlers.

https://reactjs.org/docs/fragments.html#keyed-fragments

I think I saw later that they might be avoiding that option though (perhaps part of their new "Fire" effort).

So maybe it's not worth placing our bets on this. The alternatives might have some concerning implication on the ergonomics of the component:

  • We could have a hook which returns an object of props for the event handlers. But how do we handle a case where both the hook and the original component each need to have an e.g. onFocus event handler?
  • We could have a hook which returns a ref, and then the hook attaches DOM event handlers to the ref instance. But should we have any concern that those are DOM events, not React events, which behave (and bubble) slightly differently? We've had issues with this in the past for things like Slot/Fill bubblesVirtually.

One possible advantage of using the DOM events is that we could use the actual blur event, where the distinction between blur and focusout is basically the point of what this component is trying to implement. As far as I can recall, React counter-intuitively uses the focusout event when assigning onBlur (see facebook/react#6410, specifically facebook/react#6410 (comment)).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if a good pattern or not but on previous occasions I've wrote hooks like that:

useSomething = ( ref ) => {
   useEffect(() => {
     // do something with ref.current.
     // useEffect ensures the ref is defined
  });
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it would be good if we find a pattern here and follow it for future hooks relying on refs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the harder question is if and when we "rely on refs", including whether it makes sense here.

@@ -100,7 +96,6 @@ function Layout() {
editorSidebarOpened || pluginSidebarOpened || publishSidebarOpened;
const className = classnames( 'edit-post-layout', 'is-mode-' + mode, {
'is-sidebar-opened': sidebarIsOpened,
'has-fixed-toolbar': hasFixedToolbar,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something that you think is worth mentioning on a dev note?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I saw this as largely a clean-up of classes which were no longer used in the editor, but never removed. To answer your question, I guess it depends on a combination of (a) are plugins using it and (b) are there use-cases for it? I don't have a huge concern of reverting this specific change, aside from the fact that it becomes dead code (maintenance burden, larger bundle sizes).

I do see a couple results here: https://wpdirectory.net/search/01E3J9B3K378P5PV2104804D06

But it's hard to tell if the usage is intended for has-fixed-toolbar of the layout component, vs. has-fixed-toolbar of the block.

Copy link
Contributor

@youknowriad youknowriad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work here.

@aduth
Copy link
Member Author

aduth commented Mar 16, 2020

One exception to clarify: If a user types &, it'd be escaped as &. It's related to a subsequent point about how we don't want unescaping to occur while the user is typing, but I also think it matches their intent. If we don't escape the ampersand, they may unknowingly create an HTML escape sequence. You can read about all sorts of fun associated with "ambiguous ampersands". At the end of the day, I think it would be fair to assume that for the vast majority of cases that a user inputs '&", they intend for it to be interpreted as the ampersand character. Preemptively escaping it aligns to that expectation. Unlike escaping other characters such quotes, there's not a concern of interfering with texturize. The only "concern" is that a user would not be able to create an HTML escape sequence manually if they wanted to, at least not through the editor. Maybe we could enhance this in the future such that the "Code" mode skips all of this behavior and shows the full HTML representation (neither decode nor preemptively encode ampersands). But as it stands today, we use the same component for Visual and Code modes, and don't vary any of its behaviors.

Co-Authored-By: Andrés <nosolosw@users.noreply.github.com>
@aduth
Copy link
Member Author

aduth commented Mar 17, 2020

There's some additional prior art to consider at #17994 and #17789 (cc @ellatrix, @dmsnell), particularly in regards to the intentionality of inputs provided by the user: if a user types "&", they most likely expect to see an ampersand in the editor and on the front-end, meaning that it should be escaped as &amp; for save. I'm conflicted on whether it may or may not be worth following the approach from #17994 in using escapeEditableHTML, so as to normalize this behavior between the title and post content. This would contradict one of the original "goals" of allowing users to insert HTML in a title accordance to their own capabilities. One one hand, this may not be a common case to care to support, at least if this pull request seeks to treat the title more as a visual / "rendered" representation of the title. On the other hand, the primary reason for escaping the ampersand in the current proposed implementation is largely only to avoid the effect that the input field suddenly updates in a way the user does not expect (because it runs decodeEntities on the value). In that light, we only really need to escape the ampersand. Unlike #18616, I don't expect either of these approaches would have any direct conflict with wptexturize. There may be some concern about plugin interoperability (like in #19898), but considering this behavior would apply server-side anyways for non-administrators, that may already be the case.

In some initial iterations, I had also explored the feasibility of decoding the entities once at initial load, and then not transforming the input value in any way thereafter. The problem with this approach is that it's hard to know when that "once" should be. The underlying post data can be updated for any number of reasons programmatically. For example, when a user saves their post and the server-side escapes ampersands and other characters, the post title data is updated to reflect the latest value. Should we re-run that "once" behavior? Because otherwise, we're back to the original issue. How can we know that those title updates came as a result of a save, as opposed to the post title input itself?

@dmsnell
Copy link
Contributor

dmsnell commented Mar 17, 2020

This would contradict one of the original "goals" of allowing users to insert HTML in a title accordance to their own capabilities

One helpful way I think about it is whether the editor gets in the way of that goal or not. If someone wants to enter unescaped content into the text view that WordPress will mess up on the server I don't see it as a bad thing to fix it right there in the client before going to the server. If we change what someone writes and the effect is that they become aware of it sooner before publishing their document then I consider that a win.

decoding the entities once at initial load, and then not transforming the input value in any way thereafter

I concur with your assessment. In my proposal I know it felt a little overboard but this is just a bit of an ugly job. What makes most sense to me is providing the low-level inputs into Gutenberg where we declare "the output should match what the person thinks they are typing" like the normal <input /> components, and some that declare the opposite, that "the output should match what the person is actually typing" like the <code> and <pre> components.

Somewhere probably in a lot of grunt work is finding the range of things that are valid HTML but which WordPress will nonetheless mangle. I think those things are the ones we should abstract away from block developers.

@ellatrix
Copy link
Member

Why are we not using the RichText component for the post title? As far as I know, rich text formatting in post titles is allowed in WordPress? We could disable the formatting controls, but let the user edit the HTML. This leads me to another question... Why is the post title not a block (with some restrictions?

@mtias
Copy link
Member

mtias commented Apr 9, 2020

It should definitely be a block. I think soon we can try using the block that already exists for FSE.

@aduth
Copy link
Member Author

aduth commented May 25, 2020

RichText (and, more broadly, a title block) is likely the better path forward here, so I'm going to close this attempt.

As mentioned in the original comment, there's a lot of supporting changes which could still be salvaged into unrelated improvements.

@aduth aduth closed this May 25, 2020
@aduth aduth deleted the update/post-title-decode branch May 25, 2020 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Package] Editor /packages/editor [Type] Bug An existing feature does not function as intended
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Post title renders escaped/encoded characters parsed by wp_kses filter
6 participants