Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

html: sync changes from std #208

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

LukeShu
Copy link
Contributor

@LukeShu LukeShu commented Apr 22, 2024

Before golang/go@324513b (2012-01-04) std "html" and what is now
"golang.org/x/net/html" were the same. Ever since then (well, since
golang/go@4e0749a (2012-05-29)) the escape/unescape code that they share has
been drifting apart, each receiving separate improvements.

This PR cherry-picks over all of the changes that std "html" has seen.

This is the counterpart to https://golang.org/cl/580896.

shawnps and others added 6 commits April 22, 2024 10:54
R=golang-dev, gobot, bradfitz
CC=golang-dev
https://golang.org/cl/40810044

Cherry-picked-from: golang/go@a025e1c
The html package uses some specific code to escape special characters.
Actually, the strings.Replacer can be used instead, and is much more
efficient. The converse operation is more complex but can still be
slightly optimized.

Credits to Ken Bloom (kabloom@google.com), who first submitted a
similar patch at https://codereview.appspot.com/141930043

Added benchmarks and slightly optimized UnescapeString.

benchmark                   old ns/op     new ns/op     delta
BenchmarkEscape-4           118713        19825         -83.30%
BenchmarkEscapeNone-4       87653         3784          -95.68%
BenchmarkUnescape-4         24888         23417         -5.91%
BenchmarkUnescapeNone-4     14423         157           -98.91%

benchmark                   old allocs     new allocs     delta
BenchmarkEscape-4           9              2              -77.78%
BenchmarkEscapeNone-4       0              0              +0.00%
BenchmarkUnescape-4         2              2              +0.00%
BenchmarkUnescapeNone-4     0              0              +0.00%

benchmark                   old bytes     new bytes     delta
BenchmarkEscape-4           24800         12288         -50.45%
BenchmarkEscapeNone-4       0             0             +0.00%
BenchmarkUnescape-4         10240         10240         +0.00%
BenchmarkUnescapeNone-4     0             0             +0.00%

Fixes #8697

Change-Id: I208261ed7cbe9b3dee6317851f8c0cf15528bce4
Reviewed-on: https://go-review.googlesource.com/9808
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>

Cherry-picked-from: golang/go@2d9a50b
Change-Id: I129d70304ae4e4694d9217826b18b341e3834d3c
Reviewed-on: https://go-review.googlesource.com/11201
Reviewed-by: Andrew Gerrand <adg@golang.org>

Cherry-picked-from: golang/go@a3c0730
Add benchmarks for for sparsely escaped and densely escaped strings.
Then speed up the sparse unescaping part heavily by using IndexByte and
copy to skip the parts containing no escaping very fast.

Unescaping densely escaped strings slower because of
the new function call overhead. But sparsely encoded strings are seen
more often in the utf8 enabled web.

We win part of the speed back by looking up entityName differently.

	benchmark                  old ns/op    new ns/op    delta
	BenchmarkEscape                31680        31396   -0.90%
	BenchmarkEscapeNone             6507         6872   +5.61%
	BenchmarkUnescape              36481        48298  +32.39%
	BenchmarkUnescapeNone            332          325   -2.11%
	BenchmarkUnescapeSparse         8836         3221  -63.55%
	BenchmarkUnescapeDense         30639        32224   +5.17%

Change-Id: If606cb01897a40eefe35ba98f2ff23bb25251606
Reviewed-on: https://go-review.googlesource.com/10172
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>

Cherry-picked-from: golang/go@5b92028
Fixes #15221

Change-Id: I9e927a2f604213338b4572f1a32d0247c58bdc60
Reviewed-on: https://go-review.googlesource.com/21798
Reviewed-by: Ian Lance Taylor <iant@golang.org>

Cherry-picked-from: golang/go@a44c425
Fixes #21194

Change-Id: Iac5187335df67f90f0f47c7ef6574de147c2ac9b
Reviewed-on: https://go-review.googlesource.com/52970
Reviewed-by: Avelino <t@avelino.xxx>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

Cherry-picked-from: golang/go@6dae588
Copy link

google-cla bot commented Apr 22, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@gopherbot
Copy link
Contributor

This PR (HEAD: 51307fd) has been imported to Gerrit for code review.

Please visit Gerrit at https://go-review.googlesource.com/c/net/+/580855.

Important tips:

  • Don't comment on this PR. All discussion takes place in Gerrit.
  • You need a Gmail or other Google account to log in to Gerrit.
  • To change your code in response to feedback:
    • Push a new commit to the branch used by your GitHub PR.
    • A new "patch set" will then appear in Gerrit.
    • Respond to each comment by marking as Done in Gerrit if implemented as suggested. You can alternatively write a reply.
    • Critical: you must click the blue Reply button near the top to publish your Gerrit responses.
    • Multiple commits in the PR will be squashed by GerritBot.
  • The title and description of the GitHub PR are used to construct the final commit message.
    • Edit these as needed via the GitHub web interface (not via Gerrit or git).
    • You should word wrap the PR description at ~76 characters unless you need longer lines (e.g., for tables or URLs).
  • See the Sending a change via GitHub and Reviews sections of the Contribution Guide as well as the FAQ for details.

@LukeShu LukeShu changed the title html: Sync changes from std html: sync changes from std Apr 22, 2024
@gopherbot
Copy link
Contributor

Message from Gopher Robot:

Patch Set 1:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/580855.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Luke Shumaker:

Patch Set 3:

(2 comments)


Please don’t reply on this GitHub thread. Visit golang.org/cl/580855.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Luke Shumaker:

Patch Set 3:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/580855.
After addressing review feedback, remember to publish your drafts!

bradfitz and others added 6 commits April 23, 2024 21:23
Saves ~105KB of heap for callers who don't use html.UnescapeString.
(EscapeString is much more common).

Also saves 70KB of binary size, because now the linker can do dead
code elimination. (because #2559 is still open and global maps always
generate init code)

Fixes #26727
Updates #6853

Change-Id: I18fe9a273097e2c7e0cb7f88205cae1bb60fa89b
Reviewed-on: https://go-review.googlesource.com/127075
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>

Cherry-picked-from: golang/go@740e589
Adds a sample Fuzz test function to package html based on
https://github.com/dvyukov/go-fuzz-corpus/blob/master/stdhtml/main.go

Updates #19109
Updates #31309

Change-Id: I8c49fff8f70fc8a8813daf1abf0044752003adbb
Reviewed-on: https://go-review.googlesource.com/c/go/+/174301
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>

Cherry-picked-from: golang/go@4ad1355
The comment contained a link that had a file name and ID that no longer existed, so change to the URL of the corresponding part of the latest page.

Change-Id: I74e0885aabf470facc39b84035f7a83fef9c6a8e
GitHub-Last-Rev: 5681c84d9f1029449da6860c65a1d9a128296e85
GitHub-Pull-Request: golang/go#36514
Reviewed-on: https://go-review.googlesource.com/c/go/+/214181
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>

Cherry-picked-from: golang/go@52c4488
Make all our package sources use Go 1.17 gofmt format
(adding //go:build lines).

Part of //go:build change (#41184).
See https://golang.org/design/draft-gobuild

Change-Id: Ia0534360e4957e58cd9a18429c39d0e32a6addb4
Reviewed-on: https://go-review.googlesource.com/c/go/+/294430
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>

Cherry-picked-from: golang/go@d4b2638
When these packages are released as part of Go 1.18,
Go 1.16 will no longer be supported, so we can remove
the +build tags in these files.

Ran go fix -fix=buildtag std cmd and then reverted the bootstrapDirs
as defined in cmd/dist/buildtool.go, which need to continue
to build with Go 1.4 for now.

Also reverted vendor and cmd/vendor, which will need
to be updated in their own repos first.

Manual changes in runtime/pprof/mprof_test.go to adjust line numbers.

For #41184.

Change-Id: Ic0f93f7091295b6abc76ed5cd6e6746e1280861e
Reviewed-on: https://go-review.googlesource.com/c/go/+/344955
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>

Cherry-picked-from: golang/go@f229e70
Convert the existing gofuzz based fuzz test to a testing.F based fuzz
test.

Change-Id: Ieae69ba7fb17bd54d95c7bb2f4ed04c323c9f15f
Reviewed-on: https://go-review.googlesource.com/c/go/+/494195
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Tobias Klauser <tobias.klauser@gmail.com>
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>

Cherry-picked-from: golang/go@200a01f
@gopherbot
Copy link
Contributor

This PR (HEAD: 3a376dd) has been imported to Gerrit for code review.

Please visit Gerrit at https://go-review.googlesource.com/c/net/+/580855.

Important tips:

  • Don't comment on this PR. All discussion takes place in Gerrit.
  • You need a Gmail or other Google account to log in to Gerrit.
  • To change your code in response to feedback:
    • Push a new commit to the branch used by your GitHub PR.
    • A new "patch set" will then appear in Gerrit.
    • Respond to each comment by marking as Done in Gerrit if implemented as suggested. You can alternatively write a reply.
    • Critical: you must click the blue Reply button near the top to publish your Gerrit responses.
    • Multiple commits in the PR will be squashed by GerritBot.
  • The title and description of the GitHub PR are used to construct the final commit message.
    • Edit these as needed via the GitHub web interface (not via Gerrit or git).
    • You should word wrap the PR description at ~76 characters unless you need longer lines (e.g., for tables or URLs).
  • See the Sending a change via GitHub and Reviews sections of the Contribution Guide as well as the FAQ for details.

@gopherbot
Copy link
Contributor

Message from Damien Neil:

Patch Set 5:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/580855.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Luke Shumaker:

Patch Set 5:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/580855.
After addressing review feedback, remember to publish your drafts!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet