encoding/json: speed up the decoding scanner

#5683 covered the overall speed issues of encoding and decoding JSON. It's now closed, since the package has gotten noticeably faster in the last few Go releases. For example, after a series of patches I sent during 1.12:

```
name           old time/op    new time/op    delta
CodeEncoder-4    7.43ms ± 0%    5.35ms ± 1%  -28.01%  (p=0.002 n=6+6)
CodeDecoder-4    30.8ms ± 1%    27.3ms ± 0%  -11.42%  (p=0.004 n=6+5)

name           old speed      new speed      delta
CodeEncoder-4   261MB/s ± 0%   363MB/s ± 1%  +38.91%  (p=0.002 n=6+6)
CodeDecoder-4  62.9MB/s ± 1%  70.9MB/s ± 1%  +12.71%  (p=0.002 n=6+6)
```

Notice, however, how decoding is about five times slower than encoding. While it makes sense that decoding is more work, a 5x difference seems to point that there are some bottlenecks.

Here are the encoding top 10 functions by cpu, as reported by `go test -run=- -bench=CodeEncoder -benchtime=5s -cpuprofile=cpu.out`:

```
      flat  flat%   sum%        cum   cum%
     3.93s 10.84% 10.84%     36.07s 99.50%  encoding/json.structEncoder.encode
     3.45s  9.52% 20.36%      3.92s 10.81%  strconv.formatBits
     3.39s  9.35% 29.71%      4.80s 13.24%  strconv.(*extFloat).ShortestDecimal
     2.50s  6.90% 36.61%      2.50s  6.90%  runtime.memmove
     2.41s  6.65% 43.26%      3.18s  8.77%  encoding/json.(*encodeState).string
     2.11s  5.82% 49.08%      2.11s  5.82%  strconv.fmtF
     1.78s  4.91% 53.99%      3.40s  9.38%  bytes.(*Buffer).WriteString
     1.36s  3.75% 57.74%         2s  5.52%  bytes.(*Buffer).WriteByte
     1.34s  3.70% 61.43%      1.34s  3.70%  bytes.(*Buffer).tryGrowByReslice (inline)
     1.32s  3.64% 65.08%      2.31s  6.37%  bytes.(*Buffer).Write
```

It's been optimized to a point where `bytes.Buffer`, `runtime.memmove` and `strconv` all appear in the top 10, so it's pretty well optimized. I can't see any more low-hanging fruit in that top 10.

And here are the as reported by `go test -run=- -bench=CodeDecoder -benchtime=5s -cpuprofile=cpu.out`:

```
      flat  flat%   sum%        cum   cum%
    3190ms 10.74% 10.74%     3240ms 10.91%  encoding/json.stateInString
    3150ms 10.60% 21.34%     8260ms 27.80%  encoding/json.(*Decoder).readValue
    2990ms 10.06% 31.40%     7660ms 25.78%  encoding/json.(*decodeState).scanWhile
    1990ms  6.70% 38.10%    21120ms 71.09%  encoding/json.(*decodeState).object
    1730ms  5.82% 43.92%     1860ms  6.26%  encoding/json.stateEndValue
    1550ms  5.22% 49.14%     2110ms  7.10%  encoding/json.state1
    1350ms  4.54% 53.69%     1350ms  4.54%  encoding/json.unquoteBytes
    1020ms  3.43% 57.12%     1080ms  3.64%  encoding/json.stateBeginValue
     920ms  3.10% 60.22%     3120ms 10.50%  encoding/json.indirect
     720ms  2.42% 62.64%      740ms  2.49%  encoding/json.stateBeginString
```

The big difference here is that all 10 functions are in the json package itself. In particular, the scanner's `state*` functions take up half of the list, including number one. And numbers two and three, `readValue` and `scanWhile`, are also part of scanning tokens.

There's another issue about speeding up the decoder, #16212. It's about doing reflect work ahread of time, like the encoder does. However, out of that top 10, only `object` and `indirect` involve any reflect work, so I doubt that a refactor for #16212 would bring substantial gains while the scanner takes vastly more time than reflection.

So I propose that we refactor or even redesign the scanner to make it faster. The main bottleneck at the moment is the `step` function, which must be called for every decoded byte. I can imagine that it's slow for a few main reasons:

* `step` is a function value, which can be nil - one nil check per byte
* We can't quickly skip chunks of uninteresting bytes, such as whitespace or long quoted strings with safe characters
* The calls cannot be inlined, even though many `state*` functions are very small - one call per byte

The godoc for the `step` field reads:

```
// The step is a func to be called to execute the next transition.
// Also tried using an integer constant and a single func
// with a switch, but using the func directly was 10% faster
// on a 64-bit Mac Mini, and it's nicer to read.
```

I tried using a function switch, and even a function jump table, but neither gave noticeable speed-ups. In fact, they all made `CodeDecoder` 1-5% slower. I presume the manual jump table via a `[...]func(*scanner, byte) int` array is slow because we still can't get rid of the nil checks. I'd hope that #5496 would help here.

I have a few ideas in mind to try to make the work per byte a bit faster, but I'm limited by the "one unknown step call per byte" design, as I tried to explain above. I understand that the current design makes the JSON scanner simpler to reason about, but I wonder if there's anything we can do to make it faster while keeping it reasonably simple.

This issue is to track small speed-ups in the scanner, but also to discuss larger refactors to it. I'm milestoning for 1.13, since at least the small speed-ups can be merged for that release.

/cc @rsc @dsnet @bradfitz @josharian @kevinburke 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

encoding/json: speed up the decoding scanner #28923

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

encoding/json: speed up the decoding scanner #28923

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions