Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/cover: symbolization #4585

Open
tarasmadan opened this issue Mar 19, 2024 · 4 comments
Open

pkg/cover: symbolization #4585

tarasmadan opened this issue Mar 19, 2024 · 4 comments

Comments

@tarasmadan
Copy link
Collaborator

tarasmadan commented Mar 19, 2024

Is your feature request related to a problem? Please describe.
First /cover request with lazy symbolization - 19s.
Time to get updated numbers (after 5 seconds) - 17s.
RAM consumption 40G.

Describe the solution you'd like
Full symbolization costs 50 seconds and is comparable with syzkaller startup time (with QEMU).
Symbolizing all callbacks before first /cover call we can reduce its generation time to 3 seconds and memory consumption to 0G.

There are 2 potential solutions:

  1. Symbolize everything in background on syzkaller start.
  2. Symbolize all callbacks after/during the kernel build process and use it as a build artefact. GZIPped data will cost ~30M.

Second approach looks better but will cost more.

@tarasmadan
Copy link
Collaborator Author

@dvyukov proposed third option. Let's remove addr2line dependency and parse DWARF data.
His prototype:

package main

import (
	"debug/dwarf"
	"debug/elf"
	"fmt"
	"io"
	"os"
	"bufio"
	"time"
	"strconv"
)

func main() {
	start := time.Now()
	pcs := make(map[uint64]struct{})
	for s := bufio.NewScanner(os.Stdin); s.Scan(); {
		n, err := strconv.ParseUint(s.Text(), 16, 64)
		if err != nil {
			panic(err)
		}
		pcs[n] = struct{}{}
	}
	fmt.Printf("read %v pcs in %v\n", len(pcs), time.Since(start))

	f, err := elf.Open(os.Args[1])
	if err != nil {
		panic(err)
	}
	data, err := f.DWARF()
	if err != nil {
		panic(err)
	}
	matched, total := 0, 0
	for r := data.Reader(); ; {
		ent, err := r.Next()
		if err != nil {
			panic(err)
		}
		if ent == nil {
			break
		}
		if ent.Tag != dwarf.TagCompileUnit {
			panic(fmt.Errorf("found unexpected tag %v on top level", ent.Tag))
		}
		lr, err := data.LineReader(ent)
		if err != nil {
			panic(err)
		}
		var entry dwarf.LineEntry
		for {
			if err := lr.Next(&entry); err != nil {
				if err == io.EOF {
					break
				}
				panic(err)
			}
			total++
			if _, ok := pcs[entry.Address]; !ok {
				continue
			}
			matched++
			//fmt.Printf("pc %x %v:%v:%v\n", entry.Address, entry.File.Name, entry.Line, entry.Column)
		}
		r.SkipChildren()
	}
	fmt.Printf("total %v, matched %v\n", total, matched)
}

@dvyukov
Copy link
Collaborator

dvyukov commented Apr 9, 2024

His prototype:

It turns out to be not that easy. LineReader has info about inlined frames, but only file:line, not the function name. And we need inline function names in both pkg/report and pkg/cover.
Inlined function names has something to do with TagInlinedSubroutine, but I have not figure out how exactly these tags should be processed. llvm-addr2line code can be used as a reference source.

@tarasmadan
Copy link
Collaborator Author

File:line to function name looks doable having the source code itself.
Any chances to get the StartLine:StartPos - EndLine:EndPos?

@dvyukov
Copy link
Collaborator

dvyukov commented Apr 11, 2024

LineEntry has Column field:
https://pkg.go.dev/debug/dwarf@go1.22.2#LineEntry

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants