Skip to content
This repository has been archived by the owner on Sep 11, 2020. It is now read-only.

CommitObjects is slow compared to equivalent git rev-list --all #1294

Open
jatindhankhar opened this issue Feb 28, 2020 · 0 comments
Open

CommitObjects is slow compared to equivalent git rev-list --all #1294

jatindhankhar opened this issue Feb 28, 2020 · 0 comments

Comments

@jatindhankhar
Copy link

I am trying to write a custom git grep wrapper using go-git.

Essentially trying to replicate

git rev-list --all | xargs git --no-pager grep -i 'search_text'

CommitObjects() is slow compared to the command git rev-list --all

To benchmark it, I used a big repo, (https://github.com/odoo/odoo/) with large number of commits.

I understand there would be some overheard due to creation of custom objects created to support various operations, but the current implementation of CommitObjects is 16 times slower than the raw command.

The original strange thing I noticed that the go implementation would freeze for few seconds after reaching following commit 004a0b996ff8f269451e07346f71a129a1f3fbaf then list out remaining ~ 18-20 commits.

main.go

package main

import (
	"fmt"
	"gopkg.in/src-d/go-git.v4/plumbing/object"
)
import "gopkg.in/src-d/go-git.v4"

func main() {
	r, err := git.PlainOpen("odoo")
	if err == nil {
		bs, _ := r.CommitObjects()
		bs.ForEach(func(ref *object.Commit) error {
			fmt.Println(ref.Hash)
			return nil
		})
	} else
	{
		fmt.Println(err.Error())
	}
}
# go-git wrapper
./main  16.24s user 11.88s system 103% cpu 27.224 total

# raw command 
git rev-list --all  1.67s user 0.32s system 81% cpu 2.456 total

Screenshot 2020-02-28 at 2 48 40 PM

I used Hyperfine(https://github.com/sharkdp/hyperfine) to run a more standard benchmark than the time command and result is same.

hyperfine --min-runs 5 './main' 'git rev-list --all'

Benchmark #1: ./main
  Time (mean ± σ):     28.729 s ±  2.729 s    [User: 15.574 s, System: 12.378 s]
  Range (min … max):   25.745 s … 32.868 s    5 runs

Benchmark #2: git rev-list --all
  Time (mean ± σ):      1.413 s ±  0.163 s    [User: 1.174 s, System: 0.171 s]
  Range (min … max):    1.331 s …  1.704 s    5 runs

  Warning: The first benchmarking run for this command was significantly slower than the rest (1.704 s). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.

Summary
  'git rev-list --all' ran
   20.33 ± 3.04 times faster than './main'

Screenshot 2020-02-28 at 2 55 02 PM


Profiling code

package main

import (
	"fmt"
	"github.com/pkg/profile"
	"gopkg.in/src-d/go-git.v4/plumbing/object"
)
import "gopkg.in/src-d/go-git.v4"

func main() {
	defer profile.Start().Stop()
	r, err := git.PlainOpen("odoo")
	if err == nil {
		bs, _ := r.CommitObjects()
		bs.ForEach(func(ref *object.Commit) error {
			fmt.Println(ref.Hash)
			return nil
		})
	} else
	{
		fmt.Println(err.Error())
	}
}

Profile output

cpu_profiling.pdf

Am I missing something ?

Is there a more performant way of iterating commits ?

P.S. Benchmark was performed on a 2017 MBP

  Model Name:	MacBook Pro
  Model Identifier:	MacBookPro14,1
  Processor Name:	Dual-Core Intel Core i5
  Processor Speed:	2.3 GHz
  Number of Processors:	1
  Total Number of Cores:	2
  L2 Cache (per Core):	256 KB
  L3 Cache:	4 MB
  Hyper-Threading Technology:	Enabled
  Memory:	8 GB
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant