Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

image.GetPoint() is extremely slow #408

Open
MTRNord opened this issue Feb 25, 2024 · 6 comments
Open

image.GetPoint() is extremely slow #408

MTRNord opened this issue Feb 25, 2024 · 6 comments

Comments

@MTRNord
Copy link

MTRNord commented Feb 25, 2024

Hi.

I am trying to reimplement the pdq hash algorithm in go and need to access all pixels and their r g b values individually for the whole image. It seems to work fine up until I call image.GetPoint. Doing this seems to significantly (speak SECONDS per iteration aka about 2 minutes for the whole file) slow down the process.

I am not sure why its slow.

The code used is like this:

       numCols := image.Width()
	numRows := image.Height()
	for i := 0; i < numRows; i++ {
		for j := 0; j < numCols; j++ {
			colorArray, err := image.GetPoint(j, i)
			if err != nil {
				log.Fatalf("Error getting pixel: %v", err)
			}
			r := colorArray[0]
			g := colorArray[1]
			b := colorArray[2]
			(*luma)[i*numCols+j] = LUMA_FROM_R_COEFF*r + LUMA_FROM_G_COEFF*g + LUMA_FROM_B_COEFF*b
		}
	}

where image is a reference to the image. The rest is semi relevant only. I verified that the GetPoint call specifically is the slow part of the loop.

Any ideas what might be going on or how to make it faster?

@MTRNord
Copy link
Author

MTRNord commented Feb 25, 2024

Converting via ToImage to the golang image.Image and then using At instead seems to make it work in milliseconds

@tonimelisma
Copy link
Collaborator

Hey @MTRNord, thanks for this. I don't have a clear answer unfortunately. Is it possible for you to try profiling the govips part?

@MTRNord
Copy link
Author

MTRNord commented Feb 25, 2024

Hi So I ran this now and I got this:

(I ran it on my original code)

➜  pdqhash-go git:(main) ✗ go tool pprof scanner q0004.prof 
File: scanner
Build ID: c82775b31f8a5d58fa8cb57bc7c7660cf278fc97
Type: cpu
Time: Feb 25, 2024 at 12:26pm (CET)
Duration: 57.12s, Total samples = 60.83s (106.49%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10
Showing nodes accounting for 60s, 98.64% of 60.83s total
Dropped 89 nodes (cum <= 0.30s)
Showing top 10 nodes out of 26
      flat  flat%   sum%        cum   cum%
    36.54s 60.07% 60.07%     36.60s 60.17%  runtime.cgocall
    13.68s 22.49% 82.56%     13.68s 22.49%  [scanner]
     3.15s  5.18% 87.74%      3.15s  5.18%  [libc.so.6]
     2.91s  4.78% 92.52%      2.91s  4.78%  [libglib-2.0.so.0.7800.3]
     1.79s  2.94% 95.46%      1.79s  2.94%  [libgobject-2.0.so.0.7800.3]
     1.42s  2.33% 97.80%      1.42s  2.33%  [libvips.so.42.17.1]
     0.34s  0.56% 98.36%      0.34s  0.56%  runtime/internal/syscall.Syscall6
     0.14s  0.23% 98.59%     22.66s 37.25%  runtime._ExternalCode
     0.02s 0.033% 98.62%      0.39s  0.64%  internal/poll.(*FD).Write
     0.01s 0.016% 98.64%     37.50s 61.65%  github.com/MTRNord/pdqhash-go.(*PDQHasher).fillFloatLumaFromBufferImage

so the FFI seems to be either quite expensive or libvips is here.

Libvips was started like this:

vips.LoggingSettings(nil, vips.LogLevelMessage)
	vips.Startup(&vips.Config{
		ConcurrencyLevel: 0,
		MaxCacheFiles:    5,
		MaxCacheMem:      50 * 1024 * 1024,
		MaxCacheSize:     100,
		ReportLeaks:      false,
		CacheTrace:       false,
		CollectStats:     false,
	})
	defer vips.Shutdown()

The profile is at https://github.com/MTRNord/pdqhash-go/blob/2290c5881ff302dc009ac741b92a8682a39c1a38/q0004.prof

The profile was made based on the commit at https://github.com/MTRNord/pdqhash-go/tree/2290c5881ff302dc009ac741b92a8682a39c1a38 with scanner being build using CC=clang CCX=clang++ go build -asan cmd/scanner.go and ran using ./scanner -folder test-images/reg-test-input/labelme-subset/q0004.jpg -cpuprofile=q0004.prof

I hope that helps and if not I am happy to add more info :)

@MTRNord
Copy link
Author

MTRNord commented Feb 25, 2024

Looking at this I wonder if golang/go#19574 and similar issues are related here possibly.

@MTRNord
Copy link
Author

MTRNord commented Feb 25, 2024

Oh and the problematic call is at https://github.com/MTRNord/pdqhash-go/blob/2290c5881ff302dc009ac741b92a8682a39c1a38/pdq_hasher.go#L160 which is called for every pixel in the image.

@MTRNord
Copy link
Author

MTRNord commented Feb 25, 2024

Converting via ToImage to the golang image.Image and then using At instead seems to make it work in milliseconds

While this works in general I realised that this will introduce some slight (compression? quality?) error which in my case makes it fail as hashes are not generated as they should be :)

MTRNord added a commit to MTRNord/pdqhash-go that referenced this issue Feb 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants