Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bit level HasPrefix() for Int Geohash #21

Open
phrozen opened this issue Aug 20, 2019 · 5 comments
Open

Bit level HasPrefix() for Int Geohash #21

phrozen opened this issue Aug 20, 2019 · 5 comments

Comments

@phrozen
Copy link

phrozen commented Aug 20, 2019

Hey, I'm using your library for testing and I ran into a functionality issue Geohash Ints have that you might be able to help me sort out.

Geohashes (string) are able to drop precision for region search inside the area, for proximity searches I can use a lower precision geohash and search by prefix the geohash and all it's neighbors to find any location within the given area using strings.HasPrefix(s, prefix string) or the equivalent in the bytes package. This two functions work at the byte level comparison. But with Int Geohashes, the prefix search becomes imposible when precision % 8 != 0.

I'm currently using binary.BigEndian.PutUint64 to convert to a byte slice in order to store them as keys into boltdb. I can do prefix search with Int geohashes, but only for i % 8 == 0 precisions due to byte equivalence.

Is there a way to get a bit level comparison prefix search for Int geohash? I'm thinking like:

func HasPrefixWithPrecision(i, prefix, bits uint64) bool {
    ...
}

As integer geohashes grow with precision, I think the first step is do a left shift with the given precision to match the uints prefix, and then compare in bit level up to a given precision. This enables proximity searches with 64 levels of precision, which is huge compared to the current 8.

Any thoughts?

@phrozen
Copy link
Author

phrozen commented Aug 20, 2019

This is my example code but assumes both hashes were encoded at 64 level precision. This might be good enough for me, but I would like to know your thoughts on a more generic function or optimization.

package main

import (
	"fmt"
	"github.com/mmcloughlin/geohash"
)

func hasPrefix(hash, prefix, bits uint64) bool {
	// Assumes 64 bit precision on both
	bits = 64 - bits
	hash = hash >> bits
	prefix = prefix >> bits
	return (hash^prefix == 0)
}

func main() {
	// This two points are roughly 10km apart
	lat1, lon1 := 20.681933988905442, -103.46248676253741
	lat2, lon2 := 20.67443821113288, -103.38736429270001
	// Encode both to Int with default 64bit precision
	hash1 := geohash.EncodeInt(lat1, lon1)
	hash2 := geohash.EncodeInt(lat2, lon2)
	for i := 0; i <= 64; i++ {
		// Check at which precision (prefix) this points are within the
                // same given area, (results true until 22 bits).
		fmt.Println(i, "precision > ", hasPrefix(hash2, hash1, uint64(i)))
	}
}

@mmcloughlin
Copy link
Owner

It looks like you've already figured this out. To confirm two 64-bit geohashes have the same prefix you shift out the low bits and compare for equality. You could make this work for lower precision geohashes as well, you'd just take an argument in place of the 64 in your hasPrefix function.

I'm wondering whether this helper belongs in the package itself.

@phrozen
Copy link
Author

phrozen commented Aug 23, 2019

I bet it would be a great addition to the package, string/byte geohash prefix compare is covered by the standard library, but there is no bit level comparison available. If using geohash for anything else other than storing raw locations, the Int geohash needs a way to compare itself to others at lower resolution for proximity search within bounds/region. The problem I find is that AFAIK there is no way of knowing the precision an Int geohash was encoded with in the first place.

On a totally different matter (but i don't want to open another unnecessary issue), how can I know my build is actually using the ASM optimizations for amd64? is there a way to check that it is?

@mmcloughlin
Copy link
Owner

Yes I tend to agree. It's sufficiently non-obvious that a helper function would be nice.

On the assembly question, the only runtime check is whether BMI2 instruction set is available.

geohash/geohash_x86.go

Lines 13 to 24 in e0493a2

// hasBMI2 returns whether the CPU supports Bit Manipulation Instruction Set
// 2.
func hasBMI2() bool {
_, ebx, _, _ := cpuid(7, 0)
return ebx&(1<<8) != 0
}
// init determines whether to use assembly version by performing CPU feature
// check.
func init() {
useAsm = hasBMI2()
}

Are you on Linux? If so you can quickly check with:

cat /proc/cpuinfo | grep bmi2

@phrozen
Copy link
Author

phrozen commented Feb 26, 2021

I just realized I never contributed here for this specific issue and it's nagging me so hard I postponed this for so long! (Being the only open issue only makes it worse). So I'm planning to submit a PR with this specific implementation as we discussed.

// HasPrefix checks if two 64-bit integer geohashes have the same prefix 
// most significant bits.
func HasPrefix(hash1, hash2 uint64, prefix uint8) bool {
	return HasPrefixWithPrecision(hash1, hash2, prefix, 64)
}

// HasPrefixWithPrecision checks if two 64-bit integer geohashes encoded 
// with bits precision have the same prefix most significant bits.
// Always keep prefix < bits. Always true when prefix == bits.
func HasPrefixWithPrecision(hash1, hash2 uint64, prefix, bits uint8) bool {
	bits = bits - prefix
	hash1 = hash1 >> bits
	hash2 = hash2 >> bits
	return (hash1^hash2 == 0)
	// or
	// return hash1>>bits^hash2>>bits == 0
}

Placing them right below EncodeIntWithPrecision. Do you like the long version for readability? Or can we let go the extra assignments and go with the abridged version if it's clear enough? Another question, where should I place the tests geohash_test.go or extensive_test.go?

I'll have the PR ready by the weekend so we can finally close this one! My deepest apologies! Comments are encouraged.

Edit: Also, I was going to edit the README for the docs too, but i see a template so it makes me think it is being autogenerated. Are you using a tool to generated the README documentation based on code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants