Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Range Query has answers lesser than expected #16

Open
samadDotDev opened this issue Sep 4, 2020 · 0 comments
Open

Range Query has answers lesser than expected #16

samadDotDev opened this issue Sep 4, 2020 · 0 comments

Comments

@samadDotDev
Copy link

Doing an experiment on standalone branch revealed that circle range search query returns lesser answers than actually present in the data. Following short ExampleApp code can reproduce the problem:

  def main(args: Array[String]): Unit = {

    // Turn off excessive logging from spark
    Logger.getLogger("org").setLevel(Level.OFF)
    Logger.getLogger("akka").setLevel(Level.OFF)

    val spark = SparkSession
      .builder()
      .master("local[*]")
      .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
      .getOrCreate()

    val trajs = spark.sparkContext
      .textFile("src/main/resources/trajectory.txt")
      .zipWithIndex().map(getTrajectory)
      .filter(_.points.length >= DITAConfigConstants.TRAJECTORY_MIN_LENGTH)
      .filter(_.points.length <= DITAConfigConstants.TRAJECTORY_MAX_LENGTH)
    println(s"Trajectory count: ${trajs.count()}")

    val rdd1 = new TrieRDD(trajs)
    val search = TrajectoryRangeAlgorithms.DistributedSearch

    // circle range search
    val center = Point(Array(39.9, 116.3))
    val radius = 0.1
    
    // Perform DITA's (Indexed) range search
    val ditaRangeSearch = search.search(spark.sparkContext, center, rdd1, radius)

    // Perform an exhaustive range search
    val exhaustiveRangeSearch = trajs.filter(t => t.points.forall(p => p.minDist(center) <= radius))

    println(s"Circle range search count: DITA: ${ditaRangeSearch.count()}, Exhaustive: ${exhaustiveRangeSearch.count()}")
  }

It has the following output on provided dataset (trajectory.txt):

Trajectory count: 5595
Circle range search count: DITA: 266, Exhaustive: 860

i.e. Ideally, the range search count should return 860 results but it only returns 266. This difference becomes even more critical in some cases when range is small and DITA's range query doesn't return a result at all while there are many trajectories present in the data satisfying the query.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant