Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional aggregation columns shortcut the computation to an empty dataset #239

Open
imarios opened this issue Jan 30, 2018 · 1 comment
Open

Comments

@imarios
Copy link
Contributor

imarios commented Jan 30, 2018

so it seems that if during an aggregation the first aggregate returns null, then it doesn't even care about the next results, it just returns nothing. Going to look if this is a Spark bug and not a Frameless one. Nope, the issue seems to be on our side of the fence

case class X[A,B](a: A, b: B)
val t = TypedDataset.create(X[Option[Int],Long](None, 0)::Nil)
t.show().run()
+----+---+
|   a|  b|
+----+---+
|null|  0|
+----+---+

scala> t.agg(first(t('a)), sum(t('b))).collect().run()
res: Seq[(Option[Int], Long)] = WrappedArray()

scala> t.agg(sum(t('b)), first(t('a))).collect().run()
res: Seq[(Long, Option[Int])] = WrappedArray((0,None))

Test that fails randomly due to this issue (NonAggregateFunctionsTests.scala).

 test("Empty vararg tests") {
    import frameless.functions.aggregate._
    def prop[A : TypedEncoder, B: TypedEncoder](data: Vector[X2[A, B]]) = {
      val ds = TypedDataset.create(data)
      val frameless = ds.select(ds('a), concat(), ds('b), concatWs(":")).collect().run().toVector
      val framelessAggr = ds.agg(first(ds('a)), concat(), concatWs("x"), litAggr(2)).collect().run().toVector
      val scala = data.map(x => (x.a, "", x.b, ""))
      val scalaAggr = if (data.nonEmpty) Vector((data.head.a, "", "", 2)) else Vector.empty
      (frameless ?= scala).&&(framelessAggr ?= scalaAggr)
    }

    check(forAll(prop[Long, Long] _))
    check(forAll(prop[Option[Vector[Boolean]], Long] _))
  }
@imarios imarios added the bug label Jan 30, 2018
@imarios
Copy link
Contributor Author

imarios commented Jan 30, 2018

@OlivierBlanvillain, this was the reason master was failing. The doesn't doesn't always break only when it happens to generate an empty Option.

imarios added a commit to imarios/frameless that referenced this issue Jan 30, 2018
imarios added a commit to imarios/frameless that referenced this issue Jul 4, 2018
@imarios imarios mentioned this issue Jul 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant