Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support java.sql.Date and java.sql.Timestamp so they work just as in plain Spark datasets. #205

Open
ireactsam opened this issue Nov 10, 2017 · 4 comments
Labels

Comments

@ireactsam
Copy link

ireactsam commented Nov 10, 2017

Given the following snippet:

    import frameless._
    import org.apache.spark.sql.catalyst.util.DateTimeUtils

    implicit val dateAsInt: Injection[java.sql.Date, Int] = Injection(DateTimeUtils.fromJavaDate, DateTimeUtils.toJavaDate)

    // create some df (typically read from an orc or parquet file)
    val today = new java.sql.Date(System.currentTimeMillis)
    val df = Seq((42, today)).toDF("i", "d")

    // and turn it into a TypedDataset
    case class P(i: Int, d: java.sql.Date)
    val ds = df.as[P]
    val tds = TypedDataset.create(ds)

in plain Dataset you can use:

ds.filter(ds("d") === today).show 

+---+----------+
|  i|         d|
+---+----------+
| 42|2017-11-10|
+---+----------+

but in TypedDataset this results in an AnalysisException

tds.filter(tds('d) === today).show().run
org.apache.spark.sql.AnalysisException: cannot resolve '(`d` = FramelessLit(2017-11-10))' due to data type mismatch: differing types in '(`d` = FramelessLit(2017-11-10))' (date and int).;;
'Filter (d#82 = FramelessLit(2017-11-10))
+- Project [_1#78 AS i#81, _2#79 AS d#82]
   +- LocalRelation [_1#78, _2#79]
@imarios imarios added the bug label Nov 11, 2017
@ireactsam
Copy link
Author

I managed to define a TypedEncoder[java.sql.Date] as:

  implicit val dateEncoder: TypedEncoder[java.sql.Date] = new TypedEncoder[java.sql.Date] {
    import org.apache.spark.sql.catalyst.util.DateTimeUtils
    override def nullable: Boolean = false
    override def jvmRepr: DataType = ScalaReflection.dataTypeFor[java.sql.Date]
    override def catalystRepr: DataType = DateType
    override def toCatalyst(path: Expression): Expression =
      StaticInvoke(DateTimeUtils.getClass, catalystRepr, "fromJavaDate", path :: Nil, propagateNull = true)
    override def fromCatalyst(path: Expression): Expression =
      StaticInvoke(DateTimeUtils.getClass, jvmRepr, "toJavaDate", path :: Nil, propagateNull = true)
  }

together with adding to CatalystOrdered:

implicit val framelessDateOrdered     : CatalystOrdered[java.sql.Date]      = of[java.sql.Date]

I can now do:

tds.filter(tds('d) === today).show().run
tds.filter(tds('d) > today).show().run

I should look in doing a PR...

@imarios
Copy link
Contributor

imarios commented Feb 1, 2018

@ireactsam Interestingly, this only fails on the REPL. If you run it outside the REPL (say in a unit test), everything works.

imarios added a commit to imarios/frameless that referenced this issue Feb 1, 2018
OlivierBlanvillain added a commit that referenced this issue Feb 1, 2018
Showing that #205 is not a bug in the current code
@tejasmanohar
Copy link

New to frameless (and shapeless and so on). Why do we need to write out the Injection in general? Spark has a built-in implicit date encoder (org.apache.spark.sql.Encoders.DATE).

@OlivierBlanvillain
Copy link
Contributor

OlivierBlanvillain commented Jun 7, 2018

@tejasmanohar Did you see this page? One typical use case would be that your entire application is based around joda-time, so you naturally also want to manipulate them in your Spark tables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants