Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] introduce new scalding-quotation module #1754

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

fwbrasil
Copy link
Contributor

@fwbrasil fwbrasil commented Dec 5, 2017

Problem

Scalding could leverage more information on the user's code to apply optimizations and provide a better description of the jobs.

Solution

Introduce the new module scalding-quotation that provides a mechanism to quote the method parameters through an implicit parameter. Example:

def flatMap[U](f: T => TraversableOnce[U])(implicit q: Quoted): TypedPipe[U]

The quoted implicit is materialized by a macro that has access to the parameters used to invoke the method. This first version has three features:

  1. It provides the class file and line number information. It has the complete path to the source file, so we could use it to render links to the source code.
  2. It extracts the text of the method call, so users have more information on a transformation without having to open the source file.
  3. It detects projections and, if the source has projection support, it can apply them automatically.

Quoted functions

The macro has access only to the code under compilation. For instance, given this code:

def phone(p: Contact) = p.phone
personTypedPipe.map(p => phone(p.contact))

The macro doesn't know the tree of phone, so it must assume that all fields are used and thus the projection is p.contact instead of p.contact.phone. To address this limitation, the user can quote functions:

val phone = Quoted.function {
  (p: Contact) = p.phone
}
personTypedPipe.map(p => phone(p.contact))

Even though the macro doesn't have the projections of phone statically, it falls back to a runtime tree that checks if the function is quoted and determines the proper projection (p.contact.phone in this case).

Quoted propagation

The quotation must be done by the methods that are called by the user code. Any internal transformation done by scalding should use the initial quoted from the user call. For instance, this TypedPipe method:

def distinct(implicit ord: Ordering[_ >: T], m: Quoted): TypedPipe[T] =
    asKeys(ord.asInstanceOf[Ordering[T]], m).sum.keys

calls other quoted methods but it doesn't materialize new Quoted instances for them, reusing the one coming from the user call.

As a safety measure, the macro rejects scalding sources that try to materialize a Quoted since that's something that shouldn't happen in scalding internally. The compilation fails with the following error if a quotation is missing and it's a scalding source file:

The quotation must happen at the level of the user-facing API. Add an implicit q: Quoted to the enclosing method. If that's not possible and the transformation doesn't introduce projections, use Quoted.internal.

As the error message informs, it's possible to use Quoted.internal as a workaround for internal transformations that don't produce projections:

def internalSum(t: TypedPipe[Int]) = {
    implicit val q: Quoted = Quoted.internal
    t.sum
}

The macro uses a simple whitelist to allow Quoted materialization for source file paths that have specific substrings ("test", "example", "tutorial" for now). It's a weak rule, but it seems good enough.

Enabling the automatic projection

The projection is done through an optimization phase that is disabled by default and can be enabled via the scalding.experimental.automatic_projection_pushdown flag. I added the experimental so we can promote the feature later if it's stable.

Note that the automatic projection happens at an earlier phase than the manual projections configuration, so it won't have an effect if users are already using the manual projection pushdown.

New typed pipe description

The previous mechanism that provided descriptions based on stack traces will now output the information from Quoted instead. Example description:

SomeSourceFile:38 map(p => p.name)

Interfaces like DrScalding that render the descriptions will need rework. The description could be long since the user's code can have an arbitrary length.

Backwards compatibility

All user-facing methods need to take the extra implicit Quoted. This makes the changeset binary-incompatible, but it's mostly source-compatible. It'll not compile only if the user has a method call that specifies implicits explicitly. Example:

// TypedPipe method
def distinct(implicit ord: Ordering[_ >: T], m: Quoted): TypedPipe[T]

// will compile
aTypedPipe.distinct

// won't compile anymore
aTypedPipe.distinct(someOrdering)

// user will need to specify the quoted parameter
aTypedPipe.distinct(someOrdering, implicitly[Quoted]) 

Notes

  • As requested by the team internally, this pull request introduces all changes at once. I'd be happy to create smaller pull requests if you like.
  • I've added the eclipse plugin to the sbt configuration and added the eclipse resources to .gitignore. It's just a convenience for me; I can remove if you prefer.
  • The macro uses internal compiler APIs. It works on 2.11, 2.12, and should work for versions before Scala 3. The Scala 3 migration path for macros is still unclear.
  • I wasn't really sold on Dagon, but transitiveDependentsOf made me realize how convenient it is :)

@dieu dieu requested review from sritchie and johnynek and removed request for sritchie December 6, 2017 00:56
Copy link
Collaborator

@johnynek johnynek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exciting! Here are some small comments. Still reading.

I personally would like say 3 smaller PRs if possible, maybe one adding scalding-quotation, then 2 more using? I don't know. I will take a longer look later.

addSbtPlugin("org.scoverage" % "sbt-scoverage" % "1.5.0")
addSbtPlugin("org.xerial.sbt" % "sbt-sonatype" % "1.0")
addSbtPlugin("org.wartremover" % "sbt-wartremover" % "2.1.1")
addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "5.2.3")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to add this, or can you put it in your personal ~/.sbt directory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's convenient for me because I work on projects that require different versions of the plugin so it can't be on ~/.sbt. It's fine, I'll revert this change.

@@ -36,7 +37,7 @@ sealed trait CoGroupable[K, +R] extends HasReducers with HasDescription with jav
/**
* This is the list of mapped pipes, just before the (reducing) joinFunction is applied
*/
def inputs: List[TypedPipe[(K, Any)]]
def inputs(implicit q: Quoted): List[TypedPipe[(K, Any)]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment why we need the q here? This is not a method users should really ever be using (only people writing backends).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had added it because FilterKeys.input creates a TypedPipe. I reverted these methods and used Quoted.internal.

@@ -48,7 +49,7 @@ sealed trait CoGroupable[K, +R] extends HasReducers with HasDescription with jav
* how to achieve, and since it is an internal function, not clear it
* would actually help anyone for it to be type-safe
*/
private[scalding] def joinFunction: (K, Iterator[Any], Seq[Iterable[Any]]) => Iterator[R]
private[scalding] def joinFunction(implicit q: Quoted): (K, Iterator[Any], Seq[Iterable[Any]]) => Iterator[R]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question. Can you add a comment to why we need q here?

@fwbrasil
Copy link
Contributor Author

fwbrasil commented Dec 7, 2017

@johnynek I think it could be three PRs:

  1. add the new scalding-quotation module
  2. add the Quoted implicits
  3. introduce automatic projection (optimization rule and TypedSource changes)

@ttim @dieu @benpence wdyt?

@isnotinvain
Copy link
Contributor

IDK if it makes sense to copy, but scalatest calls something similar a Position and may of it's methods take an implicit Position -- seems the projection use case doesn't really fit in something called Position -- I just worry that users are going to have no idea what a Quoted is and if we can come up with something more clear. Is there any value in separating positional info from introspection info (eg field accesses)? if it was called MethodIntrospection it might be more clear what it's for, not a very nice name though.

@ttim
Copy link
Collaborator

ttim commented Dec 8, 2017

I have CallSiteInformation in mind, but I also think it's not the best name.

@ttim
Copy link
Collaborator

ttim commented Dec 8, 2017

@fwbrasil I think it's a good idea to split it in a way you say.

@fwbrasil
Copy link
Contributor Author

fwbrasil commented Dec 8, 2017

@isnotinvain @ttim Quotation is a principled concept like Monoid, Monad, etc. I think it's just a question of having documentation

@fwbrasil fwbrasil changed the title introduce new scalding-quotation module [WIP] introduce new scalding-quotation module Dec 11, 2017
@fwbrasil
Copy link
Contributor Author

@johnynek @ttim @dieu @isnotinvain I've just created the first PR: #1755


private val log = LoggerFactory.getLogger(this.getClass)

def projectionMeta[T: ClassTag](superClass: Class[_]) = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we put the return type on this public method?

I'm confused where Quotations are coming into the picture here. Can you comment?

@@ -657,6 +661,24 @@ object OptimizationRules {
}
}

object ApplyProjectionPushdown extends PartialRule[TypedPipe] {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow, this is pretty nice...

Okay...

@CLAassistant
Copy link

CLAassistant commented Nov 16, 2019

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants