Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot get properties of Elements on remote graph with scalaGraph #223

Open
TitiHl opened this issue Nov 27, 2017 · 37 comments
Open

Cannot get properties of Elements on remote graph with scalaGraph #223

TitiHl opened this issue Nov 27, 2017 · 37 comments

Comments

@TitiHl
Copy link

TitiHl commented Nov 27, 2017

Hi,

Thanks for building this nice wrapper for Scala :D. I am currently use this on a Remote JanusGraph by calling:
val scalaGraph: ScalaGraph = EmptyGraph.instance().asScala().configure(_.withRemote(DriverRemoteConnection.using(cluster, "g")))

but found I lost some syntax benefit for gremlin-scala, say, if I want to add an edge between v1 and v2, I can no logger call:
val edge = v1 --- ("reference", metadata -> "EdgeTest", deleted -> false) --> v2
Exceptions below:

Edge additions not supported
java.lang.IllegalStateException: Edge additions not supported
	at org.apache.tinkerpop.gremlin.structure.Vertex$Exceptions.edgeAdditionsNotSupported(Vertex.java:175)
	at org.apache.tinkerpop.gremlin.structure.util.reference.ReferenceVertex.addEdge(ReferenceVertex.java:47)
	at gremlin.scala.ScalaVertex.addEdge(ScalaVertex.scala:65)
	at gremlin.scala.SemiEdge.$minus$minus$greater(SemiEdge.scala:4)

I believe this is the cause of EmptyGraph as a underlying graph.
referring to this example:
https://github.com/mpollmeier/gremlin-scala-examples/blob/master/dse-graph/src/test/scala/SimpleSpec.scala
instead I have to call
val a = StepLabel[Vertex]() val b = StepLabel[Vertex]() scalaGraph.V(v1.id).as(a).V(v2.id).as(b).addE(REFERENCE).from(a).to(b).property(metadata, "EdgeTest").property(deleted, false).iterate()

this is one of the examples that I cannot use nice wrapper provided by gremlin-scala when I am working on a remote graph, so wondering if i missed sth. here as I am still manipulating on a ScalaGraph or there is a better way to add vertex/edges in remote graph.

Thanks for your help in advance.
Alex

@TitiHl
Copy link
Author

TitiHl commented Nov 28, 2017

and also I found using valueMap with remote graph, I have to pass in valueMap(true) to get the properties using GraphTraversalSource: https://stackoverflow.com/questions/45764199/janusgraph-cluster-always-returns-vertex-without-properties-referencevertex
g.V().valueMap(true).toList()
but from scalaGraph of:
val scalaGraph: ScalaGraph = EmptyGraph.instance().asScala().configure(_.withRemote(DriverRemoteConnection.using(cluster, "g")))
there is no way I can pass valueMap(true). wondering what is the best way to get the properties of Elements using ScalaGraph in this way.

@TitiHl TitiHl changed the title Question: better way to manipulate on remote graph with scalaGraph? Cannot get properties of Elements on remote graph with scalaGraph Nov 28, 2017
mpollmeier added a commit to mpollmeier/gremlin-scala-examples that referenced this issue Nov 28, 2017
@mpollmeier
Copy link
Owner

As you mentioned already, EmptyGraph is the problem. Simply use org.janusgraph.core.JanusGraph and everything should be fine.

Did you see https://github.com/mpollmeier/gremlin-scala-examples/ ? It contains a JanusGraph example repo and I just added a line to prove that you can add edges. valueMap works as well, e.g. if you add println(scalaGraph.V.valueMap.toList).

@TitiHl
Copy link
Author

TitiHl commented Nov 28, 2017

Hi,
Thanks for your reply. The reason I use EmptyGraph is I am using it to initialise a remote graph, I guess I have to use EmptyGraph to connect to a remote as I found all the docs are using this way unless I missed a way to construct a EmptyGraph of JanusGraph.

I am basically want to achieve exactly the same as the DSE example but with JanusGraph:
https://github.com/mpollmeier/gremlin-scala-examples/blob/master/dse-graph/src/test/scala/SimpleSpec.scala
somehow the return type of DSE example is DetachVertex while for JanusGraph is ReferenceVertex while I cannot add edges here.

Thanks,
Alex

@mpollmeier
Copy link
Owner

Ok, I understand the issue now. I don't have much capacity to fiddle with this myself, but just had a brief look at Janus' documentation. Have you tried to pass the remote url etc. in the config that you pass to JanusGraphFactory, rather than using EmptyGraph.withRemote? E.g.

storage.backend=cassandra
storage.hostname=localhost

http://docs.janusgraph.org/latest/configuration.html

@TitiHl
Copy link
Author

TitiHl commented Nov 30, 2017

Hi,

the cluster points to the conf file that points the Remote JanusGraph server, while the JanusGraph Server has the all storage backend etc. settings.
But you are right, maybe I can specify the backend of JanusGraph explicitly that I can get sth. more than a EmptyGraph. will try this out.

Thanks for your help here.
Cheers,
Alex

@rwilcox
Copy link

rwilcox commented Feb 1, 2018

To follow up on this old issue, I believe I have similar problems. Well, similar in that I found I can not use the fancy operators with remote connections (if I do the data is not saved).

I have forked gremlin-scala-examples to show this: I'm using JanusGraph Server as my server here, but whatever it's mostly just Gremlin Server underneath: rwilcox/gremlin-scala-examples JanusGraph example for JanusGraph Server

There are three examples here:

  1. Java copied from Janusgraph's official remote example

  2. an operator ( / structure) based API example with a remote JanusGraph, copied from gremlin-scala janusgraph example

  3. a traversal API based example, where I (poorly!) try to convert the edge / vertices operator API -> traversal API.

(@TitiHl these all use JanusGraphFactory.open("inmemory") instead of EmptyGraph(), which I found too limiting ie not supporting transactions etc etc)

TL; DR:

  • connect to JanusGraph Server: val graph : ScalaGraph = JanusGraphFactory.open("inmemory").configure( _.withRemote( conf) ) line
  • graph + ( "Saturn", Key[String]("name") -> "Saturn" ) line
  • JanusGraph Server data store is unchanged

BUT, I used the addV traversal methods, like so:

  • connect to JanusGraph Server: val graph : ScalaGraph = JanusGraphFactory.open("inmemory").configure( _.withRemote( conf) )
  • graph.addV().property( Key[String]("name"), "Saturn" ).iterate() line
  • JanusGraph Server data store is changed

By going through issue history, I found #118, which has the following comment link - which is one of the reasons why graph.addV() exists at all!

ScalaGraph does have addV (I just called it addVertex). Also note that we have a nicer syntax to add vertices/edges, you might want to use that instead (it's documented on the front page (readme))

I think the difference is that addVertex on a Graph instance does not create a traversal, it operates directly on the Graph where addV operates on the traversal source and creates a traversal.

But methods like + and --- operate on ScalaGraph objects (calling addVertex), not the underlaying traversal source object.

The reason why operating on a graph vs operating on a traversal is important is because it seems to be that the (best? only?) way to connect to Gremlin ... err JanusGraph... Server is via TraversalSource's withRemote method.

@mpollmeier does this logic sound right to you? (I'm a relative newbie to this project and graph / tinkerpop in general)

@TitiHl : I have not tested the original bug with this configuration (JanusGraphFactory.open("inmemory")) vs the other , but that may solve your problem ???

In general, It would be great to somehow have the + or --- operators also work on gremlin.scala.TraversalSource objects, instead of just ScalaGraph objects. (Is there a way to force this??)

@mpollmeier
Copy link
Owner

Interesting - I'll run this tomorrow and see if I can find a workaround.
To make sure we're on the same page: how exactly did you start janusgraph? I'm just downloading janusgraph-0.2.0-hadoop2.zip from https://github.com/JanusGraph/janusgraph/releases/.

The docs suggest to run gremlin.sh and then graph = JanusGraphFactory.open('conf/janusgraph-berkeleyje-es.properties') - is this what you did?

@rwilcox
Copy link

rwilcox commented Feb 3, 2018

Awesome, thanks! Take a look at the JanusGraph Server Getting Started, but TL;DR: use bin/janusgraph.sh start <-- should work out for you

mpollmeier added a commit that referenced this issue Feb 4, 2018
when using remote graphs, Graph is actually just an empty shell and
can't be used to e.g. add elements.

open problems:
* marshalled classes can use the @id annotation to specify
the ID of a vertex, but we cannot set the id of a vertex inside a traversal
* same needs to be applied for edges

re #223
@mpollmeier
Copy link
Owner

Ok so it turns out that we shouldn't use Graph to add elements, and instead always use the Traversal. This doesn't impact local graphs (one edge case though: the user cannot provide the element id), and is the only way to handle remote graphs, as you had to figure out yourself painfully.

I've made a start to change everything to use a traversal (only for vertices so far) in f740788 - let me know your thoughts.

So I can actually test this, maybe you help me with the following: when I run your test cases, I get the following error:

- janusgraph server ported Java (from janusgraph-server example) *** FAILED ***
java.util.concurrent.CompletionException: io.netty.handler.codec.DecoderException: 
org.apache.tinkerpop.gremlin.driver.ser.SerializationException: 
org.apache.tinkerpop.shaded.kryo.KryoException: Buffer underflow.

Any ideas what's wrong? Some missing configuration?

Thanks for bringing this back up and providing a nice project to reproduce, @rwilcox

Other random thoughts:

  • you're building up a ClusteredClient but don't actually use it...
  • IMO using janusgraphfactory.open(inmemory) is misleading, it gives you the (false) sense that you can actually use that graph instance. Use EmptyGraph instead

@rwilcox
Copy link

rwilcox commented Feb 4, 2018

Woh, awesome! I'll take a look at the changes probably tomorrow,

(And that buffer underflow error sounds familiar too - I can't place it but I'll check it out at work tomorrow... maybe there it will come to me).

I have answers to your random thoughts now:

you're building up a ClusteredClient but don't actually use it

Yes, in my reading sample code / readings docs / and code provided to me from others on my Current Graph Database Project, I believe the ClusteredClient etc provides JanusGraph specific management features: ie the ability the create indexes to speed up searching, schemas, etc. But only learned this in the last day or so. (And I don't actually do those things in the sample code, yes)

... IMO using janusgraphfactory.open(inmemory) is misleading, it gives you the (false) sense that you can actually use that graph instance. Use EmptyGraph instead

Maybe. What I believe / assume is happening is that creating a traversal off an EmptyGraph will give you only features available in generic Gremlin Server, but basing the traversal off a JanusGraph gives you JanusGraph features.

I'm super interested in what the local graph instance is used for in remote traversal situations: is it just a bootstrap mechanism or is it used somehow ie does it hold a subgraph in memory for cache reasons????? I may go ask the JanusGraph people, as my lead engineer had similar questions (ie if it is used for something like caching, that may have memory implications for mid to large graphs).

@mpollmeier
Copy link
Owner

It would certainly be a good idea to use the graph instance for some local caching, but I don't think it's doing that, instead it just seems to be a bootstrap for the traversal..

@mpollmeier
Copy link
Owner

@rwilcox any news re the DecoderException? Can you reproduce it?

@rwilcox
Copy link

rwilcox commented Feb 5, 2018

any news re the DecoderException? Can you reproduce it?

No, and my browser history and notes didn't help either :(

@mpollmeier
Copy link
Owner

'No' as in, if you run the test locally it works, and you don't get that exception?
If so, what exactly did you do? I downloaded the 0.2.0-hadoop2 release, unpacked and ran bin/janusgraph.sh -v start, and then ran the test.

@rwilcox
Copy link

rwilcox commented Feb 5, 2018

'No' as in, if you run the test locally it works, and you don't get that exception?
If so, what exactly did you do? I downloaded the 0.2.0-hadoop2 release, unpacked and ran bin/janusgraph.sh -v start

Correct - on OS X 10.12 with JAVA_HOME set to a 1.8 JVM, I ran bin/janusgraph.sh -v start then ran my tests one by one in IntelliJ. No error. (Are you using JVM 1.7 or 1.9 maybe???????)

@mpollmeier
Copy link
Owner

How about if you run it in sbt?

I'm on linux with java 1.8

java -version
openjdk version "1.8.0_144"
OpenJDK Runtime Environment (build 1.8.0_144-b01)
OpenJDK 64-Bit Server VM (build 25.144-b01, mixed mode)

I just freshly unpacked janusgraph and ran sbt test. Output on janusgraph console:

27043 [gremlin-server-worker-1] ERROR org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor  - Could not deserialize the Traversal instance

and sbt

*** 3 TESTS FAILED ***

@rwilcox
Copy link

rwilcox commented Feb 6, 2018

Ok, super weird: sbt test gives me the error too. Was not expecting that (given my success with IntelliJ)

@rwilcox
Copy link

rwilcox commented Feb 6, 2018

Ok, super interesting. In my IntelliJ test configuration there's a checkbox to "use SBT". It was off. When I checked it to be on I got the same error in IntelliJ.

I guess I can see the Scala IntelliJ plugin somehow wanting to bypass sbt for Reasons by default

@mpollmeier
Copy link
Owner

That's good news, we're getting the same results :)
Let me know when you get to the bottom of the error, maybe the working setup with intellij can help? Maybe there's a difference in the classpath?

@alicefuzier
Copy link

alicefuzier commented Feb 17, 2018

Hey, I'm also interested in this. I got a similar SerializationException running some slightly different code. After some digging I solved it by explicitly specifying the serializer when creating the cluster like so:

private def buildCluster() = {
    val serializer = new GryoMessageSerializerV1d0(GryoMapper.build().addRegistry(JanusGraphIoRegistry.getInstance()))
    val cluster =
      Cluster.build().addContactPoint("localhost").port(45679).serializer(serializer).create()
    cluster
  }

Hope it helps!

@apatzer
Copy link

apatzer commented Mar 2, 2018

For what it's worth, I'm running into the same issues trying to connect to a new Amazon Neptune GraphDB Cluster.

val builder: Cluster.Builder = Cluster.build()
  builder.addContactPoint("my-endpoint.amazonaws.com")
  builder.port(8182)
val cluster: Cluster = builder.create()
val graph = EmptyGraph.instance().asScala().configure(_.withRemote(DriverRemoteConnection.using(cluster)))

Gives the same errors:
(Empty)Graph does not support adding vertices

@mpollmeier
Copy link
Owner

@alicefuzier thanks for sharing, but that didn't fix the exception I'm getting:

io.netty.handler.codec.DecoderException: org.apache.tinkerpop.gremlin.driver.ser.SerializationException: org.apache.tinkerpop.shaded.kryo.KryoException: Buffer underflow.

I don't know much about Janus and it's serialisation unfortunately.

@apatzer that's the error you get when you add a vertex with graph.addV, or graph + someCaseClass. Until this is resolved, the workaround is to add your vertex in a traversal, i.e. using the addV step in GremlinScala. Note: case classes aren't yet supported for that.

@mpollmeier
Copy link
Owner

Ok I just figured out how to connect to janusgraph. Use a different serialiser.

hosts: [localhost]
port: 8182
serializer: {
    className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV2d0,
    config: {
        ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry]
    }
}

I'll continue with the changes to allow adding vertices to a remotegraph shortly.

mpollmeier added a commit that referenced this issue Mar 24, 2018
when using remote graphs, Graph is actually just an empty shell and
can't be used to e.g. add elements.

open problems:
* marshalled classes can use the @id annotation to specify
the ID of a vertex, but we cannot set the id of a vertex inside a traversal
* same needs to be applied for edges

re #223
mpollmeier added a commit that referenced this issue Mar 24, 2018
when using remote graphs, Graph is actually just an empty shell and
can't be used to e.g. add elements.

open problems:
* marshalled classes can use the @id annotation to specify
the ID of a vertex, but we cannot set the id of a vertex inside a traversal
* same needs to be applied for edges

re #223
mpollmeier added a commit that referenced this issue Mar 25, 2018
when using remote graphs, Graph is actually just an empty shell and
can't be used to e.g. add elements.

open problems:
* marshalled classes can use the @id annotation to specify
the ID of a vertex, but we cannot set the id of a vertex inside a traversal
* same needs to be applied for edges

re #223
@mpollmeier
Copy link
Owner

I just found some time to dig deeper into this. The underlying problem is that the configuration for remote is not stored in the graph instance, but in the TraversalSource. Because of that, one cannot simply call e.g. vertex.addEdge any more, because that doesn't know about the TraversalSource, and therefor the remote graph.
Since IMO the graph instance should hold that information (ScalaGraph does by holding onto the TraversalSource), I decided to add that as an implicit for the arrow DSL. I.e. from now on you need to have an implicit ScalaGraph in scope, then the arrow DSL works fine with remote and local graphs.

I just released gremlin-scala 3.3.1.2 and provided a working example for gremlin-server. I'm still fighting with janusgraph (the basic setup is here), and assume I need to release a new version for 3.3.0, since Janusgraph hasn't released anything for 3.3.1 yet.

@hudsonmd
Copy link

hudsonmd commented Mar 29, 2018

It looks like I'm running into a similar issue even with 3.3.1.2 when interacting with a Neptune graph.

org.apache.tinkerpop.shaded.kryo.KryoException: Buffer underflow.
val cluster = Cluster.build()
      .addContactPoint(url)
      .port(port)
      .create()
implicit val g = EmptyGraph.instance().asScala
                      .configure(_.withRemote(DriverRemoteConnection.using(cluster, "g")))

object Name extends Key[String]("name")
// this succeeds
g.addV("Node").property(Name, "N/A").valueMap.head()
try {
   //this triggers the error
    g.addV("Node").property(Name, "N/A").head()
} catch {
   // error also occurs for the below expression
    case _: KryoException => g.addVertex("Node", Name.name -> "N/A")
}

Have you found any solutions other than changing the serializer? Neptune does not have such an IORegistry published as far as I can tell

@hudsonmd
Copy link

hudsonmd commented Mar 29, 2018

Turns out it was user error.. You can modify my above example to add this unmodified serializer and it will function properly

 val cluster = Cluster.build()
      .addContactPoint(url)
      .port(port)
      .serializer(new GraphSONMessageSerializerV3d0())
      .create()

Thanks for the great work bringing gremlin to scala!

@mpollmeier
Copy link
Owner

Quick update re JanusGraph: since it's last release (0.2.0) is still based on tinkerpop 3.2.x I can't backport the new model for handling this in remote graphs, because it relies on GraphTraversal.from(Vertex) which was only introduced in 3.3.x. I'll only provide a working JanusGraph example when they release a new version.

@jeremysears
Copy link
Collaborator

A quick update... JanusGraph 0.3.0 was released on July 31, 2018. It now supports tinkerpop 3.3.3.

@mpollmeier
Copy link
Owner

Finally got around to setting up a remote janusgraph example: https://github.com/mpollmeier/gremlin-scala-examples/blob/fcc048e/janusgraph/src/test/scala/SimpleSpec.scala#L55

I found debugging the serialisers non-straightforward, but here's a setup that works:

val serializer = new GryoMessageSerializerV3d0(GryoMapper.build.addRegistry(JanusGraphIoRegistry.getInstance))
val cluster = Cluster.build.addContactPoint("localhost").port(8182).serializer(serializer).create
implicit val graph = EmptyGraph.instance.asScala.configure(_.withRemote(DriverRemoteConnection.using(cluster)))

@voroninp
Copy link

voroninp commented Sep 15, 2018

@mpollmeier Actually, this problem is still actual for Amazon Neptune. At least I have no idea how to initialize connection in a way it worked.

@mpollmeier mpollmeier reopened this Sep 15, 2018
@mpollmeier
Copy link
Owner

does anyone have a working setup for gremlin-groovy or gremlin-java? this isn't really a gremlin-scala specific issue..

@nkconnor
Copy link

nkconnor commented Nov 1, 2018

I can't get the gremlin-server example to work. It adds vertices / edges fine as in the provided example.. but fails to retrieve any property values

@Joe29
Copy link

Joe29 commented Nov 1, 2018

@nkconnor @mpollmeier For neptune I'm using the following setup. Still new to gremlin, so not sure how much of this is specific to issues with Neptune. I sort of walked it back from using idomatic (gremlin-)scala in a lot of areas, but am still able to use the g + CC functionality.

Like @hudsonmd said, adding the following serializer is important:

val cluster = Cluster.build()
    .addContactPoint("localhost") // with ssh tunnel to Neptune
    .port(8182)
    .serializer(new GryoMessageSerializerV3d0())
    .create()

  implicit val g = EmptyGraph.instance.asScala().configure(_.withRemote(DriverRemoteConnection.using(cluster)))

And then for retrieving an item map with properties:

val userMap = g.V().has[Int](UserLabel, userIdKey, id)
        .valueMap()
        .head()

Calling the valueMap before head is important if you want properties - also calling head seems to stop a traversal, so something like .head().out() I don't think works.

Updates for properties in neptune seem to only work if you first delete the property. I was running into this gnarly bug where if you try to increase the property value of a Double it works, but if you try to decrease it it does not.

g.V().has(UserLabel, userIdKey, uId).properties(userPropertyKey.name).drop().iterate()
g.V().has(UserLabel, userIdKey, uId).property(userPropertyKey, myPropertyValue).iterate()

@nkconnor
Copy link

nkconnor commented Nov 1, 2018

@Joe29 are you able to use .toCC[User] prior to head? Or do you work soley with the value maps?

// .toCC errors
java.lang.IllegalArgumentException: Class is not registered: gremlin.scala.GremlinScala$$Lambda$48836/1166529519
Note: To register this class use: kryo.register(gremlin.scala.GremlinScala$$Lambda$48836/1166529519.class);
	at org.apache.tinkerpop.shaded.kryo.Kryo.getRegistration(Kryo.java:484)
	at org.apache.tinkerpop.shaded.kryo.Kryo.getSerializer(Kryo.java:502)

@Joe29
Copy link

Joe29 commented Nov 1, 2018

@nkconnor It's been a couple of weeks since I touched the code, but I do recall having problems with .toCC. Yeah it looks like I've got some dirty code to work around that with the value maps.

@Joe29
Copy link

Joe29 commented Nov 1, 2018

@nkconnor If you want to get up and running feel free to use this, though it's not pretty (two json libs??). I'm also making some assumptions here about the type of data/case class. Let me know if you find a cleaner solution.

import spray.json._
import org.json4s.DefaultFormats
import org.json4s.native.Json

  def mapToJSON(map: Map[String, Any]): String ={
    val correctedMap = map.map(kv => {
      if (kv._2.isInstanceOf[java.util.ArrayList[Any]]){
        kv._1 -> kv._2.asInstanceOf[java.util.ArrayList[Any]].head
      } else {
        kv
      }
    })

    Json(DefaultFormats).write(correctedMap)
  }

  implicit val userFormat         = jsonFormat3(User)

  val m = g.V().has[Int](UserLabel, userIdKey, id)
        .valueMap()
        .head()

      mapToJSON(m.toMap).parseJson.convertTo[User]

@nkconnor
Copy link

nkconnor commented Nov 1, 2018

Thanks for the help Joe.... I'm going to look around at other graph libraries since the remote support is limited

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants