Skip to content
This repository has been archived by the owner on Nov 11, 2022. It is now read-only.

Dataflow /var/opt/google/dataflow directory created as /var/opt/google/dataflow/dataflow #636

Open
junying1 opened this issue May 11, 2018 · 1 comment

Comments

@junying1
Copy link

junying1 commented May 11, 2018

Dataflow is not aboutable to find files packaged with my classes. I use Class.getResource("/data.json"). Stackdriver log shows it's looking for the file in /var/opt/google/dataflow/some-random-jar-name.jar!/data.json. When I ssh into the VM instance for the worker, the file is actually in /var/opt/google/dataflow/dataflow/some-random-jar-name.jar.jar. This was working as of 5/9/18.

I tested with the WordCount example straight from Apache Beam documentation: https://beam.apache.org/get-started/quickstart-java/

Followed all the steps. Then added a "resources/data.json" to "src/main". Added the following lines to WordCount.ExtractWordsFn's processElement method:

 try {
  String jsonStr = new Scanner(new File(WordCount.class.getResource("/data.json").getFile())).useDelimiter("\\Z").next();
  System.out.println("====================================================");
  System.out.println(jsonStr);
  System.out.println("====================================================");
} catch (Exception e) {
  e.printStackTrace();
}

Sure enough, it runs fine locally with DirectRunner, but with DataflowRunner, I got the same error in stack driver:

message: "java.io.FileNotFoundException: file:/var/opt/google/dataflow/classes-yGX0uczTTR8A8LXakSr0JA.jar!/data.json (No such file or directory)"

While the example batch is still running, I ssh'ed into the worker instance and checked /var/opt/google/dataflow. There is another "dataflow" directory, and the files are copied there. So confirmed the double dataflow directory issue.

@junying1
Copy link
Author

I worked out a workaround: use Class.getResourceAsStream to get an inputstream. For whatever reason, getResourceAsStream functioned as expected, while getResource still fails. For all of my purposes, an inputstream works just as well as a URL.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant