Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a Apache Spark UDF? #5

Open
mielvds opened this issue Mar 19, 2021 · 2 comments
Open

Adding a Apache Spark UDF? #5

mielvds opened this issue Mar 19, 2021 · 2 comments

Comments

@mielvds
Copy link

mielvds commented Mar 19, 2021

Hi! Would it make sense to have a small addition that makes the library usable in Apache Spark? Something along the lines of

package com.atomgraph.etl.json;
import org.apache.spark.sql.api.java.UDF1;
import org.apache.jena.rdf.model.Model;
public class Json2rdfUDF implements UDF1<String, String> {
private static final long serialVersionUID = 1L;
@Override
  public StreamRDF call(String jsonString) throws Exception {

       InputStream bis = new ByteArrayInputStream(jsonString.getBytes());
       Reader reader =  new BufferedReader(bis);

       StreamRDF rdfStream = new CollectorStreamRDF();
       new JsonStreamRDFWriter(reader, rdfStream, baseURI.toString()).convert();
       
       return rdfStream;
   }
}
@namedgraph
Copy link
Member

Is it the serialVersionUID that does this? If it doesn't change the rest of the logic then fine.

Will you make a PR?

@namedgraph
Copy link
Member

@mielvds ping. What is required here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants