Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Dataflow - MongoDB-to-BigQuery batch mode failing with filter on data #1328

Open
robbycarter opened this issue Feb 12, 2024 · 4 comments
Labels
bug Something isn't working needs triage p2

Comments

@robbycarter
Copy link

Related Template(s)

MongoDB-to-BigQuery

Template Version

v2

What happened?

I have function that checks if a field is true or not. If its true then it returns null to skip saving that document into BigQuery.

I have tried doing a return undefined, return "" and i keep getting the same issue which is

com.google.cloud.teleport.v2.common.UncaughtExceptionLogger - The template launch failed.
java.lang.IllegalArgumentException: schema can not be null

Below is a code snippet

function deliveries_transform(input_doc) {
  var doc = JSON.parse(input_doc)

  // Filters
  if (doc.has_parent) {
    return null;
  }

  //return after stringifying
  return JSON.stringify(doc);
}

I referred to the example stated in this link
https://cloud.google.com/dataflow/docs/guides/templates/create-template-udf#filter_events

The job was created using the google console and not via api or sdk.

Relevant log output

[
  {
    "insertId": "",
    "jsonPayload": {
      "line": "exec.go:66",
      "message": "com.google.cloud.teleport.v2.common.UncaughtExceptionLogger - The template launch failed.\njava.lang.IllegalArgumentException: schema can not be null\n\tat org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:143)\n\tat org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO$Write.withSchema(BigQueryIO.java:2679)\n\tat com.google.cloud.teleport.v2.mongodb.templates.MongoDbToBigQuery.run(MongoDbToBigQuery.java:154)\n\tat com.google.cloud.teleport.v2.mongodb.templates.MongoDbToBigQuery.main(MongoDbToBigQuery.java:96)\n"
    },
    "resource": {
      "type": "dataflow_step",
      "labels": {
        "region": "",
        "project_id": "",
        "step_id": "",
        "job_name": "mongodb-to-bigquery-batch",
        "job_id": ""
      }
    },
    "timestamp": "2024-02-12T21:45:00.037010Z",
    "severity": "ERROR",
    "labels": {
      "compute.googleapis.com/resource_name": "",
      "dataflow.googleapis.com/region": "us-east4",
      "dataflow.googleapis.com/job_id": "",
      "compute.googleapis.com/resource_id": "",
      "compute.googleapis.com/resource_type": "",
      "dataflow.googleapis.com/job_name": "mongodb-to-bigquery-batch"
    },
    "logName": "",
    "receiveTimestamp": "2024-02-12T21:45:02.855403339Z",
    "errorGroups": [
      {
        "id": "CPXppsbT8JP4nQE"
      }
    ]
  },
  {
    "insertId": "",
    "jsonPayload": {
      "message": "Error: Template launch failed: exit status 1",
      "line": "launch.go:80"
    },
    "resource": {
      "type": "dataflow_step",
      "labels": {
        "job_name": "mongodb-to-bigquery-batch",
        "job_id": "",
        "step_id": "",
        "project_id": "",
        "region": ""
      }
    },
    "timestamp": "",
    "severity": "ERROR",
    "labels": {
      "dataflow.googleapis.com/region": "",
      "dataflow.googleapis.com/job_id": "",
      "compute.googleapis.com/resource_id": "",
      "compute.googleapis.com/resource_type": "",
      "compute.googleapis.com/resource_name": "",
      "dataflow.googleapis.com/job_name": "mongodb-to-bigquery-batch"
    },
    "logName": "",
    "receiveTimestamp": "2024-02-12T21:45:02.855403339Z"
  },
  {
    "textPayload": "Error occurred in the launcher container: Template launch failed. See console logs.",
    "insertId": "xl5y9bd22ed",
    "resource": {
      "type": "dataflow_step",
      "labels": {
        "project_id": "",
        "job_id": "2024-02-12_13_43_46-15601135711795228441",
        "job_name": "mongodb-to-bigquery-batch",
        "step_id": "",
        "region": ""
      }
    },
    "timestamp": "2024-02-12T21:47:43.432514787Z",
    "severity": "ERROR",
    "labels": {
      "dataflow.googleapis.com/job_id": "2024-02-12_13_43_46-15601135711795228441",
      "dataflow.googleapis.com/region": ",
      "dataflow.googleapis.com/log_type": "",
      "dataflow.googleapis.com/job_name": "mongodb-to-bigquery-batch"
    },
    "logName": "",
    "receiveTimestamp": "2024-02-12T21:47:43.962727013Z"
  }
]
@robbycarter robbycarter added bug Something isn't working needs triage p2 labels Feb 12, 2024
@britz89
Copy link

britz89 commented Apr 11, 2024

Hi,

I'm encountering the same issue.
If I use a "return null" statement when I try to skip the document row I obtain the "schema can not be null" error.
Did anyone manage to resolve the issue?
Many thanks!

@robbycarter
Copy link
Author

Hi,

I'm encountering the same issue. If I use a "return null" statement when I try to skip the document row I obtain the "schema can not be null" error. Did anyone manage to resolve the issue? Many thanks!

Hi @britz89 . I have not found a fix but I found an alternative way to skip it.
I pull all the data then use a saved query to run and create a new Table from the import. I have the filter applied in that saved query

@britz89
Copy link

britz89 commented Apr 11, 2024

So if I understood correctly you are pulling the full collection, storing in a temp table and then in a subsequent step filtering the rows. Correct?
My requirement is to avoid a full copy of the collection, so I hope that this issue will be fixed otherwise I will have to find another way.
Thanks for your suggestion, btw!

@robbycarter
Copy link
Author

So if I understood correctly you are pulling the full collection, storing in a temp table and then in a subsequent step filtering the rows. Correct? My requirement is to avoid a full copy of the collection, so I hope that this issue will be fixed otherwise I will have to find another way. Thanks for your suggestion, btw!

Yes that is what I am currently doing until it is fixed because I need a solution up. The other alternative I thought about is using a custom batch template and fixing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage p2
Projects
None yet
Development

No branches or pull requests

2 participants