Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exportModel encounters NullPointerException #817

Open
knguyen1 opened this issue Apr 12, 2024 · 2 comments
Open

exportModel encounters NullPointerException #817

knguyen1 opened this issue Apr 12, 2024 · 2 comments
Assignees

Comments

@knguyen1
Copy link

Describe the bug
Cannot generate csv of model because of NullPointerException. Phase generateDocs works just fine. From documentation: https://docs.zingg.ai/zingg/stepbystep/createtrainingdata/exportlabeleddata

To Reproduce
Steps to reproduce the behavior:
Run: (.venv) spark@496208741a60:/workspaces/foo-zingg-entity-resolution $ ~/zingg-0.4.0/scripts/zingg.sh --phase exportModel --conf /workspaces/foo-zingg-entity-resolution/datasets/trader/conf_no_bdid.json --location tmp --properties-file /workspaces/foo-zingg-entity-resolution/zingg.conf

Expected behavior
Should be able to export a csv of the model.

Screenshots

24/04/12 15:35:03 INFO ClientOptions: --phase
24/04/12 15:35:03 INFO ClientOptions: exportModel
24/04/12 15:35:03 INFO ClientOptions: --conf
24/04/12 15:35:03 INFO ClientOptions: /workspaces/foo-zingg-entity-resolution/datasets/trader/conf_no_bdid.json
24/04/12 15:35:03 INFO ClientOptions: --location
24/04/12 15:35:03 INFO ClientOptions: tmp
24/04/12 15:35:03 INFO ClientOptions: --email
24/04/12 15:35:03 INFO ClientOptions: zingg@zingg.ai
24/04/12 15:35:03 INFO ClientOptions: --license
24/04/12 15:35:03 INFO ClientOptions: zinggLicense.txt
24/04/12 15:35:03 WARN ArgumentsUtil: Config Argument is /workspaces/foo-zingg-entity-resolution/datasets/trader/conf_no_bdid.json
24/04/12 15:35:03 WARN ArgumentsUtil: phase is exportModel
24/04/12 15:35:03 INFO Client: 
24/04/12 15:35:03 INFO Client: **************************************************************************
24/04/12 15:35:03 INFO Client: *            ** Note about analytics collection by Zingg AI **           *
24/04/12 15:35:03 INFO Client: *                                                                        *
24/04/12 15:35:03 INFO Client: *  Please note that Zingg captures a few metrics about application's     *
24/04/12 15:35:03 INFO Client: *  runtime parameters. However, no user's personal data or application   *
24/04/12 15:35:03 INFO Client: *  data is captured. If you want to switch off this feature, please      *
24/04/12 15:35:03 INFO Client: *  set the flag collectMetrics to false in config. For details, please   *
24/04/12 15:35:03 INFO Client: *  refer to the Zingg docs (https://docs.zingg.ai/docs/security.html)    *
24/04/12 15:35:03 INFO Client: **************************************************************************
24/04/12 15:35:03 INFO Client: 
java.lang.NullPointerException
        at java.base/java.lang.Class.forName0(Native Method)
        at java.base/java.lang.Class.forName(Unknown Source)
        at zingg.spark.core.executor.SparkZFactory.get(SparkZFactory.java:40)
        at zingg.common.client.Client.setZingg(Client.java:68)
        at zingg.common.client.Client.<init>(Client.java:46)
        at zingg.spark.client.SparkClient.<init>(SparkClient.java:29)
        at zingg.spark.client.SparkClient.getClient(SparkClient.java:68)
        at zingg.common.client.Client.mainMethod(Client.java:185)
        at zingg.spark.client.SparkClient.main(SparkClient.java:76)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.base/java.lang.reflect.Method.invoke(Unknown Source)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1029)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1120)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1129)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Desktop (please complete the following information):

  • OS:
(.venv) spark@496208741a60:/workspaces/foo-zingg-entity-resolution $ cat /etc/os-release 
NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):
N/A

Additional context

{
    "fieldDefinition": [
        {
            "fieldName": "data_source",
            "fields": "data_source",
            "dataType": "string",
            "matchType": "DONT_USE"
        },
        // other fields...
    ],
    "output": [
        {
            "name": "output",
            "format": "csv",
            "props": {
                "location": "/tmp/zinggOutput",
                "delimiter": ",",
                "header": true
            }
        }
    ],
    "data": [{
        "name": "salesforce",
        "format": "jdbc",
        "props": {
            "url": "jdbc:redshift://my-redshift-server:5439/my-redshift-db",
            "dbtable": "my_schema.my_table",
            "driver": "com.amazon.redshift.jdbc42.Driver",
            "user": "test",
            "password": "password123"
        }
    }],
    "labelDataSampleSize" : 0.15,
    "numPartitions": 50,
    "modelId": 101,
    "zinggDir": "/workspaces/foo-zingg-entity-resolution/models"
}

@sonalgoyal sonalgoyal self-assigned this Apr 14, 2024
@sonalgoyal
Copy link
Member

thanks for reporting this. if you are struck, you can try reading the model folder at zinggDir/modelId/trainingData/marked using pyspark. this location will have your labeled data in parquet format

@vikasgupta78
Copy link
Collaborator

Will be handled along side SparkConnect change, putting on hold for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants