Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pyspark.sql.utils.IllegalArgumentException on fresh install #39

Open
gstenzel opened this issue Mar 8, 2021 · 1 comment
Open

pyspark.sql.utils.IllegalArgumentException on fresh install #39

gstenzel opened this issue Mar 8, 2021 · 1 comment
Assignees
Labels
question Further information is requested

Comments

@gstenzel
Copy link

gstenzel commented Mar 8, 2021

  • Using Windows 10, same errors in Ubuntu WSL
  • Java version: openjdk version "1.8.0_282" (equivalent to JDK 8)
  • Installed with pip
  • in Python 3.6: >>> import nlu without errors
>>> nlu.load('tokenize').predict('Each word and symbol in a sentence will generate token.') # From the homepage
Ivy Default Cache set to: C:\Users\USERNAME\.ivy2\cache
The jars for the packages stored in: C:\Users\USERNAME\.ivy2\jars
:: loading settings :: url = jar:file:/C:/Users/USERNAME/.conda/envs/py36nlp/Lib/site-packages/pyspark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.johnsnowlabs.nlp#spark-nlp_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-f4225ffb-68be-4e92-a6fc-c8cf6d7928e2;1.0
        confs: [default]
        found com.johnsnowlabs.nlp#spark-nlp_2.11;2.7.5 in central
        found com.typesafe#config;1.3.0 in central
        found org.rocksdb#rocksdbjni;6.5.3 in central
        found com.amazonaws#aws-java-sdk;1.7.4 in central
        found commons-logging#commons-logging;1.1.1 in central
        found org.apache.httpcomponents#httpclient;4.2 in central
        found org.apache.httpcomponents#httpcore;4.2 in central
        found commons-codec#commons-codec;1.3 in central
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.ivy.util.url.IvyAuthenticator (file:/C:/Users/USERNAME/.conda/envs/py36nlp/Lib/site-packages/pyspark/jars/ivy-2.4.0.jar) to field java.net.Authenticator.theAuthenticator
WARNING: Please consider reporting this to the maintainers of org.apache.ivy.util.url.IvyAuthenticator
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
        found joda-time#joda-time;2.10.10 in central
        [2.10.10] joda-time#joda-time;[2.2,)
        found com.github.universal-automata#liblevenshtein;3.0.0 in central
        found com.google.code.findbugs#annotations;3.0.1 in central
        found net.jcip#jcip-annotations;1.0 in central
        found com.google.code.findbugs#jsr305;3.0.1 in central
        found com.google.protobuf#protobuf-java-util;3.0.0-beta-3 in central
        found com.google.protobuf#protobuf-java;3.0.0-beta-3 in central
        found com.google.code.gson#gson;2.3 in central
        found it.unimi.dsi#fastutil;7.0.12 in central
        found org.projectlombok#lombok;1.16.8 in central
        found org.slf4j#slf4j-api;1.7.21 in central
        found com.navigamez#greex;1.0 in central
        found dk.brics.automaton#automaton;1.11-8 in central
        found org.json4s#json4s-ext_2.11;3.5.3 in central
        found org.joda#joda-convert;1.8.1 in central
        found org.tensorflow#tensorflow;1.15.0 in central
        found org.tensorflow#libtensorflow;1.15.0 in central
        found org.tensorflow#libtensorflow_jni;1.15.0 in central
        found net.sf.trove4j#trove4j;3.0.3 in central
:: resolution report :: resolve 1184ms :: artifacts dl 28ms
        :: modules in use:
        com.amazonaws#aws-java-sdk;1.7.4 from central in [default]
        com.github.universal-automata#liblevenshtein;3.0.0 from central in [default]
        com.google.code.findbugs#annotations;3.0.1 from central in [default]
        com.google.code.findbugs#jsr305;3.0.1 from central in [default]
        com.google.code.gson#gson;2.3 from central in [default]
        com.google.protobuf#protobuf-java;3.0.0-beta-3 from central in [default]
        com.google.protobuf#protobuf-java-util;3.0.0-beta-3 from central in [default]
        com.johnsnowlabs.nlp#spark-nlp_2.11;2.7.5 from central in [default]
        com.navigamez#greex;1.0 from central in [default]
        com.typesafe#config;1.3.0 from central in [default]
        commons-codec#commons-codec;1.3 from central in [default]
        commons-logging#commons-logging;1.1.1 from central in [default]
        dk.brics.automaton#automaton;1.11-8 from central in [default]
        it.unimi.dsi#fastutil;7.0.12 from central in [default]
        joda-time#joda-time;2.10.10 from central in [default]
        net.jcip#jcip-annotations;1.0 from central in [default]
        net.sf.trove4j#trove4j;3.0.3 from central in [default]
        org.apache.httpcomponents#httpclient;4.2 from central in [default]
        org.apache.httpcomponents#httpcore;4.2 from central in [default]
        org.joda#joda-convert;1.8.1 from central in [default]
        org.json4s#json4s-ext_2.11;3.5.3 from central in [default]
        org.projectlombok#lombok;1.16.8 from central in [default]
        org.rocksdb#rocksdbjni;6.5.3 from central in [default]
        org.slf4j#slf4j-api;1.7.21 from central in [default]
        org.tensorflow#libtensorflow;1.15.0 from central in [default]
        org.tensorflow#libtensorflow_jni;1.15.0 from central in [default]
        org.tensorflow#tensorflow;1.15.0 from central in [default]
        :: evicted modules:
        commons-codec#commons-codec;1.6 by [commons-codec#commons-codec;1.3] in [default]
        joda-time#joda-time;2.9.5 by [joda-time#joda-time;2.10.10] in [default]
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   29  |   1   |   0   |   2   ||   27  |   0   |
        ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-f4225ffb-68be-4e92-a6fc-c8cf6d7928e2
        confs: [default]
        0 artifacts copied, 27 already retrieved (0kB/17ms)
21/03/08 16:01:46 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
        at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:379)
        at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:394)
        at org.apache.hadoop.util.Shell.<clinit>(Shell.java:387)
        at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
        at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2823)
        at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2818)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2684)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
        at org.apache.spark.deploy.DependencyUtils$.org$apache$spark$deploy$DependencyUtils$$resolveGlobPath(DependencyUtils.scala:191)
        at org.apache.spark.deploy.DependencyUtils$$anonfun$resolveGlobPaths$2.apply(DependencyUtils.scala:147)
        at org.apache.spark.deploy.DependencyUtils$$anonfun$resolveGlobPaths$2.apply(DependencyUtils.scala:145)
        at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
        at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
        at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
        at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
        at org.apache.spark.deploy.DependencyUtils$.resolveGlobPaths(DependencyUtils.scala:145)
        at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$3.apply(SparkSubmit.scala:343)
        at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$3.apply(SparkSubmit.scala:343)
        at scala.Option.map(Option.scala:146)
        at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:343)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:774)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
21/03/08 16:01:46 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
No accepted Data type or usable columns found or applying the NLU models failed.
Make sure that the first column you pass to .predict() is the one that nlu should predict on OR rename the column you want to predict on to 'text'
If you are on Google Collab, click on Run time and try factory reset Runtime run the setup script again, you might have used too much memory
On Kaggle try to reset restart session and run the setup script again, you might have used too much memory
Full Stacktrace: see bottom
Additional info:
<class 'pyspark.sql.utils.IllegalArgumentException'> pipeline.py 1380
Stuck? Contact us on Slack! https://join.slack.com/t/spark-nlp/shared_invite/zt-lutct9gm-kuUazcyFKhuGY3_0AMkxqA

Same errors occure when running nlu.load('tokenize').predict('Each word and symbol in a sentence will generate token.')
Full stack trace:

Full Stacktrace was (<class 'pyspark.sql.utils.IllegalArgumentException'>, IllegalArgumentException('Unsupported class file major version 55', 'org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:166)
         at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:148)
         at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:136)
         at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:237)
         at org.apache.spark.util.ClosureCleaner$.getClassReader(ClosureCleaner.scala:50)
         at org.apache.spark.util.FieldAccessFinder$$anon$4$$anonfun$visitMethodInsn$7.apply(ClosureCleaner.scala:845)
         at org.apache.spark.util.FieldAccessFinder$$anon$4$$anonfun$visitMethodInsn$7.apply(ClosureCleaner.scala:828)
         at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
         at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:134)
         at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:134)
         at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
         at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
         at scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:134)
         at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
         at org.apache.spark.util.FieldAccessFinder$$anon$4.visitMethodInsn(ClosureCleaner.scala:828)
         at org.apache.xbean.asm6.ClassReader.readCode(ClassReader.java:2175)
         at org.apache.xbean.asm6.ClassReader.readMethod(ClassReader.java:1238)
         at org.apache.xbean.asm6.ClassReader.accept(ClassReader.java:631)
         at org.apache.xbean.asm6.ClassReader.accept(ClassReader.java:355)
         at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:272)
         at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:271)
         at scala.collection.immutable.List.foreach(List.scala:392)
         at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:271)
         at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:163)
         at org.apache.spark.SparkContext.clean(SparkContext.scala:2326)
         at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:820)
         at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:819)
         at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
         at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
         at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
         at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:819)
         at org.apache.spark.sql.execution.python.EvalPythonExec.doExecute(EvalPythonExec.scala:89)
         at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
         at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
         at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
         at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
         at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
         at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
         at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:391)
         at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:43)
         at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:627)
         at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
         at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
         at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
         at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
         at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
         at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
         at org.apache.spark.sql.execution.GenerateExec.doExecute(GenerateExec.scala:80)
         at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
         at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
         at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
         at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
         at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
         at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
         at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:391)
         at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:43)
         at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:627)
         at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
         at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
         at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
         at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
         at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
         at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
         at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
         at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:296)
         at org.apache.spark.sql.Dataset$$anonfun$collectToPython$1.apply(Dataset.scala:3263)
         at org.apache.spark.sql.Dataset$$anonfun$collectToPython$1.apply(Dataset.scala:3260)
         at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3370)
         at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
         at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
         at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
         at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withAction(Dataset.scala:3369)
         at org.apache.spark.sql.Dataset.collectToPython(Dataset.scala:3260)
         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
         at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
         at java.base/java.lang.reflect.Method.invoke(Method.java:566)
         at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
         at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
         at py4j.Gateway.invoke(Gateway.java:282)
         at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
         at py4j.commands.CallCommand.execute(CallCommand.java:79)
         at py4j.GatewayConnection.run(GatewayConnection.java:238)
         at java.base/java.lang.Thread.run(Thread.java:834)'), <traceback object at 0x0000024A29857188>)
@maziyarpanahi
Copy link
Member

You have an error in your environment:

java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.

Please follow this instruction step by step, making sure you have the Apache Spark/Hadoop setup correctly on Windows. (Every step matters and unfortunately, it's a bit long for Windows)

JohnSnowLabs/spark-nlp#1022

Before doing pip install nlu pyspark==2.4.7

@maziyarpanahi maziyarpanahi added the question Further information is requested label Mar 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants