Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE #1548

Open
Tradunsky opened this issue May 28, 2020 · 1 comment
Open

Comments

@Tradunsky
Copy link

Tradunsky commented May 28, 2020

Tried to reproduce a benchmark test made here:
https://dzone.com/articles/joining-a-billion-rows-20x-faster-than-apache-spar

Basically, it is 26x times slower (25.95 - 26.31 sec) than clean apache spark (0.97 - 0.98 sec) on my laptop:

MacOS: Catalina ver. 10 
  Processor Name:	Quad-Core Intel Core i7
  Processor Speed:	2,9 GHz
  Total Number of Cores: 4
  Memory:	16 GB
Oracle jdk1.8.0_201.jdk
scala-sdk-2.11.8
snappydata-cluster_2.11:1.2.0 or 1.1.0

RuntimeMemoryManager org.apache.spark.memory.SnappyUnifiedMemoryManager@4c398c80 configuration:
		Total Usable Heap = 2.9 GB (3082926162)
		Storage Pool = 1470.1 MB (1541463081)
		Execution Pool = 1470.1 MB (1541463081)
		Max Storage Pool Size = 2.3 GB (2466340929)

I'm sure I have not tuned my environment well enough, but I though it is still important to post this issue, since it is not related to spark, but snappy spark distribution:

val rangeData = spark.range(1000L * 1000 * 1000).toDF()
rangeData.cache()
rangeData.count()

leads to the error:

java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
	at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:863)
	at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:102)
	at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:90)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1366)
	at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:104)
	at org.apache.spark.storage.BlockManager.getLocalValues(BlockManager.scala:468)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:704)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:41)
...
@Tradunsky
Copy link
Author

And this also can be reproduces just with any other big dataframe that could not fit into memory.
Good memory manager logs warns about that:

20/05/29 01:13:53 WARN SnappyUnifiedMemoryManager: Could not allocate memory for rdd_4_0 of _SPARK_CACHE_ size=1084871037. Memory pool size 2164224048
20/05/29 01:13:53 WARN MemoryStore: Not enough space to cache rdd_4_0 in memory! (computed 2.0 GB so far)
20/05/29 01:13:53 INFO MemoryStore: Memory use = 0.0 B (blocks) + 2.0 GB (scratch space shared across 1 tasks(s)) = 2.0 GB. Storage limit = 2.9 GB.
20/05/29 01:13:53 WARN BlockManager: Persisting block rdd_4_0 to disk instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant