Code is not working when data size is large #1

souvik82 · 2017-01-18T12:23:31Z

I have 4 node cluster - Master has 56 GB of RAM and Data nodes have 32 GB RAM. When I am going to use small data set like 200 MB. It works fine. When I am using 10 GB of Data, it is getting hang.

Sorting of data is horrible - It is taking too much time (For 10 GB - around 2.9 Mins). Below is my spark submit script:

spark-submit --jars /usr/lib/hbase-client-1.2.0-IBM-7.jar,/usr/lib/hbase-hadoop-compat-1.2.0-IBM-7.jar,/usr/lib/htrace-core-3.1.0-incubating.jar,/usr/lib/hbase-common-1.2.0-IBM-7.jar,/usr/lib/hbase-hadoop2-compat-1.2.0-IBM-7.jar,/usr/lib/hbase-protocol.jar,/usr/lib/hbase-server-1.2.0-IBM-7.jar,/usr/lib/metrics-core-2.2.0.jar,/usr/lib/hbase-annotations-1.2.0-IBM-7.jar --class testHbaseRDDUtil souvik-0.0.1-SNAPSHOT.jar --driver-memory 20G --executor-memory 4G --num-executors 32

zeyuanxy · 2017-01-18T18:09:11Z

Hello, sorting is accompanied with repartition, which means there is a lot of data transportation between machines, which is definitely a performance killer. Can you share the detailed metrics(or status) of your Spark Job?

zeyuanxy · 2017-03-28T07:28:43Z

Hi @souvik82 , I've added a new interface toHBaseBulkWithFamilies to let you define families rather than iterate through all the families, which would greatly improve performance when the data size is huge. Can you try it? Thanks~

ghost · 2017-06-20T08:29:39Z

How do I set the param numFilesPerRegionPerFamily? I have dozens of GB needs to be saved in HBase,,,
请问一下我该怎么设置numFilesPerRegionPerFamily的大小啊？我只知道我有十几个GB的数据要保存，这个数字应该怎么动态调节呢？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code is not working when data size is large #1

Code is not working when data size is large #1

souvik82 commented Jan 18, 2017

zeyuanxy commented Jan 18, 2017

zeyuanxy commented Mar 28, 2017

ghost commented Jun 20, 2017

Code is not working when data size is large #1

Code is not working when data size is large #1

Comments

souvik82 commented Jan 18, 2017

zeyuanxy commented Jan 18, 2017

zeyuanxy commented Mar 28, 2017

ghost commented Jun 20, 2017