Generate chunkIdx at runtime rather than as a file input #11

msashokkumar · 2018-06-29T05:06:59Z

Hi,

I have 20 bag files adding up to 1 TB. All my bag files are stored in HDFS already. I have a DC/OS cluster running containers (mesosphere/spark:2.3.0-2.2.1-2-hadoop-2.7). I am trying to run ros_hadoop as a distributed service to extract data from the 20 bag files in an automated way.

Is there a way to generate the chunkIdx during runtime and pass it to newAPIHadoopFile?

If not, can the idx.bin file be stored in HDFS and presented as "hdfs://...idx.bin" to chunkIdx?

Thanks!

wiegelmann · 2018-07-02T20:06:07Z

Generate the chunkIdx during runtime should be an option. Would need a few days to investigate the matter.

msashokkumar · 2018-07-03T00:41:07Z

Making the chunkIdx file readable from hadoop is useful as well.

vini-almeida · 2019-03-27T11:52:36Z

Hi there,
Is there any updates on this recently?
Thank you!

msashokkumar changed the title ~~Generate chunkIdx at rather than a file input~~ Generate chunkIdx at runtime rather than a file input Jun 29, 2018

msashokkumar changed the title ~~Generate chunkIdx at runtime rather than a file input~~ Generate chunkIdx at runtime rather than as a file input Jun 29, 2018

wiegelmann added the enhancement label Jul 2, 2018

vini-almeida mentioned this issue Mar 28, 2019

Allow idx.bin file to be read either from local fs or hdfs. #19

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate chunkIdx at runtime rather than as a file input #11

Generate chunkIdx at runtime rather than as a file input #11

msashokkumar commented Jun 29, 2018

wiegelmann commented Jul 2, 2018

msashokkumar commented Jul 3, 2018

vini-almeida commented Mar 27, 2019

Generate chunkIdx at runtime rather than as a file input #11

Generate chunkIdx at runtime rather than as a file input #11

Comments

msashokkumar commented Jun 29, 2018

wiegelmann commented Jul 2, 2018

msashokkumar commented Jul 3, 2018

vini-almeida commented Mar 27, 2019