Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate chunkIdx at runtime rather than as a file input #11

Open
msashokkumar opened this issue Jun 29, 2018 · 3 comments
Open

Generate chunkIdx at runtime rather than as a file input #11

msashokkumar opened this issue Jun 29, 2018 · 3 comments

Comments

@msashokkumar
Copy link

Hi,

I have 20 bag files adding up to 1 TB. All my bag files are stored in HDFS already. I have a DC/OS cluster running containers (mesosphere/spark:2.3.0-2.2.1-2-hadoop-2.7). I am trying to run ros_hadoop as a distributed service to extract data from the 20 bag files in an automated way.

Is there a way to generate the chunkIdx during runtime and pass it to newAPIHadoopFile?

If not, can the idx.bin file be stored in HDFS and presented as "hdfs://...idx.bin" to chunkIdx?

Thanks!

@msashokkumar msashokkumar changed the title Generate chunkIdx at rather than a file input Generate chunkIdx at runtime rather than a file input Jun 29, 2018
@msashokkumar msashokkumar changed the title Generate chunkIdx at runtime rather than a file input Generate chunkIdx at runtime rather than as a file input Jun 29, 2018
@wiegelmann
Copy link
Collaborator

Generate the chunkIdx during runtime should be an option. Would need a few days to investigate the matter.

@msashokkumar
Copy link
Author

Making the chunkIdx file readable from hadoop is useful as well.

@vini-almeida
Copy link

Hi there,
Is there any updates on this recently?
Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants