Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running the kite-sdk commands in mapreduce mode #426

Open
malathit opened this issue Dec 17, 2015 · 3 comments
Open

Running the kite-sdk commands in mapreduce mode #426

malathit opened this issue Dec 17, 2015 · 3 comments

Comments

@malathit
Copy link

Hi,

I had a look at the kite dataset code and found that kite internally uses apache crunch to run map reduce pipeline.

In my case, I invoke the kite cli from oozie to import the json data. But I noticed that by default, the apache crunch program is running mapreduce in LocalRunner mode. If I want to run the program in distributed mapreduce mode, how do I achieve that?

Regards,
Malathi

@rdblue
Copy link
Contributor

rdblue commented Dec 17, 2015

Kite will use MR on the cluster if both source and destination datasets are distributed. So Local to HDFS uses the local runner, while HDFS to Hive uses MR.

@malathit
Copy link
Author

Hi,

Thanks for the reply. In my case, I am using the data in hdfs to be written to the hive dataset created by hive. But still the program runs as localrunner. Any ideas if I have missed something obvious?

@rdblue
Copy link
Contributor

rdblue commented Dec 18, 2015

What is the command you're running? If you don't specify hdfs:/... then Kite assumes you mean local. So if you run hdfs -put file.csv and then run kite-dataset csv-import file.csv ... Kite will find and use the local version instead of the one you just put in HDFS. You have to use the full URI like this: kite-dataset csv-import hdfs:/user/me/file.csv ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants