Skip to content

Commit

Permalink
Update API links in MapReduce.md
Browse files Browse the repository at this point in the history
  • Loading branch information
SvenGroot committed Nov 14, 2023
1 parent 0c6cf9a commit 10e1265
Show file tree
Hide file tree
Showing 3 changed files with 28 additions and 19 deletions.
2 changes: 1 addition & 1 deletion doc/UserGuide/JobExecution.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ Next, let's look at some of [Jumbo's features](DfsFeatures.md) in more detail.
[`IJobRunner`]: https://www.ookii.org/docs/jumbo-2.0/html/T_Ookii_Jumbo_Jet_Jobs_IJobRunner.htm
[`ITask<TInput, TOutput>.Run`]: https://www.ookii.org/docs/jumbo-2.0/html/T_Ookii_Jumbo_Jet_ITask_2.htm
[`ITask<TInput, TOutput>`]: https://www.ookii.org/docs/jumbo-2.0/html/T_Ookii_Jumbo_Jet_ITask_2.htm
[`JetClient.JobServer.CreateJob`]: https://www.ookii.org/docs/jumbo-2.0/html/!UNKNOWN!.htm
[`JetClient.JobServer.CreateJob`]: https://www.ookii.org/docs/jumbo-2.0/html/M_Ookii_Jumbo_Jet_IJobServerClientProtocol_CreateJob.htm
[`JetClient.RunJob`]: https://www.ookii.org/docs/jumbo-2.0/html/Overload_Ookii_Jumbo_Jet_JetClient_RunJob.htm
[`JetClient.WaitForJobCompletion`]: https://www.ookii.org/docs/jumbo-2.0/html/Overload_Ookii_Jumbo_Jet_JetClient_WaitForJobCompletion.htm
[`JobBuilder`]: https://www.ookii.org/docs/jumbo-2.0/html/T_Ookii_Jumbo_Jet_Jobs_Builder_JobBuilder.htm
Expand Down
38 changes: 21 additions & 17 deletions doc/UserGuide/MapReduce.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@ The WordCount job we [created in the tutorial](Tutorial1.md) uses some features
part of regular MapReduce, like hash table aggregation. There are other features too, like the
ability to have jobs with more than two stages, that don't fit neatly into MapReduce.

It is however possible to create a normal MapReduce job. Essentially, all that’s needed is
two-stage job where the first stage runs a map function, the second stage runs a reduce function,
and the channel between them sorts the data by key. The [`JobBuilder`](https://www.ookii.org/docs/jumbo-2.0/html/T_Ookii_Jumbo_Jet_Jobs_Builder_JobBuilder.htm)
provides methods for all these operations.
It is however possible to create a normal MapReduce job. Essentially, all that’s needed is two-stage
job where the first stage runs a map function, the second stage runs a reduce function, and the
channel between them sorts the data by key. The [`JobBuilder`][] provides methods for all these
operations.

For example, to convert the word count sample to MapReduce, all we need to do is replace the
`AggregateCounts` function with a reduce function:
Expand All @@ -20,9 +20,9 @@ public static void ReduceWordCount(Utf8String key, IEnumerable<int> values, Reco
```

This is pretty much exactly like the reduce function you’d write in Hadoop (except for convenience
I used the `Sum` method provided by LINQ in .Net, rather than summing the values manually).
I used the [`Sum`][] method provided by LINQ in .Net, rather than summing the values manually).

The `BuildJob` function for the MapReduce version would look as follows:
The [`BuildJob`][] function for the MapReduce version would look as follows:

```C#
var input = job.Read(InputPath, typeof(LineRecordReader));
Expand All @@ -33,19 +33,23 @@ WriteOutput(counted, OutputPath, typeof(TextRecordWriter<>));
```

It starts off the same as the previous version: reads the input, and runs the map function. It then
calls [`SpillSortCombine`](https://www.ookii.org/docs/jumbo-2.0/html/M_Ookii_Jumbo_Jet_Jobs_Builder_JobBuilder_SpillSortCombine__2.htm),
which performs an external merge sort using multiple passes on the sending stage’s side, and merging
the data on the receiving stage’s side. This is identical to the sorting method used by Hadoop 1.0,
and can handle very large amounts of data without putting too much pressure on memory usage. Like
Hadoop, it’s possible to run a combiner during the sort, for which in this case we use the
`ReduceWordCount` function. Note that `SpillSortCombine` doesn’t add an extra stage, but configures
the channel to perform the sorting operation.

After sorting, we call the [`Reduce`](https://www.ookii.org/docs/jumbo-2.0/html/M_Ookii_Jumbo_Jet_Jobs_Builder_JobBuilder_Reduce__3.htm)
function to add a stage that runs the `ReduceWordCount` function, and finally we write the output as
in the previous sample.
calls [`SpillSortCombine`][], which performs an external merge sort using multiple passes on the
sending stage’s side, and merging the data on the receiving stage’s side. This is identical to the
sorting method used by Hadoop 1.0, and can handle very large amounts of data without putting too
much pressure on memory usage. Like Hadoop, it’s possible to run a combiner during the sort, for
which in this case we use the `ReduceWordCount` function. Note that [`SpillSortCombine`][] doesn’t
add an extra stage, but configures the channel to perform the sorting operation.

After sorting, we call the [`Reduce`][] function to add a stage that runs the `ReduceWordCount`
function, and finally we write the output as in the previous sample.

The end result is a job that runs almost exactly like a Hadoop job would. Which in the case of this
WordCount example is likely slower than the version we built earlier.

Next, it's time to look at [how Jumbo Jet executes jobs](JobExecution.md).

[`BuildJob`]: https://www.ookii.org/docs/jumbo-2.0/html/M_Ookii_Jumbo_Jet_Jobs_Builder_JobBuilderJob_BuildJob.htm
[`JobBuilder`]: https://www.ookii.org/docs/jumbo-2.0/html/T_Ookii_Jumbo_Jet_Jobs_Builder_JobBuilder.htm
[`Reduce`]: https://www.ookii.org/docs/jumbo-2.0/html/Overload_Ookii_Jumbo_Jet_Jobs_Builder_JobBuilder_Reduce.htm
[`SpillSortCombine`]: https://www.ookii.org/docs/jumbo-2.0/html/Overload_Ookii_Jumbo_Jet_Jobs_Builder_JobBuilder_SpillSortCombine.htm
[`Sum`]: https://learn.microsoft.com/dotnet/api/system.linq.enumerable.sum
7 changes: 6 additions & 1 deletion doc/refs.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
"#prefix": "https://www.ookii.org/docs/jumbo-2.0/html/",
"#suffix": ".htm",
"AddInput": "M_Ookii_Jumbo_IO_IMultiInputRecordReader_AddInput",
"AggregateCounts": "M_Ookii_Jumbo_Jet_Samples_AdvancedWordCount_AggregateCounts",
"AllowRecordReuseAttribute": "T_Ookii_Jumbo_Jet_AllowRecordReuseAttribute",
"ApplyJobPropertiesAndSettings": "M_Ookii_Jumbo_Jet_Jobs_BaseJobRunner_ApplyJobPropertiesAndSettings",
"AssignAdditionalPartitions": "M_Ookii_Jumbo_IO_IMultiInputRecordReader_AssignAdditionalPartitions",
Expand Down Expand Up @@ -48,7 +49,7 @@
"ITask<TInput, TOutput>.Run": "T_Ookii_Jumbo_Jet_ITask_2",
"IValueWriter<T>": "T_Ookii_Jumbo_IO_IValueWriter_1",
"IWritable": "T_Ookii_Jumbo_IO_IWritable",
"JetClient.JobServer.CreateJob": "!UNKNOWN!",
"JetClient.JobServer.CreateJob": "M_Ookii_Jumbo_Jet_IJobServerClientProtocol_CreateJob",
"JetClient.RunJob": "Overload_Ookii_Jumbo_Jet_JetClient_RunJob",
"JetClient.WaitForJobCompletion": "Overload_Ookii_Jumbo_Jet_JetClient_WaitForJobCompletion",
"JobBuilder": "T_Ookii_Jumbo_Jet_Jobs_Builder_JobBuilder",
Expand Down Expand Up @@ -89,16 +90,20 @@
"RecordStreamOptions.DoNotCrossBoundary": "T_Ookii_Jumbo_IO_RecordStreamOptions",
"RecordWriter": "T_Ookii_Jumbo_IO_RecordWriter",
"RecordWriter<T>": "T_Ookii_Jumbo_IO_RecordWriter_1",
"Reduce": "Overload_Ookii_Jumbo_Jet_Jobs_Builder_JobBuilder_Reduce",
"ReduceWordCount": "M_Ookii_Jumbo_Jet_Samples_WordCount_ReduceWordCount",
"RoundRobinMultiInputRecordReader<T>": "T_Ookii_Jumbo_IO_RoundRobinMultiInputRecordReader_1",
"Run": "M_Ookii_Jumbo_Jet_ITask_2_Run",
"RunJob": "M_Ookii_Jumbo_Jet_Jobs_IJobRunner_RunJob",
"SByte": "#system.sbyte",
"Single": "#system.single",
"SortSpillRecordWriter": "T_Ookii_Jumbo_Jet_Channels_SortSpillRecordWriter_1",
"SpillSort": "M_Ookii_Jumbo_Jet_Jobs_Builder_JobBuilder_SpillSort",
"SpillSortCombine": "Overload_Ookii_Jumbo_Jet_Jobs_Builder_JobBuilder_SpillSortCombine",
"StreamRecordReader<T>": "T_Ookii_Jumbo_IO_StreamRecordReader_1",
"StreamRecordWriter<T>": "T_Ookii_Jumbo_IO_StreamRecordWriter_1",
"String": "#system.string",
"Sum": "#system.linq.enumerable.sum",
"System.Boolean": "#system.boolean",
"System.Int64": "#system.int64",
"System.IO.Stream": "#system.io.stream",
Expand Down

0 comments on commit 10e1265

Please sign in to comment.