|author||Andrew McDermott <firstname.lastname@example.org>||2014-02-12 16:59:54 +0000|
|committer||Andrew McDermott <email@example.com>||2014-02-12 16:59:54 +0000|
Signed-off-by: Andrew McDermott <firstname.lastname@example.org>
Diffstat (limited to 'README')
1 files changed, 61 insertions, 0 deletions
diff --git a/README b/README
new file mode 100644
@@ -0,0 +1,61 @@
+This directory contains a pre-built version of Hadoop for
+demonstrating OpenJDK-8 on aarch64 systems. This build of Hadoop
+deliberately contains no native code.
+To setup the environment please source the env.sh script.
+ $ . env.sh
+You can verify that the installation is complete by verifying the
+existence of hadoop (on your PATH):
+ $ which hadoop
+ $ hadoop version
+The goal of TeraSort is to sort a large amount of data as fast as
+possible. The example comprises the following steps:
+ 1) Generating the input data via teragen
+ 2) Running the actual terasort on the input data
+ 3) Validating the sorted output data via teravalidate
+Those discrete steps map to the following shell scripts:
+ $ teragen <n-gigabytes> <output-filename>
+ $ terasort <input-filename> <outout-filename>
+ $ teravalidate <input-filename> <output-filename>
+ $ teragen 1 teragen-1GB
+ $ terasort teragen-1GB terasort-1GB-sorted
+ $ teravalidate terasort-1GB-sorted terasort-1GB-validated
+ aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
+ aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
+ dbcount: An example job that count the pageview counts from a database.
+ grep: A map/reduce program that counts the matches of a regex in the input.
+ join: A job that effects a join over sorted, equally partitioned datasets
+ multifilewc: A job that counts words from several files.
+ pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
+ pi: A map/reduce program that estimates Pi using monte-carlo method.
+ randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
+ randomwriter: A map/reduce program that writes 10GB of random data per node.
+ secondarysort: An example defining a secondary sort to the reduce.
+ sleep: A job that sleeps at each map and reduce task.
+ sort: A map/reduce program that sorts the data written by the random writer.
+ sudoku: A sudoku solver.
+ teragen: Generate data for the terasort
+ terasort: Run the terasort
+ teravalidate: Checking results of terasort
+ wordcount: A map/reduce program that counts the words in the input files.