aboutsummaryrefslogtreecommitdiff
path: root/README
diff options
context:
space:
mode:
Diffstat (limited to 'README')
-rw-r--r--README61
1 files changed, 61 insertions, 0 deletions
diff --git a/README b/README
new file mode 100644
index 0000000..916aa48
--- /dev/null
+++ b/README
@@ -0,0 +1,61 @@
+This directory contains a pre-built version of Hadoop for
+demonstrating OpenJDK-8 on aarch64 systems. This build of Hadoop
+deliberately contains no native code.
+
+Setup
+=====
+
+To setup the environment please source the env.sh script.
+
+ $ . env.sh
+
+You can verify that the installation is complete by verifying the
+existence of hadoop (on your PATH):
+
+ $ which hadoop
+ $ hadoop version
+
+Teragen Demo
+============
+
+The goal of TeraSort is to sort a large amount of data as fast as
+possible. The example comprises the following steps:
+
+ 1) Generating the input data via teragen
+ 2) Running the actual terasort on the input data
+ 3) Validating the sorted output data via teravalidate
+
+Those discrete steps map to the following shell scripts:
+
+ $ teragen <n-gigabytes> <output-filename>
+ $ terasort <input-filename> <outout-filename>
+ $ teravalidate <input-filename> <output-filename>
+
+for example:
+
+ $ teragen 1 teragen-1GB
+ $ terasort teragen-1GB terasort-1GB-sorted
+ $ teravalidate terasort-1GB-sorted terasort-1GB-validated
+
+Available Demos
+===============
+
+ aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
+ aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
+ dbcount: An example job that count the pageview counts from a database.
+ grep: A map/reduce program that counts the matches of a regex in the input.
+ join: A job that effects a join over sorted, equally partitioned datasets
+ multifilewc: A job that counts words from several files.
+ pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
+ pi: A map/reduce program that estimates Pi using monte-carlo method.
+ randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
+ randomwriter: A map/reduce program that writes 10GB of random data per node.
+ secondarysort: An example defining a secondary sort to the reduce.
+ sleep: A job that sleeps at each map and reduce task.
+ sort: A map/reduce program that sorts the data written by the random writer.
+ sudoku: A sudoku solver.
+ teragen: Generate data for the terasort
+ terasort: Run the terasort
+ teravalidate: Checking results of terasort
+ wordcount: A map/reduce program that counts the words in the input files.
+