Skip to content
Philip (flip) Kromer edited this page May 9, 2012 · 1 revision

Mapper

Mapper<K1,V1,K2,V2>

void       	~ map(key, val, clctr, rptr)          	| Maps a single input key/value pair into an intermediate key/value pair.
void       	~ configure(job_conf)                 	| Initializes a new instance from a JobConf.
void       	~ close()                             	| Closes this stream and releases any system resources associated with it.          

Reducer

Interface Reducer<K2,V2,K3,V3>

void       	~ reduce(key, vals_iter, clctr, rptr) 	| Reduces values for a given key.
void       	~ configure(job_conf)                 	| Initializes a new instance from a JobConf.
void       	~ close()                             	| Closes this stream and releases any system resources associated with it.          

OutputCollector

void       	~ collect(K key, V value)             	| Adds a key/value pair to the output.

WritableComparable

long       	~ compareTo(WritableComparable o)     	| Compares this object with the specified object for order. Returns a negative integer, zero, or a positive integer as this object is less than, equal to, or greater than the specified object.

Stringifier

Stringifier interface offers two methods to convert an object to a string representation and restore the object given its string representation.

Object     	~ fromString(str)                     	|  Restores the object from its string representation.
String     	~ toString(T obj)                     	| Converts the object to a string representation

Counter

A named counter that tracks the progress of a map/reduce job. Counters represent global counters, defined either by the Map-Reduce framework or applications. Each Counter is named by an Enum and has a long for the value. Counters are bunched into Groups, each comprising of counters from a particular Enum class.

String     	~ getDisplayName()                    	| Get the display name of the counter.
String     	~ getName()                    
long       	~ getValue()                          	| What is the current value of this counter?
           	~ increment(long)                     	| Increment this counter by the given value
           	~ setValue(long)                      	| Set this counter by the given value

RecordReader

RecordReader reads <key, value> pairs from an InputSplit. RecordReader, typically, converts the byte-oriented view of the input, provided by the InputSplit, and presents a record-oriented view for the Mapper & Reducer tasks for processing. It thus assumes the responsibility of processing record boundaries and presenting the tasks with keys and values. See Also: InputSplit, InputFormat

void       	~ close()                             	| Close this InputSplit to future operations.
K          	~ createKey()                         	| Create an object of the appropriate type to be used as a key.
V          	~ createValue()                       	| Create an object of the appropriate type to be used as a value.
long       	~ getPos()                            	| Returns the current position in the input.
float      	~ getProgress()                       	| How much of the input has the RecordReader consumed i.e.
boolean    	~ next(K key, V value)                	| Reads the next key/value pair from the input for processing.

void       	~ close()                             	| Close the record reader.
KEYIN      	~ getCurrentKey()                     	| Get the current key
VALUEIN    	~ getCurrentValue()                   	| Get the current value.
float      	~ getProgress()                       	| The current progress of the record reader through its data.
void       	~ initialize(input_split, context)    	| Called once at initialization.
boolean    	~ nextKeyValue()                      	| Read the next key, value pair.

ResetableIterator

void       	~ add(T item)                         	| Add an element to the collection of elements to iterate over.
void       	~ clear()                             	| Close datasources, but do not release internal resources.
void       	~ close()                             	| Close datasources and release resources.
boolean    	~ hasNext()                           	| True if a call to next may return a value.
boolean    	~ next(T val)                         	| Assign next value to actual.
boolean    	~ replay(T val)                       	| Assign last value returned to actual.
void       	~ reset()                             	| Set iterator to return to the start of its range.

Reporter

Counter    	~ getCounter(Enum<?> name)            	| Get the Counters.Counter of the given group with the given name.
Counter    	~ getCounter(group, name)             	| Get the Counters.Counter of the given group with the given name.
InputSplit 	~ getInputSplit()                     	| Get the InputSplit object for a map.
float      	~ getProgress()                       	| Get the progress of the task.
void       	~ incrCounter(key, amount)            	| Increments the counter identified by the key, which can be of any Enum type, by the specified amount.
void       	~ incrCounter(group, counter, amount) 	| Increments the counter identified by the group and counter name by the specified amount.
void       	~ setStatus(String status)            	| Set the status description for the task.
void       	~ progress()                          	| Report progress to the Hadoop framework.

MetricsTag

String     	~ description() 
boolean    	~ equals(Object obj) 
int        	~ hashCode() 
Metrics    	~ info() 
String     	~ name() 
String     	~ toString() 
String     	~ value()                             	| Get the value of the tag

Container

Container represents an allocated resource in the cluster. The ResourceManager is the sole authority to allocate any Container to applications. The allocated Container is always on a single node and has a unique ContainerId. It has a specific amount of Resource allocated. It includes details such as:

  • ContainerId for the container, which is globally unique.

  • NodeId of the node on which it is allocated.

  • HTTP uri of the node.

  • Resource allocated to the container.

  • Priority at which the container was allocated.

  • ContainerState of the container.

  • ContainerToken of the container, used to securely verify authenticity of the allocation.

  • ContainerStatus of the container.

    CtnrStatus ~ getContainerStatus() | Get the ContainerStatus of the container. CtnrToken ~ getContainerToken() | Get the ContainerToken for the container. CtnrId ~ getId() | Get the globally unique identifier for the container. String ~ getNodeHttpAddress() | Get the http uri of the node on which the container is allocated. NodeId ~ getNodeId() | Get the identifier of the node on which the container is allocated. Priority ~ getPriority() | Get the Priority at which the Container was allocated. Resource ~ getResource() | Get the Resource allocated to the container. CtnrState ~ getState() | Get the current ContainerState of the container. String ~ getDiagnostics() | Get diagnostic messages for failed containers. int ~ getExitStatus() | Get the exit status for the container.

Writable

A serializable object which implements a simple, efficient, serialization protocol, based on DataInput and DataOutput. Any key or value type in the Hadoop Map-Reduce framework implements this interface.

Implementations typically implement a static read(DataInput) method which constructs a new instance, calls readFields(DataInput) and returns the instance.

class MyWritable < Writable
  field :counter,     Integer
  field :timestamp,   Long

  # create a new instance, consume contents, return the populated writable
  def self.read(raw)
    record = new
    record.read_fields(raw)
    record
  end
  
  # Serialize the fields of this object to `out`.
  def write(out) 
    out << counter << timestamp
  end
  
  # Deserialize the fields of this object from `raw`.
  # For efficiency, implementations should attempt to re-use storage in the existing object where possible.
  def read_fields(raw)
    self.counter   = raw.get_int
    self.timestamp = raw.get_long
  end
end

FilenameTemplate

Event data escape sequences

  • %{host} -- host
  • %{nanos} -- nanos
  • %{priority} -- priority string
  • %{body} -- body
  • %% -- a % character.
  • %t -- Unix time in millis
  • %a -- locale’s short weekday name (Mon, Tue, …)
  • %A -- locale’s full weekday name (Monday, Tuesday, …)
  • %b -- locale’s short month name (Jan, Feb,…)
  • %B -- locale’s long month name (January, February,…)
  • %c -- locale’s date and time (Thu Mar 3 23:05:25 2005)
  • %d -- day of month (01)
  • %D -- date; same as %m/%d/%y
  • %H -- hour (00..23)
  • %I -- hour (01..12)
  • %j -- day of year (001..366)
  • %k -- hour ( 0..23)
  • %l -- hour ( 1..12)
  • %m -- month (01..12)
  • %M -- minute (00..59)
  • %P -- locale’s equivalent of am or pm
  • %s -- seconds since 1970-01-01 00:00:00 UTC
  • %S -- second (00..60)
  • %y -- last two digits of year (00..99)
  • %Y -- year (2010)
  • %z -- +hhmm numeric timezone (for example, -0400)