Package org.apache.orc.mapred


package org.apache.orc.mapred

This package provides convenient access to ORC files using Hadoop's MapReduce InputFormat and OutputFormat.

For reading, set the InputFormat to OrcInputFormat and your map will receive a stream of OrcStruct objects for each row. (Note that ORC files may have any type as the root object instead of structs and then the object type will be the appropriate one.)

The mapping of types is:

ORC TypeWritable Type
arrayOrcList
binaryBytesWritable
bigintLongWritable
booleanBooleanWritable
charText
dateDateWritable
decimalHiveDecimalWritable
doubleDoubleWritable
floatFloatWritable
intIntWritable
mapOrcMap
smallintShortWritable
stringText
structOrcStruct
timestampOrcTimestamp
tinyintByteWritable
uniontypeOrcUnion
varcharText

For writing, set the OutputFormat to OrcOutputFormat and define the property "orc.schema" in your configuration. The property defines the type of the file and uses the Hive type strings, such as "struct<x:int,y:string,z:timestamp>" for a row with an integer, string, and timestamp. You can create an example object using:


String typeStr = "struct<x:int,y:string,z:timestamp>";
OrcStruct row = (OrcStruct) OrcStruct.createValue(
    TypeDescription.fromString(typeStr));

Please look at the OrcConf class for the configuration knobs that are available.