Package org.apache.orc.mapred
This package provides convenient access to ORC files using Hadoop's MapReduce InputFormat and OutputFormat.
For reading, set the InputFormat to OrcInputFormat and your map will receive a stream of OrcStruct objects for each row. (Note that ORC files may have any type as the root object instead of structs and then the object type will be the appropriate one.)
The mapping of types is:
ORC Type | Writable Type |
---|---|
array | OrcList |
binary | BytesWritable |
bigint | LongWritable |
boolean | BooleanWritable |
char | Text |
date | DateWritable |
decimal | HiveDecimalWritable |
double | DoubleWritable |
float | FloatWritable |
int | IntWritable |
map | OrcMap |
smallint | ShortWritable |
string | Text |
struct | OrcStruct |
timestamp | OrcTimestamp |
tinyint | ByteWritable |
uniontype | OrcUnion |
varchar | Text |
For writing, set the OutputFormat to OrcOutputFormat and define the property "orc.schema" in your configuration. The property defines the type of the file and uses the Hive type strings, such as "struct<x:int,y:string,z:timestamp>" for a row with an integer, string, and timestamp. You can create an example object using:
String typeStr = "struct<x:int,y:string,z:timestamp>";
OrcStruct row = (OrcStruct) OrcStruct.createValue(
TypeDescription.fromString(typeStr));
Please look at the OrcConf class for the configuration knobs that are available.
-
ClassDescriptionOrcInputFormat<V extends WritableComparable>A MapReduce/Hive input format for ORC files.This type provides a wrapper for OrcStruct so that it can be sent through the MapReduce shuffle as a key.OrcList<E extends WritableComparable>An ArrayList implementation that implements Writable.A TreeMap implementation that implements Writable.OrcMapredRecordReader<V extends WritableComparable>This record reader implements the org.apache.hadoop.mapred API.OrcMapredRecordWriter<V extends Writable>OrcOutputFormat<V extends Writable>An ORC output format that satisfies the org.apache.hadoop.mapred API.A Timestamp implementation that implements Writable.An in-memory representation of a union type.This type provides a wrapper for OrcStruct so that it can be sent through the MapReduce shuffle as a value.