org.apache.orc.mapred (ORC MapReduce 2.1.3 API)

package org.apache.orc.mapred

This package provides convenient access to ORC files using Hadoop's MapReduce InputFormat and OutputFormat.

For reading, set the InputFormat to OrcInputFormat and your map will receive a stream of OrcStruct objects for each row. (Note that ORC files may have any type as the root object instead of structs and then the object type will be the appropriate one.)

The mapping of types is:

ORC Type	Writable Type
array	OrcList
binary	BytesWritable
bigint	LongWritable
boolean	BooleanWritable
char	Text
date	DateWritable
decimal	HiveDecimalWritable
double	DoubleWritable
float	FloatWritable
int	IntWritable
map	OrcMap
smallint	ShortWritable
string	Text
struct	OrcStruct
timestamp	OrcTimestamp
tinyint	ByteWritable
uniontype	OrcUnion
varchar	Text

For writing, set the OutputFormat to OrcOutputFormat and define the property "orc.schema" in your configuration. The property defines the type of the file and uses the Hive type strings, such as "struct<x:int,y:string,z:timestamp>" for a row with an integer, string, and timestamp. You can create an example object using:


String typeStr = "struct<x:int,y:string,z:timestamp>";
OrcStruct row = (OrcStruct) OrcStruct.createValue(
    TypeDescription.fromString(typeStr));

Please look at the OrcConf class for the configuration knobs that are available.

Classes

Class

Description

OrcInputFormat<V extends WritableComparable>

A MapReduce/Hive input format for ORC files.

OrcKey

This type provides a wrapper for OrcStruct so that it can be sent through the MapReduce shuffle as a key.

OrcList<E extends WritableComparable>

An ArrayList implementation that implements Writable.

OrcMap<K extends WritableComparable,V extends WritableComparable>

A TreeMap implementation that implements Writable.

OrcMapredRecordReader<V extends WritableComparable>

This record reader implements the org.apache.hadoop.mapred API.

OrcMapredRecordWriter<V extends Writable>

OrcOutputFormat<V extends Writable>

An ORC output format that satisfies the org.apache.hadoop.mapred API.

OrcStruct

OrcTimestamp

A Timestamp implementation that implements Writable.

OrcUnion

An in-memory representation of a union type.

OrcValue

This type provides a wrapper for OrcStruct so that it can be sent through the MapReduce shuffle as a value.

Package org.apache.orc.mapred