java.lang.Object

org.apache.hadoop.mapred.FileInputFormat<NullWritable,V>

org.apache.orc.mapred.OrcInputFormat<V>

All Implemented Interfaces:: InputFormat<NullWritable,V>

public class OrcInputFormat<V extends WritableComparable> extends FileInputFormat<NullWritable,V>

A MapReduce/Hive input format for ORC files.

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.hadoop.mapred.FileInputFormat
FileInputFormat.Counter
Field Summary

Fields inherited from class org.apache.hadoop.mapred.FileInputFormat
INPUT_DIR_NONRECURSIVE_IGNORE_SUBDIRS, INPUT_DIR_RECURSIVE, LOG, NUM_INPUT_FILES
Constructor Summary

Constructors

Constructor

Description

OrcInputFormat()
Method Summary

Modifier and Type

Method

Description

static Reader.Options

buildOptions(Configuration conf, Reader reader, long start, long length)

Build the Reader.Options object based on the JobConf and the range of bytes.

RecordReader<NullWritable,V>

getRecordReader(InputSplit inputSplit, JobConf conf, Reporter reporter)

protected FileStatus[]

listStatus(JobConf job)

Filter out the 0 byte files, so that we don't generate splits for the empty ORC files.

static boolean[]

parseInclude(TypeDescription schema, String columnsStr)

Convert a string with a comma separated list of column ids into the array of boolean that match the schemas.

static void

setSearchArgument(Configuration conf, org.apache.hadoop.hive.ql.io.sarg.SearchArgument sarg, String[] columnNames)

Put the given SearchArgument into the configuration for an OrcInputFormat.

Methods inherited from class org.apache.hadoop.mapred.FileInputFormat
addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getSplitHosts, getSplits, isSplitable, makeSplit, makeSplit, setInputPathFilter, setInputPaths, setInputPaths, setMinSplitSize

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- OrcInputFormat
  
  public OrcInputFormat()
Method Details
- parseInclude
  
  public static boolean[] parseInclude(TypeDescription schema, String columnsStr)
  
  Convert a string with a comma separated list of column ids into the array of boolean that match the schemas.
  
  Parameters:
  
  schema - the schema for the reader
  
  columnsStr - the comma separated list of column ids
  
  Returns:
  
  a boolean array
- setSearchArgument
  
  public static void setSearchArgument(Configuration conf, org.apache.hadoop.hive.ql.io.sarg.SearchArgument sarg, String[] columnNames)
  
  Put the given SearchArgument into the configuration for an OrcInputFormat.
  
  Parameters:
  
  conf - the configuration to modify
  
  sarg - the SearchArgument to put in the configuration
  
  columnNames - the list of column names for the SearchArgument
- buildOptions
  
  public static Reader.Options buildOptions(Configuration conf, Reader reader, long start, long length)
  
  Build the Reader.Options object based on the JobConf and the range of bytes.
  
  Parameters:
  
  conf - the job configuratoin
  
  reader - the file footer reader
  
  start - the byte offset to start reader
  
  length - the number of bytes to read
  
  Returns:
  
  the options to read with
- getRecordReader
  
  public RecordReader<NullWritable,V> getRecordReader(InputSplit inputSplit, JobConf conf, Reporter reporter) throws IOException
  
  Specified by:
  
  getRecordReader in interface InputFormat<NullWritable,V extends WritableComparable>
  
  Specified by:
  
  getRecordReader in class FileInputFormat<NullWritable,V extends WritableComparable>
  
  Throws:
  
  IOException
- listStatus
  
  protected FileStatus[] listStatus(JobConf job) throws IOException
  
  Filter out the 0 byte files, so that we don't generate splits for the empty ORC files.
  
  Overrides:
  
  listStatus in class FileInputFormat<NullWritable,V extends WritableComparable>
  
  Parameters:
  
  job - the job configuration
  
  Returns:
  
  a list of files that need to be read
  
  Throws:
  
  IOException

Class OrcInputFormat<V extends WritableComparable>

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.hadoop.mapred.FileInputFormat

Field Summary

Fields inherited from class org.apache.hadoop.mapred.FileInputFormat

Constructor Summary

Method Summary

Methods inherited from class org.apache.hadoop.mapred.FileInputFormat

Methods inherited from class java.lang.Object

Constructor Details

OrcInputFormat

Method Details

parseInclude

setSearchArgument

buildOptions

getRecordReader

listStatus