Package org.apache.orc.mapred
Class OrcInputFormat<V extends WritableComparable>
- All Implemented Interfaces:
InputFormat<NullWritable,
V>
A MapReduce/Hive input format for ORC files.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.hadoop.mapred.FileInputFormat
FileInputFormat.Counter
-
Field Summary
Fields inherited from class org.apache.hadoop.mapred.FileInputFormat
INPUT_DIR_NONRECURSIVE_IGNORE_SUBDIRS, INPUT_DIR_RECURSIVE, LOG, NUM_INPUT_FILES
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic Reader.Options
buildOptions
(Configuration conf, Reader reader, long start, long length) Build the Reader.Options object based on the JobConf and the range of bytes.getRecordReader
(InputSplit inputSplit, JobConf conf, Reporter reporter) protected FileStatus[]
listStatus
(JobConf job) Filter out the 0 byte files, so that we don't generate splits for the empty ORC files.static boolean[]
parseInclude
(TypeDescription schema, String columnsStr) Convert a string with a comma separated list of column ids into the array of boolean that match the schemas.static void
setSearchArgument
(Configuration conf, org.apache.hadoop.hive.ql.io.sarg.SearchArgument sarg, String[] columnNames) Put the given SearchArgument into the configuration for an OrcInputFormat.Methods inherited from class org.apache.hadoop.mapred.FileInputFormat
addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getSplitHosts, getSplits, isSplitable, makeSplit, makeSplit, setInputPathFilter, setInputPaths, setInputPaths, setMinSplitSize
-
Constructor Details
-
OrcInputFormat
public OrcInputFormat()
-
-
Method Details
-
parseInclude
Convert a string with a comma separated list of column ids into the array of boolean that match the schemas.- Parameters:
schema
- the schema for the readercolumnsStr
- the comma separated list of column ids- Returns:
- a boolean array
-
setSearchArgument
public static void setSearchArgument(Configuration conf, org.apache.hadoop.hive.ql.io.sarg.SearchArgument sarg, String[] columnNames) Put the given SearchArgument into the configuration for an OrcInputFormat.- Parameters:
conf
- the configuration to modifysarg
- the SearchArgument to put in the configurationcolumnNames
- the list of column names for the SearchArgument
-
buildOptions
public static Reader.Options buildOptions(Configuration conf, Reader reader, long start, long length) Build the Reader.Options object based on the JobConf and the range of bytes.- Parameters:
conf
- the job configuratoinreader
- the file footer readerstart
- the byte offset to start readerlength
- the number of bytes to read- Returns:
- the options to read with
-
getRecordReader
public RecordReader<NullWritable,V> getRecordReader(InputSplit inputSplit, JobConf conf, Reporter reporter) throws IOException - Specified by:
getRecordReader
in interfaceInputFormat<NullWritable,
V extends WritableComparable> - Specified by:
getRecordReader
in classFileInputFormat<NullWritable,
V extends WritableComparable> - Throws:
IOException
-
listStatus
Filter out the 0 byte files, so that we don't generate splits for the empty ORC files.- Overrides:
listStatus
in classFileInputFormat<NullWritable,
V extends WritableComparable> - Parameters:
job
- the job configuration- Returns:
- a list of files that need to be read
- Throws:
IOException
-