Class OrcInputFormat<V extends WritableComparable>

java.lang.Object
org.apache.hadoop.mapred.FileInputFormat<NullWritable,V>
org.apache.orc.mapred.OrcInputFormat<V>
All Implemented Interfaces:
InputFormat<NullWritable,V>

public class OrcInputFormat<V extends WritableComparable> extends FileInputFormat<NullWritable,V>
A MapReduce/Hive input format for ORC files.
  • Constructor Details

    • OrcInputFormat

      public OrcInputFormat()
  • Method Details

    • parseInclude

      public static boolean[] parseInclude(TypeDescription schema, String columnsStr)
      Convert a string with a comma separated list of column ids into the array of boolean that match the schemas.
      Parameters:
      schema - the schema for the reader
      columnsStr - the comma separated list of column ids
      Returns:
      a boolean array
    • setSearchArgument

      public static void setSearchArgument(Configuration conf, org.apache.hadoop.hive.ql.io.sarg.SearchArgument sarg, String[] columnNames)
      Put the given SearchArgument into the configuration for an OrcInputFormat.
      Parameters:
      conf - the configuration to modify
      sarg - the SearchArgument to put in the configuration
      columnNames - the list of column names for the SearchArgument
    • buildOptions

      public static Reader.Options buildOptions(Configuration conf, Reader reader, long start, long length)
      Build the Reader.Options object based on the JobConf and the range of bytes.
      Parameters:
      conf - the job configuratoin
      reader - the file footer reader
      start - the byte offset to start reader
      length - the number of bytes to read
      Returns:
      the options to read with
    • getRecordReader

      public RecordReader<NullWritable,V> getRecordReader(InputSplit inputSplit, JobConf conf, Reporter reporter) throws IOException
      Specified by:
      getRecordReader in interface InputFormat<NullWritable,V extends WritableComparable>
      Specified by:
      getRecordReader in class FileInputFormat<NullWritable,V extends WritableComparable>
      Throws:
      IOException
    • listStatus

      protected FileStatus[] listStatus(JobConf job) throws IOException
      Filter out the 0 byte files, so that we don't generate splits for the empty ORC files.
      Overrides:
      listStatus in class FileInputFormat<NullWritable,V extends WritableComparable>
      Parameters:
      job - the job configuration
      Returns:
      a list of files that need to be read
      Throws:
      IOException