All Classes and Interfaces

Class
Description
Statistics about the ACID operations in an ORC file
 
 
Defines a batch filter that can operate on a VectorizedRowBatch and filter rows by using the selected vector to determine the eligible rows.
The top level interface that the reader uses to read the columns from the ORC file.
Statistics for binary columns.
 
 
 
BloomFilter is a probabilistic data structure for set membership check.
Bare metal bit set implementation.
 
 
This class represents the fix from ORC-101 where we fixed the bloom filter from using the JVM's default character set to always using UTF-8.
Statistics for boolean columns.
 
 
The sections of stripe that we have read.
Builds a list of buffer chunks
 
Under the covers, char is written to ORC the same way as string.
Statistics for all of collections such as Map and List.
Statistics that are available for all types of columns.
 
 
 
The API for compression codecs for ORC.
 
 
 
An enumeration that lists the generic compression algorithms that can be applied to ORC files.
Convert ORC tree readers.
 
 
 
 
 
Override methods like checkEncoding to pass-thru to the convert TreeReader.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
This class has routines to work with encryption within ORC files.
 
A high-performance set implementation used to support fast set membership testing, using Cuckoo hashing.
The API for masking data during column encryption for ORC.
To create a DataMask, the users should come through this API.
An interface to provide override data masks for sub-columns.
Providers can provide one or more kinds of data masks.
The standard DataMasks can be created using this short cut.
Information about the DataMask used to mask the unencrypted data.
An abstract data reader that IO formats can use to read bytes from underlying storage.
 
 
Statistics for DATE columns.
 
Conversion utilities from the hybrid Julian/Gregorian calendar to/from the proleptic Gregorian.
Writer for short decimals in ORCv2.
Statistics for decimal columns.
An identity data mask for decimal types.
 
Interface to define the dictionary used for encoding value in columns of specific types like string, char, varchar, etc.
 
The interface for visitors.
The information about each node.
 
 
Statistics for float and double columns.
An identity data mask for floating point types.
 
A class that is a growable array of bytes.
Dynamic int array that uses primitive types and chunks to avoid copying large number of integers when it resizes.
Information about a key used for column encryption in an ORC file.
TreeWriter that handles column encryption.
Information about a column encryption variant.
Thrown when an invalid file format is encountered.
Deprecated.
Use OrcTail instead
 
 
 
The factory for getting the proper version of the Hadoop shims.
The Julian-Gregorian hybrid calendar system.
A date in the British Cutover calendar system.
This is an in-memory implementation of KeyProvider.
 
 
Implements a stream over an encrypted, but uncompressed stream.
 
Implements a stream over an uncompressed stream.
Statistics for all of the integer columns, such as byte, short, int, and long.
Interface for reading integers.
 
Interface for writing integers.
This is copied from commons-io project to cut the dependency from old Hadoop.
 
 
 
 
A data mask for list types that applies the given masks to its children, but doesn't mask at this level.
 
An identity data mask for integer types.
A data mask for map types that applies the given masks to its children, but doesn't mask at this level.
 
 
A mask factory framework that automatically builds a recursive mask.
The Provider for all of the built-in data masks.
Deprecated.
A memory manager that keeps a global context of how many ORC writers there are and manages the memory between them.
 
Implements a memory manager that keeps a global context of how many ORC writers there are and manages the memory between them.
Murmur3 is successor to Murmur2 fast non-crytographic hash algorithms.
Masking routine that converts every value to NULL.
 
A clone of Hadoop codec pool for ORC; cause it has its own codecs...
Define the configuration properties that Orc understands.
Contains factory methods to read or write ORC files.
 
 
 
 
Create a version number for the ORC file format, so that we can add non-forward compatible changes in the future.
 
 
 
Options for creating ORC file writers.
Records the version of the writer in terms of which bugs have been fixed.
 
This defines the input for any filter operation.
This defines the input for any filter operation.
 
 
 
 
The output stream for writing to ORC files.
 
 
 
 
 
Record the information about each column encryption variant.
This interface separates the physical layout of ORC files from the higher level details.
The target of an output stream.
Service to determine Plugin filters to be used during read.
 
An interface used for seeking to a row index.
An interface for recording positions in a stream.
 
The interface for reading ORC files.
Options for creating a RecordReader.
 
This tracks the keys for reading encrypted columns.
Store the state of whether we've tried to decrypt a local key using this key or not.
Information about an encrypted column.
 
 
A row-by-row iterator for ORC files.
 
 
 
 
Stateless methods shared between RecordReaderImpl and EncodedReaderImpl.
 
Masking strategy that hides most string and numeric values based on unicode character categories.
A reader that reads a sequence of bytes.
A streamFactory that writes a sequence of bytes.
A reader that reads a sequence of integers.
A reader that reads a sequence of light weight compressed integers.
A streamFactory that writes a sequence of integers.
A writer that performs light weight compression over sequence of integers.
 
Infer and track the evolution between the schema as stored in the file and the schema that has been requested by the reader.
 
Wrapper class for the selected vector that centralizes the convenience functions
 
 
Masking strategy that masks String, Varchar, Char and Binary types as SHA 256 hash.
 
The name of a stream within a stripe.
 
The compression and encryption options for writing a stream.
This class provides an adaptor so that tools that want to read an ORC file from an FSDataInputStream can do so.
 
Statistics for string columns.
Using HashTable to represent a dictionary.
A red-black tree that stores strings.
 
Information about the stripes in an ORC file that is provided by the Reader.
This class handles parsing the stripe information and handling the necessary filtering and selection.
 
The statistics for a stripe.
 
Handles the Struct rootType for batch handling.
A data mask for struct types that applies the given masks to its children, but doesn't mask at this level.
 
Statistics for Timestamp columns.
 
Factory for creating ORC tree readers.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
A reader for string columns that are dictionary encoded in the current stripe.
A reader for string columns that are direct encoded in the current stripe.
A tree reader that will read string columns.
 
 
 
 
 
The writers for the specific writers of each type.
 
The parent class of all of the writers for each column.
This is the description of the types in an ORC file.
 
Specify the version of the VectorizedRowBatch that the user desires.
A pretty printer for TypeDescription.
 
 
 
 
A data mask for union types that applies the given masks to its children, but doesn't mask at this level.
 
Deprecated.
This will be removed in the future releases.
 
Under the covers, varchar is written to ORC the same way as string.
A filter that operates on the supplied VectorizedRowBatch and updates the selections.
Base implementation for Dictionary.VisitorContext used to traversing all nodes in a dictionary.
The interface for writing ORC files.
 
 
 
An ORC file writer.
An ORCv2 file writer.
The ORC internal API to the writer.