All Classes and Interfaces
Class
Description
Statistics about the ACID operations in an ORC file
Defines a batch filter that can operate on a
VectorizedRowBatch
and filter rows by using the
selected vector to determine the eligible rows.The top level interface that the reader uses to read the columns from the
ORC file.
Statistics for binary columns.
BloomFilter is a probabilistic data structure for set membership check.
Bare metal bit set implementation.
This class represents the fix from ORC-101 where we fixed the bloom filter
from using the JVM's default character set to always using UTF-8.
Statistics for boolean columns.
The sections of stripe that we have read.
Builds a list of buffer chunks
Under the covers, char is written to ORC the same way as string.
Statistics for all of collections such as Map and List.
Statistics that are available for all types of columns.
The API for compression codecs for ORC.
An enumeration that lists the generic compression algorithms that
can be applied to ORC files.
Convert ORC tree readers.
Override methods like checkEncoding to pass-thru to the convert TreeReader.
This class has routines to work with encryption within ORC files.
A high-performance set implementation used to support fast set membership testing,
using Cuckoo hashing.
The API for masking data during column encryption for ORC.
To create a DataMask, the users should come through this API.
An interface to provide override data masks for sub-columns.
Providers can provide one or more kinds of data masks.
The standard DataMasks can be created using this short cut.
Information about the DataMask used to mask the unencrypted data.
An abstract data reader that IO formats can use to read bytes from underlying storage.
Statistics for DATE columns.
Conversion utilities from the hybrid Julian/Gregorian calendar to/from the
proleptic Gregorian.
Writer for short decimals in ORCv2.
Statistics for decimal columns.
An identity data mask for decimal types.
Interface to define the dictionary used for encoding value in columns
of specific types like string, char, varchar, etc.
The interface for visitors.
The information about each node.
Statistics for float and double columns.
An identity data mask for floating point types.
A class that is a growable array of bytes.
Dynamic int array that uses primitive types and chunks to avoid copying
large number of integers when it resizes.
Information about a key used for column encryption in an ORC file.
TreeWriter that handles column encryption.
Information about a column encryption variant.
Thrown when an invalid file format is encountered.
Deprecated.
The factory for getting the proper version of the Hadoop shims.
The Julian-Gregorian hybrid calendar system.
A date in the British Cutover calendar system.
This is an in-memory implementation of
KeyProvider
.Implements a stream over an encrypted, but uncompressed stream.
Implements a stream over an uncompressed stream.
Statistics for all of the integer columns, such as byte, short, int, and
long.
Interface for reading integers.
Interface for writing integers.
This is copied from commons-io project to cut the dependency
from old Hadoop.
A data mask for list types that applies the given masks to its
children, but doesn't mask at this level.
An identity data mask for integer types.
A data mask for map types that applies the given masks to its
children, but doesn't mask at this level.
A mask factory framework that automatically builds a recursive mask.
The Provider for all of the built-in data masks.
Deprecated.
A memory manager that keeps a global context of how many ORC
writers there are and manages the memory between them.
Implements a memory manager that keeps a global context of how many ORC
writers there are and manages the memory between them.
Murmur3 is successor to Murmur2 fast non-crytographic hash algorithms.
Masking routine that converts every value to NULL.
A clone of Hadoop codec pool for ORC; cause it has its own codecs...
Define the configuration properties that Orc understands.
Contains factory methods to read or write ORC files.
Create a version number for the ORC file format, so that we can add
non-forward compatible changes in the future.
Options for creating ORC file writers.
Records the version of the writer in terms of which bugs have been fixed.
This defines the input for any filter operation.
This defines the input for any filter operation.
The output stream for writing to ORC files.
Record the information about each column encryption variant.
This interface separates the physical layout of ORC files from the higher
level details.
The target of an output stream.
Service to determine Plugin filters to be used during read.
An interface used for seeking to a row index.
An interface for recording positions in a stream.
The interface for reading ORC files.
Options for creating a RecordReader.
This tracks the keys for reading encrypted columns.
Store the state of whether we've tried to decrypt a local key using this
key or not.
Information about an encrypted column.
A row-by-row iterator for ORC files.
Stateless methods shared between RecordReaderImpl and EncodedReaderImpl.
Masking strategy that hides most string and numeric values based on unicode
character categories.
A reader that reads a sequence of bytes.
A streamFactory that writes a sequence of bytes.
A reader that reads a sequence of integers.
A reader that reads a sequence of light weight compressed integers.
A streamFactory that writes a sequence of integers.
A writer that performs light weight compression over sequence of integers.
Infer and track the evolution between the schema as stored in the file and
the schema that has been requested by the reader.
Wrapper class for the selected vector that centralizes the convenience functions
Masking strategy that masks String, Varchar, Char and Binary types
as SHA 256 hash.
The name of a stream within a stripe.
The compression and encryption options for writing a stream.
This class provides an adaptor so that tools that want to read an ORC
file from an FSDataInputStream can do so.
Statistics for string columns.
Using HashTable to represent a dictionary.
A red-black tree that stores strings.
Information about the stripes in an ORC file that is provided by the Reader.
This class handles parsing the stripe information and handling the necessary
filtering and selection.
The statistics for a stripe.
Handles the Struct rootType for batch handling.
A data mask for struct types that applies the given masks to its
children, but doesn't mask at this level.
Statistics for Timestamp columns.
Factory for creating ORC tree readers.
A reader for string columns that are dictionary encoded in the current
stripe.
A reader for string columns that are direct encoded in the current
stripe.
A tree reader that will read string columns.
The writers for the specific writers of each type.
The parent class of all of the writers for each column.
This is the description of the types in an ORC file.
Specify the version of the VectorizedRowBatch that the user desires.
A pretty printer for TypeDescription.
A data mask for union types that applies the given masks to its
children, but doesn't mask at this level.
Deprecated.
This will be removed in the future releases.
Under the covers, varchar is written to ORC the same way as string.
A filter that operates on the supplied
VectorizedRowBatch
and updates the selections.Base implementation for
Dictionary.VisitorContext
used to traversing
all nodes in a dictionary.The interface for writing ORC files.
An ORC file writer.
An ORCv2 file writer.
The ORC internal API to the writer.
OrcTail
instead