If your company or tool uses ORC, please let us know so that we can update this page.
ORC files have always supporting reading and writing from Hadoop’s MapReduce, but with the ORC 1.1.0 release it is now easier than ever without pulling in Hive’s exec jar and all of its dependencies. OrcStruct now also implements WritableComparable and can be serialized through the MapReduce shuffle.
Apache Hive was the original use case and home for ORC. ORC’s strong type system, advanced compression, column projection, predicate push down, and vectorization support make Hive perform better than any other format for your data.
Apache Nifi is adding support for writing ORC files.
Apache Pig added support for reading and writing ORC files in Pig 14.0.
Apache Spark has added support for reading and writing ORC files with support for column project and predicate push down.
EEL is a Scala BigData API that supports reading and writing data for various file formats and storage systems including to and from ORC. It is designed as a in-process low level API for manipulating data. Data is lazily streamed from source to sink and using standard Scala operations such as map, flatMap and filter, it is especially suited for ETL style applications. EEL supports ORC predicate and projection pushdowns and correct handles conversions from other formats including complex types such as maps, lists or nested structs. A typical use case would be to extract data from JDBC to ORC files housed in HDFS, or directly into Hive tables backed by an ORC file format.
With more than 300 PB of data, Facebook was an early adopter of ORC and quickly put it into production.
The Trino team has done a lot of work integrating ORC into their SQL engine.
Timber adopted ORC for it’s S3 based logging platform that stores petabytes of log data. ORC has been key in ensuring a fast, cost-effective strategy for persisting and querying that data.
HPE Vertica has contributed significantly to the ORC C++ library. ORC is a significant part of Vertica SQL-on-Hadoop (VSQLoH) which brings the performance, reliability and standards compliance of the Vertica Analytic Database to the Hadoop ecosystem.