ORC 1.8.0 Released

The ORC team is excited to announce the release of ORC v1.8.0.

New Feature and Notable Changes:

  • ORC-450 Support selecting list indices without materializing list items
  • ORC-824 Add column statistics for List and Map
  • ORC-1004 Java ORC writer supports the selection vector
  • ORC-1075 Support reading ORC files with no column statistics
  • ORC-1125 Support decoding decimals in RLE
  • ORC-1136 Optimize reads by combining multiple reads without significant separation into a single read
  • ORC-1138 Seek vs Read Optimization
  • ORC-1172 Add row count limit config for one stripe
  • ORC-1212 Upgrade protobuf-java to 3.17.3
  • ORC-1220 Set min.hadoop.version to 2.7.3
  • ORC-1248 Redefine Hadoop dependency for Apache ORC 1.8.0
  • ORC-1256 Publish test-jar to maven central
  • ORC-1260 Publish shaded-protobuf classifier artifacts

Improvements:

  • ORC-825 Use Empty Array For Collections toArray
  • ORC-826 Do Not Use Collection Contains/Get
  • ORC-828 Improve Fetch Data Set Process
  • ORC-829 Optimize Serialization percentileBits
  • ORC-831 Do Not Copy String When Flushing Dictionary
  • ORC-833 RunLengthIntegerReaderV2 Calculate Batch Size Once
  • ORC-834 Do Not Convert to String in DecimalFromTimestampTreeReader
  • ORC-835 Cache TRUE/FALSE Bytes in StringGroupFromBooleanTreeReader
  • ORC-836 StringGroupFromDoubleTreeReader Use Double toString
  • ORC-837 Reuse HiveDecimalWritable in ConvertTreeReaderFactory
  • ORC-838 Simplify compareTo/equals/putBuffer of ByteBufferAllocatorPool
  • ORC-840 Remove Superfluous Array Fill in RecordReaderImpl
  • ORC-841 Remove Superfluous Array Fill in StringHashTableDictionary
  • ORC-842 Remove newKey from StringHashTableDictionary
  • ORC-844 Improve hashCode Methods
  • ORC-847 Do Not Create Empty Array in StringGroupFromBinaryTreeReader
  • ORC-852 Allow DynamicByteArray to Return a ByteBuffer
  • ORC-853 Optimize writeDouble Implementation
  • ORC-855 Remove Unused isRepeating from RunLengthIntegerReaderV2
  • ORC-865 Bump opencsv from 3.9 to 5.5.1
  • ORC-883 Dependency Audit and QA
  • ORC-897 optimization loop termination condition in readerIsCompatible method
  • ORC-935 Bump commons-csv from 1.8 to 1.9.0
  • ORC-937 Replace deprecated method
  • ORC-958 Convert command support overwrite option
  • ORC-969 Evaluate SearchArguments using file and stripe level stats
  • ORC-975 Avoid double counting closestFixedBits in percentileBits method
  • ORC-982 Extract checkstyle to a single file, help newcomers check code style
  • ORC-988 Bump opencsv from 5.5.1 to 5.5.2
  • ORC-992 Reached max repeat length, we can directly decide to use DELTA encoding
  • ORC-1005 Make that the java and C++ implementations of determineEncoding in RunLengthIntegerWriterV2 are consistent.
  • ORC-1007 Fix a warning from the shade plugin
  • ORC-1013 Renaming a parameter in constructors of TreeWriter’s derived classes
  • ORC-1014 Add details when we get IOExceptions from file system
  • ORC-1020 Improve orc::RleDecoderV2::nextDirect
  • ORC-1027 Filter processing to allow filter injections that cannot be represented via SArgs
  • ORC-1047 Handle quoted field names during string schema parsing
  • ORC-1077 Remove commons-codec dependency and use java.util.Base64
  • ORC-1099 Extend ReadIntent to support MAP and UNION type
  • ORC-1101 Improve malformed STRUCT handling
  • ORC-1122 Add buffer to decode the whole run in RleDecoderV2
  • ORC-1137 Improve float/double conversion in DoubleColumnReader::next()
  • ORC-1149 Bump slf4j.version to 1.7.36
  • ORC-1150 Improve RowReaderImpl::computeBatchSize()
  • ORC-1152 Support encoding short decimals in RLEv2
  • ORC-1156 Update opencsv to 5.6
  • ORC-1163 Bump zookeeper from 3.7.0 to 3.8.0
  • ORC-1169 Use Hadoop 3.3.2 on Java 17+
  • ORC-1178 Use hadoop 3.3.3 on Java 17+

ORC 1.7.6 Released

The ORC team is excited to announce the release of ORC v1.7.6.

The bug fixes:

  • ORC-1204 ORC MapReduce writer to flush when long arrays
  • ORC-1205 nextVector should invoke ensureSize when reusing vectors
  • ORC-1215 Remove a wrong NotNull annotation on value of setAttribute
  • ORC-1222 Upgrade tools.hadoop.version to 2.10.2
  • ORC-1227 Use Constructor.newInstance instead of Class.newInstance
  • ORC-1228 Fix setAttribute to handle null value

The test changes:

  • ORC-932 Bump byte-buddy from 1.10.19 to 1.11.12 (#842)
  • ORC-1169 Use Hadoop to 3.3.2 on Java 17+ (#1113)
  • ORC-1178 Use Hadoop 3.3.3 on Java 17+ (#1129)
  • ORC-1193 Bump parquet.version to 1.12.3
  • ORC-1207 Upgrade Spark to 3.3.0
  • ORC-1210 Upgrade maven to 3.8.6
  • ORC-1234 Upgrade objenesis to 3.2 in Spark benchmark
  • ORC-1235 Bump avro.version to 1.11.1
  • ORC-1240 Update site README to use apache/orc-dev DockerHub image
  • ORC-1241 Use apache/orc-dev DockerHub repository in Docker tests
  • ORC-1244 Upgrade byte-buddy to 1.12.13 in branch-1.7
  • ORC-1245 Use Hadoop 3.3.4 on Java 17+ and benchmark

The documentation changes:

  • MINOR: Update DOAP with new releases (#1127)
  • ORC-900 Update doap_orc.rdf for Apache Projects page (#806)
  • ORC-1231 Update supported OS list in building.md
  • ORC-1237 Remove a wrong image link to article-footer.png
  • ORC-1238 Update DOAP with 1.7.5

The tasks:

  • ORC-1185 Add merge_orc_pr.py
  • ORC-1187 Use main instead of master in merge_orc_pr.py
  • ORC-1213 Use https in ThirdpartyToolchain.cmake
  • ORC-1226 Add a deprecation warning for Hadoop 2.7.2 and below

ORC 1.7.5 Released

The ORC team is excited to announce the release of ORC v1.7.5.

The bug fixes:

  • ORC-1151 Fix ColumnWriter for non-UTC Timestamp columns
  • ORC-1160 Fix seekToRow can’t seek within selected row group
  • ORC-1133 Fix csv-import tool options
  • ORC-1183 Upgrade gson to 2.9.0
  • ORC-1186 Limit family in aarch64 profile
  • ORC-1188 Fix ORC_PREFER_STATIC_ZLIB

The improvements:

  • ORC-1198 Add a new PhysicalFsWriter constructor with FSDataOutputStream parameter
  • ORC-1199 Use Google mirror of Maven Central as the primary

The test changes:

  • ORC-1155 Add Ubuntu 22.04 to docker tests
  • ORC-1154 Bump hive.version from 3.1.2 to 3.1.3
  • ORC-1161 Add MacOS 12 and remove MacOS 10
  • ORC-1174 Add Ubuntu 22.04 to GitHub Action
  • ORC-1182 Use slf4j-simple instead of deprecated slf4j-log4j12
  • ORC-1184 Use Hadoop 3.3.3 in benchmark module
  • ORC-1189 Update README.md and help command message in benchmark module and .gitignore
  • ORC-1190 Fix ORCWriterBenchMark dumpDir initialization
  • ORC-1191 Updated TLC Taxi Benchmark Dataset
  • ORC-1192 Use orc.zstd instead of orc.none
  • ORC-1196 Add Spark benchmark integration tests to GHA
  • ORC-1201 Remove Debian 9 from Docker Tests

The documentation changes:

  • Add ASF verification instruction link

Pavan Lanka added as committer

The ORC PMC is happy to add Pavan Lanka as an ORC committer for the work on introducing LazyIO of non-filter columns and optimizing stripe index and data reads.

Thank you for your work on ORC, Pavan!

ORC adds Yiqun Zhang to PMC

The Apache ORC Project Management Committee (PMC) is happy to announce that Yiqun Zhang has joined us as a new member of the PMC. Yiqun has been showing consistent contributions as a committer, and participated in both major and maintenance releases by actively helping the release managers with testing the release candidates.

Please welcome Yiqun to the ORC PMC!

ORC 1.7.4 Released

The ORC team is excited to announce the release of ORC v1.7.4.

The bug fixes:

  • ORC-1120 Remove C++ library limitation about write version
  • ORC-1121 Fix column conversion check bug which causes column filters don’t work
  • ORC-1127 Add missing version of UNSTABLE-PRE-2.0
  • ORC-1146 Float category missing check if the statistic sum is a finite value
  • ORC-1147 Use isNaN instead of isFinite to determine the contain NaN values

The improvements:

  • ORC-236 Support UNION type in Java Convert tool
  • ORC-1116 Fix csv-import tool when exporting long bytes
  • ORC-1123 Add estimationMemory method for writer

The test changes:

  • ORC-1145 Add Java 18 to GitHub Action CI
  • ORC-1118 Support Java 17 and ARM64 docker tests

The documentation changes:

  • ORC-1117 Add Dask page at Using in Python section
  • ORC-1119 Remove timestamp from ORC API docs

ORC 1.6.14 Released

The ORC team is excited to announce the release of ORC v1.6.14.

The bug fixes:

  • ORC-1121 Fix column coversion check bug which causes column filters don’t work
  • ORC-1146 Float category missing check if the statistic sum is a finite value
  • ORC-1147 Use isNaN instead of isFinite to determine the contain NaN values

The ‘tests’ fixes:

  • ORC-1016 Use openssl@1.1 in GitHub Action MacOS CIs
  • ORC-1113 Remove CentOS 8 from docker-based tests

Quanlong Huang added as committer

The ORC PMC is happy to add Quanlong Huang as an ORC committer for the work on ORC C++ library and Apache Impala integration.

Thank you for your work on ORC, Quanlong!

ORC 1.7.3 Released

The ORC team is excited to announce the release of ORC v1.7.3.

The ‘bug’ fixes:

  • ORC-1060 Reduce memory usage when vectorized reading dictionary string encoding columns
  • ORC-1065 Fix IndexOutOfBoundsException in ReaderImpl.extractFileTail
  • ORC-1067 [C++] Upgrade ZSTD to 1.5.1
  • ORC-1078 Row group end offset doesn’t accommodate all the blocks
  • ORC-1081 Fix heap-use-after-free in SearchArgumentBuilderImpl::end()
  • ORC-1087 [C++] Handle unloaded seek positions when seeking in an uncompressed chunk
  • ORC-1092 [C++] Upgrade LZ4 to version 1.9.3
  • ORC-1102 [C++] Upgrade ZSTD to 1.5.2

The ‘tools’ improvements:

  • ORC-1055 [C++] Add the timezone option for the csv-import tool
  • ORC-1082 Improve FileDump and JsonFileDump to be robust on missing column statistics
  • ORC-1092 [C++] Support specifying type ids or column names in cpp tools

The ‘documentation’ patches:

  • ORC-1050 Update ORC site README.md and release process page
  • ORC-1069 Update building.md
  • ORC-1071 Update ‘adopters’ page
  • ORC-1091 Add ‘Tests’ section at ORC ‘develop’ page
  • ORC-1112 Add ‘Using with Python’ web page
  • ORC-1114 Update ‘Using with Python’ page with ‘PyArrow’ 7.0.0

The ‘task’ patches:

  • ORC-1070 Upgrade site docker image to use Ubuntu 20.04
  • ORC-1072 Add ‘Stale’ GitHub Action job
  • ORC-1094 Enable GitHub issues tab
  • ORC-1095 Deprecate ‘UnknownFormatException’

The ‘tests’ fixes:

  • ORC-875 Add GitHub Action job for Windows Server 2019
  • ORC-878 Bump auto-service from 1.0-rc7 to 1.0
  • ORC-881 Bump slf4j.version from 1.7.30 to 1.7.32
  • ORC-989 Bump checkstyle from 8.45.1 to 9.0
  • ORC-993 Bump junit.version from 5.7.2 to 5.8.0
  • ORC-1018 Bump checkstyle from 9.0 to 9.0.1
  • ORC-1033 Bump junit.version from 5.8.0 to 5.8.1
  • ORC-1044 Bump reproducible-build-maven-plugin to 0.14
  • ORC-1048 Bump checkstyle from 9.0.1 to 9.1
  • ORC-1052 Bump avro.version from 1.10.2 to 1.11.0
  • ORC-1057 Bump junit.version from 5.8.1 to 5.8.2
  • ORC-1061 Bump checkstyle from 9.1 to 9.2
  • ORC-1066 Bump guava from 30.1.1-jre to 31.0.1-jre
  • ORC-1068 [C++] Stabilize HAS_POST_2038 test
  • ORC-1073 Remove appveyor.yml
  • ORC-1076 Remove Travis PR Builder Link from README.md
  • ORC-1079 Add Linux Clang 11 GitHub Action test coverage
  • ORC-1080 Remove .travis.yml
  • ORC-1084 Bump checkstyle from 9.2 to 9.2.1
  • ORC-1086 Bump reproducible-build-maven-plugin from 0.14 to 0.15
  • ORC-1090 Disable Clang 13.0-specific compilation warnings
  • ORC-1093 Remove debian8 specific code in run-one.sh
  • ORC-1096 Bump slf4j.version to 1.7.33
  • ORC-1103 Use Maven 3.8.4
  • ORC-1104 Use Spark 3.2.1 in benchmark
  • ORC-1105 fetch-data.sh should use zsh instead of bash
  • ORC-1106 Use transitive commons-lang3 dependency in bench module
  • ORC-1107 Fix NPE at benchmark data schema loading
  • ORC-1108 Use RawLocalFileSystem to skip checksum files during benchmark data generation
  • ORC-1109 Use zstd instead of none in the default compress option
  • ORC-1111 Bump build-helper-maven-plugin from 3.2.0 to 3.3.0
  • ORC-1113 Remove CentOS 8 from docker-based tests
  • ORC-1115 Suppress Illegal reflective access warnings on Java9+ Tests

ORC 1.6.13 Released

The ORC team is excited to announce the release of ORC v1.6.13.

The bug fixes:

  • ORC-1065 Fix IndexOutOfBoundsException in ReaderImpl.extractFileTail
  • ORC-1078 Row group end offset doesn’t accommodate all the blocks

The ‘tests’ fixes:

  • ORC-875 Add GitHub Action job for Windows Server 2019
  • ORC-941 Move MacOS 10.15/11.5 test from Travis to GitHub Action
  • ORC-1079 Add Linux Clang 11 GitHub Action test coverage
  • ORC-1080 Remove .travis.yml

ORC 1.7.2 Released

The ORC team is excited to announce the release of ORC v1.7.2.

The bug fixes:

  • ORC-492 Avoid potential ArrayIndexOutOfBoundsException when getting WriterVersionn
  • ORC-1053 Fix time zone offset precision when convert tool converts LocalDateTime to Timestamp is not consistent with the internal default precision of ORC
  • ORC-1041 Use memcpy during LZO decompression
  • ORC-1059 Align findColumns behaviour between 1.6 and 1.7 release

The ‘tools’ improvements:

  • ORC-1012 Support specifying columns in orc-scan
  • ORC-1017 Add sizes tool to determine and display the sizes of each column in a set of files
  • ORC-1023 Support writing bloom filters in ConvertTool

The ‘tests’ fixes:

  • ORC-915 Remove io.netty.netty from Spark benchmark
  • ORC-938 Bump netty-all from 4.1.42.Final to 4.1.66.Final
  • ORC-948 Add hive benchmark integration tests
  • ORC-957 Bump netty-all from 4.1.66.Final to 4.1.67.Final
  • ORC-1021 Add -fno-omit-frame-pointer in DEBUG and RELWITHDEBINFO builds
  • ORC-1051 Update benchmark dependencies

ORC 1.7.1 Released

The ORC team is excited to announce the release of ORC v1.7.1.

The bug fixes of ORC 1.7:

  • ORC-879 Flaky Test for TestJsonReader
  • ORC-1000 Use Java 17 in GitHub Action
  • ORC-1002 Add java17 profile for Java17 unit testing
  • ORC-1008 Overflow detection code is incorrect in IntegerColumnStatisticsImpl
  • ORC-1009 [C++] Missing string include causes build failure with MSVC++
  • ORC-1010 Bump tzdata from tzdata-2020e-1.tar.xz to tzdata-2021b-1.tar.xz
  • ORC-1011 Activate java17 profile automatically
  • ORC-1015 Update OrcFile.WriterOptions::memory javadoc
  • ORC-1016 Use openssl@1.1 in GitHub Action MacOS CIs
  • ORC-1024 BloomFilter hash computation is inconsistent between Java and C++ clients
  • ORC-1029 Could not load ‘org.apache.orc.DataMask.Provider’ when using orc encryption and spark executor with multi cores!
  • ORC-1030 Java Tools Recover File command does not accurately find OrcFile.MAGIC
  • ORC-1032 Bump parquet.version from 1.12.0 to 1.12.2
  • ORC-1034 The search byte array algorithm is incorrectly implemented in FileDump.java
  • ORC-1035 backupDataPath may be incorrect in recoverFile
  • ORC-1036 Due to tzdata upgrade, the fixed download links in CI are often not working
  • ORC-1037 Bump spark.version from 3.1.2 to 3.2.0
  • ORC-1039 Make FileDump.recoverFile handle side files only if they exist
  • ORC-1040 Add Debian 11 docker test
  • ORC-1042 Ignore unused-function C++ compile warning on CentOS 7
  • ORC-1043 Fix C++ conversion compilation error in CentOS 7

ORC 1.6.12 Released

The ORC team is excited to announce the release of ORC v1.6.12.

The bug fixes of ORC 1.6.12:

  • ORC-1008 Overflow detection code is incorrect in IntegerColumnStatisticsImpl
  • ORC-1010 Bump tzdata from tzdata-2020e-1.tar.xz to tzdata-2021b-1.tar.xz
  • ORC-1024 BloomFilter hash computation is inconsistent between Java and C++ clients
  • ORC-1029 Could not load ‘org.apache.orc.DataMask.Provider’ when using orc encryption and spark executor with multi cores!
  • ORC-1034 The search byte array algorithm is incorrectly implemented in FileDump.java
  • ORC-1035 backupDataPath may be incorrect in recoverFile
  • ORC-1036 Due to tzdata upgrade, the fixed download links in CI are often not working
  • ORC-1040 Add Debian 11 docker test
  • ORC-1042 Ignore unused-function C++ compile warning on CentOS 7
  • ORC-1043 Fix C++ conversion compilation error in CentOS 7

ORC adds William Hyun to PMC

On behalf of the Apache ORC Project Management Committee (PMC), it gives me great pleasure to announce that William Hyun has joined the PMC. William has led several areas including Java 17/Apple Silicon support, Java Tools improvement, Code quality improvement using static analysis, CI/Docker test coverage improvement, and Apache ORC 1.7 migration support at Apache Arrow/Druid/Iceberg.

Please join me in welcoming William to the ORC PMC!

ORC 1.7.0 Released

The ORC team is excited to announce the release of ORC v1.7.0.

The new features of ORC 1.7:

  • ORC-377 Support Snappy compression in C++ Writer
  • ORC-577 Support row-level filtering
  • ORC-716 Build and test on Java 17-EA
  • ORC-731 Improve Java Tools
  • ORC-742 LazyIO of non-filter columns
  • ORC-751 Implement Predicate Pushdown in C++ Reader
  • ORC-755 Introduce OrcFilterContext
  • ORC-757 Add Hashtable implementation for dictionary
  • ORC-780 Support LZ4 Compression in C++ Writer
  • ORC-797 Allow writers to get the stripe information
  • ORC-818 Build and test in Apple Silicon
  • ORC-861 Bump CMake minimum requirement to 2.8.12
  • ORC-867 Upgrade hive-storage-api to 2.8.1
  • ORC-984 Save the software version that wrote each ORC file

Known issues:

ORC 1.6.11 Released

The ORC team is excited to announce the release of ORC v1.6.11.

The new features of ORC 1.6:

  • ORC-14 Add column encryption.
  • ORC-189 Add timestamp with local timezone
  • ORC-203 Trim minimum and maximum string values
  • ORC-363 Add zstd support in Java
  • ORC-397 Support selectively disabling dictionaries
  • ORC-522 Add type annotations

Known issues:

ORC 1.5.13 Released

The ORC team is excited to announce the release of ORC v1.5.13.

The new features of ORC 1.5:

  • ORC-179 Add ORC C++ Writer
  • ORC-91 Support for variable length blocks in HDFS.
  • ORC-199 Implement a CSV to ORC converter
  • ORC-344 Support for using Decimal64ColumnVector
  • ORC-345 Adding Decimal64StatisticsImpl
  • ORC-331 Support for building C++ under MSVC.
  • ORC-234 Support for older versions of Hadoop (>= 2.2.x)
  • ORC-305 Added statistics for size on disk

Known issues:

ORC 1.6.10 Released

The ORC team is excited to announce the release of ORC v1.6.10..

The new features of ORC 1.6:

  • ORC-14 Add column encryption.
  • ORC-189 Add timestamp with local timezone
  • ORC-203 Trim minimum and maximum string values
  • ORC-363 Add zstd support in Java
  • ORC-397 Support selectively disabling dictionaries
  • ORC-522 Add type annotations

Known issues:

ORC 1.6.9 Released

The ORC team is excited to announce the release of ORC v1.6.9.

The new features of ORC 1.6:

  • ORC-14 Add column encryption.
  • ORC-189 Add timestamp with local timezone
  • ORC-203 Trim minimum and maximum string values
  • ORC-363 Add zstd support in Java
  • ORC-397 Support selectively disabling dictionaries
  • ORC-522 Add type annotations

Known issues:

ORC 1.6.8 Released

The ORC team is excited to announce the release of ORC v1.6.8.

The new features of ORC 1.6:

  • ORC-14 Add column encryption.
  • ORC-189 Add timestamp with local timezone
  • ORC-203 Trim minimum and maximum string values
  • ORC-363 Add zstd support in Java
  • ORC-397 Support selectively disabling dictionaries
  • ORC-522 Add type annotations

Known issues:

William Hyun added as committer

The ORC PMC is happy to add William Hyun as an ORC committer for the work on improving ORC’s code quality and integration to Apache Spark and Apache Iceberg.

Thank you for your work on ORC, William!

ORC adds Panagiotis Garefalakis to PMC

On behalf of the Apache ORC Project Management Committee (PMC), it gives me great pleasure to announce that Panagiotis Garefalakis has joined the PMC. Panagiotis has radically improved the integration between Hive and ORC.

Please join me in welcoming Panagiotis to the ORC PMC!

ORC 1.6.7 Released

The ORC team is excited to announce the release of ORC v1.6.7.

The new features of ORC 1.6:

  • ORC-14 Add column encryption.
  • ORC-189 Add timestamp with local timezone
  • ORC-203 Trim minimum and maximum string values
  • ORC-363 Add zstd support in Java
  • ORC-397 Support selectively disabling dictionaries
  • ORC-522 Add type annotations

Known issues:

ORC 1.6.6 Released

The ORC team is excited to announce the release of ORC v1.6.6.

The new features of ORC 1.6:

  • ORC-14 Add column encryption.
  • ORC-189 Add timestamp with local timezone
  • ORC-203 Trim minimum and maximum string values
  • ORC-363 Add zstd support in Java
  • ORC-397 Support selectively disabling dictionaries
  • ORC-522 Add type annotations

Known issues:

ORC 1.6.5 Released

The ORC team is excited to announce the release of ORC v1.6.5.

The new features of ORC 1.6:

  • ORC-14 Add column encryption.
  • ORC-189 Add timestamp with local timezone
  • ORC-203 Trim minimum and maximum string values
  • ORC-363 Add zstd support in Java
  • ORC-397 Support selectively disabling dictionaries
  • ORC-522 Add type annotations

Known issues:

ORC 1.5.12 Released

The ORC team is excited to announce the release of ORC v1.5.12.

The new features of ORC 1.5:

  • ORC-179 Add ORC C++ Writer
  • ORC-91 Support for variable length blocks in HDFS.
  • ORC-199 Implement a CSV to ORC converter
  • ORC-344 Support for using Decimal64ColumnVector
  • ORC-345 Adding Decimal64StatisticsImpl
  • ORC-331 Support for building C++ under MSVC.
  • ORC-234 Support for older versions of Hadoop (>= 2.2.x)
  • ORC-305 Added statistics for size on disk

Known issues:

ORC 1.6.4 Released

The ORC team is excited to announce the release of ORC v1.6.4.

The new features of ORC 1.6:

  • ORC-14 Add column encryption.
  • ORC-189 Add timestamp with local timezone
  • ORC-203 Trim minimum and maximum string values
  • ORC-363 Add zstd support in Java
  • ORC-397 Support selectively disabling dictionaries
  • ORC-522 Add type annotations

Known issues:

  • ORC-667 Positional mapping for nested struct types should not applied by default

ORC 1.5.11 Released

The ORC team is excited to announce the release of ORC v1.5.11.

The new features of ORC 1.5:

  • ORC-179 Add ORC C++ Writer
  • ORC-91 Support for variable length blocks in HDFS.
  • ORC-199 Implement a CSV to ORC converter
  • ORC-344 Support for using Decimal64ColumnVector
  • ORC-345 Adding Decimal64StatisticsImpl
  • ORC-331 Support for building C++ under MSVC.
  • ORC-234 Support for older versions of Hadoop (>= 2.2.x)
  • ORC-305 Added statistics for size on disk

Known issues:

  • ORC-667 Positional mapping for nested struct types should not applied by default

ORC 1.5.10 Released

The ORC team is excited to announce the release of ORC v1.5.10.

The new features of ORC 1.5:

  • ORC-179 Add ORC C++ Writer
  • ORC-91 Support for variable length blocks in HDFS.
  • ORC-199 Implement a CSV to ORC converter
  • ORC-344 Support for using Decimal64ColumnVector
  • ORC-345 Adding Decimal64StatisticsImpl
  • ORC-331 Support for building C++ under MSVC.
  • ORC-234 Support for older versions of Hadoop (>= 2.2.x)
  • ORC-305 Added statistics for size on disk

Known issues:

ORC 1.6.3 Released

The ORC team is excited to announce the release of ORC v1.6.3.

The new features of ORC 1.6:

  • ORC-14 Add column encryption.
  • ORC-189 Add timestamp with local timezone
  • ORC-203 Trim minimum and maximum string values
  • ORC-363 Add zstd support in Java
  • ORC-397 Support selectively disabling dictionaries
  • ORC-522 Add type annotations

Known issues:

ORC 1.5.9 Released

The ORC team is excited to announce the release of ORC v1.5.9.

The new features of ORC 1.5:

  • ORC-179 Add ORC C++ Writer
  • ORC-91 Support for variable length blocks in HDFS.
  • ORC-199 Implement a CSV to ORC converter
  • ORC-344 Support for using Decimal64ColumnVector
  • ORC-345 Adding Decimal64StatisticsImpl
  • ORC-331 Support for building C++ under MSVC.
  • ORC-234 Support for older versions of Hadoop (>= 2.2.x)
  • ORC-305 Added statistics for size on disk

Known issues:

ORC adds Dongjoon Hyun to PMC

On behalf of the Apache ORC Project Management Committee (PMC), it gives me great pleasure to announce that Dongjoon Hyun has joined the PMC. Dongjoon has radically improved the integration between Spark and ORC.

Please join me in welcoming Dongjoon to the ORC PMC!

ORC 1.4.5 Released

The ORC team is excited to announce the release of ORC v1.4.5.

The new features of ORC 1.4:

  • ORC-72 Add benchmark code for file formats.
  • ORC-87 Fix timestamp statistics in C++.
  • ORC-150 Add tool to convert from JSON.
  • ORC-151 Reduce the size of tools.jar.
  • ORC-174 Create a nohive variant of the jars.

Known issues:

ORC 1.6.2 Released

The ORC team is excited to announce the release of ORC v1.6.2.

The new features of ORC 1.6:

  • ORC-14 Add column encryption.
  • ORC-189 Add timestamp with local timezone
  • ORC-203 Trim minimum and maximum string values
  • ORC-363 Add zstd support in Java
  • ORC-397 Support selectively disabling dictionaries
  • ORC-522 Add type annotations

Known issues:

ORC 1.5.8 Released

The ORC team is excited to announce the release of ORC v1.5.8.

The new features of ORC 1.5:

  • ORC-179 Add ORC C++ Writer
  • ORC-91 Support for variable length blocks in HDFS.
  • ORC-199 Implement a CSV to ORC converter
  • ORC-344 Support for using Decimal64ColumnVector
  • ORC-345 Adding Decimal64StatisticsImpl
  • ORC-331 Support for building C++ under MSVC.
  • ORC-234 Support for older versions of Hadoop (>= 2.2.x)
  • ORC-305 Added statistics for size on disk

Known issues:

ORC 1.6.1 Released

The ORC team is excited to announce the release of ORC v1.6.1.

The new features of ORC 1.6:

  • ORC-14 Add column encryption.
  • ORC-189 Add timestamp with local timezone
  • ORC-203 Trim minimum and maximum string values
  • ORC-363 Add zstd support in Java
  • ORC-397 Support selectively disabling dictionaries
  • ORC-522 Add type annotations

Known issues:

  • ORC-562 Don’t wrap the readerSchema with ACID fields, if it already is

  • ORC-569 The first index entry may have empty positions

  • ORC-571 ArrayIndexOutOfBoundsException in StripePlanner.readRowIndex

ORC 1.5.7 Released

The ORC team is excited to announce the release of ORC v1.5.7.

The new features of ORC 1.5:

  • ORC-179 Add ORC C++ Writer
  • ORC-91 Support for variable length blocks in HDFS.
  • ORC-199 Implement a CSV to ORC converter
  • ORC-344 Support for using Decimal64ColumnVector
  • ORC-345 Adding Decimal64StatisticsImpl
  • ORC-331 Support for building C++ under MSVC.
  • ORC-234 Support for older versions of Hadoop (>= 2.2.x)
  • ORC-305 Added statistics for size on disk

Known issues:

  • ORC-562 Don’t wrap the readerSchema with ACID fields, if it already is

  • ORC-569 The first index entry may have empty positions

ORC 1.6.0 Released

The ORC team is excited to announce the release of ORC v1.6.0.

The new features of ORC 1.6:

  • ORC-14 Add column encryption.
  • ORC-189 Add timestamp with local timezone
  • ORC-203 Trim minimum and maximum string values
  • ORC-363 Add zstd support in Java
  • ORC-397 Support selectively disabling dictionaries
  • ORC-522 Add type annotations

Known issues:

  • ORC-414 ORC files with malformed protobuf objects can crash C++ reader

  • ORC-555 IllegalArgumentException when reading files with large footers

  • ORC-562 Don’t wrap the readerSchema with ACID fields, if it already is

  • ORC-569 The first index entry may have empty positions

  • ORC-571 ArrayIndexOutOfBoundsException in StripePlanner.readRowIndex

ORC 1.5.6 Released

The ORC team is excited to announce the release of ORC v1.5.6.

Users are advised that as of ORC 1.5.6, ORCReaders that aren’t used to create RecordReaders should be closed.

The new features of ORC 1.5:

  • ORC-179 Add ORC C++ Writer
  • ORC-91 Support for variable length blocks in HDFS.
  • ORC-199 Implement a CSV to ORC converter
  • ORC-344 Support for using Decimal64ColumnVector
  • ORC-345 Adding Decimal64StatisticsImpl
  • ORC-331 Support for building C++ under MSVC.
  • ORC-234 Support for older versions of Hadoop (>= 2.2.x)
  • ORC-305 Added statistics for size on disk

Known issues:

  • ORC-525 Users must close ORC Readers after use

  • ORC-414 ORC files with malformed protobuf objects can crash C++ reader

  • ORC-562 Don’t wrap the readerSchema with ACID fields, if it already is

  • ORC-569 The first index entry may have empty positions

Renat Vailiullin and Sandeep More added as committers

The ORC PMC is happy to add Renat Vailiullin and Sandeep More as an ORC committers. Renat has done a lot of work to improve the Windows builds and Sandeep has been working on the data masking and statistics.

Thank you for your work on ORC, Renat and Sandeep!

ORC 1.5.5 Released

The ORC team is excited to announce the release of ORC v1.5.5.

The new features of ORC 1.5:

  • ORC-179 Add ORC C++ Writer
  • ORC-91 Support for variable length blocks in HDFS.
  • ORC-199 Implement a CSV to ORC converter
  • ORC-344 Support for using Decimal64ColumnVector
  • ORC-345 Adding Decimal64StatisticsImpl
  • ORC-331 Support for building C++ under MSVC.
  • ORC-234 Support for older versions of Hadoop (>= 2.2.x)
  • ORC-305 Added statistics for size on disk

Known issues:

  • ORC-414 ORC files with malformed protobuf objects can crash C++ reader

  • ORC-562 Don’t wrap the readerSchema with ACID fields, if it already is

  • ORC-569 The first index entry may have empty positions

ORC adds Gang Wu to PMC

On behalf of the Apache ORC Project Management Committee (PMC), it gives me great pleasure to announce that Gang Wu has joined the PMC. Gang has been doing great work on the C++ code base.

Please join me in welcoming Gang to the ORC PMC!

Dongjoon Hyun added as committer

The ORC PMC is happy to add Dongjoon Hyun as an ORC committer for the work on improving ORC’s integration to Spark.

Thank you for your work on ORC, Dongjoon!

ORC 1.5.4 Released

The ORC team is excited to announce the release of ORC v1.5.4.

The new features of ORC 1.5:

  • ORC-179 Add ORC C++ Writer
  • ORC-91 Support for variable length blocks in HDFS.
  • ORC-199 Implement a CSV to ORC converter
  • ORC-344 Support for using Decimal64ColumnVector
  • ORC-345 Adding Decimal64StatisticsImpl
  • ORC-331 Support for building C++ under MSVC.
  • ORC-234 Support for older versions of Hadoop (>= 2.2.x)
  • ORC-305 Added statistics for size on disk

Known issues:

  • ORC-414 ORC files with malformed protobuf objects can crash C++ reader

  • ORC-562 Don’t wrap the readerSchema with ACID fields, if it already is

  • ORC-569 The first index entry may have empty positions

ORC 1.5.3 Released

The ORC team is excited to announce the release of ORC v1.5.3.

The new features of ORC 1.5:

  • ORC-179 Add ORC C++ Writer
  • ORC-91 Support for variable length blocks in HDFS.
  • ORC-199 Implement a CSV to ORC converter
  • ORC-344 Support for using Decimal64ColumnVector
  • ORC-345 Adding Decimal64StatisticsImpl
  • ORC-331 Support for building C++ under MSVC.
  • ORC-234 Support for older versions of Hadoop (>= 2.2.x)
  • ORC-305 Added statistics for size on disk

Known issues:

  • ORC-414 ORC files with malformed protobuf objects can crash C++ reader

  • ORC-562 Don’t wrap the readerSchema with ACID fields, if it already is

  • ORC-569 The first index entry may have empty positions

ORC 1.5.2 Released

The ORC team is excited to announce the release of ORC v1.5.2.

The new features of ORC 1.5:

  • ORC-179 Add ORC C++ Writer
  • ORC-91 Support for variable length blocks in HDFS.
  • ORC-199 Implement a CSV to ORC converter
  • ORC-344 Support for using Decimal64ColumnVector
  • ORC-345 Adding Decimal64StatisticsImpl
  • ORC-331 Support for building C++ under MSVC.
  • ORC-234 Support for older versions of Hadoop (>= 2.2.x)
  • ORC-305 Added statistics for size on disk

Known issues:

  • ORC-414 ORC files with malformed protobuf objects can crash C++ reader

  • ORC-562 Don’t wrap the readerSchema with ACID fields, if it already is

  • ORC-569 The first index entry may have empty positions

ORC 1.5.1 Released

The ORC team is excited to announce the release of ORC v1.5.1.

The new features of ORC 1.5:

  • ORC-179 Add ORC C++ Writer
  • ORC-91 Support for variable length blocks in HDFS.
  • ORC-199 Implement a CSV to ORC converter
  • ORC-344 Support for using Decimal64ColumnVector
  • ORC-345 Adding Decimal64StatisticsImpl
  • ORC-331 Support for building C++ under MSVC.
  • ORC-234 Support for older versions of Hadoop (>= 2.2.x)
  • ORC-305 Added statistics for size on disk

Known issues:

  • ORC-414 ORC files with malformed protobuf objects can crash C++ reader

  • ORC-562 Don’t wrap the readerSchema with ACID fields, if it already is

  • ORC-569 The first index entry may have empty positions

ORC 1.5.0 Released

The ORC team is excited to announce the release of ORC v1.5.0.

The new features of ORC 1.5:

  • ORC-179 Add ORC C++ Writer
  • ORC-91 Support for variable length blocks in HDFS.
  • ORC-199 Implement a CSV to ORC converter
  • ORC-344 Support for using Decimal64ColumnVector
  • ORC-345 Adding Decimal64StatisticsImpl
  • ORC-331 Support for building C++ under MSVC.
  • ORC-234 Support for older versions of Hadoop (>= 2.2.x)
  • ORC-305 Added statistics for size on disk

Known issues:

  • ORC-367 Boolean columns are read incorrectly when using seek.

  • ORC-414 ORC files with malformed protobuf objects can crash C++ reader

  • ORC-562 Don’t wrap the readerSchema with ACID fields, if it already is

  • ORC-569 The first index entry may have empty positions

ORC 1.4.4 Released

The ORC team is excited to announce the release of ORC v1.4.4.

The new features of ORC 1.4:

  • ORC-72 Add benchmark code for file formats.
  • ORC-87 Fix timestamp statistics in C++.
  • ORC-150 Add tool to convert from JSON.
  • ORC-151 Reduce the size of tools.jar.
  • ORC-174 Create a nohive variant of the jars.

Known issues:

ORC 1.4.3 Released

The ORC team is excited to announce the release of ORC v1.4.3.

The new features of ORC 1.4:

  • ORC-72 Add benchmark code for file formats.
  • ORC-87 Fix timestamp statistics in C++.
  • ORC-150 Add tool to convert from JSON.
  • ORC-151 Reduce the size of tools.jar.
  • ORC-174 Create a nohive variant of the jars.

Known issues:

  • CVE-2018-8015 ORC files with malformed types cause stack overflow.

ORC 1.4.2 Released

The ORC team is excited to announce the release of ORC v1.4.2.

The new features of ORC 1.4:

  • ORC-72 Add benchmark code for file formats.
  • ORC-87 Fix timestamp statistics in C++.
  • ORC-150 Add tool to convert from JSON.
  • ORC-151 Reduce the size of tools.jar.
  • ORC-174 Create a nohive variant of the jars.

Known issues:

  • CVE-2018-8015 ORC files with malformed types cause stack overflow.

  • ORC-285 Empty vector batches of floats or doubles cause EOFException

ORC 1.4.1 Released

The ORC team is excited to announce the release of ORC v1.4.1.

The new features of ORC 1.4:

  • ORC-72 Add benchmark code for file formats.
  • ORC-87 Fix timestamp statistics in C++.
  • ORC-150 Add tool to convert from JSON.
  • ORC-151 Reduce the size of tools.jar.
  • ORC-174 Create a nohive variant of the jars.

Known issues:

  • CVE-2018-8015 ORC files with malformed types cause stack overflow.

  • ORC-285 Empty vector batches of floats or doubles cause EOFException

ORC 1.3.4 Released

The ORC team is excited to announce the release of ORC v1.3.4.

The new features of ORC 1.3:

  • ORC-58 Split C++ Reader into Reader and RowReader
  • ORC-120 Add backwards compatibility mode for schema evolution.
  • ORC-124 Fast decimal improvements
  • ORC-128 Add ability to get statistics from writer

Known issues:

  • CVE-2018-8015 ORC files with malformed types cause stack overflow.

  • ORC-285 Empty vector batches of floats or doubles cause EOFException

ORC adds Eugene and Deepak to PMC

On behalf of the Apache ORC Project Management Committee (PMC), it gives me great pleasure to announce that Eugene Koifman and Deepak Majeti have joined the PMC. Eugene has been critical working on ACID and Deepak has been doing great work on the C++ code base.

Please join me in welcoming Eugene and Deepak to the ORC PMC!

Deepak Majeti added as committer

The ORC PMC is happy to add Deepak Majeti as an ORC committer for the work on the C++ ORC reader including both contributions and reviews of other’s patches. Thank you for your work on ORC, Deepak!

ORC 1.4.0 Released

The ORC team is excited to announce the release of ORC v1.4.0.

The new features of ORC 1.4:

  • ORC-72 Add benchmark code for file formats.
  • ORC-87 Fix timestamp statistics in C++.
  • ORC-150 Add tool to convert from JSON.
  • ORC-151 Reduce the size of tools.jar.
  • ORC-174 Create a nohive variant of the jars.

Known issues:

  • CVE-2018-8015 ORC files with malformed types cause stack overflow.

  • ORC-285 Empty vector batches of floats or doubles cause EOFException

ORC 1.3.3 Released

The ORC team is excited to announce the release of ORC v1.3.3.

The new features of ORC 1.3:

  • ORC-58 Split C++ Reader into Reader and RowReader
  • ORC-120 Add backwards compatibility mode for schema evolution.
  • ORC-124 Fast decimal improvements
  • ORC-128 Add ability to get statistics from writer

Known issues:

  • CVE-2018-8015 ORC files with malformed types cause stack overflow.

  • ORC-285 Empty vector batches of floats or doubles cause EOFException

ORC 1.3.2 Released

The ORC team is excited to announce the release of ORC v1.3.2.

The new features of ORC 1.3:

  • ORC-58 Split C++ Reader into Reader and RowReader
  • ORC-120 Add backwards compatibility mode for schema evolution.
  • ORC-124 Fast decimal improvements
  • ORC-128 Add ability to get statistics from writer

Known issues:

  • CVE-2018-8015 ORC files with malformed types cause stack overflow.

  • ORC-285 Empty vector batches of floats or doubles cause EOFException

ORC 1.3.1 Released

The ORC team is excited to announce the release of ORC v1.3.1.

The new features of ORC 1.3:

  • ORC-58 Split C++ Reader into Reader and RowReader
  • ORC-120 Add backwards compatibility mode for schema evolution.
  • ORC-124 Fast decimal improvements
  • ORC-128 Add ability to get statistics from writer

Known issues:

  • CVE-2018-8015 ORC files with malformed types cause stack overflow.

  • ORC-135 Predicate push down is incorrect on timestamps when moved between timezones

  • ORC-285 Empty vector batches of floats or doubles cause EOFException

ORC 1.3.0 Released

The ORC team is excited to announce the release of ORC v1.3.0.

The new features of ORC 1.3:

  • ORC-58 Split C++ Reader into Reader and RowReader
  • ORC-120 Add backwards compatibility mode for schema evolution.
  • ORC-124 Fast decimal improvements
  • ORC-128 Add ability to get statistics from writer

Known issues:

  • ORC-135 Predicate push down is incorrect on timestamps when moved between timezones

  • ORC-285 Empty vector batches of floats or doubles cause EOFException

ORC adds Gopal Vijayaraghavan to PMC

On behalf of the Apache ORC Project Management Committee (PMC), it gives me great pleasure to announce that Gopal Vijayaraghavan has joined the PMC. Gopal has done an amazing job at speeding up ORC in many ways.

Please join me in welcoming Gopal to the ORC PMC!

Congratulations Gopal!

ORC adds new committers

As part of the removal of the ORC code base from Hive, the ORC PMC has offered to make any existing Hive committers into ORC committers. The new ORC committers coming from Hive are:

  • Aihua Xu
  • Ashutosh Chauhan
  • Carl Steinbach
  • Chaoyu Tang
  • Chinna Rao Lalam
  • Daniel Dai
  • Eugene Koifman
  • Ferdinand Xu
  • Jason Dere
  • Jesus Camacho Rodriguez
  • Jimmy Xiang
  • Lars Francke
  • Matthew McCline
  • Mithun Radhakrishnan
  • Naveen Gangam
  • Pengcheng Xiong
  • Rajesh Balamohan
  • Rui Li
  • Sergio Pena
  • Siddharth Seth
  • Vaibhav Gumashta
  • Wei Zheng
  • Yongzhi Chen

ORC 1.2.3 Released

The ORC team is excited to announce the release of ORC v1.2.3. This release fixes some bugs in the Java schema evolution code.

The new features of ORC 1.2:

  • ORC-54 Evolve schemas based on field name rather than index
  • ORC-84 Create a separate java tool module.
  • ORC-77 and ORC-81 Implement LZO and LZ4 compression codecs.
  • ORC-92 Add support for nested column id selection in C++
  • ORC-69 Add batch option support in orc-scan tools.

Important fixes:

  • HIVE-14214 ORC schema evolution and predicate push down do not work together.

Known issues:

  • CVE-2018-8015 ORC files with malformed types cause stack overflow.

  • ORC-135 Predicate push down is incorrect on timestamps when moved between timezones

  • ORC-285 Empty vector batches of floats or doubles cause EOFException

ORC 1.2.2 Released

The ORC team is excited to announce the release of ORC v1.2.2.

The new features of ORC 1.2:

  • ORC-54 Evolve schemas based on field name rather than index
  • ORC-84 Create a separate java tool module.
  • ORC-77 and ORC-81 Implement LZO and LZ4 compression codecs.
  • ORC-92 Add support for nested column id selection in C++
  • ORC-69 Add batch option support in orc-scan tools.

Important fixes:

  • HIVE-14214 ORC schema evolution and predicate push down do not work together.

Known issues:

  • CVE-2018-8015 ORC files with malformed types cause stack overflow.

  • ORC-135 Predicate push down is incorrect on timestamps when moved between timezones

  • ORC-285 Empty vector batches of floats or doubles cause EOFException

ORC 1.2.1 Released

The ORC team is excited to announce the release of ORC v1.2.1.

The new features of ORC 1.2:

  • ORC-54 Evolve schemas based on field name rather than index
  • ORC-84 Create a separate java tool module.
  • ORC-77 and ORC-81 Implement LZO and LZ4 compression codecs.
  • ORC-92 Add support for nested column id selection in C++
  • ORC-69 Add batch option support in orc-scan tools.

Important fixes:

  • HIVE-14214 ORC schema evolution and predicate push down do not work together.

Known issues:

  • CVE-2018-8015 ORC files with malformed types cause stack overflow.

  • ORC-135 Predicate push down is incorrect on timestamps when moved between timezones

  • ORC-285 Empty vector batches of floats or doubles cause EOFException

ORC 1.2.0 Released

The ORC team is excited to announce the release of ORC v1.2.0.

The new features of ORC 1.2:

  • ORC-54 Evolve schemas based on field name rather than index
  • ORC-84 Create a separate java tool module.
  • ORC-77 and ORC-81 Implement LZO and LZ4 compression codecs.
  • ORC-92 Add support for nested column id selection in C++
  • ORC-69 Add batch option support in orc-scan tools.

Important fixes:

  • HIVE-14214 ORC schema evolution and predicate push down do not work together.

Known issues:

  • CVE-2018-8015 ORC files with malformed types cause stack overflow.

  • ORC-101 Bloom filters for string and decimal use inconsistent encoding

  • ORC-135 Predicate push down is incorrect on timestamps when moved between timezones

  • ORC-285 Empty vector batches of floats or doubles cause EOFException

ORC 1.1.2 Released

The ORC team is excited to announce the release of ORC v1.1.2. This release contains the Java reader and writer and the native C++ ORC reader and tools.

The major new features in ORC 1.1 are:

  • ORC-1 Copy the Java ORC code from Hive.
  • ORC-10 Fix the C++ reader to correctly read timestamps from timezones with different daylight savings rules.
  • ORC-52 Add mapred and mapreduce connectors.

Known issues:

  • CVE-2018-8015 ORC files with malformed types cause stack overflow.

  • HIVE-14214 Schema evolution and predicate pushdown don’t work together.

  • ORC-101 Bloom filters for string and decimal use inconsistent encoding

  • ORC-135 Predicate push down is incorrect on timestamps when moved between timezones

  • ORC-285 Empty vector batches of floats or doubles cause EOFException

File format benchmark

I gave a talk at Hadoop Summit San Jose 2016 about a file format benchmark that I’ve contributed as ORC-72. The benchmark focuses on real data sets that are publicly available. The data sets represent a wide variety of use cases:

  • NYC Taxi Data - very dense data with mostly numeric types
  • Github Archives - very sparse data with a lot of complex structure
  • Sales - a real production schema from a sales table with a synthetic generator

The benchmarks look at a set of three very common use cases:

  • Full table scan - read all columns and rows
  • Column projection - read some columns, but all of the rows
  • Column projection and predicate push down - read some columns and some rows

You can see the slides here:

File Format Benchmarks: Avro, JSON, ORC, & Parquet

ORC 1.1.1 Released

The ORC team is excited to announce the release of ORC v1.1.1. This release contains the Java reader and writer and the native C++ ORC reader and tools.

The major new features in ORC 1.1 are:

  • ORC-1 Copy the Java ORC code from Hive.
  • ORC-10 Fix the C++ reader to correctly read timestamps from timezones with different daylight savings rules.
  • ORC-52 Add mapred and mapreduce connectors.

Known issues:

  • CVE-2018-8015 ORC files with malformed types cause stack overflow.

  • HIVE-14214 Schema evolution and predicate pushdown don’t work together.

  • ORC-101 Bloom filters for string and decimal use inconsistent encoding

  • ORC-135 Predicate push down is incorrect on timestamps when moved between timezones

  • ORC-285 Empty vector batches of floats or doubles cause EOFException

ORC 1.1.0 Released

The ORC team is excited to announce the release of ORC v1.1.0. This release contains the Java reader and writer and the native C++ ORC reader and tools.

Release Artifacts:

The major new features in ORC 1.1 are:

  • ORC-1 Copy the Java ORC code from Hive.
  • ORC-10 Fix the C++ reader to correctly read timestamps from timezones with different daylight savings rules.
  • ORC-52 Add mapred and mapreduce connectors.

Known issues:

  • CVE-2018-8015 ORC files with malformed types cause stack overflow.

  • HIVE-14214 Schema evolution and predicate pushdown don’t work together.

  • ORC-101 Bloom filters for string and decimal use inconsistent encoding

  • ORC-135 Predicate push down is incorrect on timestamps when moved between timezones

  • ORC-285 Empty vector batches of floats or doubles cause EOFException

ORC 1.0.0 Released

The ORC team is excited to announce the release of ORC v1.0.0. This release contains the native C++ ORC reader and some tools.

The major features:

  • Portable pure C++ ORC reader
  • The C++ reader is known to work on:
    • CentOS and RHEL 5, 6, and 7
    • Debian 6 and 7
    • Ubuntu 12 and 14
    • Mac OS 10.10 and 10.11
  • A file-contents command that prints the contents of the file as json records.
  • A file-metadata command that prints the metadata of the file.
  • Docker files for building and testing on various Linux distributions.
  • Memory estimation for the reader.

Known issues:

  • CVE-2018-8015 ORC files with malformed types cause stack overflow.

  • ORC-10 When moving ORC files between timezones, different daylight savings rules will cause timestamps to shift in the C++ reader.

ORC adds Aliaksei Sandryhaila to PMC

On behalf of the Apache ORC Project Management Committee (PMC), it gives me great pleasure to announce that Aliaksei Sandryhaila has joined the Apache ORC PMC. He has done lot of good work on ORC and I’m looking forward to more.

Please join me in welcoming Aliaksei to ORC PMC!

Congratulations Aliaksei!

ORC adopts new logo

The ORC project has adopted a new logo. We hope you like it.

orc logo

Other great options included a big white hand on a black shield. smile

ORC adds 7 committers

The ORC project management committee today added seven new committers for their work on ORC. Welcome all!

  • Gunther Hagleitner
  • Aliaksei Sandryhaila
  • Sergey Shelukhin
  • Gopal Vijayaraghavan
  • Stephen Walkauskas
  • Kevin Wilfong
  • Xuefu Zhang

ORC becomes an Apache Top Level Project

Today Apache ORC became a top level project at the Apache Software Foundation. This step represents a major step forward for the project, and is representative of its momentum.

Back in January 2013, we created ORC files as part of the initiative to massively speed up Apache Hive and improve the storage efficiency of data stored in Apache Hadoop. We added it as a feature of Hive for two reasons:

  1. To ensure that it would be well integrated with Hive
  2. To ensure that storing data in ORC format would be as simple as stating “stored as ORC” to your table definition.

In the last two years, many of the features that we’ve added to Hive, such as vectorization, ACID, predicate push down and LLAP, support ORC first, and follow up with other storage formats later.

The growing use and acceptance of ORC has encouraged additional Hadoop execution engines, such as Apache Pig, Map-Reduce, Cascading, and Apache Spark to support reading and writing ORC. However, there are concerns that depending on the large Hive jar that contains ORC pulls in a lot of other projects that Hive depends on. To better support these non-Hive users, we decided to split off from Hive and become a separate project. This will not only allow us to support Hive, but also provide a much more streamlined jar, documentation and help for users outside of Hive.

Although Hadoop and its ecosystem are largely written in Java, there are a lot of applications in other languages that would like to natively access ORC files in HDFS. Hortonworks, HP, and Microsoft are developing a pure C++ ORC reader and writer that enables C++ applications to read and write ORC files efficiently without Java. That code will also be moved into Apache ORC and released together with the Java implementation.