The ORC team is excited to announce the release of ORC v2.0.0.
- Released: 8 March 2024
- Source code: orc-2.0.0.tar.gz
- GPG Signature signed by Dongjoon Hyun (34F0FC5C)
- Git tag: rel/release-2.0.0
- Maven Central: ORC 2.0.0
- SHA 256: 9107730919c29eb3…
- Fixed issues: ORC-2.0.0
New Feature and Notable Changes:
- ORC-998: Refactor compression output buffer within OutStream for better portability
- ORC-1088: Suport ZSTD_JNI and columnn compress to set compression level
- ORC-1100: Support vcpkg
- ORC-1251: Use Hadoop Vectored IO
- ORC-1387: [C++] Support schema evolution from decimal to numeric/decimal
- ORC-1440: Check for protobuf config based module
- ORC-1463: Support brotli codec
- ORC-1507: Use Zulu JDK distribution and switch from 21-ea to 21
- ORC-1512: Drop Java 8/11 and make Java 17 by default
- ORC-1531: Create orc-format module and repo
- ORC-1545: Use orc-format 1.0.0-SNAPSHOT
- ORC-1546: Use orc-format 1.0.0-alpha
- ORC-1547: Spin-off ORC Format
- ORC-1551: Use orc-format 1.0.0-beta
- ORC-1572: Use Apache ORC Format 1.0.0
- ORC-1585: [C++] Add orc-format_ep as a dependency of orc
Improvements:
- ORC-1459: Mark DataBuffer::size() and DataBuffer::capacity() as const
- ORC-1460: specification: Clarify how dictionary entries are sorted
- ORC-1461: Mark Int128::getHighBits() and Int128::getLowBits() as const
- ORC-1472: Replace deprecated method in TestMurmur3.java
- ORC-1479: Enhance example usage message to use Uber jar
- ORC-1481: [C++] Better error message when TZDB is unavailable
- ORC-1504: Add lower bound check in get API for DynamicIntArray
- ORC-1506: Replacing deprecated valueOf() with recommended forNumber()
- ORC-1509: Auto grant contributor role to first-time contributors
- ORC-1520: Remove JDK 8 settings from pom
- ORC-1567: Add the
-ignoreExtension
configuration to thesizes
andcount
commands of orc-tools - ORC-1570: Add
supportVectoredIO
API toHadoopShimsCurrent
and use it - ORC-1571: Supports displaying raw data size in the meta command of orc-tools
- ORC-1577: Use ZSTD as the default compression
- ORC-1580: Change default DataBuffer constructor to use reserve instead of resize
- ORC-1595: Add a short-cut to skip tiny inputs for ZstdCodec.compress
- ORC-1596: Remove redundant
Zstd.isError
JNI usage - ORC-1597: Set bloom filter fpp to 1%
- ORC-1600: Reduce getStaticMemoryManager sync block in OrcFile
- ORC-1601: Reduce get HadoopShims sync block in HadoopShimsFactory
- ORC-1610: Reduce the number of hash computation in CuckooSetBytes
- ORC-1613: Zstd decompression supports direct buffer
- ORC-1631: Supports summary output in sizes command
- ORC-1637: [C++] Port conan recipe from upstream conan center
- ORC-1638: Avoid System.exit(0) in count command
- ORC-1639: [C++] Reduce unnecessary compiler flags in CMake
- ORC-1641: Remove
sourceFileExcludes
frommaven-javadoc-plugin
- ORC-1642: Avoid
System.exit(0)
inscan
command - ORC-1593: Set orc.compression.zstd.level to 3 by default
Bug Fixes:
- ORC-634: Fix the json output for double NaN and infinite
- ORC-1455: [C++] Fix build failure on non-x86 with unused macro in CpuInfoUtil.cc
- ORC-1473: Zero-copy zeroCopyReadRanges and releaseBuffer bugs
- ORC-1476: Maven build fail with unsupported platform: protoc-3.17.3-osx-aarch_64.exe
- ORC-1480: [C++] Build failed when the BUILD_CPP_ENABLE_METRICS is ON
- ORC-1500: [C++] The partition field does not support English special characters
- ORC-1528: When using the orc.min.disk.seek.size configuration to read extremely large ORC files, a java.nio.BufferOverflowException may occur.
- ORC-1553: Reading information from Row group, where there are 0 records of SArg column
- ORC-1563: Fix orc.bloom.filter.fpp default value and orc.compress notes of Spark and Hive config docs
- ORC-1568: Use
readDiskRanges
iforc.use.zerocopy
is enabled - ORC-1575: Use ASF Archive URL instead Download URL
- ORC-1578: Fix SparkBenchmark according to SPARK-40918
- ORC-1588: Fix incorrect Decimal assert in LeafFilterFactory
- ORC-1602: [C++] limit compression block size
Tasks:
- ORC-1422: Setting version to 2.0.0-SNAPSHOT
- ORC-1434: Remove
org.apache.hadoop
fromdependabot.yml
- ORC-1484: Use JIRA_ACCESS_TOKEN in
merge_orc_pr.py
- ORC-1485: Enable checkstyle checks for test classes
- ORC-1486: Fix checkstyle violations for tests in orc-core module
- ORC-1492: Fix checkstyle violations for tests in
mapreduce
,tools
,bench
modules - ORC-1496: Use iterator to suggest backporting branches
- ORC-1515: Skip publishing orc-example module
- ORC-1516: Fix minor typo in comments in IOUtils
- ORC-1518: Remove findbugs folders
- ORC-1529: Fix minor typos in pom.xml
- ORC-1530: Rename variables in RecordReaderUtils.ChunkReader#create
- ORC-1535: Remove generated Java docs from source tree
- ORC-1536: Remove
hive-storage-api
link frommaven-javadoc-plugin
- ORC-1540: Remove MacOS 11 from GitHub Action CI
- ORC-1542: Use
Pattern Matching for instanceof
(JEP-394) - ORC-1549: Update
libhdfspp.tar.gz
by adding#include <cstdint>
- ORC-1569: Remove HadoopShimsPre2_3, HadoopShimsPre2_6, HadoopShimsPre2_7 classes
- ORC-1579: Add
ASF Generative Tooling Guidance
to PR template - ORC-1591: Lower log level from INFO to DEBUG in *ReaderImpl/WriterImpl/PhysicalFsWriter
- ORC-1592: Suppress KeyProvider missing log
- ORC-1598: Close reader in orc-examples
- ORC-1604: Deprecate non-utf8 bloom filter for Java writer
Tests:
- ORC-1003: Recover java-examples-test
- ORC-1409: Add stream order description in ORC spec.
- ORC-1432: Add MacOS 13 GitHub Action Job
- ORC-1474: Replaced deprecated getMinimum/Maximum in TestColumnStatistics
- ORC-1475: [C++] ConvertColumnReader.TestConvertNumericToStringVariant fails when compiled with unsigned char
- ORC-1477: Remove unused imports from Test classes
- ORC-1478: Add Unit Test for org.apache.orc.impl.DynamicIntArray
- ORC-1510: Fix package for TestOrcUtils and add more test cases
- ORC-1541: Add
Ubuntu 24.04 LTS
Docker Test - ORC-1555: Simplify
fedora37
docker image - ORC-1556: Add
Rocky Linux 9
Docker Test - ORC-1557: Add GitHub Action CI for
Docker Test
- ORC-1558: Remove
ubuntu22_jdk=21
andubuntu22_jdk=21_cc=clang
test combinations fromdocker/os-list.txt
- ORC-1574: Update
GitHub Action
YAML files inbranch-2.0
- ORC-1586: Fix IllegalAccessError when SparkBenchmark runs on JDK17
- ORC-1607: Fix
testDoubleNaNAndInfinite
to useTestFileDump.checkOutput
- ORC-1614: Set ByteBuffer limit in TestBrotli test
- ORC-1618: Disable building tests for snappy
- ORC-1619: Add
MacOS 14
to GitHub Action - ORC-1620: Add Apple Silicon Test Coverage
- ORC-1621: Switch to
oraclelinux9
fromrocky9
- ORC-1623: Use
directOut.put(out)
instead ofdirectOut.put(out.array())
inTestZstd
test - ORC-1630: Test using VectoredIO of hadoop to read ORC
- ORC-1632: Add test for count command
- ORC-1633: Add test for sizes command
- ORC-1643: Add test for
scan
command
Build and dependency changes:
- ORC-870: Unpin and upgrade
jmh
to 1.37 - ORC-1423: Bump build-helper-maven-plugin to 3.4.0
- ORC-1424: Bump maven-assembly-plugin to 3.6.0
- ORC-1425: Bump checkstyle to 10.11.0
- ORC-1427: Use Hadoop 3.3.5 in
tools
module - ORC-1429: Upgrade Maven to 3.8.8
- ORC-1430: Use Hadoop 3.3.5 shaded clients
- ORC-1431: Use parquet to 1.13.1 in bench module
- ORC-1437: Bump checkstyle to 10.12.0
- ORC-1438: Bump auto-service to 1.1.0
- ORC-1439: Bump guava to 32.0.0-jre
- ORC-1442: Update guava to 32.0.1
- ORC-1445: Bump snappy-java to 1.1.10.1 in
bench
module - ORC-1448: Bump auto-service to 1.1.1
- ORC-1456: Update Hadoop to 3.3.6
- ORC-1466: Bump junit to 5.10.0
- ORC-1467: Upgrade
commons-lang3
to 3.13.0 - ORC-1468: Bump
opencsv
to 5.8 - ORC-1469: Update guava to 32.1.2
- ORC-1470: Update maven-shade-plugin to 3.5.0
- ORC-1493: Bump
byte-buddy
to 1.14.6 - ORC-1502: Upgrade Maven to 3.9.4
- ORC-1508: Upgrade slf4j to 2.0.9
- ORC-1513: Upgrade
snappy
to 1.1.10.4 - ORC-1514: Remove zookeeper runtime dependency
- ORC-1517: Bump snappy-java to 1.1.10.5 in bench module
- ORC-1521: Bump com.google.guava:guava to 32.1.3-jre
- ORC-1522: Bump commons-cli:commons-cli to 1.6.0
- ORC-1523: Bump
maven-checkstyle-plugin
to 3.3.1 - ORC-1524: Bump
maven-shade-plugin
to 3.5.1 - ORC-1526: Bump spotbugs-maven-plugin to 4.8.1.0
- ORC-1527: Bump
junit
to 5.10.1 - ORC-1533: Upgrade
commons-lang3
to 3.14.0 - ORC-1534: Upgrade
build-helper-maven-plugin
to 3.5.0 - ORC-1537: Unpin and upgrade
spotless
to 2.41.0 - ORC-1538: Unpin and upgrade
maven-dependency-plugin
to 3.6.1 - ORC-1543: Bump
spotless-maven-plugin
to 2.41.1 - ORC-1544: Unpin and upgrade
protobuf-java
to 3.25.1 - ORC-1550: Upgrade
Maven
to 3.9.6 - ORC-1562: Bump
com.google.guava:guava
to 33.0.0-jre - ORC-1565: Bump slf4j.version to 2.0.10
- ORC-1566: Make Brotli dependency as optional
- ORC-1576: Upgrade spark.jackson.version to 2.15.2 in bench module
- ORC-1581: Bump
slf4j.version
to 2.0.11 - ORC-1582: Bump
protobuf-java
to 3.25.2 - ORC-1605: Upgrade
brotli4j
to 1.16.0 - ORC-1616: Upgrade
aircompressor
to 0.26 - ORC-1624: Upgrade Spark to 3.5.1
- ORC-1626: Upgrade
Mockito
to 5.10 andbyte-buddy
to1.14.11
- ORC-1627: Unpin
scala-library
- ORC-1628: Bump
protobuf-java
to 3.25.3
Documentations:
- ORC-994: Fix javadoc so that it doesn’t put files into the source tree
- ORC-1471: Updated README.md to use maven 3.8.8
- ORC-1491: Update Python documentation with PyArrow 13.0.0 and Dask 2023.8.1
- ORC-1503: Update README.md to use maven 3.9.4
- ORC-1552: Update README.md to use maven 3.9.6
- ORC-1564: Add Java ORC configuration documentation
- ORC-1584: Remove README about Proto subdirectory
- ORC-1587: Fix usage command of SparkBenchmark document
- ORC-1599: Add zstd compression level and windowlog in Java configuration documentation
- ORC-1612: Document available encodings at
orc.compress
- ORC-1625: Switch to oraclelinux9 from rocky9 in
README