The ORC team is excited to announce the release of ORC v1.8.0.
- Released: 3 September 2022
- Source code: orc-1.8.0.tar.gz
- GPG Signature signed by William Hyun (DECDFA29)
- Git tag: rel/release-1.8.0
- Maven Central: ORC 1.8.0
- SHA 256: 859d78bfded98405…
- Fixed issues: ORC-1.8.0
New Feature and Notable Changes:
- ORC-450 Support selecting list indices without materializing list items
- ORC-824 Add column statistics for List and Map
- ORC-1004 Java ORC writer supports the selection vector
- ORC-1075 Support reading ORC files with no column statistics
- ORC-1125 Support decoding decimals in RLE
- ORC-1136 Optimize reads by combining multiple reads without significant separation into a single read
- ORC-1138 Seek vs Read Optimization
- ORC-1172 Add row count limit config for one stripe
- ORC-1212 Upgrade protobuf-java to 3.17.3
- ORC-1220 Set min.hadoop.version to 2.7.3
- ORC-1248 Redefine Hadoop dependency for Apache ORC 1.8.0
- ORC-1256 Publish test-jar to maven central
- ORC-1260 Publish shaded-protobuf classifier artifacts
Improvements:
- ORC-825 Use Empty Array For Collections toArray
- ORC-826 Do Not Use Collection Contains/Get
- ORC-828 Improve Fetch Data Set Process
- ORC-829 Optimize Serialization percentileBits
- ORC-831 Do Not Copy String When Flushing Dictionary
- ORC-833 RunLengthIntegerReaderV2 Calculate Batch Size Once
- ORC-834 Do Not Convert to String in DecimalFromTimestampTreeReader
- ORC-835 Cache TRUE/FALSE Bytes in StringGroupFromBooleanTreeReader
- ORC-836 StringGroupFromDoubleTreeReader Use Double toString
- ORC-837 Reuse HiveDecimalWritable in ConvertTreeReaderFactory
- ORC-838 Simplify compareTo/equals/putBuffer of ByteBufferAllocatorPool
- ORC-840 Remove Superfluous Array Fill in RecordReaderImpl
- ORC-841 Remove Superfluous Array Fill in StringHashTableDictionary
- ORC-842 Remove newKey from StringHashTableDictionary
- ORC-844 Improve hashCode Methods
- ORC-847 Do Not Create Empty Array in StringGroupFromBinaryTreeReader
- ORC-852 Allow DynamicByteArray to Return a ByteBuffer
- ORC-853 Optimize writeDouble Implementation
- ORC-855 Remove Unused isRepeating from RunLengthIntegerReaderV2
- ORC-865 Bump opencsv from 3.9 to 5.5.1
- ORC-883 Dependency Audit and QA
- ORC-897 optimization loop termination condition in readerIsCompatible method
- ORC-935 Bump commons-csv from 1.8 to 1.9.0
- ORC-937 Replace deprecated method
- ORC-958 Convert command support overwrite option
- ORC-969 Evaluate SearchArguments using file and stripe level stats
- ORC-975 Avoid double counting closestFixedBits in percentileBits method
- ORC-982 Extract checkstyle to a single file, help newcomers check code style
- ORC-988 Bump opencsv from 5.5.1 to 5.5.2
- ORC-992 Reached max repeat length, we can directly decide to use DELTA encoding
- ORC-1005 Make that the java and C++ implementations of determineEncoding in RunLengthIntegerWriterV2 are consistent.
- ORC-1007 Fix a warning from the shade plugin
- ORC-1013 Renaming a parameter in constructors of TreeWriter’s derived classes
- ORC-1014 Add details when we get IOExceptions from file system
- ORC-1020 Improve orc::RleDecoderV2::nextDirect
- ORC-1027 Filter processing to allow filter injections that cannot be represented via SArgs
- ORC-1047 Handle quoted field names during string schema parsing
- ORC-1077 Remove commons-codec dependency and use java.util.Base64
- ORC-1099 Extend ReadIntent to support MAP and UNION type
- ORC-1101 Improve malformed STRUCT handling
- ORC-1122 Add buffer to decode the whole run in RleDecoderV2
- ORC-1137 Improve float/double conversion in DoubleColumnReader::next()
- ORC-1149 Bump slf4j.version to 1.7.36
- ORC-1150 Improve RowReaderImpl::computeBatchSize()
- ORC-1152 Support encoding short decimals in RLEv2
- ORC-1156 Update opencsv to 5.6
- ORC-1163 Bump zookeeper from 3.7.0 to 3.8.0
- ORC-1169 Use Hadoop 3.3.2 on Java 17+
- ORC-1178 Use hadoop 3.3.3 on Java 17+