Package org.apache.orc.impl
Class Utf8Utils
java.lang.Object
org.apache.orc.impl.Utf8Utils
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic int
charLength
(byte[] data, int offset, int length) static int
findLastCharacter
(byte[] text, int from, int until) Find the start of the last character that ends in the current string.static int
getCodePoint
(byte[] source, int from, int len) Get the code point at a given location in the byte array.static boolean
isUtfStartByte
(byte b) Checks if b is the first byte of a UTF-8 character.static int
truncateBytesTo
(int maxCharLength, byte[] data, int offset, int length) Return the number of bytes required to read at most maxLength characters in full from a utf-8 encoded byte array provided by data[offset:offset+length].
-
Constructor Details
-
Utf8Utils
public Utf8Utils()
-
-
Method Details
-
charLength
public static int charLength(byte[] data, int offset, int length) -
truncateBytesTo
public static int truncateBytesTo(int maxCharLength, byte[] data, int offset, int length) Return the number of bytes required to read at most maxLength characters in full from a utf-8 encoded byte array provided by data[offset:offset+length]. This does not validate utf-8 data, but operates correctly on already valid utf-8 data.- Parameters:
maxCharLength
-data
-offset
-length
-
-
isUtfStartByte
public static boolean isUtfStartByte(byte b) Checks if b is the first byte of a UTF-8 character. -
findLastCharacter
public static int findLastCharacter(byte[] text, int from, int until) Find the start of the last character that ends in the current string.- Parameters:
text
- the bytes of the utf-8from
- the first byte locationuntil
- the last byte location- Returns:
- the index of the last character
-
getCodePoint
public static int getCodePoint(byte[] source, int from, int len) Get the code point at a given location in the byte array.- Parameters:
source
- the bytes of the stringfrom
- the offset to start atlen
- the number of bytes in the character- Returns:
- the code point
-