Class Utf8Utils

java.lang.Object
org.apache.orc.impl.Utf8Utils

public final class Utf8Utils extends Object
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    static int
    charLength(byte[] data, int offset, int length)
     
    static int
    findLastCharacter(byte[] text, int from, int until)
    Find the start of the last character that ends in the current string.
    static int
    getCodePoint(byte[] source, int from, int len)
    Get the code point at a given location in the byte array.
    static boolean
    Checks if b is the first byte of a UTF-8 character.
    static int
    truncateBytesTo(int maxCharLength, byte[] data, int offset, int length)
    Return the number of bytes required to read at most maxLength characters in full from a utf-8 encoded byte array provided by data[offset:offset+length].

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • Utf8Utils

      public Utf8Utils()
  • Method Details

    • charLength

      public static int charLength(byte[] data, int offset, int length)
    • truncateBytesTo

      public static int truncateBytesTo(int maxCharLength, byte[] data, int offset, int length)
      Return the number of bytes required to read at most maxLength characters in full from a utf-8 encoded byte array provided by data[offset:offset+length]. This does not validate utf-8 data, but operates correctly on already valid utf-8 data.
      Parameters:
      maxCharLength -
      data -
      offset -
      length -
    • isUtfStartByte

      public static boolean isUtfStartByte(byte b)
      Checks if b is the first byte of a UTF-8 character.
    • findLastCharacter

      public static int findLastCharacter(byte[] text, int from, int until)
      Find the start of the last character that ends in the current string.
      Parameters:
      text - the bytes of the utf-8
      from - the first byte location
      until - the last byte location
      Returns:
      the index of the last character
    • getCodePoint

      public static int getCodePoint(byte[] source, int from, int len)
      Get the code point at a given location in the byte array.
      Parameters:
      source - the bytes of the string
      from - the offset to start at
      len - the number of bytes in the character
      Returns:
      the code point