Dask
How to install
Dask also supports Apache ORC.
pip3 install "dask[dataframe]==2023.8.1"
pip3 install pandas
How to write and read an ORC file
In [1]: import pandas as pd
In [2]: import dask.dataframe as dd
In [3]: pf = pd.DataFrame(data={"col1": [1, 2, 3], "col2": ["a", "b", None]})
In [4]: dd.to_orc(dd.from_pandas(pf, npartitions=2), path="/tmp/orc")
Out[4]: (None, None)
In [5]: dd.read_orc(path="/tmp/orc").compute()
Out[5]:
col1 col2
0 1 a
1 2 b
0 3 <NA>
In [6]: dd.read_orc(path="/tmp/orc", columns=["col1"]).compute()
Out[6]:
col1
0 1
1 2
0 3
10 Minutes to Dask page provides a short overview.