Yup, these are all reasons to prefer column orientation over row orientation for analytics workloads. In my opinion data locality trumps everything but compression and fast transmission is definitely very nice.
Until recently, numpy and pandas were row oriented, and this was a major bottleneck. A lot of pandas’s strange API is apparently due to working around row orientation. See e.g. this article by Wes McKinney, creator of pandas: https://wesmckinney.com/blog/apache-arrow-pandas-internals/#:~:text=Arrow’s%20C%2B%2B%20implementation%20provides%20essential,optimized%20for%20analytical%20processing%20performance
Yup, these are all reasons to prefer column orientation over row orientation for analytics workloads. In my opinion data locality trumps everything but compression and fast transmission is definitely very nice.
Until recently, numpy and pandas were row oriented, and this was a major bottleneck. A lot of pandas’s strange API is apparently due to working around row orientation. See e.g. this article by Wes McKinney, creator of pandas: https://wesmckinney.com/blog/apache-arrow-pandas-internals/#:~:text=Arrow’s%20C%2B%2B%20implementation%20provides%20essential,optimized%20for%20analytical%20processing%20performance