Practical Parquet Engineering

ebook Definitive Reference for Developers and Engineers

By Richard Johnson

cover image of Practical Parquet Engineering

Sign up to save your library

With an OverDrive account, you can save your favorite libraries for at-a-glance information about availability. Find out more about OverDrive accounts.

   Not today

Find this title in Libby, the library reading app by OverDrive.

Download Libby on the App Store Download Libby on Google Play

Search for a digital library with this title

Title found at these libraries:

Library Name Distance
Loading...

"Practical Parquet Engineering"
"Practical Parquet Engineering" is an authoritative and comprehensive guide to mastering the design, implementation, and optimization of Apache Parquet, the industry-standard columnar storage format for big data analytics. Beginning with the architectural fundamentals, the book elucidates Parquet's design philosophy and core principles, providing a nuanced understanding of its logical and physical models. Readers will benefit from in-depth comparisons to alternative formats like ORC and Avro, along with explorations of schema evolution, metadata management, and the unique benefits of self-describing storage—making this an essential reference for anyone seeking to build resilient and efficient data infrastructure.
Moving from theory to hands-on application, the book offers actionable best practices for both writing and querying Parquet at scale. Topics such as file construction, encoding strategies, compression, and partitioning are addressed with precision, alongside nuanced guidance for language-specific implementations and optimizing data pipelines in distributed and cloud environments. Advanced chapters cover real-world performance tuning, including benchmarking, profiling, cache strategies, and troubleshooting complex bottlenecks in production. Readers will also learn how to leverage Parquet's rich metadata and statistics for query acceleration, and how to integrate seamlessly with modern analytics frameworks like Spark, Presto, and Hive.
Addressing emerging requirements around security, compliance, and data quality, "Practical Parquet Engineering" goes beyond functionality to cover data governance, encryption, access control, and regulatory mandates like GDPR and HIPAA. Dedicated chapters on validation, testing, and quality management socialize industry-strength patterns for ensuring correctness and resilience. The book culminates in advanced topics, custom engineering extensions, and a diverse suite of case studies from enterprise data lakes, global analytics, IoT, and hybrid-cloud architectures, making it an indispensable resource for data engineers, architects, and technical leaders aiming to future-proof their data platforms with Parquet.

Practical Parquet Engineering