High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark

$49.99

… [more below]

Author: Karau, Holden
Binding: Paperback
Page Count: 356
Publish Date: July 11 2017
ISBN10: 1491943203
Language: English

SKU / Item #: 174713 Categories: Books, Computer Systems/Programming, Data Science - Data Analytics Tags: Computers, D:R, Data Analytics, Data Base Management, Data Science, Data Warehousing, Holden Karau, O'Reilly Media, Open Source, Paperback, Programming, PUB201707

By: Karau, Holden

Apache Spark is amazing when everything clicks. But if you haven’t seen the performance improvements you expected, or still don’t feel confident enough to use Spark in production, this practical book is for you. Authors Holden Karau and Rachel Warren demonstrate performance optimizations to help your Spark queries run faster and handle larger data sizes, while using fewer resources.

Ideal for software engineers, data engineers, developers, and system administrators working with large-scale data applications, this book describes techniques that can reduce data infrastructure costs and developer hours. Not only will you gain a more comprehensive understanding of Spark, you’ll also learn how to make it sing.

With this book, you’ll explore:

How Spark SQL’s new interfaces improve performance over SQL’s RDD data structure
The choice between data joins in Core Spark and Spark SQL
Techniques for getting the most out of standard RDD transformations
How to work around performance issues in Spark’s key/value pair paradigm
Writing high-performance Spark code without Scala or the JVM
How to test for functionality and performance when applying suggested improvements
Using Spark MLlib and Spark ML machine learning libraries
Spark’s Streaming components and external community packages

Author: Holden Karau, Rachel Warren
Binding Type: Paperback
Publisher: O’Reilly Media
Published: 07/11/2017
Pages: 356
Weight: 1.2lbs
Size: 9.20h x 7.00w x 0.70d
ISBN: 9781491943205
Language: English

Author	Karau, Holden
Binding	Paperback
ISBN10	1491943203
ISBN13	9781491943205
Page Count	356
Published Date	July 11 2017
Language	English

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Click Here for Free PDF Book Lists and Order Forms

Book Categories

Other Categories

Tagged Categories

High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark

Reviews

SHOP BY CATEGORY

COMPANY

YOUR ACCOUNT

2025 Book Directories Now Available!

Click Here for Free PDF Book Lists and Order Forms

Book Categories

Other Categories

Tagged Categories

High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark

Reviews

Related products

An A-Z Guide to Healing Foods: A Shopper’s Reference

Data Science: The Ultimate Guide to Data Analytics, Data Mining, Data Warehousing, Data Visualization, Regression Analysis, Database

Deep Economy: The Wealth of Communities and the Durable Future

Becoming a Data Head: How to Think, Speak, and Understand Data Science, Statistics, and Machine Learning

Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python