9 - ** Learning Spark **

Learning Spark:
Lightning-Fast Data Analytics (2nd edition)
by Jules S. Damji, Brooke Wenig, Tathagata Das, and Denny Lee (2020)



https://learning.oreilly.com/library/view/learning-spark-2nd/9781492050032/

Github: https://github.com/databricks/LearningSparkV2

 

Discord channel: #spark
Discord server: https://bit.ly/35RhGXM

Ad-Hoc Meetups

sign up here or via Discord

Interested in doing ad-hoc meetup

Interested in doing ad-hoc meetup

 Casey

 Amish

 

Status on reading

Chapters

Casey

Amish

 

Chapters

Casey

Amish

 

1-3

 

4

 

 

 

5

 

 

 

6

 

 

 

7

 

 

 

8

 

 

 

9

 

 

 

10

 

 

 

11

 

 

 

12

 

 

 

 

Notes on reading

Chapters

Notes

Chapters

Notes

3

For converting DBC files to ipynb (jupyter notebook) files, Amish found the following solution via a google search during our meetup - https://davewentzel.com/content/ConvertDatabricksDBCtoipynb/ Not tested yet, but it looks good. The book recommends that you sign up for the Databricks Community Edition, but Patrick and I had found that the website forces you to a 14-day free trial instead now. Hopefully this conversion solution will allow us to convert the dbc file that is in the book’s code repo so that we can work with the Jupyter notebooks independently.
Upon further investigation, this conversion requires a Databricks personal access token, which in turn requires you to sign up for the Databricks service.

 

 

June 28, 2022 - Given the nature of the material of the book, we have decided to not have weekly meetups on it. Rather we shall try the ad-hoc meetup approach, as described here - Los Gatos Reading Group Library Participants will mark their progress in the book to each other and call for an ad-hoc meetup occasionally here and via Discord.

 

Chapter 3
Apache Sparks Structured APIs

June 21, 2022

Presenter

Chapter 3
Apache Sparks Structured APIs

June 21, 2022

Presenter

Spark: What’s Underneath an RDD?
Structuring Spark
The DataFrame API
- Spark’s Basic Data Types
- Spark’s Structured and Complex Data Types
- Schemas and Creating DataFrames
- Columns and Expressions
- Rows
- Common DataFrame Operations
- End-to-End DataFrame Example
The DataSet API
DataFrames Versus Datasets
Spark SQL and the Underlying Engine

Casey, Amish, Patrick (discussion of the chapter as a whole)


 

Chapters 1 and 2
Introduction to Apache Spark
Downloading Apache Spark and Getting Started

June 7, 2022

Presenter

Chapters 1 and 2
Introduction to Apache Spark
Downloading Apache Spark and Getting Started

June 7, 2022

Presenter

The Genesis of Spark
What is Apache Spark

Casey

Unified Analytics
The Developer’s Experience

Amish

Step 1: Downloading Apache Spark
Step 2: Using the Scala or PySpark Shell
Step 3: Understanding Spark Application Concepts

Amish

Transformations, Actions, and Lazy Evaluations
The Spark UI

Casey

Your First Standalone Application

Casey

Â