9 - ** Learning Spark **
Learning Spark:
Lightning-Fast Data Analytics (2nd edition)
by Jules S. Damji, Brooke Wenig, Tathagata Das, and Denny Lee (2020)
Github: https://github.com/databricks/LearningSparkV2
Discord channel: #spark
Discord server: Join the Los Gatos Reading Group Discord Server!
Ad-Hoc Meetups
sign up here or via Discord
Interested in doing ad-hoc meetup |
---|
Casey |
Amish |
|
Status on reading
Chapters | Casey | Amish |
|
---|---|---|---|
1-3 |
| ||
4 |
|
|
|
5 |
|
|
|
6 |
|
|
|
7 |
|
|
|
8 |
|
|
|
9 |
|
|
|
10 |
|
|
|
11 |
|
|
|
12 |
|
|
|
Notes on reading
Chapters | Notes |
---|---|
3 | For converting DBC files to ipynb (jupyter notebook) files, Amish found the following solution via a google search during our meetup - https://davewentzel.com/content/ConvertDatabricksDBCtoipynb/ Not tested yet, but it looks good. The book recommends that you sign up for the Databricks Community Edition, but Patrick and I had found that the website forces you to a 14-day free trial instead now. Hopefully this conversion solution will allow us to convert the dbc file that is in the book’s code repo so that we can work with the Jupyter notebooks independently. |
|
|
June 28, 2022 - Given the nature of the material of the book, we have decided to not have weekly meetups on it. Rather we shall try the ad-hoc meetup approach, as described here - Los Gatos Reading Group Library Participants will mark their progress in the book to each other and call for an ad-hoc meetup occasionally here and via Discord.
Chapter 3 June 21, 2022 | Presenter |
---|---|
Spark: What’s Underneath an RDD? | Casey, Amish, Patrick (discussion of the chapter as a whole) |
Chapters 1 and 2 June 7, 2022 | Presenter |
---|---|
The Genesis of Spark | Casey |
Unified Analytics | Amish |
Step 1: Downloading Apache Spark | Amish |
Transformations, Actions, and Lazy Evaluations | Casey |
Your First Standalone Application | Casey |