lesnomaui.blogg.se

Dlow shuffle part 1 download
Dlow shuffle part 1 download










dlow shuffle part 1 download

For commercial scale Spark clusters, 30 GB of text data is a trivial task. However, if you are running Spark on the ODROID XU4 cluster or in local mode on your Mac laptop, 30+ GB of text data is substantial. However, if you download 10+ years of data from the Bureau of Transportation Statistics (meaning you downloaded 120+ one month CSV files from the site), that would collectively represent 30+ GB of data. The data gets downloaded as a raw CSV file, which is something that Spark can easily load.

dlow shuffle part 1 download

The data can be downloaded in month chunks from the Bureau of Transportation Statistics website. This data analysis project is to explore what insights can be derived from the Airline On-Time Performance data set collected by the United States Department of Transportation.

DLOW SHUFFLE PART 1 DOWNLOAD UPDATE

UPDATE – I have a more modern version of this post with larger data sets available here.












Dlow shuffle part 1 download