Category Archives: Uncategorized

Java 8 vs. Scala: Part I – DZone Performance

Source: Java 8 vs. Scala: Part I – DZone Performance

How to learn Scala – Codacy | Blog

A list of resources to getting started with Scala. Become a (better) Scala developer with a list of resources, tutorials and best practices.

Source: How to learn Scala – Codacy | Blog

Atom Packages for WordPress developers — WPism

This set of resources is fundamental for whoever is starting to develop for WordPress and has decided to make use of the hackable IDE Here are some of the most appealing WordPress Atom packages that make it easy for development with support for functions, filters, action hooks.

Source: Atom Packages for WordPress developers — WPism

10 Best Eclipse Shortcuts – DZone Java

Looking for the best Eclipse shortcuts? Here are the top 10. These are for all the Eclipse aficionados to be able to use your favourite IDE at its best.

Source: 10 Best Eclipse Shortcuts – DZone Java

Apache Spark Bitesize: What is RDD

As the title suggests, this is meant to be a quick post clarifying the core abstraction in Apache Spark: RDD, also known as Resilient Distributed Dataset. RDD is the fundamental data structure in Spark and it is an immutable distributed collections of elements. In simple terms, it is essentially the way Spark represents a set of data, which spreads across multiple machines. As per the formal definition:

RDDs are fault-tolerant, parallel data structures that let users explicitly persist intermediate results in memory, control their partitioning to optimize data placement, and manipulate them using a rich set of operators.

It is possible to create RDDs in two different ways: 1) by calling the parallelise method of the JavaSparkContext class in the driver program; 2) by referencing the dataset which resides on an external storage system.

Here is an example of how to create a parallelised collection via the Scala API:

val myCollection = List(1,3,6,8,9)
val myDistributedCollection = sc.parallelize(myCollection)

This, instead, is an example of how to reference external datasets (Scala API):

val distFile = sc.textFile("myFile.csv")

In the next Apache Spark Bitesize, I will be covering RDD operations: transformations and actions.

Further resources:

  • M. Zaharia et. al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing, 2001. Available at:
  • Apache Spark – Quick Start:
  • SparkHub:



%d bloggers like this: