Apache Spark Bitesize: What is RDD

As the title suggests, this is meant to be a quick post clarifying the core abstraction in Apache Spark: RDD, also known as Resilient Distributed Dataset. RDD is the fundamental data structure in Spark and it is an immutable distributed collections of elements. In simple terms, it is essentially the way Spark represents a set of data, which spreads across multiple machines. As per the formal definition:

RDDs are fault-tolerant, parallel data structures that let users explicitly persist intermediate results in memory, control their partitioning to optimize data placement, and manipulate them using a rich set of operators.

It is possible to create RDDs in two different ways: 1) by calling the parallelise method of the JavaSparkContext class in the driver program; 2) by referencing the dataset which resides on an external storage system.

Here is an example of how to create a parallelised collection via the Scala API:

val myCollection = List(1,3,6,8,9)
val myDistributedCollection = sc.parallelize(myCollection)

This, instead, is an example of how to reference external datasets (Scala API):

val distFile = sc.textFile("myFile.csv")

In the next Apache Spark Bitesize, I will be covering RDD operations: transformations and actions.

Further resources:

  • M. Zaharia et. al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing, 2001. Available at:  https://people.eecs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
  • Apache Spark – Quick Start: http://spark.apache.org/docs/latest/quick-start.html
  • SparkHub: https://sparkhub.databricks.com/

 

 

Advertisements

Ray Kurzweil Plans to Create a Mind at Google—and Have It Serve You | MIT Technology Review

The technologist speaks about an ambitious plan to build a powerful artificial intelligence.

Source: Ray Kurzweil Plans to Create a Mind at Google—and Have It Serve You | MIT Technology Review

Differences between Arrays and ArrayLists in Java

CIDOC CRM at ResearchSpace (British Museum)

Alternative way to install Apache Jena in Eclipse…with a but!

Being a long-term Semantic Web aficionado, I have used Apache Jena with Eclipse several times for manipulating and parsing RDF models and originally installed it by downloading the Windows-compatible distro from http://jena.apache.org/download/index.cgi and and import the libs into my Eclipse project. I have recently found an Eclipse Library Plugin for Jena 2.0  and I was rather excited to be able to work with Jena  without fetching the latest Jena distro and adding it to your Build Path, but the excitement did not last long….The plugin is unfortunately for Jena 2.0 and it does not seem to be compatible with most recent Eclipse versions (e.g. Mars 4.5). Oh well, back to the known route, then!:)

(Related) Interesting resources:

Using Jena with Eclipse

JenaTools

Video: Comparable Interface

Great Introduction to JUnit

How to add shortcut keys for java code in eclipse – Stack Overflow

Say I type “sout”, the intellisense should expand it to “System.out.println()”. Is there a way to adding such templates?

Source: How to add shortcut keys for java code in eclipse – Stack Overflow

Reverse a String: 4 potential approaches

This is probably one of the most common Java programming interview question and there are at least 4 different approaches that can be taken to tackle it. We will illustrate them by presenting some examples in Java.

1) Use the built-in StringBuffer reverse() method:

public class ReverseStringBuffer
{
    public static void main(String args[])
    {
        String originalString = "MyAwesomeString";
        String reversedString = new StringBuffer(originalString).reverse().toString();
        System.out.println("This is the original String " + originalString +
        "n" + "This is the reversed String " + reversedString);
    }
}

2) Use the built-in StringBuilder reverse() method:

public class ReverseStringBuilder
{
    public static void main(String args[])
    {
        String anotherOriginalString = "AmazingString";
        StringBuilder anotherReversedString = 
            new StringBuilder(anotherOriginalString).reverse();
        System.out.println("This is the original String " + anotherOriginalString + 
            "n" + "This is the reversed String " + anotherReversedString);
    }
}

3) Convert a String into a Char array and loop through each character.

public class ReverseStringCharArray
{
    public static void main(String args[])
    {
        String myString = "ThisIsMyAwesomeString";
        char[] stringToChar = myString.toCharArray();
        System.out.println(stringToChar.length);
        for(int i = stringToChar.length - 1; i >= 0; i--)
        {
	        System.out.print(stringToChar[i]);
        }
A similar implementation can be re-written as an utility:
 
public class ReverseStringCharArray
{
    public static void main(String args[])
    {
        String term = "HappyString";
        //Here we are calling the reverse utility method
        String reversedTerm= reverse(term);
        System.out.println("This is the original String " + term + 
            "n" + "This is the reversed String " + reversedTerm);
    }

    public static String reverse(String sourceTerm)
    {
        if(sourceTerm == null || sourceTerm.isEmpty())
        {
	        return sourceTerm;
        }       
	    String reversedString = "";
	    for(int i = source.length() -1; i>= 0; i--)
        {
	        reversedString = reversedString + source.charAt(i);
	        System.out.println("This prints the reversed string " + 
                reversedString + " n " + 
                " whereas this prints the individual character " + source.charAt(i));
	    }     
	    return reversedString;
    }    
}

4) Use recursion to reverse the entered String. This method returns the reverse of the string passed in by appending the first character (sourceTern.charAt(0)) to the remainder of the String itself (str.substring(1)). If the String is less or equal to 1 character, then the recursion is halted.

public class ReverseStringRecursiveMethod
{
    public static void main(String args[])
    {
        String term = "AnotherString";
        //prints the result of the call to the recursive method
        System.out.println(recursiveMethod(term)); 
    }

    public static String recursiveMethod(String sourceTerm)
    {
        if ( (null == sourceTerm) || (sourceTerm.length() <= 1) )
        {
            return sourceTerm;
        }
        
        return recursiveMethod(sourceTerm.substring(1)) + sourceTern.charAt(0);
    }
}

Keyboard Shortcuts – RStudio Support

Print this and have it in front of your when using RStudio. It is priceless! Source: Keyboard Shortcuts – RStudio Support

%d bloggers like this: