APIs for Spark Development: Java vs Scala

Posted by icorda

Apache Spark is an open source cluster computing platform for data processing tasks
It extends Apache Hadoop and introduces concepts such as stream processing (Spark Streaming) and Iterative Computation (Machine Learning tasks)
Apache Spark was initially written in Scala and ships with a Scala and Python interactive shell – REPL(Read-Evaluate-Print-Loop). It includes the following APIs:
- Java
- Scala
- Python

Spark APIs: Java Vs Scala

Java less concise and more verbose and error prone – support for lambdas and stream only from Java 8
More Established programming language, lots of experts in the market
Full Java/Scala interoperability – implicit conversions between major Collections’ types

Blend of functional and object-oriented aspects makes Scala highly scalable
No distinction between an object and a function – every value is an object and every operation is a method call
Scala’s type inference contributes to more readable programs
Scala’s Traits tames multiple inheritance
Scala displays conciseness, brevity and advanced static typing.

Java

Scala’s strengths lay on Scalability, Conciseness and Advanced Static Typing together with full Java interoperability
Scala can have a challenging learning curve and still has limited community presence (compared to Java and Python)
Scala developers are still a niche Vs Rich Market for Java and Python professionals
Python still a strong choice because of easy transition from OOP languages and the number of available statistical and Data Science libraries

Posted on May 13, 2017, in Uncategorized. Bookmark the permalink. Leave a comment.