APIs for Spark Development: Java vs Scala
- Apache Spark is an open source cluster computing platform for data processing tasks
- It extends Apache Hadoop and introduces concepts such as stream processing (Spark Streaming) and Iterative Computation (Machine Learning tasks)
- Apache Spark was initially written in Scala and ships with a Scala and Python interactive shell – REPL(Read-Evaluate-Print-Loop). It includes the following APIs:
- Java
- Scala
- Python
Spark APIs: Java Vs Scala
Java
- Java less concise and more verbose and error prone – support for lambdas and stream only from Java 8
- More Established programming language, lots of experts in the market
- Full Java/Scala interoperability – implicit conversions between major Collections’ types
Scala
- Blend of functional and object-oriented aspects makes Scala highly scalable
- No distinction between an object and a function – every value is an object and every operation is a method call
- Scala’s type inference contributes to more readable programs
- Scala’s Traits tames multiple inheritance
- Scala displays conciseness, brevity and advanced static typing.
Scala API at work: Loading CSV Files
Java Api at work: Loading CSV Files
Final Remarks
- Scala’s strengths lay on Scalability, Conciseness and Advanced Static Typing together with full Java interoperability
- Scala can have a challenging learning curve and still has limited community presence (compared to Java and Python)
- Scala developers are still a niche Vs Rich Market for Java and Python professionals
- Python still a strong choice because of easy transition from OOP languages and the number of available statistical and Data Science libraries
Posted on May 13, 2017, in Uncategorized. Bookmark the permalink. Leave a comment.
Leave a comment
Comments 0