Author Archives: icorda

Clearing the Confusion: AI vs Machine Learning vs Deep Learning Differences

Perhaps the most basic question for beginners when learning about Machine Learning and Deep Learning.

Read Parquet Files with SparkSQL

SparkSQL is a Spark module for working with structure data and it can also be used to read columnar data format such as Parquet files.  Here a number of useful commands that can be run from the spark-shell:

#Set the context

val sqlContext = new org.apache.spark.sql.SQLContext(sc)

#Read the parquet file in HDFS and

val df =“hdfs://user/myfolder/part-r-00033.gz.parquet”).printSchema

#Show the top 10 rows of data from the parquet file, false)

#Convert to JSON and print out the content of 1 record


Jenkins Best Practices – Practical Continuous Deployment in the Real World — GoDaddy Open Source HQ

Source: Jenkins Best Practices – Practical Continuous Deployment in the Real World — GoDaddy Open Source HQ

Java Beans and DTOs

DTO (Data Transfer Object)

Data Transfer Object is a pattern whose aim is to transport data between layers and tiers of a program. A DTO should contain NO business logic

public class UserDTO {
    String firstName;
    String lastName;
    List<String> groups;

    public String getFirstName() {
        return firstName;
    public void setFirstName(String firstName) {
        this.firstName = firstName;
    public String getLastName() {
        return lastName;
    public void setLastName(String lastName) {
        this.lastName = lastName;

    public List<String> getGroups() {
        return groups;
    public void setGroups(List<String> groups) {
        this.groups = groups;

Java Beans

Java Beans are classes that follows certain conventions or event better they are Sun/Oracle standards/specifications as explained here:

Essentially, Java Beans adhere to the following:

  • all properties are private (and they are accessed through getters and setters);
  • they have zero-arg constructors (aka default constructors)
  • they implement the Serializable Interface

The main reason why we use Java Beans is to encapsulate

public classBeanClassExample() implements {

  private int id;

  //no-arg constructor
  public BeanClassExample() {

  public int getId() {
    return id;

  public void setId(int id) { = id;

So, yeah what is the real difference? If any?

In a nutshell, Java Beans follow strict conditions (as discussed above) and contain no behaviour (as opposed to states), except made for storage, retrieval, serialization and deserialization. It is indeed a specification, while DTO (Data Transfer Object) is a Pattern on its own. It is more than acceptable to use a Java Bean to implement a DTO pattern.

Avro is amazing!

Why Avro For Kafka Data?

Scala, give me a break :)

I have been recently asked whether it is possible to use break (and continue as well) in a loop with Scala and it occurred to me that I have never come across such a case. Coming from Java, I do know how to employ break and continue in a while loop, for example, so why would it be different in Scala, considering that it is builds on top of the JVM? It is actually a bit more complicated than that. Although Scala does not specifically have the keywords break and continue, it does offer similar functionality through scala.util.control.Breaks.

Here is an example of how to use break from the Class Breaks, as follows:

import scala.util.control.Breaks._

val in = new BufferReader(new InputStreamReader(

breakable {
  while (true) {
    println ("? ")
    if (input.readLine() == "") break

In Java, the above would corresponding to this:

BufferedReader in =
   new BufferedReader(new InputStreamReader(;
   while (true) {
     if (in.readLine() == "") break

The breakable function has become available from Scala 2.8 onwards and before that we would have tacked the issues mostly through 2 approaches:

  • by adding a boolean variable indicating whether the loops keeps being valid;
  • by re-writing the loop as a  recursive function;

Happy Scala programming 🙂


2 minutes to spare: Apache NiFi on Mac

As a Mac user, I usually run Apache NiFi using one of the two approaches:

  • by standing up a Docker container;
  • by downloading and installing locally on your Mac;

Running a NiFi Container

You can install Docker on Mac via Homebrew:

brew install docker

Alternatively it is possible to download the Docker Community Edition (CE): an easy to install desktop app for building, packaging and testing dockerised apps, which includes tools such as Docker command line, Docker compose and Docker Notary

After installing Docker, this will let you pull the NiFi image:

docker pull apache/nifi:1.5.0

Next, we can start the image and watch it run:

docker run -p 8080:8080 apache/nifi:1.2.0

Downloading and Installing NiFi locally

Installing Apache NiFi on Mac is quite straightforward, as follows:

brew install nifi

This assumes that you have Homebrew installed. If that is not the case, this is the command you will need:

ruby -e "$(curl -fsSL" < /dev/null 2> /dev/null

Here is where NiFi has been installed:


Some basic operations can be done with these commands:

bin/ run, it runs in the foreground,

bin/ start, it runs in the background

bin/ status, it checks the status

bin/ stop, it stops NiFi

Next step, whatever approach you took at the beginning, is to verify that your NiFi installation/dockerised version is running. This is as simple as visiting the following URL:


Happy Nif-ing 🙂

Machine Learning’s ‘Amazing’ Ability to Predict Chaos

Machine Learning’s ‘Amazing’ Ability to Predict Chaos

Download SQUID – Your News Buddy

So, you want to build a bot with NodeJs?


I have used Node.js in a number of projects and in conjunction with the module bundler called Webpack  and the automation toolkit Gulp, but still I wanted to experiment with something different that would bring up the advantages of using such a server-side platform. I remembered that the Microsoft Bot Framework employs Node.js for its Bot Builder SDK and why not building bots sounds interesting! I have actually found out that there are a few books specifically focusing on building bots with Node.js and that seemed to be like a fun task. The choice then became clear, let’s use Node.js and Twit, a Twitter API Client for Node, to build a Twitter bot that simply sends a query to the Twitter API, receives a response containing the results of the performed search, and then retweets the most recent tweet returned. Let’s see what we need to achieve this!

Set up a dedicated Twitter account for your Bot

Bots get usually banned from Twitter, so it is recommended to create Twitter account perhaps with a secondary email address specifically for the following experiment. It is highly recommended that you do not use your “official” Twitter account as it is likely that it will be short-lived.  After your new account is activated, go to the Twitter Developer Center and sign in with your new details. You might also want to have a look around and in particular have a read through the documentation on how to get started with the Twitter Developer Platform and how to create your first app.

Screen Shot 2018-03-07 at 23.05.45.png

Create a Twitter App

From your acccount, you will be able to see the newly created app from here.   After creating the application, look for ‘Keys and Access Tokens’ and click on ‘Generate Token Actions’. Make a copy of the details below as you will be using them later as part of your

  • Consumer Key
  • Consumer Secret
  • Access Token
  • Access Token Secret

The Part where you code

You will interact with your newly created Twitter App via a Nodejs library called Twit. Create a new project folder in your Dev directory (ideally the directory structure where your git installation resides):

mkdir twitter-bot
cd twitter-bot
npm init


This will kick off an utility that will take you through the process of creating the package.json file.

  "name": "twitter-bot",
  "version": "1.0.0",
  "description": "sample twitter bot",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\"
  "keywords": [
  "author": "ilacorda",
  "license": "ISC"


Then you will need to install Twit, the Twitter API Client for Node that supports REST and Streaming API.

npm install twit

and create a file called

touch bot.js

This will be your main application file, that means the entry point of your app. You will also need a third and additional file called


where you will past the following:

  • Consumer Key
  • Consumer Secret
  • Access Token
  • Access Token Secret

It will look like this:


module.exports = {
  consumer_key: '',  
  consumer_secret: '',
  access_token: '',  
  access_token_secret: ''


Your directory structure should look as follows:


   |- bot.js
   |- config.js
   |- package.json


The part where you make the Bot do something

Next step is to make your bot to query the most recent tweets. We will need to write a function that finds the latest tweets according to the query passed as a parameter. To do so, we need to initialise a params object holding aq property that will refine our searches. In our case, we are targeting tweets with hashtag #nodejs and #Nodejs:


var retweet = function() {
  var params = {
    q: '#nodejs, #Nodejs',
    result_type: 'recent',
    lang: 'en'    


the property

result_type: 'recent'

instructs the bot to search exclusively for the tweets that were posted since the bot was started. We can use


, which accepts three arguments: API endpoint, params object (defined by us) and a callback.

Twitter.get('search/tweets', params, function(err, data) {
      // if there are no errors
        if (!err) {
          // fetch the ID of tweet
            var retweetId = data.statuses[0].id_str;
            // Instruct Tweeter to retweet
  'statuses/retweet/:id', {
                id: retweetId
            }, function(err, response) {
                if (response) {
                // if there was an error while tweeting
                if (err) {
                    console.log('No results were returned');


To post or to retweet the tweet the bot has found, we have used the method to post to any of the REST API endpoints.


In order to run your bot, you should simply type the following command on terminal:

node bot.js

Alternatively, it is possible to use:

npm scripts

in a nutshell, they are scripts whose goal is to automate repetitive tasks. This requires to modify the file


by adding the following lines of code:


  "scripts": {    
    "start": "node bot.js",  

and then you can type

npm start

An Oracle JDBC Client

A while ago I was tasked to write a small application in order to connect to an Oracle Database and perform a set of simple queries. For such a task, I have employed the DAO (Data Access Object) pattern and a corresponding DAO Interface. A basic Java client, in turn, calls the instantiation of such DAO class, which implements a the DAO interface. As follow, the application in its internal details:

Oracle DB Client 

[code language=”java”]
package oracledb.connection.client;
import oracledb.connection.dao.OracleDB_DAO;

public class OracleConnectionClient {

public static void main(String[] args) throws Exception {

OracleDB_DAO dao = new OracleDB_DAO();

The Data Access Object (DAO) implementation. The method[code]readPropertiesFile()[/code]

parses a properties file containing the access credentials and DB connection details.


[code language=”java”]
package oracledb.connection.dao;

import java.sql.*;
import java.util.Properties;

public class OracleDB_DAO implements OracleDB_DAO_Interface {

public static String SAMPLE_SELECT_QUERY = “SELECT * FROM CUSTOMERS WHERE FirstName = ‘Eliott’ AND LastName = ‘Brown'”;

private static String driverClass = “oracle.jdbc.driver.OracleDriver”;
private Connection connection;
private static String dbUrl;
private static String userName;
private static String password;

static String resourceName = “”;

* Read the properties Initialise the DAO
* @throws IOException
* @throws ClassNotFoundException

public void readPropertiesFile() throws IOException, ClassNotFoundException {

ClassLoader loader = Thread.currentThread().getContextClassLoader();
Properties props = new Properties();
InputStream resourceStream = loader.getResourceAsStream(resourceName);

// Return the properties
dbUrl = props.getProperty(“dburl”);
userName = props.getProperty(“dbuser”);
password = props.getProperty(“dbpassword”);

// Load the

* (non-Javadoc)
* @see oracledb.connection.dao.OracleDB_DAO_Interface1#openConnection()
public void openConnection() throws SQLException {

// get the connection to the database
System.out.println(“Establishing the Connection to the Database”);
try {
connection = DriverManager.getConnection(dbUrl, userName, password);
} catch (SQLException ex) {

* (non-Javadoc)
* @see oracledb.connection.dao.OracleDB_DAO_Interface1#closeConnection()
public void closeConnection() throws SQLException {
if (connection != null) {
// close the connection

* (non-Javadoc)
* @see oracledb.connection.dao.OracleDB_DAO_Interface1#
* getFirstNameAndLastNameFromCustomers()
public ResultSet getFirstNameAndLastNameFromCustomers() throws SQLException, IOException {
// create the prepared stmt
Statement stmt = connection.createStatement();
// assign the query to a variable
// execute the query
ResultSet rs = stmt.executeQuery(sql);
System.out.println(“This print the ResultSet for getPlanByMSISD ” + rs);
PrintWriter csvWriter = new PrintWriter(new File(“sample.csv”));

stmt.close(); // close statement
return rs;

* (non-Javadoc)
* @see oracledb.connection.dao.OracleDB_DAO_Interface1#getDBCurrentTime()
public String getDBCurrentTime() throws SQLException, IOException {
String dateTime = null;
// create the prepared stmt
Statement stmt = connection.createStatement();
ResultSet rst = stmt.executeQuery(“select SYSDATE from dual”);
while ( {
dateTime = rst.getString(1);
// close the resultset
System.out.println(“This prints the dateTime from the DB ” + dateTime);
return dateTime;

The DAO Interface that defines the standard operations to be performed on a model object:

[code language=”java”]
package oracledb.connection.dao;

import java.sql.ResultSet;
import java.sql.SQLException;

public interface OracleDB_DAO_Interface {

* Open the Dao Connection
* @param
* @throws SQLException
* @throws IOException
void openConnection() throws SQLException;

* Close the connection
* @throws SQLException
void closeConnection() throws SQLException;

* Get the resultset from the the select query
* @throws SQLException
* @throws IOException
ResultSet getFirstNameAndLastNameFromCustomers() throws SQLException, IOException;

* Get the Current Time via DB Query
* @return
* @throws SQLException
* @throws IOException
String getDBCurrentTime() throws SQLException, IOException;