Author Archives: icorda

Setting up your Deep Learning Environment (Mac)

So, you have embarked into your Deep Learning journey and perhaps you are navigating through the concepts of Gradient Descent, Back-propagation and so forth. After all the theory you are eager to get your environment ready to do some actual ‘deep learning hard work’ and you have no idea where to start. You are in the right place then. This short tutorial has been put together for Mac user (sorry Windows aficionados) and will provide you with what you need to get started.

Yes, you need Python!

Sure you know that Python is the key programming language when it comes to Machine and Deep Learning. Make sure you have our beloved HomeBrew:

/usr/bin/ruby -e “$(curl -fsSL"

Install Python 3 (with this version, pip3 will be automatically installed)

brew install python3

Virtual Environment

In order to keep things clean and contain all your deep learning related dependencies in one space, it is useful to use virtual environments.

pip3 install virtualenv virtualenvwrapper

You will also need to modify your bash profile file:

vim ~/.bash_profile

by adding the following:

# virtualenv and virtualenvwrapper
export WORKON_HOME=$HOME/.virtualenvs
export VIRTUALENVWRAPPER_PYTHON=/usr/local/bin/python3
source /usr/local/bin/

Next step is to create a virtual environment for your deep learning project:

mkvirtualenv cv -p python3

This will create a virtual environment named cv and in order to come out of such instance, you will need to type the command deactivate

Some Additional Dependencies

You will also need to install cmake to be able to use dlib, a C++ toolkit containing Machine Learning algorithm:

brew install cmake

Additionally, you will need to download X11 to display the image’s outputs from both dlib and opencv target=”_blank”

Let’s install the real stuff

Situate yourself inside your virtual environment by typing the following:

workon cv

Some additional dependencies should be taken care of:

pip install numpy h5py pillow scikit-image

Finally, we can install OpenCV:

pip install opencv-python

Then, we will be installing Dlib, Tensorflow and Keras:

pip install dlib
pip install tensorflow
pip install keras

Keras, in particular, is a user friendly, beginner library for Machine Learning and Deep Learning models that runs on top of Tensorflow. Happy Machine Learning modelling 🙂


Clearing the Confusion: AI vs Machine Learning vs Deep Learning Differences

Perhaps the most basic question for beginners when learning about Machine Learning and Deep Learning.

Read Parquet Files with SparkSQL

SparkSQL is a Spark module for working with structure data and it can also be used to read columnar data format such as Parquet files.  Here a number of useful commands that can be run from the spark-shell:

#Set the context

val sqlContext = new org.apache.spark.sql.SQLContext(sc)

#Read the parquet file in HDFS and

val df =“hdfs://user/myfolder/part-r-00033.gz.parquet”).printSchema

#Show the top 10 rows of data from the parquet file, false)

#Convert to JSON and print out the content of 1 record


Jenkins Best Practices – Practical Continuous Deployment in the Real World — GoDaddy Open Source HQ

Source: Jenkins Best Practices – Practical Continuous Deployment in the Real World — GoDaddy Open Source HQ

Java Beans and DTOs

DTO (Data Transfer Object)

Data Transfer Object is a pattern whose aim is to transport data between layers and tiers of a program. A DTO should contain NO business logic

public class UserDTO {
    String firstName;
    String lastName;
    List<String> groups;

    public String getFirstName() {
        return firstName;
    public void setFirstName(String firstName) {
        this.firstName = firstName;
    public String getLastName() {
        return lastName;
    public void setLastName(String lastName) {
        this.lastName = lastName;

    public List<String> getGroups() {
        return groups;
    public void setGroups(List<String> groups) {
        this.groups = groups;

Java Beans

Java Beans are classes that follows certain conventions or event better they are Sun/Oracle standards/specifications as explained here:

Essentially, Java Beans adhere to the following:

  • all properties are private (and they are accessed through getters and setters);
  • they have zero-arg constructors (aka default constructors)
  • they implement the Serializable Interface

The main reason why we use Java Beans is to encapsulate

public classBeanClassExample() implements {

  private int id;

  //no-arg constructor
  public BeanClassExample() {

  public int getId() {
    return id;

  public void setId(int id) { = id;

So, yeah what is the real difference? If any?

In a nutshell, Java Beans follow strict conditions (as discussed above) and contain no behaviour (as opposed to states), except made for storage, retrieval, serialization and deserialization. It is indeed a specification, while DTO (Data Transfer Object) is a Pattern on its own. It is more than acceptable to use a Java Bean to implement a DTO pattern.

Avro is amazing!

Why Avro For Kafka Data?

Scala, give me a break :)

I have been recently asked whether it is possible to use break (and continue as well) in a loop with Scala and it occurred to me that I have never come across such a case. Coming from Java, I do know how to employ break and continue in a while loop, for example, so why would it be different in Scala, considering that it is builds on top of the JVM? It is actually a bit more complicated than that. Although Scala does not specifically have the keywords break and continue, it does offer similar functionality through scala.util.control.Breaks.

Here is an example of how to use break from the Class Breaks, as follows:

import scala.util.control.Breaks._

val in = new BufferReader(new InputStreamReader(

breakable {
  while (true) {
    println ("? ")
    if (input.readLine() == "") break

In Java, the above would corresponding to this:

BufferedReader in =
   new BufferedReader(new InputStreamReader(;
   while (true) {
     if (in.readLine() == "") break

The breakable function has become available from Scala 2.8 onwards and before that we would have tacked the issues mostly through 2 approaches:

  • by adding a boolean variable indicating whether the loops keeps being valid;
  • by re-writing the loop as a  recursive function;

Happy Scala programming 🙂


2 minutes to spare: Apache NiFi on Mac

As a Mac user, I usually run Apache NiFi using one of the two approaches:

  • by standing up a Docker container;
  • by downloading and installing locally on your Mac;

Running a NiFi Container

You can install Docker on Mac via Homebrew:

brew install docker

Alternatively it is possible to download the Docker Community Edition (CE): an easy to install desktop app for building, packaging and testing dockerised apps, which includes tools such as Docker command line, Docker compose and Docker Notary

After installing Docker, this will let you pull the NiFi image:

docker pull apache/nifi:1.5.0

Next, we can start the image and watch it run:

docker run -p 8080:8080 apache/nifi:1.2.0

Downloading and Installing NiFi locally

Installing Apache NiFi on Mac is quite straightforward, as follows:

brew install nifi

This assumes that you have Homebrew installed. If that is not the case, this is the command you will need:

ruby -e "$(curl -fsSL" < /dev/null 2> /dev/null

Here is where NiFi has been installed:


Some basic operations can be done with these commands:

bin/ run, it runs in the foreground,

bin/ start, it runs in the background

bin/ status, it checks the status

bin/ stop, it stops NiFi

Next step, whatever approach you took at the beginning, is to verify that your NiFi installation/dockerised version is running. This is as simple as visiting the following URL:


Happy Nif-ing 🙂

Machine Learning’s ‘Amazing’ Ability to Predict Chaos

Machine Learning’s ‘Amazing’ Ability to Predict Chaos

Download SQUID – Your News Buddy

So, you want to build a bot with NodeJs?


I have used Node.js in a number of projects and in conjunction with the module bundler called Webpack  and the automation toolkit Gulp, but still I wanted to experiment with something different that would bring up the advantages of using such a server-side platform. I remembered that the Microsoft Bot Framework employs Node.js for its Bot Builder SDK and why not building bots sounds interesting! I have actually found out that there are a few books specifically focusing on building bots with Node.js and that seemed to be like a fun task. The choice then became clear, let’s use Node.js and Twit, a Twitter API Client for Node, to build a Twitter bot that simply sends a query to the Twitter API, receives a response containing the results of the performed search, and then retweets the most recent tweet returned. Let’s see what we need to achieve this!

Set up a dedicated Twitter account for your Bot

Bots get usually banned from Twitter, so it is recommended to create Twitter account perhaps with a secondary email address specifically for the following experiment. It is highly recommended that you do not use your “official” Twitter account as it is likely that it will be short-lived.  After your new account is activated, go to the Twitter Developer Center and sign in with your new details. You might also want to have a look around and in particular have a read through the documentation on how to get started with the Twitter Developer Platform and how to create your first app.

Screen Shot 2018-03-07 at 23.05.45.png

Create a Twitter App

From your acccount, you will be able to see the newly created app from here.   After creating the application, look for ‘Keys and Access Tokens’ and click on ‘Generate Token Actions’. Make a copy of the details below as you will be using them later as part of your

  • Consumer Key
  • Consumer Secret
  • Access Token
  • Access Token Secret

The Part where you code

You will interact with your newly created Twitter App via a Nodejs library called Twit. Create a new project folder in your Dev directory (ideally the directory structure where your git installation resides):

mkdir twitter-bot
cd twitter-bot
npm init


This will kick off an utility that will take you through the process of creating the package.json file.

  "name": "twitter-bot",
  "version": "1.0.0",
  "description": "sample twitter bot",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\"
  "keywords": [
  "author": "ilacorda",
  "license": "ISC"


Then you will need to install Twit, the Twitter API Client for Node that supports REST and Streaming API.

npm install twit

and create a file called

touch bot.js

This will be your main application file, that means the entry point of your app. You will also need a third and additional file called


where you will past the following:

  • Consumer Key
  • Consumer Secret
  • Access Token
  • Access Token Secret

It will look like this:


module.exports = {
  consumer_key: '',  
  consumer_secret: '',
  access_token: '',  
  access_token_secret: ''


Your directory structure should look as follows:


   |- bot.js
   |- config.js
   |- package.json


The part where you make the Bot do something

Next step is to make your bot to query the most recent tweets. We will need to write a function that finds the latest tweets according to the query passed as a parameter. To do so, we need to initialise a params object holding aq property that will refine our searches. In our case, we are targeting tweets with hashtag #nodejs and #Nodejs:


var retweet = function() {
  var params = {
    q: '#nodejs, #Nodejs',
    result_type: 'recent',
    lang: 'en'    


the property

result_type: 'recent'

instructs the bot to search exclusively for the tweets that were posted since the bot was started. We can use


, which accepts three arguments: API endpoint, params object (defined by us) and a callback.

Twitter.get('search/tweets', params, function(err, data) {
      // if there are no errors
        if (!err) {
          // fetch the ID of tweet
            var retweetId = data.statuses[0].id_str;
            // Instruct Tweeter to retweet
  'statuses/retweet/:id', {
                id: retweetId
            }, function(err, response) {
                if (response) {
                // if there was an error while tweeting
                if (err) {
                    console.log('No results were returned');


To post or to retweet the tweet the bot has found, we have used the method to post to any of the REST API endpoints.


In order to run your bot, you should simply type the following command on terminal:

node bot.js

Alternatively, it is possible to use:

npm scripts

in a nutshell, they are scripts whose goal is to automate repetitive tasks. This requires to modify the file


by adding the following lines of code:


  "scripts": {    
    "start": "node bot.js",  

and then you can type

npm start