Category Archives: Uncategorized

Go Primer: & and * operators

I have recently enrolled on a Udemy course called GetGoing: Introduction to Golang as I wanted to sink my teeth into the marvellous world of Go; its simplicity and its power to harness concurrency. One of the first concepts you are likely to get confused about when starting to learn Go is the use of the two operators & and *

The operator & is placed in front of a variable and returns its memory address

For example:

myVariable := "aBeautifulString"
anotherVariable :=&myVariable

The result is a memory address, in this fashion: 0xc00008a000

The operator * (aka reference operator) is placed in front of a variable that holds a memory address and resolves it. For example:

aString := "thisIsAString"
anotherString := *&aString

The result will be "thisIsAString"

There is also an additional case where the operator * is placed in front of a type, e.g. *string and it becomes part of the type declaration stating that the variable holds a pointer to a string. For example:

var str *string

Let’s spend 2 minutes at The Bash Script Corner

A while ago, I was tasked with writing a bash script that would do the following:

  • Read the username, password and host values from Kubernetic config map
  • Reads the Postgres username, password, host and database values
  • Replace variables with corresponding values in the application.yml file
#! /usr/bin/env sh

echo "Mapping the env variables to the values from Kubernetis "


#Sanity check
if [ $# -eq 0 ]; then
    echo "No argument was provided, however the script requires 1 argument to successfully run"
    exit 1


#Postgres Env Variables
export DB_USERNAME=$(kubectl get configmap "$STAGING_ENV"-$POSTGRES_CORE -o jsonpath="{.data.username}")
export DB_PASSWORD=$(kubectl get secret "$STAGING_ENV"-$POSTGRES_CORE -o jsonpath="{.data.password}" | base64 -D)
export DB_HOSTNAME=$(kubectl get configmap "$STAGING_ENV"-$POSTGRES_CORE -o jsonpath="{}")
export DB_NAME=$(kubectl get configmap "$STAGING_ENV"-$POSTGRES_CORE -o jsonpath="{}")

export APPLICATION_YML_LOCATION=src/main/resources/application.yml

echo "Start replacing postgres env variables values with sed"
echo "End replacing postgres env variables values with sed"

New from Satellite 2020: GitHub Discussions, Codespaces, securing code in private repositories, and more – The GitHub Blog

Source: New from Satellite 2020: GitHub Discussions, Codespaces, securing code in private repositories, and more – The GitHub Blog

Which Java Microservice Framework Should You Choose in 2020?

Source: Which Java Microservice Framework Should You Choose in 2020?

Installing and Configuring Hadoop on Mac

So, now you want to find out a bit more about Hadoop, an open source framework for storing large datasets in a distributed environment. Before tackling the installation and configuration of Hadoop on your beloved Mac, let’s clarify a few important points will help navigate the world of Hadoop.

Hadoop’s Components

There are essentially 4 components that form the core of Apache Hadoop:

  • HDFS, aka Hadoop Distributed File System; HDFS is the primary data storage system used by Hadoop applications.  It employs NameNode and DataNode architecture to
  • MapReduce, aka the distributed data processing framework of Apache Hadoop;

    The MapReduce algorithm consists of 2 main stages:

    • Map stage − This is the input stage where data is in the form of file or directory and is stored in the Hadoop file system (HDFS). The mapper is responsible to process the data and split it in several chunks.
    • Reduce stage − This stage is the combination of the Shuffle stage and the Reduce stage. The Reducer’s job is to process the data that comes from the mapper.
      • During a MapReduce job, Hadoop sends the Map and Reduce tasks to the appropriate servers in the cluster.
      • The framework manages all the details of data-passing such as issuing tasks, verifying task completion, and copying data around the cluster between the nodes.
      • Most of the computing takes place on nodes with data on local disks that reduces the network traffic.
      • After completion of the given tasks, the cluster collects and reduces the data to form an appropriate result, and sends it back to the Hadoop server.
  • Hadoop Common, that are is a set of pre-defined utilities and libraries employed by other modules within the Hadoop ecosystem;
  • YARN(Yet Another Resource Negotiator).Yarn is the cluster resource management layer of the Apache Hadoop Ecosystem, which schedules jobs and assigns resources. The main idea behind the birth of YARN was to resolve issues such as scalability and resource utilisation within a cluster. Yarn has 2 core components: Scheduler and Applications manager. The scheduler is responsible for allocating resources to various up and running applications, but it does not perform monitoring or tracking of status for the applications.

    Conversely, the applications manager is responsible for accepting job submissions.

When Hadoop is the right choice

  • For Processing large datasets;
  • For Storing a variety of data formats (see the concept of Data Variety as one of the 3 V’s in Big Data;
  • For Parallel Data Processing (yes and that is exactly what MapReduce helps you with)

When Hadoop is NOT the right choice

  • When your dataset is not big enough, meaning that you can work well with RDBMs solutions;
  • For processing data stored in relational databases;
  • For processing real-time as well as graph-based data.

Hadoop Mac Installation

Before installing Hadoop, you should ask yourself what kind of Hadoop cluster and therefore installation would you require? To make it easy, here are 3 of the most common installation types:

  • Local or Standalone Mode. In this way, Hadoop is configured to run in a non-distributed manner as a single Java process running on your computer;
  • Pseudo Distributed Mode (also known as single-node cluster). This means that it will be similar to the standalone mode, but all Hadoop daemons run on a single node. This is what they called near production mode.
  • Fully Distributed Mode.This is the production mode of Hadoop where multiple nodes will be running at the same time. In such setting, data will be distributed across several nodes and processing will be done on each node.

Setting Up SSH on MacOS

Before installing Hadoop, we need to make sure that SSH is working properly on your machine, by running the following:

ssh localhost

If it returns this:

ssh: connect to host localhost port 22: Connection refused

It means that the remote login is off:

 sudo systemsetup -getremoteloginRemote 
 Login: off

In order to enable the remote login, run the following:

sudo systemsetup -setremotelogin on

SSH keys will then need to be generated:

ssh-keygen -t rsa
$ cat ~/.ssh/ >> ~/.ssh/authorized_keys

Install Hadoop via HomeBrew

We are going to install Hadoop via HomeBrew, as follows:

brew install hadoop

Hadoop Configuration on Mac

Configuring Hadoop requires a number of steps.


The file is located at


The following line should be change from






The file is  located at


and add the below configuration inside




The file is located at


and by default will be blank add below config




The file is located at


add the following:


Before running Hadoop format HDFS

$ hdfs namenode -format

To Start Hadoop, you can use the following 2 commands:

Both scripts are available at:


Kafka and Zookeeper: main concepts

What is Kafka

Apache Kafka is a distributed real-time streaming platform whose primarily use cases are those requiring high throughput, reliability, and replication characteristics not achievable with ideal performance by applications like JMS, RabbitMQ, and AMQP

Generally speaking, a Big Data streaming platform offers 3 main capabilities:

  • Publishing and subscribing to streams of records, similar to a message queue or enterprise messaging system;
  • Storing streams of records in a fault-tolerant durable way;
  • Processing streams of records as they occur.

Kafka’s Applications and Case Studies

Some of the companies that are using Apache Kafka in their respective use cases are as follows:

  • LinkedIn: Apache Kafka is used at LinkedIn activity data streaming and operational metrics. This data powers various products such as LinkedIn News Feed and LinkedIn Today.
  • Twitter uses Kafka as a part of its Storm (now Herion actually)—a stream-processing infrastructure. Here is an account of Twitter’s Kafka adoption.
  • Foursquare : Kafka powers online-to-online and online-to-offline messaging at Foursquare. It is used to integrate Foursquare monitoring and production systems with Foursquare-and Hadoop-based offline infrastructures.

Kafka: main concepts

A Kafka cluster primarily has 5 main components:

  • Topic: A topic is a category or feed name to which messages are published by the message producers. In Kafka, topics are partitioned and each partition is represented by the ordered immutable sequence of messages. A Kafka cluster maintains the partitioned log for each topic. Each message in the partition is assigned a unique sequential ID called the offset.
  • Broker: A Kafka cluster consists of one or more servers where each one may have one or more server processes running and is called the broker. Topics are created within the context of broker processes.
  • Zookeeper: It serves as the coordination interface between the Kafka broker and consumers. From the Hadoop Wiki ZooKeeper allows distributed processes to coordinate with each other through a shared hierarchical name space of data registers (we call these registers znodes), much like a file system.
  • Producers: They publish data to the topics by choosing the appropriate partition within the topic. For load balancing, the allocation of messages to the topic partition can be done in a round-robin fashion or using a custom defined function.
  • Consumers: They are the applications or processes that subscribe to topics and process the feed of published messages.

What is Zookeeper

ZooKeeper is a centralised service for maintaining configuration information, naming, providing distributed synchronisation and group services. In a nutshell, Zookeeper is a coordination interface that allows communication between Kafka and the consumer. The main difference between Zookeeper and the normal filesystems lies in the concept of znode. Every znode is identified by a name and separated by a sequence of path (/).

  • at the highest level, there is a root znode separated by “/” under which, there are 2 logical namespaces, namely config and workers.
  • The config namespace is used for centralized configuration management and the workers namespace is used for naming.
  • Under config namespace, each znode can store upto 1MB of data. The main purpose of such structure (also called ZooKeeper Data Model) is to store synchronized data and describe the metadata of the znode.

Where to go from here

Lots of resources can be found on line, just a few to begin your journey with distributed messaging services:

Apache Kafka Home

Apache Kafka Github Repo

Apache Kafka for Beginners

Big Data Messaging with Kafka

Apache Zookeeper HomePage

Apache Zookeeper GitHub Repo

Spring Cloud Zookeeper

How to configure Zookeeper



Setting up your Deep Learning Environment (Mac)

So, you have embarked into your Deep Learning journey and perhaps you are navigating through the concepts of Gradient Descent, Back-propagation and so forth. After all the theory you are eager to get your environment ready to do some actual ‘deep learning hard work’ and you have no idea where to start. You are in the right place then. This short tutorial has been put together for Mac user (sorry Windows aficionados) and will provide you with what you need to get started.

Yes, you need Python!

Sure you know that Python is the key programming language when it comes to Machine and Deep Learning. Make sure you have our beloved HomeBrew:

/usr/bin/ruby -e “$(curl -fsSL"

Install Python 3 (with this version, pip3 will be automatically installed)

brew install python3

Virtual Environment

In order to keep things clean and contain all your deep learning related dependencies in one space, it is useful to use virtual environments.

pip3 install virtualenv virtualenvwrapper

You will also need to modify your bash profile file:

vim ~/.bash_profile

by adding the following:

# virtualenv and virtualenvwrapper
export WORKON_HOME=$HOME/.virtualenvs
export VIRTUALENVWRAPPER_PYTHON=/usr/local/bin/python3
source /usr/local/bin/

Next step is to create a virtual environment for your deep learning project:

mkvirtualenv cv -p python3

This will create a virtual environment named cv and in order to come out of such instance, you will need to type the command deactivate

Some Additional Dependencies

You will also need to install cmake to be able to use dlib, a C++ toolkit containing Machine Learning algorithm:

brew install cmake

Additionally, you will need to download X11 to display the image’s outputs from both dlib and opencv target=”_blank”

Let’s install the real stuff

Situate yourself inside your virtual environment by typing the following:

workon cv

Some additional dependencies should be taken care of:

pip install numpy h5py pillow scikit-image

Finally, we can install OpenCV:

pip install opencv-python

Then, we will be installing Dlib, Tensorflow and Keras:

pip install dlib
pip install tensorflow
pip install keras

Keras, in particular, is a user friendly, beginner library for Machine Learning and Deep Learning models that runs on top of Tensorflow. Happy Machine Learning modelling 🙂


Clearing the Confusion: AI vs Machine Learning vs Deep Learning Differences

Perhaps the most basic question for beginners when learning about Machine Learning and Deep Learning.

Read Parquet Files with SparkSQL

SparkSQL is a Spark module for working with structure data and it can also be used to read columnar data format such as Parquet files.  Here a number of useful commands that can be run from the spark-shell:

#Set the context

val sqlContext = new org.apache.spark.sql.SQLContext(sc)

#Read the parquet file in HDFS and

val df =“hdfs://user/myfolder/part-r-00033.gz.parquet”).printSchema

#Show the top 10 rows of data from the parquet file, false)

#Convert to JSON and print out the content of 1 record


Jenkins Best Practices – Practical Continuous Deployment in the Real World — GoDaddy Open Source HQ

Source: Jenkins Best Practices – Practical Continuous Deployment in the Real World — GoDaddy Open Source HQ