Author Archives: icorda
Monorepos
At work, I was recently introduced to the idea of Monorepos. While developer who have been writing code for microservices tend to think of microservice architecture as a number of x codebases somehow interacting with each other, there is another school of thoughts that is pushing the idea of organising code in a single repository. Do not fret, that does not mean that we want to create a monolithic application, tightly coupled, aka a giant and far from being modular piece of software. It only means that, instead of having the code residing in different repositories and having to tweak and make changes in each of one making sure that integration tests are still green and happy, everything is in one codebase and it generally holds the following features:
- Only build the services or cmds that are modified in a pushed commit;
- Build all services and/or cmds that are affected by changes in common codes (i.e.
pkg
); - Build all services and/or cmds that are affected by changes in
vendor
codes.
https://medium.com/goc0de/how-to-golang-monorepo-4f62320a01fd
https://circleci.com/blog/monorepo-dev-practices/
https://hardyantz.medium.com/getting-started-monorepo-golang-application-with-bazel-370ed1069b4f
ApacheCon 2021 has arrived!
ApacheCon, the global Apache conference, has arrived and it landed in your own very living room. That is possibly one of the few times in the past year and half where you might find yourself saying “Thank you COVID!”. ApacheCon is not only fully online, but it is also free (unless you want to donate to Apache Foundation, which is all well and good). Here is the link to register:
https://www.apachecon.com/acah2021/
Have fun!
Go Primer: & and * operators
I have recently enrolled on a Udemy course called GetGoing: Introduction to Golang as I wanted to sink my teeth into the marvellous world of Go; its simplicity and its power to harness concurrency. One of the first concepts you are likely to get confused about when starting to learn Go is the use of the two operators & and *
The operator &
is placed in front of a variable and returns its memory address
For example:
myVariable := "aBeautifulString"
anotherVariable :=&myVariable
fmt.Println(anotherVariable)
The result is a memory address, in this fashion: 0xc00008a000
The operator *
(aka reference operator) is placed in front of a variable that holds a memory address and resolves it. For example:
aString := "thisIsAString"
anotherString := *&aString
The result will be "thisIsAString"
There is also an additional case where the operator *
is placed in front of a type, e.g. *string
and it becomes part of the type declaration stating that the variable holds a pointer to a string. For example:
var str *string
Let’s spend 2 minutes at The Bash Script Corner
A while ago, I was tasked with writing a bash script that would do the following:
- Read the username, password and host values from Kubernetic config map
- Reads the Postgres username, password, host and database values
- Replace variables with corresponding values in the application.yml file
#! /usr/bin/env sh
echo "Mapping the env variables to the values from Kubernetis "
STAGING_ENV="$1"
#Sanity check
if [ $# -eq 0 ]; then
echo "No argument was provided, however the script requires 1 argument to successfully run"
exit 1
fi
POSTGRES_CORE="postgres-core"
#Postgres Env Variables
export DB_USERNAME=$(kubectl get configmap "$STAGING_ENV"-$POSTGRES_CORE -o jsonpath="{.data.username}")
export DB_PASSWORD=$(kubectl get secret "$STAGING_ENV"-$POSTGRES_CORE -o jsonpath="{.data.password}" | base64 -D)
export DB_HOSTNAME=$(kubectl get configmap "$STAGING_ENV"-$POSTGRES_CORE -o jsonpath="{.data.host}")
export DB_NAME=$(kubectl get configmap "$STAGING_ENV"-$POSTGRES_CORE -o jsonpath="{.data.name}")
export APPLICATION_YML_LOCATION=src/main/resources/application.yml
echo "Start replacing postgres env variables values with sed"
sed -i '' "s/DB_USERNAME:postgres/DB_USERNAME:$DB_USERNAME/g" $APPLICATION_YML_LOCATION
sed -i '' "s/DB_PASSWORD:password/DB_PASSWORD:$DB_PASSWORD/g" $APPLICATION_YML_LOCATION
sed -i '' "s/DB_HOSTNAME:localhost/DB_HOSTNAME:$DB_HOSTNAME/g" $APPLICATION_YML_LOCATION
sed -i '' "s/DB_NAME:core-db/$DB_NAME:$DB_NAME/g" $APPLICATION_YML_LOCATION
echo "End replacing postgres env variables values with sed"
Installing and Configuring Hadoop on Mac
So, now you want to find out a bit more about Hadoop, an open source framework for storing large datasets in a distributed environment. Before tackling the installation and configuration of Hadoop on your beloved Mac, let’s clarify a few important points will help navigate the world of Hadoop.
Hadoop’s Components
There are essentially 4 components that form the core of Apache Hadoop:
- HDFS, aka Hadoop Distributed File System; HDFS is the primary data storage system used by Hadoop applications. It employs NameNode and DataNode architecture to
- MapReduce, aka the distributed data processing framework of Apache Hadoop;
The MapReduce algorithm consists of 2 main stages:
- Map stage − This is the input stage where data is in the form of file or directory and is stored in the Hadoop file system (HDFS). The mapper is responsible to process the data and split it in several chunks.
- Reduce stage − This stage is the combination of the Shuffle stage and the Reduce stage. The Reducer’s job is to process the data that comes from the mapper.
- During a MapReduce job, Hadoop sends the Map and Reduce tasks to the appropriate servers in the cluster.
- The framework manages all the details of data-passing such as issuing tasks, verifying task completion, and copying data around the cluster between the nodes.
- Most of the computing takes place on nodes with data on local disks that reduces the network traffic.
- After completion of the given tasks, the cluster collects and reduces the data to form an appropriate result, and sends it back to the Hadoop server.
- Hadoop Common, that are is a set of pre-defined utilities and libraries employed by other modules within the Hadoop ecosystem;
- YARN(Yet Another Resource Negotiator).Yarn is the cluster resource management layer of the Apache Hadoop Ecosystem, which schedules jobs and assigns resources. The main idea behind the birth of YARN was to resolve issues such as scalability and resource utilisation within a cluster. Yarn has 2 core components: Scheduler and Applications manager. The scheduler is responsible for allocating resources to various up and running applications, but it does not perform monitoring or tracking of status for the applications.
Conversely, the applications manager is responsible for accepting job submissions.
When Hadoop is the right choice
- For Processing large datasets;
- For Storing a variety of data formats (see the concept of Data Variety as one of the 3 V’s in Big Data;
- For Parallel Data Processing (yes and that is exactly what MapReduce helps you with)
When Hadoop is NOT the right choice
- When your dataset is not big enough, meaning that you can work well with RDBMs solutions;
- For processing data stored in relational databases;
- For processing real-time as well as graph-based data.
Hadoop Mac Installation
Before installing Hadoop, you should ask yourself what kind of Hadoop cluster and therefore installation would you require? To make it easy, here are 3 of the most common installation types:
- Local or Standalone Mode. In this way, Hadoop is configured to run in a non-distributed manner as a single Java process running on your computer;
- Pseudo Distributed Mode (also known as single-node cluster). This means that it will be similar to the standalone mode, but all Hadoop daemons run on a single node. This is what they called near production mode.
- Fully Distributed Mode.This is the production mode of Hadoop where multiple nodes will be running at the same time. In such setting, data will be distributed across several nodes and processing will be done on each node.
Setting Up SSH on MacOS
Before installing Hadoop, we need to make sure that SSH is working properly on your machine, by running the following:
ssh localhost
If it returns this:
ssh: connect to host localhost port 22: Connection refused
It means that the remote login is off:
sudo sy
stemsetup -getremoteloginRemote
Login: off
In order to enable the remote login, run the following:
sudo systemsetup -setremotelogin on
SSH keys will then need to be generated:
ssh-keygen -t rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Install Hadoop via HomeBrew
We are going to install Hadoop via HomeBrew, as follows:
brew install hadoop
/usr/local/Cellar/hadoop/3.2.1
Hadoop Configuration on Mac
Configuring Hadoop requires a number of steps.
Edit
hadoop-env.sh
The file is located at
/usr/local/Cellar/hadoop/3.2.1/libexec/etc/hadoop/hadoop-env.sh
The following line should be change from
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
to
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc="
Edit
Core-site.xml
The file is located at
/usr/local/Cellar/hadoop/3.2.1/libexec/etc/hadoop/core-site.xml
and add the below configuration inside
<property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property>
Edit
mapred-site.xml
The file is located at
/usr/local/Cellar/hadoop/3.2.1/libexec/etc/hadoop/mapred-site.xml
and by default will be blank add below config
<configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9010</value> </property> </configuration>
Edit
hdfs-site.xml
The file is located at
/usr/local/Cellar/hadoop/3.2.1/libexec/etc/hadoop/hdfs-site.xml
add the following:
<configuration> <property> <name>dfs.replication</name> <value></value> </property> </configuration>
Before running Hadoop format HDFS
$ hdfs namenode -format
To Start Hadoop, you can use the following 2 commands:
start-dfs.sh start-yarn.sh
Both scripts are available at:
/usr/local/Cellar/hadoop/3.2.1/sbin
You must be logged in to post a comment.