Category Archives: Uncategorized

Read Parquet Files with SparkSQL

SparkSQL is a Spark module for working with structure data and it can also be used to read columnar data format such as Parquet files.  Here a number of useful commands that can be run from the spark-shell:

#Set the context

val sqlContext = new org.apache.spark.sql.SQLContext(sc)

#Read the parquet file in HDFS and

val df = sqlContext.read.parquet(“hdfs://user/myfolder/part-r-00033.gz.parquet”).printSchema

#Show the top 10 rows of data from the parquet file

df.show(10, false)

#Convert to JSON and print out the content of 1 record

df.toJSON.take(1).foreach(println)

Jenkins Best Practices – Practical Continuous Deployment in the Real World — GoDaddy Open Source HQ

Source: Jenkins Best Practices – Practical Continuous Deployment in the Real World — GoDaddy Open Source HQ

Java Beans and DTOs

DTO (Data Transfer Object)

Data Transfer Object is a pattern whose aim is to transport data between layers and tiers of a program. A DTO should contain NO business logic

public class UserDTO {
    String firstName;
    String lastName;
    List<String> groups;

    public String getFirstName() {
        return firstName;
    }
    public void setFirstName(String firstName) {
        this.firstName = firstName;
    }
    public String getLastName() {
        return lastName;
    }
    public void setLastName(String lastName) {
        this.lastName = lastName;
    }

    public List<String> getGroups() {
        return groups;
    }
    public void setGroups(List<String> groups) {
        this.groups = groups;
    }
}

Java Beans

Java Beans are classes that follows certain conventions or event better they are Sun/Oracle standards/specifications as explained here:

https://www.oracle.com/technetwork/java/javase/documentation/spec-136004.html

Essentially, Java Beans adhere to the following:

  • all properties are private (and they are accessed through getters and setters);
  • they have zero-arg constructors (aka default constructors)
  • they implement the Serializable Interface

The main reason why we use Java Beans is to encapsulate

public classBeanClassExample() implements java.io.Serializable {

  private int id;

  //no-arg constructor
  public BeanClassExample() {
  }

  public int getId() {
    return id;
  }

  public void setId(int id) {
    this.id = id;
  }
}

So, yeah what is the real difference? If any?

In a nutshell, Java Beans follow strict conditions (as discussed above) and contain no behaviour (as opposed to states), except made for storage, retrieval, serialization and deserialization. It is indeed a specification, while DTO (Data Transfer Object) is a Pattern on its own. It is more than acceptable to use a Java Bean to implement a DTO pattern.

Avro is amazing!

Why Avro For Kafka Data?

Scala, give me a break :)

I have been recently asked whether it is possible to use break (and continue as well) in a loop with Scala and it occurred to me that I have never come across such a case. Coming from Java, I do know how to employ break and continue in a while loop, for example, so why would it be different in Scala, considering that it is builds on top of the JVM? It is actually a bit more complicated than that. Although Scala does not specifically have the keywords break and continue, it does offer similar functionality through scala.util.control.Breaks.

Here is an example of how to use break from the Class Breaks, as follows:

import scala.util.control.Breaks._
import java.io._

val in = new BufferReader(new InputStreamReader(System.in))

breakable {
  while (true) {
    println ("? ")
    if (input.readLine() == "") break
  }
}

In Java, the above would corresponding to this:

BufferedReader in =
   new BufferedReader(new InputStreamReader(System.in));
   while (true) {
     if (in.readLine() == "") break
   }

The breakable function has become available from Scala 2.8 onwards and before that we would have tacked the issues mostly through 2 approaches:

  • by adding a boolean variable indicating whether the loops keeps being valid;
  • by re-writing the loop as a  recursive function;

Happy Scala programming 🙂

 

2 minutes to spare: Apache NiFi on Mac

As a Mac user, I usually run Apache NiFi using one of the two approaches:

  • by standing up a Docker container;
  • by downloading and installing locally on your Mac;

Running a NiFi Container

You can install Docker on Mac via Homebrew:

brew install docker

Alternatively it is possible to download the Docker Community Edition (CE): an easy to install desktop app for building, packaging and testing dockerised apps, which includes tools such as Docker command line, Docker compose and Docker Notary

After installing Docker, this will let you pull the NiFi image:

docker pull apache/nifi:1.5.0

Next, we can start the image and watch it run:

docker run -p 8080:8080 apache/nifi:1.2.0

Downloading and Installing NiFi locally

Installing Apache NiFi on Mac is quite straightforward, as follows:

brew install nifi

This assumes that you have Homebrew installed. If that is not the case, this is the command you will need:

ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" < /dev/null 2> /dev/null

Here is where NiFi has been installed:

/usr/local/Cellar/nifi/your-version-of-nifi

Some basic operations can be done with these commands:

bin/nifi.sh run, it runs in the foreground,

bin/nifi.sh start, it runs in the background

bin/nifi.sh status, it checks the status

bin/nifi.sh stop, it stops NiFi

Next step, whatever approach you took at the beginning, is to verify that your NiFi installation/dockerised version is running. This is as simple as visiting the following URL:

localhost:8080/nifi

Happy Nif-ing 🙂

Machine Learning’s ‘Amazing’ Ability to Predict Chaos

Machine Learning’s ‘Amazing’ Ability to Predict Chaos
http://go.squidapp.co/n/eSFPx8E

Download SQUID – Your News Buddy
squidapp.co/getSQUID

An Oracle JDBC Client

A while ago I was tasked to write a small application in order to connect to an Oracle Database and perform a set of simple queries. For such a task, I have employed the DAO (Data Access Object) pattern and a corresponding DAO Interface. A basic Java client, in turn, calls the instantiation of such DAO class, which implements a the DAO interface. As follow, the application in its internal details:

Oracle DB Client 

[code language=”java”]
package oracledb.connection.client;
import oracledb.connection.dao.OracleDB_DAO;

public class OracleConnectionClient {

public static void main(String[] args) throws Exception {

OracleDB_DAO dao = new OracleDB_DAO();
dao.readPropertiesFile();
dao.openConnection();
dao.getDBCurrentTime();
dao.getFirstNameAndLastNameFromCustomers();
dao.closeConnection();

}
}
[/code]
The Data Access Object (DAO) implementation. The method[code]readPropertiesFile()[/code]

parses a properties file containing the access credentials and DB connection details.

 

[code language=”java”]
package oracledb.connection.dao;

import java.io.*;
import java.sql.*;
import java.util.Properties;

public class OracleDB_DAO implements OracleDB_DAO_Interface {

public static String SAMPLE_SELECT_QUERY = “SELECT * FROM CUSTOMERS WHERE FirstName = ‘Eliott’ AND LastName = ‘Brown'”;

private static String driverClass = “oracle.jdbc.driver.OracleDriver”;
private Connection connection;
private static String dbUrl;
private static String userName;
private static String password;

static String resourceName = “dbconnection.properties”;

/**
* Read the properties Initialise the DAO
*
* @throws IOException
* @throws ClassNotFoundException
*/

public void readPropertiesFile() throws IOException, ClassNotFoundException {

ClassLoader loader = Thread.currentThread().getContextClassLoader();
Properties props = new Properties();
InputStream resourceStream = loader.getResourceAsStream(resourceName);
{
props.load(resourceStream);
}

// Return the properties
dbUrl = props.getProperty(“dburl”);
userName = props.getProperty(“dbuser”);
password = props.getProperty(“dbpassword”);

// Load the
Class.forName(driverClass);
}

/*
* (non-Javadoc)
*
* @see oracledb.connection.dao.OracleDB_DAO_Interface1#openConnection()
*/
@Override
public void openConnection() throws SQLException {

// get the connection to the database
System.out.println(“Establishing the Connection to the Database”);
try {
connection = DriverManager.getConnection(dbUrl, userName, password);
System.out.println(connection);
} catch (SQLException ex) {
ex.printStackTrace();
}
}

/*
* (non-Javadoc)
*
* @see oracledb.connection.dao.OracleDB_DAO_Interface1#closeConnection()
*/
@Override
public void closeConnection() throws SQLException {
if (connection != null) {
// close the connection
connection.close();
}
}

/*
* (non-Javadoc)
*
* @see oracledb.connection.dao.OracleDB_DAO_Interface1#
* getFirstNameAndLastNameFromCustomers()
*/
@Override
@SuppressWarnings(“resource”)
public ResultSet getFirstNameAndLastNameFromCustomers() throws SQLException, IOException {
// create the prepared stmt
Statement stmt = connection.createStatement();
// assign the query to a variable
String sql = SAMPLE_SELECT_QUERY;
// execute the query
ResultSet rs = stmt.executeQuery(sql);
System.out.println(“This print the ResultSet for getPlanByMSISD ” + rs);
@SuppressWarnings(“unused”)
PrintWriter csvWriter = new PrintWriter(new File(“sample.csv”));

stmt.close(); // close statement
return rs;
}

/*
* (non-Javadoc)
*
* @see oracledb.connection.dao.OracleDB_DAO_Interface1#getDBCurrentTime()
*/
@Override
public String getDBCurrentTime() throws SQLException, IOException {
String dateTime = null;
// create the prepared stmt
Statement stmt = connection.createStatement();
ResultSet rst = stmt.executeQuery(“select SYSDATE from dual”);
while (rst.next()) {
dateTime = rst.getString(1);
}
// close the resultset
System.out.println(“This prints the dateTime from the DB ” + dateTime);
rst.close();
return dateTime;

}
}
[/code]
The DAO Interface that defines the standard operations to be performed on a model object:

[code language=”java”]
package oracledb.connection.dao;

import java.io.IOException;
import java.sql.ResultSet;
import java.sql.SQLException;

public interface OracleDB_DAO_Interface {

/**
* Open the Dao Connection
*
* @param
* @throws SQLException
* @throws IOException
*/
void openConnection() throws SQLException;

/**
* Close the connection
*
* @throws SQLException
*/
void closeConnection() throws SQLException;

/**
* Get the resultset from the the select query
*
* @throws SQLException
* @throws IOException
*/
ResultSet getFirstNameAndLastNameFromCustomers() throws SQLException, IOException;

/**
* Get the Current Time via DB Query
*
* @return
* @throws SQLException
* @throws IOException
*/
String getDBCurrentTime() throws SQLException, IOException;

}
[/code]

12+ must have Atom packages to work in JavaScript | Void Canvas

Source: 12+ must have Atom packages to work in JavaScript | Void Canvas

IE11 and String.prototype.includes() in Angular directives

Just come across an interesting behaviour with Angular directives and E11. Apparently E11 does not seem to work with the function String.prototype.includes(), for example:

[code language=”html”]
<div ng-if="str.includes(‘test’)" class="someClass"><span>Some text</span></div>
[/code]

where str == ‘Sometest‘ and the generic syntax is as below:

Browser’s compatibility is an issue with IE11 and generally speaking it is poor across IE, so it is highly recommended to use this, instead:

[code language=”html”]
<div ng-if="str.indexOf(‘test’) >= 0" class="someClass"><span>Some text</span></div>
[/code]

as per the syntax below:

IndexOf method returns the index of the string as passed in.  If the value is not found, it returns -1.

Further documentation:

MDN documentation for includes

MDN documentation for indexOf