Monthly Archives: January 2019

Read Parquet Files with SparkSQL

SparkSQL is a Spark module for working with structure data and it can also be used to read columnar data format such as Parquet files.  Here a number of useful commands that can be run from the spark-shell:

#Set the context

val sqlContext = new org.apache.spark.sql.SQLContext(sc)

#Read the parquet file in HDFS and

val df = sqlContext.read.parquet(“hdfs://user/myfolder/part-r-00033.gz.parquet”).printSchema

#Show the top 10 rows of data from the parquet file

df.show(10, false)

#Convert to JSON and print out the content of 1 record

df.toJSON.take(1).foreach(println)

Jenkins Best Practices – Practical Continuous Deployment in the Real World — GoDaddy Open Source HQ

Source: Jenkins Best Practices – Practical Continuous Deployment in the Real World — GoDaddy Open Source HQ