Monthly Archives: January 2019

Read Parquet Files with SparkSQL

Posted by icorda

SparkSQL is a Spark module for working with structure data and it can also be used to read columnar data format such as Parquet files. Here a number of useful commands that can be run from the spark-shell:

#Set the context

val sqlContext = new org.apache.spark.sql.SQLContext(sc)

#Read the parquet file in HDFS and

val df = sqlContext.read.parquet(“hdfs://user/myfolder/part-r-00033.gz.parquet”).printSchema

#Show the top 10 rows of data from the parquet file

df.show(10, false)

#Convert to JSON and print out the content of 1 record

df.toJSON.take(1).foreach(println)

Posted in Uncategorized

Jenkins Best Practices – Practical Continuous Deployment in the Real World — GoDaddy Open Source HQ

Jan 10

Posted by icorda

Source: Jenkins Best Practices – Practical Continuous Deployment in the Real World — GoDaddy Open Source HQ

Posted in Uncategorized

Semantic Creatures

Foraging Information in a Sense Making World

Monthly Archives: January 2019

Read Parquet Files with SparkSQL

Jenkins Best Practices – Practical Continuous Deployment in the Real World — GoDaddy Open Source HQ

Web 2.0 Communities

Web 2.0 Directories

Web 2.0 games and gamification

Web 2.0 Microblogging

Web 2.0 Music

Web 2.0 Productivity Tools

Web 2.0 Research Tools

Twitter Updates

Archives