Monthly Archives: June 2013

The Oxford English Dictionary and Big Data

The Oxford English Dictionary has announced its last updates, which include 1200 newly revised and updated words. Among them, the term ‘Big Data’ appears with the following definition: “data  a very large size, typically to the extent that its manipulation and management present significant logistical challenges“. Interestingly, the earliest use of the term dates back to 1980 when the Sociologist Charles Tilly in his working paper surveying “The old new social history and the new old social history” ” wrote that “none of the big questions has actually yielded to the bludgeoning of the big-data people.” Although Tilly certainly did not take up the current meaning of “Big Data”, he made reference to Lawrence Stone‘s discussion on the use of quantitative methods in historical research for making insightful generalization about the past. It seems to me that methodological barriers and debates between (Pure and Applied – forgive me for such improper differentiation of the Scientific Practice) Sciences and Social Sciences has found one more common ground. Slightly sidetracked by some reminiscences from my Philosophy degree, the influential lecture “The Two Cultures” by the novelist Charles Percy Snow came to my mind. Will the “Big Data Culture” be the new “lingua franca”?

The entire article can be read at:

Big Data News: A Revolution Indeed – Forbes.

New book on Solr is now published: Instant Apache Solr for Indexing Data How-to

My book on Solr is now published « Outer Thoughts.

CERN and Big Data Analytics: Solving the Mysteries of The Universe With Big Data

Last month, the Chief Technology Officer at CERN presented at the Big Data Innovation Summit in London. His talk ‘Solving the Mysteries of the Universe with Big Data’ is now available to watch on demand :


Solving the Mysteries of The Universe With Big Data

CERN is one of the world’s largest and most respected centres for scientific research. Its business is fundamental physics, finding out what the Universe is made of and how it works. CERN operates the Large Hadron Collider (LHC), the world’s largest and most powerful particle accelerator where the ATLAS and CMS experiments recently announced their observations of a particle consistent with the long-sought Higgs boson.

The particle’s detection has set the worldwide scientific community buzzing, but behind the success of the work undertaken at the LHC, lies a story of Big Data success that is truly ground-breaking.

Data handling on a massive scale is essential to achieve such results. CERN operates the Worldwide LHC Computing Grid (WLCG), which combines the IT power of tens of thousands of computers distributed across more than 150 computer centres to meet the needs of the LHC experiments. The rapid increase in performance of the LHC accelerator is having an impact on the computing requirements since it increases the rate, complexity and quantity of data that the LHC experiments need to store, distribute and process. Around 30 Petabytes of data is stored annually.

CERN is actively investigating a number of new approaches and technologies that will help ensure it continues to meet the extreme IT needs of the LHC over its foreseen 15 year lifetime. This presentation will explain how the LHC data is managed today and the future directions being investigated with leading IT companies and research organisations around the world.

Practical Data Science withe R – New Book-

For those who have been struggling with using R for statistical analysis and wish they had a rich and articulated written receipe on how to do Data Science with R, this might be of your interest. I have had the chance to skim through the Table of Contents and had a quick look at the Free chapter and this is certaintly a great addition to other Analytics, Statistics, Machine Learning, Data Science and R books.  It seems to emphasise on the multi-faceted role of the Data Scientist the process as a whole: from gathering requirements, loading and examining data to building, validating and deploying models to production.


Practical Data Science with R, Nina Zumel and John Mount. The article below is from

Besides, here is the Manning page for the book (with subscription to early access):

The Companies That Matter Most in Data

A list of the companies most involved in Big Data and Big Data Analytics together with a compilation of 42 big start-up that matter in the Big Data arena. Great sources to be aware of the key players in the market:

DBTA 100: The Companies That Matter Most in Data – Big Data News.


Do You Have What It Takes To Be a Data Scientist?

Another interesting article about the top qualities/skills that a Data Scientist ought to have. To quote Anjul Bhambhri, vice president of big data products at IBM:

 “A data scientist is somebody who is inquisitive, who can stare at data and spot trends. It’s almost like a Renaissance individual who really wants to learn and bring change to an organization.”

The entire article can be read here:

Do You Have What It Takes To Be a Data Scientist? | Jigsaw Academy | Training for careers in Analytics.