Data is Getting Bigger and Bigger – Hadoop
Since 2010, Hadoop has been generating a lot of buzz amongst the organizations and the data scientists alike. Hadoop came to the limelight when the data scientist or data professionals operating on large data-sets to the tune of TeraBytes started experimenting and deploying Hadoop. Though used by Google a long time ago to operate on its large datasets (MapReduce) it gained more popularity post 2010. One more reason attributed may be the inability of MongoDB to gain traction amongst the developers and the organizations alike.
What the TDWI’s Survey says?
The TDWI had conducted a survey on 48 respondents who claimed to have deployed or used HDFS.It has been found out that HDFS and a few add-ons are the most used Hadoop products. It can also be attributed to the fact that most Hadoop applications run on HDFS as the base platform.
There are some Hadoop add-ons that are layered over HDFS:
- MapReduce (69%) – It is used for the distributed processing of hand-coded logic for analytics or fast data loading or ingestion.
- Hive (60%) – It is used for projecting structure onto Hadoop Data, so that it can be queried using an SQL-like language like HiveSQL.
- HBase (54%) – It is used for simple record-store database functions against HDFS data.
Some of Hadoop products that are gaining popularity are:
- Mahout – Over 50% of the respondents said they would try or deploy Mahout.
- R – Coming in a close second, about 44% respondents said they would try or deploy R in the coming year.
- Zookeeper (40%), HCatalog (40%) and Oozie (40%) also generated a lot of enthusiasm amongst the respondents.
There is a lot of action going in the Hadoop space and there is also a lot of action happening as more and more developers are keen to take up Hadoop. Organizations are interested due to the cost and other advantages delivered by Hadoop over the other relational database technologies. Also more technology and analytics vendors are bringing out Hadoop based solutions in order to make the most use of the open source technologies.