Open Source Programming: VMware launches open source toolkit to run Hadoop on virtual machines

Monday, June 18, 2012

VMware launches open source toolkit to run Hadoop on virtual machines

VMware is ramping up its big data push with Serengeti, a new open source toolkit that lets enterprises run Apache Hadoop on virtual machines.

In a statement Thursday, the virtualisation juggernaut said the toolkit will allow enterprises to deploy a Hadoop cluster in minutes on VMware’s vSphere virtualisation platform, plus common Hadoop components such as Apache Pig and Apache Hive.

VMware is also working with the Apache Hadoop community to contribute extensions that will make key components “virtualisation-aware” to support elastic scaling and improve Hadoop’s performance in virtual environments.

Apache Hadoop is an open source platform commonly used by large enterprises in the growing area of big data processing, where complex data sets are broken down into smaller chunks for analysis by clusters of computers to derive key business insights. It is based on MapReduce, a programming model conceived by Google to overcome the problem of creating web search indexes.

According to VMware, deployment and operational complexity, the need for dedicated hardware, and concerns about security and service level assurance have prevented many enterprises from taking advantage of Hadoop.

“By decoupling Apache Hadoop nodes from the underlying physical infrastructure, VMware can bring the benefits of cloud infrastructure – rapid deployment, high-availability, optimal resource utilization, elasticity, and secure multi-tenancy – to Hadoop,” it said.

Tony Baer, principal analyst at technology consultancy Ovum said: “Hadoop must become friendly with the technologies and practices of enterprise IT if it is to become a first-class citizen within enterprise IT infrastructure. The resource-intensive nature of large Big Data clusters make virtualisation an important piece that Hadoop must accommodate”.

“VMware’s involvement with the Apache Hadoop project and its new Serengeti Apache project are critical moves that could provide enterprises the flexibility that they will need when it comes to prototyping and deploying Hadoop,” Baer added.

Earlier this month, VMware partnered with HortonWorks to develop a high availability architecture that allows companies to run HortonWorks’ Hadoop clusters on vSphere. In April, it also acquired big data start-up Cetas that provides analytics applications on top of Hadoop.

Monday, June 18, 2012

VMware launches open source toolkit to run Hadoop on virtual machines

No comments: