Wednesday, August 8, 2012

How MySQL database keeps tidy by Mozilla

As an open source company, Mozilla developers make a lot of different versions of software code each day, and part of Sheeri Cabral's job to keep track of them all: which ones work, which don't, how many times they've been downloaded, and which have a bug that needs to be fixed.



To do that, the makers of the Firefox browser have a MySQL database, the common open source structured database system, which organizes the information in a table format. A few months ago Cabral, who is a database administrator and architect for Mozilla and a MySQL community contributor, began running into issues as the database grew to over 100GB. "If that database doesn't work, the downloads aren't available," she says, emphasizing the importance of the database.


Typically the solution to such a problem is to throw more compute capacity at the server housing the database, or potentially switching from hard drive spinning disks to solid state drives, says Paul Burns, an analyst at Neovise. But Cabral found a different solution: Instead of having this one single database, Mozilla has in effect virtualized its database by splitting it up into a group of clusters, each holding a portion of the database. Using technology from a company named ScaleBase, now when a query is made the ScaleBase software identifies the cluster where the data is stored so that the entire database doesn't have to be searched. This speeds performance without adding additional hardware. "This is not an easy thing to do," Cabral says, "but they seem to have done it and it's working."

MORE MOZILLA: Can Mozilla right the ship?

NOT ALL FUN AND GAMES: Inside the IT challenges of sports and entertainment

ScaleBase was born out of an Israeli consultancy a few years ago. After receiving several requests from customers to help scale MySQL databases for Web-based and mobile applications, the idea of sharding the database, or splitting it up into smaller bite-size chunks, was tested. It worked for a variety of customers, so a business was born to sell the product on a wider scale, says Paul Campaniello, VP of global marketing for ScaleBase. "People have virtualized machines, storage, operating systems," he says. "No one has really virtualized the MySQL database yet."

After receiving venture capital funding two years ago, the company brought ScaleBase into GA this year and since then it has built out its management team, including bringing on now-Executive Chairman Ram Mester, a former vice president at IBM's information management division where he led the database management, security and optimization practices.

ScaleBase describes its flagship Data Traffic Manager as a load balancing tool that sits between an application and the backend database used to store data for the program. When using the software for the first time, ScaleBase will atomically analyze the database and partition it up into multiple instances. Once a query is made, it directs the client requests directly to the appropriate instance within the database. Pricing is based on the size of the database being managed.

ScaleBase's sharding technique is not a novel concept, but it is one of the first implementations of the technology in databases, and specifically MySQL databases, says Burns, the Neovise analyst. "These databases haven't traditionally been something that you break up, but ScaleBase takes a sharding algorithm to it and makes multiple copies of the data on different servers," he says. "They've made sharding easy to do and automated it." The technology could be helpful for anyone running a MySQL database, which is common in the open source world, and it could be especially helpful when those databases begin to scale to large sizes, Burns says.

Cabral and Burns have some reservations, though. The open source community has very much of a do-it-yourself attitude. Some open source MySQL database administrators may not be interested in purchasing a product to handle the functionality and would instead build solutions in-house or rely on an open source community to supply the technology. Cabral says she explored that option, but there just weren't open source community tools available with the functionality that ScaleBase had. To expand, ScaleBase does have an opportunity to support other open source databases, or it could even branch out to managing other types of databases, including tackling the growing big data problem of unstructured data.

No comments: