Reduce. Retain. Retrieve.
News and Blog

RainStor on EMC Centera – looks/smells/feels and performs like a database

Mark Cusack on 14 October 2009

We’ve recently completed a large-scale benchmarking exercise measuring the performance of RainStor on EMC Centera. EMC Centera is arguably the industry standard storage platform for data archiving, but is more commonly associated with unstructured data, such as emails, documents and images. The aim of this exercise was to demonstrate that RainStor on EMC Centera could be used to archive massive volumes of structured data while providing efficient and scalable SQL query performance. In essence, we wanted to demonstrate query performance comparable to a traditional database but without the management and tuning overhead. We’ll be writing a full white paper in the coming weeks describing the environment and results in more detail, but here is a brief summary….

The RainStor virtual file system is built on top of the SNIA XAM standard, so running on EMC Centera simply involves using EMC’s VIM (Vendor Interface Module)…getting the 16 blade Gen 4 Centera rack into the lab was significantly harder! The test hardware comprised a mix of Linux 8-core blades all with 16G of RAM connected via 1Gbit Ethernet: HP DL585, DELL SC1345 and HP DL380. The mix of chipsets and clock speeds enabled us to really test our scale up capabilities in a heterogeneous environment.

We tested our query performance on data sets from different types of business applications including investment banking back office systems, retail data warehouses, social security data marts, SMS/MMS/CDR archives and security log & event management systems. The data volumes ranged from 100M to 8B records, stored in up to 125 individual tables, occupying as much as 2T when stored in delimited format. The SQL queries also ranged in complexity from 350 lines of SQL over 10 tables with multiple nested views, to 3 lines of SQL covering just 1 table. The NParchive compression rates varied from 10X through to 40X (the 10X figure corresponds to the SMS archiving, which is dominated by the text payload).

The headline is that across the board, RainStor on Centera proved to be as efficient as RainStor on DAS/NAS/SAN, importing up to 13B records/day on a single 8 core server while providing excellent query performance. In terms of scale up, RainStor on EMC Centera also really shines — near perfect scalability for both query and import across multiple servers, with no sign of slowing down. When comparing query performance against a well known OLTP database using local storage, the results were broadly similar, ranging from a 2X slowdown to a 10X speedup.

In summary, we’ve been able to demonstrate that RainStor on EMC Centera delivers impressive import and query performance with hardware assured immutability, while also supporting low-cost scalability and massive data compression. Look out for the white paper containing the detailed results in the coming weeks!

Add a comment

*
*
*