Dive into the Data Lake
Submitted by Fred Kohout on
EMC Isilon helps your customers rise with the tide of Big Data.
NOTE: This is a first look at the solutions we announced on July 8 and what they mean for our partners. Stay tuned for additional segments focused on VMAX3 and XtremIO.
No matter how many times I see the numbers, I’m still blown away by the projected growth in unstructured (aka “Big”) data: from 4.4 zettabytes last year (just writing “zettabytes” is a statement in and of itself!) to 44 zettabytes in 2020, according to IDC. That means it’s going to double every two years.
This represents an enormous – and potentially game-changing – opportunity for those businesses that can 1) make all the data accessible in an efficient and protected way, and 2) apply analytics to gain business insight for competitive advantage. But how to do that when existing data is stored in unconnected “silos” or “islands of data” on different platforms using different protocols? Businesses have cobbled together solutions, such as copying the data across platforms and then performing multiple analyses. This is inefficient, however, and that’s a big problem at a time when IT budgets are under the microscope – and line-of-business leaders expect IT to help drive business success, not just keep the lights on.
That’s where EMC Isilon’s concept of the data lake comes in – and where the new Isilon solutions announced earlier this month can help you jumpstart conversations with your customers so you can make sure they don’t drown beneath the rising wave of Big Data.
An Isilon data lake consolidates multiple, disparate islands of storage into a single cohesive and unified data repository. The advantages are immense:
- A high degree of interoperability as it supports different kinds of protocols and interfaces. For example, the Isilon Scale-Out Data Lake natively supports multiple protocols like SMB, NFS, File Transfer Protocol (FTP), and Hypertext Transfer Protocol (HTTP) for traditional workloads, Object for next gen cloud apps, and HDFS for emerging workloads like Hadoop analytics. On top of this, we’ve partnered with key strategic partners to create solutions for emerging challenges, including VCE Converged Infrastructure, Hadoop Big Data Analytics, media service from Atmos, and Data Lake “aaS” from Rackspace.
- The highest levels of security as this is centrally and consistently managed across the data lake, without the range (and potential vulnerabilities) of policies and protections in a siloed world. This is particularly important given that a lot of Big Data consists of extremely confidential and highly regulated information, such as medical and financial information.
A key point: the Isilon Data Lake is a scale-out solution. Given the inevitable growth of data, and the need to control costs, an organization must be able to start with the right-sized system, and then add capacity as needed. Scale-out is what Isilon does. This solution can grow quickly, efficiently in both capacity and performance. A single Isilon cluster can scale from 18TB to 20PBs and to 200GB/sec of throughput. That should cover the biggest of Big Data!
Don’t forget that when having these conversations with your customers, you also have two new Isilon nodes to serve as the backbone for a data lake or other Big Data need. Your recommendation will likely depend on your customer’s use case:
- If they have strong transactional requirements (say in creating media content or high volume financial transactions), then the S210 is the ticket.
- If the need is for high concurrent throughput (such as for content streaming or Hadoop analytics), then the X410 comes into play.
These new solutions put a lot of new use cases into play for you. I encourage you to check out this Data Lake White Paper and visit the Redefine Possible site for more background on how EMC Isilon can help your customers meet their Big Data needs.