David Isenberg wrote his famous and controversial paper, The Rise of the Stupid Network in 1997. Its a short and historically interesting read. If you have never read it, follow the link there now. It will take you less than 10 minutes. If you want the Cliff notes version, the gist of his paper is copied below:
JUST DELIVER THE BITS, STUPID
A new network "philosophy and architecture," is replacing the vision of an Intelligent Network. The vision is one in which the public communications network would be engineered for "always-on" use, not intermittence and scarcity. It would be engineered for intelligence at the end-user's device, not in the network. And the network would be engineered simply to "Deliver the Bits, Stupid," not for fancy network routing or "smart" number translation.
Fundamentally, it would be a Stupid Network.
I've thought about corollaries in storage for many years. Networks and storage are much different. Storage is much more tightly coupled with data management in a way that networks will never be. Data management takes intelligence to make sure everything gets put in its optimal place where it can be accessed again complying with corporate governance, legal requirements and workers expectations. Networks don't really have these sorts of long-term consequences and so apples to apples comparisons aren't very useful.
But that doesn't mean there wouldn't be ways to eliminate unnecessary aspects of storage and lower costs enormously. As soon as data protection and management could be done without needing specialized storage equipment to do the job, that equipment would be eliminated. Cloud storage changes things radically for the storage industry, especially inventions like StorSimple's cloud-integrated storage (CiS) and a solution like Microsoft's hybrid cloud storage. But StorSimple was a startup and Microsoft isn't a storage company and so it wouldn't start becoming obvious that sweeping changes were underfoot until a major storage vendor came along to make it happen.
That's where EMC's ViPR software comes in. EMC refers to it as software-defined storage, which was predictable, but necessary for them. FWIW, Greg Schulz does a great job going through what was announced on his StorageIO blog.
One of the things ViPR does is provide an out-of-band virtualization layer that Greg's blog describes that opens the door to using less-expensive, stupid storage and protecting the data on it with some other global, intelligent system. This sort of design has never been very successful and it will be interesting to see if EMC can make it work this time.
The aspects of ViPR that are most interesting are its cloud elements - those that are expected initially and those that have been strongly hinted at, including:
- It runs as a VSA (virtual storage appliance), which means it is a storage controller that runs as a virtual machine, including as a virtual machine in the cloud.
- It will include access to object storage as a back end, which is how "real" storage works in the cloud, unlike AWS' EBS
- It can use cloud APIs, which is obviously a cloud-thing
If EMC wants their technology to run on the cloud, and it's clear they do, they needed all three of these things. For instance, consider remote replication to the cloud - how would the data replicated to the cloud be stored in the cloud? To a piece of hardware? No. Using storage network/device commands? No. To what target? The backend to a hypothetical EMC VSA in the cloud uses object storage services and cloud APIs. There is no other way to do it. They could have a VSA that uses iSCSI to a facility like EBS, but that would be like putting the contents of a container ship on rowboats. So, a VSA that accesses object storage services using cloud APIs is the only way. It is a clear signal that ViPR will be their version of CiS. They probably won't call it that, but that's beside the point.
The important thing is what happens to data protection after ViPR is made fully cloud-capable? Once you start using cloud services for data protection, there are a few things that immediately become obvious:
- You don't need separate data protection equipment any more because you are using a cloud service
- You can actually use incremental-forever data protection schemes
- You want to use primary dedupe and compression to reduce the amount of cloud traffic required
- You maintain a hybrid cloud metadata system that identifies all data whether its on premises or in the cloud
Those are all things that hybrid cloud storage from Microsoft does today by the way, but that's beside the point too. What's interesting is what will happen to EMC's sizeable data protection business - how will that be converted to cloud solutions and what value can they add that enhances cloud storage services? The technologies they have available for hybrid cloud data protection are already mostly in place and there will undoubtedly be a transformation for Data Domain products in the years to come, but these are the sorts of things they need to figure out over time.
It's going to be a slow transition for the storage industry, but EMC has done what it usually does - it made the first bold move and is laying the groundwork for what's to come. It will be interesting to watch how the rest of the storage industry responds.
If you are involved with managing your company’s storage infrastructure, you might be tired of hearing about how your company can use IaaS to improve software development. It might sound promising, but as a storage person they won’t help you solve your worst storage problems such as backup and data growth.
It’s probably not clear how enterprise cloud storage, like Windows Azure Storage, with its longer-than-local latencies and less-than-local bandwidth can be used to manage storage. After all, storage management typically involves transferring a lot of data in as short a period of time as possible. It’s clear that if enterprise cloud storage is going to help solve your data center storage problems, a number of things in the equation need to change. But what would those things be?
For starters, there has to be a way to lighten the workload of daily data protection so you are uploading less data. Another necessity is to make cloud storage available to systems and applications in a way that aligns better with its performance characteristics. This means finding ways to integrate enterprise cloud storage as something other than a long-distance storage container on the other side of a “cloud chasm” the way cloud gateway products do. A couple ideas for reducing the volume of daily data uploads are to work only with changed data (also called deltas) and the other is to use data reduction technologies like deduplication and compression. Limiting uploads to deltas can work with backup, but is problematic on the restore side if you have to download hundreds or even thousands of virtual backup tapes to achieve a full restore. Restores are always much more difficult than backups due to the many-to-one relationship of media involved where many tapes are used and far more data is processed than necessary to create a final restored image. Data reduction can certainly help, but these techniques are only effective up to the point where the time needed to upload the reduced data exceeds the backup window. So lightening the workload can generate incremental benefits but it is only effective up to a point.
Sometimes it helps to look at things from the other end of the telescope. So instead of thinking about longer latencies, think about how SSDs are being used in the hybrid storage model (not to be confused with hybrid cloud) where the most active data is stored on SSDs and the rest of the data is stored on rotating disks. Now add enterprise cloud storage to the mix and consider using it for the opposite end of the activity spectrum – storing dormant, unstructured data. Most companies have a large amount of this stuff, filling
up their storage arrays, getting backed up unnecessarily and lengthening recovery times during restores. What would happen if this dormant data were no longer on-premises and didn’t need to be backed up any longer? Offloading dormant data to enterprise cloud storage lightens the backup load and helps you deal with data growth. It’s not enough by itself, but it’s a big step in the right direction.
Another assumption that needs to be challenged is that backup is the only technology that can protect data from a disaster. It’s the best choice we’ve had, but that doesn’t mean something new could be better. For instance, an alternative to backup is snapshot technology, which is widely used to periodically capture deltas and is much faster and easier to use for restoring data. The fatal shortcoming of snapshots has always been that they reside on the array alongside live data - and if the array fails or is destroyed the snapshots will be lost too. For that reason, on-premises snapshots are inadequate for disaster protection.
But what if on-premises storage could take daily snapshots and upload them to enterprise cloud storage and what if those cloud snapshots could be mounted the same as on-array snapshots for restoring data? This certainly satisfies the off-site requirements for disaster recovery protection and is a scenario where uploading deltas every day can be very successful. All that’s needed is a way to know which files would need to be downloaded for a full restore.
This is what Hybrid Cloud Storage from Microsoft is all about. It combines the Cloud-integrated Storage technology that was acquired with StorSimple and combines it with Windows Azure Storage. It puts enterprise cloud storage technology in your data center where it filters dormant data and uploads it to the cloud as well as creating daily snapshots that are also uploaded to the cloud. That’s a whole different approach to managing backup and data growth. The cloud is not a disk drive “over there” somewhere, it is right next to you helping to solve your most vexing storage problems.
You might be thinking “how do I locate data after it has been uploaded to the cloud and how do I mount and restore it?” The answer is metadata, a topic that will be discussed in my next blog post.
(This blog was originally posted on my TechNet blog, Hybrid Cloud Storage. Click here to read it there)
As the saying goes, "when you’re up to your neck in alligators, it’s easy to forget that the initial objective was to drain the swamp."
IT professionals are plenty familiar with compelling interruptions that need to be dealt with quickly, but keep them from getting high-priority work done. That's one of the reasons IT leaders are looking for SAAS solutions - to decrease the potential for technology alligators to delay the projects they are being measured on.
A big advantage of SAAS is offloading the infrastructure needed to run applications in-house. In a traditional data center, application workload changes may require infrastructure changes that have secondary impacts that disrupt productivity on other applications and systems. SAAS circumvents both primary and secondary infrastructure impacts by isolating the application and it's infrastructure on an external site. The SAAS provider keeps customers up to date on the newest capabilities but also manages the bug fixes, workloads and all the infrastructure elements needed. That leaves a lot more time for the IT team to focus on delivering the technology solutions that business leaders want. SAAS is a terrific solution, but there are many applications - especially line of business applications - that are not available or do not otherwise fit the SAAS model. The whole concept of Hybrid IT is based on this reality. SAAS works great for some things but not others.
Hybrid Cloud Storage is similar to SAAS in several ways. It is an infrastructure enabler that transfers time-consuming management tasks, processes and their secondary impacts to the cloud. Like SAAS, there are some applications Hybrid Cloud Storage is not a good solution for, such as low latency transaction processing, but there are many where it works extremely well.
Managing storage capacity growth is a great example of a time consuming storage management process that data center managers know well. As application workloads e storage capacity is consumed, threatening the ability to meet service levels. Storage administration is largely an exercise in planning and implementing the response to this endless cycle. Sometimes it involves upgrading the capacity in arrays and re-balancing workloads, sometimes it involves migrating workloads with virtualization technologies, sometimes it involves acquiring additional arrays and sometimes it involves all of these. My friends at 3PAR used to call this Storage Tetris. The process becomes increasing difficult over time until there are few options left but to acquire additional arrays, along with the associated costs of data center footprint/power and data protection they impose. When you couple this dynamic with the limited life-span of most storage products it's easy to see why storage consumes such a large part of the IT team's attention and budget.
Hybrid Cloud Storage circumvents this capacity-growth cycle by uploading dormant data to the cloud and freeing capacity on-premises for new, active data and workloads. This is a fundamentally different approach than traditional storage where data consumes primary storage capacity regardless of whether it is being used or not. Dormant, unstructured data is very difficult to manage with traditional storage, but is automatically and transparently managed by Hybrid Cloud Storage. At some point the cloud storage containers used with Hybrid Cloud Storage run out of capacity too but they have scaling limits that are many times larger than on-premises arrays. That means capacity management in the cloud is done far less frequently and because it happens in the cloud it does not have secondary impacts on other applications and systems on-premises. Tetris is much easier when there is a lot of space to work with and when you don't have to worry about upsetting other workloads in the mix.
Putting data on Hybrid Cloud Storage also transfers the associated costs of footprint and power to the cloud. For some corporate data centers, this isn't that big a deal, but for others it's critical to get more done with the facility limitations they have. Also, when you consider the additional data protection hardware that is typically needed to backup the data that is stored on new arrays, the ability to move backup data to the cloud is also an important secondary benefit of Hybrid Cloud Storage (as opposed to being a secondary problem).
I will have other blog posts soon that discuss the significant changes that Hybrid Cloud Storage brings to data protection in much greater detail.
This post first appeared on the Hybrid Storage Blog
The excitement around software-defined networking (SDN) this year has had a domino effect on the rest of IT infrastructure industry and spawned many discussions about the future of the industry, including the implications for companies like Cisco and EMC and VMware. A couple days ago, Christos Karamanolis from VMware published a blog post saying he thinks 2013 will be the year for software-defined storage (SDS). That got me thinking.
I don't know about 2013 being the year for SDS, but I suspect 2103 will be the year of SDN and SDS hype and confusion. It's bad enough having one marketing battle royale (SDN) but having two of them at the same time will drive many of us crazy. I shudder to think where the whole thing will stop - SD Zombies?
Here's how I see things shaping up for SDS next year:
- There will be many SDS-hyped products, whether or not those products resemble what you think SDS is
- Other products that seem like they should be SDS, won't be sold as such
- Most SDS products won't have a construct for separating control planes and data planes like SDN products do
- If any do, they will have a difficult time explaining how it works
- Those that don't will have a difficult time explaining what their definition of "software defined" is
- People will become hopelessly mired in differentiating virtual storage from SDS
- Different vendors definitions of "software defined" will contradict each other
- VMware will continue to talk about SD-everything because they spent all that money on Nicira
- People inside and outside Microsoft will be upset with me for writing this post
- I will find myself blogging about how certain Microsoft products are great examples of SDS
This post first appeared on the Hybrid Cloud Storage blog.
This post was initially published on the Hybrid Cloud Storage blog.
On December 5th Microsoft announced a pricing reduction for Windows Azure Storage. One of the more noticeable aspects of the announcement was the breakdown of storage costs between Geo Replicated Storage and Locally Redundant Storage. To summarize, Geo Replicated Storage costs approximately 28% more for the additional service of replicating your data to a remote secondary Azure data center. When you understand the details of how data Azure Storage works it means there are six copies of data stored - three locally and three remotely. This is an example of an extremely robust design where an awful lot has to go wrong to lose data and it is part of the reason why Windows Azure Storage has such an excellent track record.
If you are considering a Hybrid Cloud Storage solution using StorSimple Cloud-integrated Storage (CiS) and Windows Azure Storage, my advice is that you plan to use Geo Replicated Storage. The additional 28% price premium for Geo Replication is a small amount to pay for remote replication with automated failover. If you compare the cost for Azure's Geo Replication with other forms of data replication that conservatively double the cost of storage, it is an incredible bargain.
So here is how the connections and data flows work with a CiS Hybrid Storage Cloud. Thanks to Avkash Chauhan for posting about this previously in his blog -the graphic below came from there.
When data is uploaded by the on-premises CiS solution to Azure Storage, three copies of the data are written to separate domains within the primary data center and an acknowledgement is sent to the CiS on-premises. Some time afterwards, which can be several minutes later, the data is replicated to the secondary data center and another three copies are written to three different domains there. This is done transparently in the background, without involving the CiS system in any way.
With CiS-powered Hybrid Cloud Storage, uploads to Windows Azure Storage occur when nightly CiS Cloud Snapshots are taken but they also happen when inactive data it is tiered to Azure Storage. Under normal conditions, the amount of traffic between CIS on premises and Azure Storage is negligible. Exceptions to that occur during the initial Cloud Snapshot for a volume when the entire volume's data is snapped or during DR scenarios when a lot of data may need to be downloaded from Azure Storage to CiS. If you are concerned about the amount of bandwidth that might be consumed by Hybrid Cloud Storage traffic, CiS provides scheduled bandwidth throttling. Many of our customers use it to assure they have all the bandwidth they need for other production applications. Geo Replication between Azure data centers does not consume bandwidth between the customer site and the primary Azure data center, so there is no to avoid Geo Replication in order to conserve bandwidth.
When you think about the economics of cloud storage, make sure to include the incredible value of Geo Replication.
The term "hybrid cloud" has been defined many different ways. At Microsoft hybrid cloud refers to data center functionality that spans on-premises and cloud service boundaries. At least that's how I'm understanding it now after having been part of the company for a few weeks. To clarify my perspective, my appreciation of cloud is slotted narrowly into IAAS functionality and the things that are likely to appeal to data center types. In this context hybrid cloud services will augment the things that customers are already doing on-premises with the cloud offloading tasks and workloads that are under-served on-premises. Where data center operations are concerned, the cloud represents a new kind of enterprise plug-in. If you think this sounds like poppycock, keep reading because I'll tell you how it is already being used this way every day by a growing number of companies.
One of the misunderstandings people have about enterprise cloud storage is that it must be similar to consumer file sharing apps like Dropbox, Box or Microsoft's own SkyDrive. To begin with, much of enterprise storage works on block processes and if you are going to offload enterprise storage you need to provide block-level functionality. As for file sharing, data center managers are not looking to share corporate data as much as they are to secure it. BTW, I fully expect to get comments here about the great virtues of file sharing for enterprises. Rest assured, there are probably few companies who use the cloud for file sharing as much as Microsoft does internally with SkyDrive and SharePoint, but that's not what I'm discussing here in this post.
StorSimple developed technology called Cloud-integrated Storage (CiS) that is implemented as a SAN appliance that acts like a hybrid cloud storage plug-in for enterprise storage. CiS packages and indexes blocks along with accompanying metadata and stores them in the cloud. These block packages may be generated by snapshots or as archives that need to be stored for an extended period of time or as dormant unstructured data that is no longer being accessed and can be vacated to reclaim on-premises storage capacity. Different customers use this technology every day because their backup systems are under-serving them, their archiving processes are too cumbersome and they don't want to use tier 1 storage for data that is no longer active. The thing that is a little bit hard for some to understand about CiS is that the data transfers to the cloud are all automated, requiring no effort on the part of system and storage administrators.
The other key to understanding the plug-in nature of CiS is that the ability to access and download data from cloud storage is also transparent because data in the cloud is either viewable in an online file system or mountable as a snapshot the same way local snapshots are mounted for restoring older versions of files. I'll explain how that all works in future blog posts, but for now I'll say it's a function of the metadata system in CiS.
Cloud-integrated Storage really is different. It breaks the mold for enterprise storage by seamlessly integrating on-premises enterprise storage with Windows Azure Storage services and the incredibly valuable Geo Redundant Storage it provides. CiS doesn't do everything you might want it to, but the things it does well are revolutionary.
FYI: This blog post is a shortened version of a white paper that was published by StorSimple. Click (opens in a new window)
Note: This blog post first appeared on the Microsoft TechNet blog, Hybrid Cloud Storage
Our relationships define our trajectory. Often its luck, but we place odds on investing more heavily in the relationships that inspire us, and from there synergistic things happen that make overcoming obstacles much easier. StorSimple is a company that is fortunate to have found its way into many good relationships, and we have benefitted a great deal from them in a relatively short time.
In October, we shared the news that Microsoft had reached an agreement to acquire StorSimple. Today, the team is incredibly excited, and we are very pleased to update our customers and partners that Microsoft has completed the acquisition.
The synergy of solving problems with customers and cloud providers is as thrilling today as we hoped it would be when the team first started working on StorSimple. As we move forward with Microsoft, we will expand our development and go-to-market teams with the goal of seeing cloud-integrated storage (CiS) become a best practice for enterprise data centers across the globe. We look forward to building up the relationships we have and making new ones going forward.
StorSimple has been on the leading edge of enabling hybrid clouds that empower enterprise data and applications. We now see a way to continue that success and deliver it as a building block for cloud-integrated IT architectures. With the help of our team members, the extended Microsoft family and our customers, we hope to deliver on that vision - over and over again in the years to come.
StorSimple has defined the frontier on how cloud can empower enterprise IT. We have done so by pioneering the integration of on-premises and cloud storage in a single solution that delivers the functions of primary storage, backup, archival and disaster recovery with automated data management. We are excited at the opportunity to continue our work with Microsoft to bring this innovation to their Cloud OS vision.
Looking back over the last three and half years, after long nights and days of innovation, and early customer deployments followed by mainstream enterprise adoption, it became clear that the combination of StorSimple and Windows Azure delivers a best in class cloud-integrated storage solution that enables customers to use public cloud services as a seamless extension of their IT infrastructure. The result is that a wide range of customers from non-profit organizations to Fortune 500 companies and government agencies have adopted this solution at an amazing rate.
On behalf of the StorSimple team and our investors (Ignition Partners, Index Ventures, Mayfield Fund and Redpoint Ventures), we are thrilled about the scale and business agility that we’ll now be able to deliver for our customers and partners.
To learn more about this announcement, please visit the Microsoft press release and blog.
Considering the budget resources data storage takes in most companies, it’s not surprising that IT executives are wondering if the cloud can help them manage their storage budgets more effectively. The assumption is that enterprise cloud storage can provide pay-as-you-grow economics for storage - something that has never existed - but first there needs to be an enabling technology that that works reliably and can do the job.
StorSimple's Cloud-integrated Storage (CiS) has been helping customers do this for over a year now and proven it's mettle working in the data centers of large enterprises. CiS on-premises SAN storage exports iSCSI LUNs to servers and connects on the back, or device, side to enterprise cloud storage for storing snapshot, backup, archive and unstructured, dormant data. CiS is a hybrid SAN array having flash SSDs and hard disk layers but it adds a third slower, higher-latency enterprise cloud storage storage layer. It’s 3-tier design is perfectly matched for managing the masses of unstructured data that IT workers wrestle with.
Metadata is packaged with application data in a CiS system, forming data objects that have embedded location and usage information. As data is written by applications, it's put in the top SSD tier and as it ages and it's access frequency drops, it gets moved to the hard disk tier. As it further ages and its access frequency goes dormant, the data becomes a candidate for storing in cloud online storage. When the CiS system meets pre-determined capacity levels, the system initiates cloud transfers to put inactive data in enterprise cloud storage. The on-premises capacity that was used by the data is returned to the system and is available to be used again.
Recapturing on-premises storage capacity by using the cloud is one of the Holy Grails of enterprise storage management because it also moves capital budget funds to the operating budget. Capacity upgrades with enterprise cloud storage are small incremental purchases, as compared to the capital-stretching upgrades with traditional enterprsie arrays where customers have to buy capacity in advance.
Both SAN and NAS storage capacity can be shifted to CiS and enterprise cloud storage. With NAS, customers move their existing NAS file shares or volumes to a file server connected to a CiS system. As the CiS system fills, unstructured, dormant data on it is tiered to the cloud. With SAN, administrators can use Storage vMotion or manual means to move volumes to the CiS system. The figure below shows this process.
A CiS system offloads inactive data and tiers it to the cloud
Managing storage capacity by moving data to CiS and enterprise cloud storage helps IT managers avoid volume-full emergencies and VM sprawl scenarios. Instead of being crisis-driven, they have a systematic method for managing storage capacity. CiS and enterprise cloud storage give storage administrators the capacity headroom they've always wanted.
The capacity efficiencies of CiS begin on-premises by de-duplicating and compressing data in the system and these efficiences also carry over to cloud storage. Data reduction effectiveness depend on the data, but customers using CiS in virtual server environments can expect reduction ratios from 3x to 5x. Notice that this ratio is for primary storage, not backups.
The bottom line is that StorSimple CiS customers are able to avoid buying more expensive storage capacity than they would have otherwise. Not only that, but because data protection through CloudSnap™ is integrated with CiS, they also don't buy additional backup hardware and software to cover the added capacity. It's a big win for many of them.
Cloud storage is clearly seeing tremendous growth and adoption across all segments of business and government customers – from mid-market companies to large enterprises and state/federal government organizations.
A key enabler to enterprise adoption of cloud-based storage services is the emergence of premise-based storage systems that integrate cloud storage with existing applications.
Across these cloud-enabling storage systems, there are some capabilities that are similar – such as the translation of cloud storage APIs like SOAP or REST to block-based storage protocols such as iSCSI, as well as de-duplication and compression for performance and capacity optimization.
But there *are* core differences between products that are merely “gateways” and true enterprise storage that is fully integrated with the cloud (an ESG report and Taneja Group report calls them “cloud-integrated storage”).
What are those key differences? I’d put them into 3 major categories, and why each matters to customers:
1) Primary enterprise storage vs. just cloud proxy for backup/archive data
Example: StorSimple solutions provide full primary storage capabilities – up to 100TB of on-premise storage capacity with auto-tiering to SSDs, SAS + cloud, etc. – to enable primary storage for enterprise applications
Why Matters: you can converge your on-premise primary storage + backup/archive infrastructure with the cloud, saving 60-80% overall TCO – not just port data to the cloud for backup/archive, and limited savings
2) Integrated data lifecycle management with the cloud vs. simple proxy of data to the cloud
Example: StorSimple uses application-consistent Cloud Snapshots to provide snapshots locally and in cloud for backup, archive and DR – all without requiring 3rd party backup software
Why Matters: you can eliminate your backup software and support costs; gateways still require you to purchase backup software + support + licenses
3) Disaster recovery and business continuity – cloud-integrated storage enables premise-based applications to directly mount cloud volumes and access needed blocks directly
Example: StorSimple solutions can mount their Cloud Snapshots in the cloud and enable premise applications to access only their needed objects in minutes or hours, vs. cloud gateways which require download of the full cloud volume, which can takes days/weeks to complete
Why Matters: pretty obvious – RTO is radically improved, as is business continuity…
In short, cloud storage is here to stay, and cloud-enabling storage systems will only help to accelerate that adoption. But storage teams need to dive deeper into products and architectures to understand the full spectrum of benefits – and savings – they can get from leveraging the full pie of cloud services + cloud-integrated enterprise storage vs. a single slice of a cloud gateway.