How is tiering different than caching?
Posted by Joel Christner on Tue, Sep 21, 2010 @ 03:40 PM
Customers considering use of a cloud storage gateway or cloud-integrated storage device such as StorSimple may be misled to believe that all devices are the same. Truth be told, there are a number of fundamental differences in the architecture used by these devices that may not be evident upon first glance, but manifest themselves as either undesirable behavior or unwanted complexity in configuration or ongoing management.
One such difference is whether the device performs purely "caching" or true "tiered storage". On the surface, many people think "aren't these the same thing?". While they seem to perform a similar function, they are in fact quite different technologies.
Caching is the function of retaining a copy of information in a smaller yet faster repository than its terminal storage. Caching is everywhere - from your processor to the controller of your traditional storage system - and it's not necessarily a bad thing. However, in the context of cloud storage, it may not be the appropriate architecture.
Caches operate fundamentally in one of two modes of operation: write-through, or write-back.
Write-through operation allows the cache to retain a copy of information, but forces the information to be stored persistently in its terminal storage prior to acknowledging the operation. In the case of a cloud storage gateway, this means that the data must be written to the cloud storage service provider before the IO is considered complete, and that IO suffers the penalty of limited bandwidth, packet loss, latency, and other performance-impacting factors. The benefit of write-through operation is that you are guaranteed that your data is coherent - meaning that the data stored in the cache is the same as the data stored in the cloud.
Write-back operation also allows the cache to retain a copy of information, but also allows the cache to acknowledge the operation, i.e. once the information is written to cache, the operation is acknowledged and written back to the terminal storage (the cloud) at a later point in time. With a write-back cache, the WAN isn't in the path of every host IO, meaning performance doesn't suffer nearly as much. However, you are not guaranteed coherency, meaning you may have a dirty cache, because it contains data that is not in the cloud.
Most cloud storage gateways put the complexity in your lap - which one do you use? Would you prefer to lack on performance or coherency?
A tiered storage architecture is different than caching. With a caching architecture, the cloud storage repository is considered your terminal storage. With a tiered cloud storage architecture, the cloud storage service is nothing more than another layer or tier, and the cloud-integrated storage appliance is your primary storage. Each piece of data is committed to a local tier of storage, and the system itself manages the re-layout of that data according to principles that we all know and love from information lifecycle management or hierarchical storage management. This includes transparently moving data from a faster tier to a slower tier, or vice versa. The beauty of tiering is that you don't compromise on either performance or on coherency, like you would with a caching-centric device. This is the type of architecture provided by StorSimple, and our customers are enjoying the benefits of coherent data and the performance experience of primary and nearline storage for the applications we are focusing on. This is implemented through an algorithm that we've patented called "Weighted Storage Layout" which optimizes storage for the most important data - the working set - utilizing integrated storage tiers within our appliance.