<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
    <channel>
        <title>Permabit Technology Corporation</title>
        <link>http://permabit.dciginc.com/</link>
        <description>Permabit Enterprise Archive is the only enterprise-class, disk-based storage system to archive petabytes of information at a fraction of the cost of tape.  The system combines space saving compression and deduplication with multi-petabyte scalability to provide Scalable Data Reduction&#153; (SDR)</description>
        <language>en</language>
        <copyright>Copyright 2008</copyright>
        <lastBuildDate>Wed, 13 Aug 2008 05:00:00 -0600</lastBuildDate>
        <generator>http://www.sixapart.com/movabletype/</generator>
        <docs>http://www.rssboard.org/rss-specification</docs>
        
        <item>
    	    <author>
	        <name>Jerome M. Wendt</name>
        	<uri>http://www.dciginc.com/about/jeromemwendt</uri>
	    </author>
            <title>Is It Time to Swap Corporate File Servers for Disk-Based Archiving Systems?</title>
            <description><![CDATA[<p>A frequently reiterated statistic is the high percentage of infrequently accessed or static data that resides on production storage systems - <a href="http://www.permabit.com/why/why-permabit.asp"><b><u><font color="#6699cc">up to 80%</b></u></font></a> according to some estimates. In fact, a <a href="http://www.usenix.org/events/usenix08/tech/full_papers/leung/leung_html/index.html"><b><u><font color="#6699cc">recent joint study</b></u></font></a> conducted over 3 months by researchers from NetApp and the University of California and presented at <a href="http://www.usenix.org/events/usenix08/tech/"><b><u><font color="#6699cc">USENIX 2008</b></u></font></a> found that over 90% of the 22 TB of data stored on two enterprise file servers was rarely accessed after it was stored. Specifically, 66% of the files were re-opened only once and 95% were re-opened fewer than five times. </p>
<p>This infrequent access of production data stored on these corporate file servers raises a serious question about the type of storage systems that companies should select to host their file server data going forward. While using enterprise file servers with high performance disk drives certainly makes sense for hosting data that is frequently accessed, this report can lead one to extrapolate that the vast majority of data now found on corporate file servers does not fall into this category.</p>
<p>This study raises some serious questions about the need for high performance storage systems to handle many of the day-to-day file services needs for most organizations, specifically for applications such as mail servers and home directories. Instead alternative storage systems such as the <a href="http://www.permabit.com/">Permabit</a> <a href="http://www.permabit.com/products/data-center-series.asp">Enterprise Archive</a> that present a file system interface and perform archiving functions may be better fits in these environments.</p>
<p>I am not suggesting that companies should take such a leap without first doing some research or that all of corporate data should be moved onto these systems. But if the 90%+ infrequently accessed statistic holds true across multiple companies, not just the file server referenced in this study, it merits companies taking a serious look at the activity on their file server to determine if certain disk-based archiving systems are a more appropriate option for use as a primary file serving target for certain applications.</p>
<p>Mail servers and home directories, as shown in <a href="http://www.usenix.org/events/usenix08/tech/full_papers/leung/leung_html/index.html">Table 4</a> of this study, are specific candidates that companies may consider moving from current corporate file servers to disk-based archiving storage systems. Practical reasons why the Permabit Enterprise Archive storage system would support the workloads of these specific corporate functions include:</p>
<ul>
<li>It presents CIFS, NFS and WebDAV <a href="http://www.permabit.com/faq/faq.asp">interfaces</a> to corporate servers</li>
<li>It can store data as a regular file (does not need to be a WORM/archived format)</li>
<li>The system is managed through a web browser</li>
<li>It supports multiple concurrent connections from multiple clients</li>
<li>It offers read, write and read/write I/O throughput ratios are in-line with the results published in this study </li></ul>
<p>It is maybe more for this last reason than any other that companies have largely stayed with high performance file servers and avoided competitive solutions. Companies lack knowledge about the performance characteristics of these workloads on their corporate file servers. As a result, they do not move these directories to alternative storage systems for fear of enraging their user base. This study serves to point out that the performance of mail server and home directory workloads is not so egregious that they cannot consider using a reasonably high performance disk-based archiving system in this role. As brought out in the first paragraph of Section 4.7 of the study, most files (76.1%) are only opened by one client and 92.7% of files are only ever opened by two or fewer clients. This illustrates that the behavior of these files is more in-line with the behavior of files stored archiving systems, not corporate file servers, anyway.</p>
<p>One can not too hastily draw too many conclusions from a study that only looked at two enterprise file servers in one company. However it should give companies pause going forward. File server performance and the ability of file servers to fit seamlessly into network LANs are major corporate concerns. But if the majority of the files stored on these file servers is accessed less than five times with most files only accessed by one client after it is stored, it begins to beg the question, "Is it time to use disk-based archiving storage systems as primary file servers for a majority of the company's files since that is essentially the role that current corporate file servers are serving anyway?"</p>]]></description>
            <link>http://permabit.dciginc.com/2008/08/is-it-time-to-swap-corporate-f.html</link>
            <guid>http://permabit.dciginc.com/2008/08/is-it-time-to-swap-corporate-f.html</guid>
            
            
                <category domain="http://www.sixapart.com/ns/types#tag">Archiving</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Networked Storage</category>
            
            <pubDate>Wed, 13 Aug 2008 05:00:00 -0600</pubDate>
        </item>
        
        <item>
    	    <author>
	        <name>Jerome M. Wendt</name>
        	<uri>http://www.dciginc.com/about/jeromemwendt</uri>
	    </author>
            <title>Is Your Vendor Really Giving You What You Need in WORM Storage?</title>
            <description><![CDATA[<p>WORM (Write Once Read Many) technology is often viewed by users as a ubiquitous technology. Though WORM is available on many types of storage systems today (whether they use disk, tape or optical), a company may fail to fully recognize or comprehend that about the only aspect of WORM that these storage system vendors agree upon is the words that comprise the acronym WORM. Beyond that, how WORM is implemented and managed long term on each storage system can vary significantly. As a result, a company can not and should not assume that a specific vendor's implementation of WORM will meet all of a company's application needs.</p>
<p>A typical motivation for implementing WORM in the first place is usually satisfying some legal or statutory requirement. Court systems and/or prosecuting attorneys want proof that electronically stored corporate data requested as part of legal e-discovery request or hold was kept in an unaltered, original state. Defense attorneys also need similar levels of assurance that the documents they produce to meet these legal e-discovery requests or holds are not compromised in any way lest some questions arise later on about the authenticity of the data that the company produces. </p>
<p>To address these types of scenarios, a company will procure a storage system that supports WORM on the presumption that it can satisfy all of these legal requirements. However this storage system then tends to become the logical choice to serve as the target for all of a company's archiving requirements - compliance-related or otherwise. Since WORM-based disk storage systems generally have a low cost per GB (assuming they use SATA disk drives), it also makes financial sense to move other infrequently accessed data from production storage systems to the WORM-based disk storage system. </p>
<p>While this lowers storage costs, an aspect that companies may overlook is how does the WORM-based storage system ingest the data and manage it once it is stored on the system? This is where WORM-based storage systems diverge. Some like the <a href="http://www.emc.com/products/family/emc-centera-family.htm">EMC Centera</a> require that applications use proprietary APIs to access and store data on it. Data stored on it may meet specific compliance requirements but the system is not easily accessible by other applications and retrieving data from it has become notoriously problematic. Other systems are designed to meet internal corporate retention requirements but <b><i>do not</b></i> store data in a state such that it satisfies rules like <a href="http://www.sec.gov/rules/interp/34-47806.htm">SEC 17a-4</a>. As a final alternative, some few WORM-based storage systems provide a company the flexibility to set policies so it can store data either way. </p>
<p>The inherent advantages of a policy-based WORM storage system like Permabit's <a href="http://www.permabit.com/products/data-center-series.asp">Enterprise Archive</a> merit a company putting this type of storage system at the top of its list. Policy-based WORM storage systems meet whatever type of data retention requirements that a company may have, whether that is "Compliance WORM" (not even an administrator can delete a volume), "Enterprise WORM" (an administrator can delete all files on a volume except for those files under retention within the volume), or even turning WORM "off" on that volume so the storage system is just used as a target for archived files. In these circumstances, a company can use a single storage system for any and all of its archiving needs by setting WORM policies on volumes according to individual application requirements.</p>
<p>However, maybe the largest benefit that policy-based WORM storage systems provides is that companies often tend to first bring in a system like this to solve a specific need in their company. Only after that need is met do companies take a step back and begin to understand all of the ways they can potentially use it for other applications. Often companies don't know what data they have or how archiving can potentially benefit them. Policy-based WORM storage systems like the Permabit Enterprise Archive can first appear as a file server to these applications so companies can first store data as a normal file (read/write mode) and then change it to a WORM format later on as their needs change or the policies and/or regulatory criteria change.</p>
<p>WORM is not the ubiquitous feature on storage systems that companies may believe it is. There are subtle but important differences in how each storage system vendor implements WORM which ultimately impacts the integrity of the data stored on the storage system and what other ways the company can internally use that storage system for their application requirements. Companies that have these more diversified archiving needs (and what company doesn't?) need to look beyond whether or not a storage system simply supports WORM and works with an application. Rather companies need to verify what options the storage system provides to archive corporate data (WORM or otherwise) and if it provides sufficient WORM options to meet all of a company's application archiving needs.</p>]]></description>
            <link>http://permabit.dciginc.com/2008/07/is-your-vendor-really-giving-y.html</link>
            <guid>http://permabit.dciginc.com/2008/07/is-your-vendor-really-giving-y.html</guid>
            
            
                <category domain="http://www.sixapart.com/ns/types#tag">Archiving</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Data Retention</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Governance Risk and Compliance</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Legal Hold</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Litigation Readiness</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Networked Storage</category>
            
            <pubDate>Thu, 31 Jul 2008 05:00:00 -0600</pubDate>
        </item>
        
        <item>
    	    <author>
	        <name>Jerome M Wendt and Kelly Polanski</name>
        	<uri>http://www.dciginc.com</uri>
	    </author>
            <title>Permabit&apos;s RAIN-EC Architecture Ready for Petascale Archived Data Stores</title>
            <description><![CDATA[<font size="3">
<p>As previously discussed in a <a href="http://permabit.dciginc.com/2008/05/archiving-deduplication-and-sa.html">DCIG blog entry</a> on the real impact of losing 1 bit in 100 trillion, non-recoverable bit errors on SATA disk drives have the potential to become a too frequently occurring problem as organizations dramatically increase the scale of their disk stores. Imagine what will happen when data stores expand to petabyte sizes with more frequent access?</p>
<p>Right now commercial data stores are on track to achieve the petascale range sooner rather than later. According to multiple sources, data collected and stored is doubling every year for most businesses; a rate of growth that has held fairly constant over time. In the 1990s, a 100 GB database was large enough to stress most systems - back when disk scanning speeds were 30 MB/s and database tools were relatively immature. In the current decade, terascale data stores are already common - and managing 100 GB is now considered somewhat trivial. In the coming decade, truly massive petascale systems can be expected to dwarf today's large multi-terabyte stores - requiring a similar leap in the technology being used to store and retrieve the data.</p>
<p>The gap between a TB of data and a PB of data is significant. This amplifies technology "glitches" which may be considered trivial in terascale stores but which will become intolerable at petascale levels. We can catch glimpses of the problems presented by examining how the scientific community has handled petascale datasets, such as in the report </font><a href="http://www-db.cs.wisc.edu/cidr/cidr2005/papers/P06.pdf"><u><font color="#0000ff" size="3">"<i>Lessons Learned from Managing a Petabyte</i>"</u></font></a><font size="3"> by scientists from the Stanford Linear Accelerator Center. Microsoft researchers have also looked at this problem working with scientists at </font><a href="http://arxiv.org/ftp/cs/papers/0208/0208013.pdf"><u><font color="#0000ff" size="3">Johns Hopkins University</u></font></a><font size="3">, storing images from the <a href="http://www.sdss.org/">Sloan Digital Sky Survey</a>. </p>
<p>What these sources report in common are challenges not only with moving the data onto storage fast enough, but also with numerous attendant processes, including: </p>
<ul>
<li>Finding and retrieving individual objects from within the massive number of objects stored - cited in one case as requiring three years for a single node to scan the objects held in a single PB of storage using 2005 technology</li>
<li>Ensuring sufficient parallel processing access to enable hundreds of people and applications to have simultaneous access to the data store</li>
<li>Ensuring availability of data with sufficient resiliency and redundancy</li>
<li>In short, every step from data collection to data mining and distribution is made more difficult by the scale of the data</li></ul>
<p>All of which makes Permabit's products, designed to solve this entire range of problems, all the more impressive. <a href="http://www.permabit.com/">Permabit</a> addresses the problems of managing petascale disk stores effectively with its advanced <a href="http://www.permabit.com/products/rain-ec.asp">RAIN-EC</a> methodology, and does so using inexpensive hardware components. RAIN-EC was introduced as part of the Permabit <a href="http://www.permabit.com/products/data-center-series.asp">Enterprise Archive Data Center</a> Series, launched in early 2008. It builds on the RAIN grid architecture by providing a methodology for efficiently and effectively protecting data without paying large performance and scalability penalties while maintaining brief times (just a few milliseconds) to seek and find an object from within a massive store. </p>
<p>Permabit clearly understands the threat of potential data loss that using petascale archive systems presents to companies using current RAID and SATA drive technologies. Through its RAIN-EC technology and grid storage architecture found in its Enterprise Archive Data Center Series product, Permabit offers a viable means to overcome this threat and make it possible for companies to plan for the coming decade of petascale archive data stores. Permabit Enterprise Archive more effectively protects archived data stores than any currently available technology and provides companies the secure, scalable foundation that they need today to meet the needs of their enterprise archives tomorrow.</p></font>]]></description>
            <link>http://permabit.dciginc.com/2008/07/permabits-rainec-architecture.html</link>
            <guid>http://permabit.dciginc.com/2008/07/permabits-rainec-architecture.html</guid>
            
            
                <category domain="http://www.sixapart.com/ns/types#tag">Archiving</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Data Retention</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Grid Storage</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Networked Storage</category>
            
            <pubDate>Tue, 15 Jul 2008 05:00:00 -0600</pubDate>
        </item>
        
        <item>
    	    <author>
	        <name>Jerome M. Wendt</name>
        	<uri>http://www.dciginc.com/about/jeromemwendt</uri>
	    </author>
            <title>Archived and Backup Data Stores: Why They Aren&apos;t Mixed</title>
            <description><![CDATA[<p>Last week's <a href="http://blocksandfiles.com/article/5669"><b><u><font color="#6699cc">announcement</b></u></font></a> that yet another vendor has made adaptations to its deduplicating system to support archive retention as a new system feature can mislead companies into drawing the conclusion that if this appliance works for backup that it is suitable for archiving as well. Many companies are frugal when it comes to storage purchases so if they can buy a disk-based appliance that addresses both their archiving and backup needs, they may be tempted to do so. </p>
<p>However, it is easy to forget that the data management requirements for archived and backup data are very different. Archived and backup data sets may share some common traits, such as retaining data for long periods of time, storing data that is very similar and storing lots of it. However this can overshadow substantial differences that exist between disk-based products and their intended purposes. The differences that exist are not always immediately apparent, especially in circumstances where the integrity of the data is concerned. Here are some key features that appliances specifically designed for archiving need to provide:</p><b>
<ul>
<li><strong>Preserve Data Immutability.</strong> </b>Immutability is a feature that is easy to overlook when evaluating appliances that do both archiving and backup. Data is only stored in an immutable state when the state of the data cannot be modified after it is created. Data immutability comes into play if and when companies need to provide guarantees to outside regulatory agencies that its archived data meets their specific regulatory and compliance requirements. SEC <a href="http://www.sec.gov/rules/final/34-44992.htm">Rule 17a-4</a> is a good example. If the data is not archived in an immutable state, guess what? You are not in compliance. If you carefully read through press releases and information publicly available from disk-based deduplicating appliances intended for both archiving and backup, you find little or no mention of their ability to deliver on SEC Rule 17a-4 functionality.</li>
<li><strong>Flexibility.</strong> Part of the challenge with implementing a disk-based archive solution today is trying to fully understand what a company's corporate policy may be or what regulatory requirements it may need to adhere today or in the future. This makes deciding what feature set is critical in an archive a difficult task. With built-in <a href="http://www.permabit.com/products/permabit-worm-technology.asp">WORM</a> capability that comes standard in the Permabit <a href="http://www.permabit.com/products/data-center-series.asp">Enterprise Archive</a>, that decision does not need to be made up front. Volumes can be created as read/write and converted to WORM at any point in the future without requiring any separate or different software or hardware. This allows you to deploy archiving sooner rather than later to begin protecting your assets immediately.</li><b>
<li><strong>No Administrative Privileges to Make Changes.</strong> </b>Disk-based deduplicating appliances intended for backup give administrators more authority to make changes and modify the data that is stored on these systems, as companies need that flexibility when storing backup data sets. But that same administrative flexibility that is needed to manage backup data becomes a risk when managing archived data that may need to remain immutable. Archiving systems include options such that once data is retained to meet certain corporate compliance requirements even administrators cannot make changes to the data.</li>
<li><strong>Scalability.</strong> Once a commitment is made to archive data to a disk-based repository, plans need to be made about how that repository can dynamically grow to handle any future data growth. Many backup appliances simply fill up and then you have start over with a new device. The Permabit Enterprise Archive can scale to 3 petabytes of raw storage due to its flexible grid storage architecture and can continue to grow without requiring data migrations or forklift upgrades.</li></ul><b></b>
<p>Although the temptation to put all of your archiving and backup data into one data bucket is strong, especially when budgets are tight, don't assume this is the right answer for you. Systems designed to retain archived data address very different internal corporate processes and company requirements. These systems are concerned about satisfying regulatory requirements - internal or external - and provide guarantees to concerned parties that archived organizational data is stored in an immutable state. In these circumstances, creating separate archived and backup data stores is both a technical and business necessity.</p>]]></description>
            <link>http://permabit.dciginc.com/2008/07/archived-and-backup-data-store.html</link>
            <guid>http://permabit.dciginc.com/2008/07/archived-and-backup-data-store.html</guid>
            
            
                <category domain="http://www.sixapart.com/ns/types#tag">Archiving</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Electronic Discovery</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Governance Risk and Compliance</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Legal Hold</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Litigation Readiness</category>
            
            <pubDate>Thu, 03 Jul 2008 13:25:00 -0600</pubDate>
        </item>
        
        <item>
    	    <author>
	        <name>Jerome M. Wendt</name>
        	<uri>http://www.dciginc.com/about/jeromemwendt</uri>
	    </author>
            <title>The Tipping Point for Managing Archived Data is Here</title>
            <description><![CDATA[<p>One would think that at some point organizations would reach the tipping point for storage consumption and that year-over-year storage capacity growth rates of 30%, 50%, 100% or more would come to an end, or at least slow down. If so, it hasn't occurred yet and, if anything, it shows every sign of continuing for the foreseeable future. Nowhere is this more evident than with the amount of data that companies need to archive and retain.</p>
<p>Even as the amount of production data that companies generate continues to grow, the amount of data that they need to archive and retain is growing at an even faster clip driven by new regulatory and corporate compliance requirements. As enforcement of these laws starts to occur, the question becomes, "How do companies concurrently adapt to the challenges of information growth and regulatory compliance while keeping infrastructure and management costs down and data online, available and protected?"</p>
<p>In <a href="http://permabit.dciginc.com/2008/05/archiving-deduplication-and-sa.html">previous blog entries</a> I examined how some RAID-based systems have potential pitfalls when protecting tens and hundreds of TBs of data - especially when it comes to deduplicated, archived data. However of equal concern is how do you scale many of these RAID-based storage systems over time while controlling management costs, minimizing risk, preserving the integrity of the data (critical when data is archived!) and still taking advantage of faster, lower cost technologies as they become available? The answer is that often you can not.</p>
<p>In fact, something almost paradoxical is occurring in data storage management as it relates to archiving. The need to archive data is often difficult to determine while the underlying storage on which it resides continues to drop in cost. This puts companies in a precarious situation - they can't afford to summarily discard corporate data but neither can they afford to dedicate additional IT staff and resources to manage it either.</p>
<p>It is this void that products like <a href="http://www.permabit.com/">Permabit</a>'s <a href="http://www.permabit.com/products/enterprise-archive.asp">Enterprise Archive</a> are specifically designed to fill. In upcoming blog entries, I will expand upon each of these points but here are some key business benefits that Enterprise Archive provides to answer specific corporate concerns around the management and protection of their archived data:</p><b>
<ul>
<li><strong>Keeps Management Focused on the Business, not the Technology</strong></b><strong>. </strong>Enterprise Archive uses an underlying storage grid architecture that can independently scale performance or capacity using off-the-shelf servers and storage. This enables companies to economically follow technology curves (increases in performance and storage capacity) while minimizing time spent administering the solution. Data migrations and technology refreshes that are normally distractions and expensive to the business are handled automatically and seamlessly as part of Permabit's Enterprise Archive storage grid architecture.</li><b>
<li><strong>Eliminates Questions about Long Term Data Protection and Growth.</strong> </b>Permabit Enterprise Archive uses its <a href="http://www.permabit.com/products/rain-ec.asp">RAIN-EC</a> and <a href="http://www.permabit.com/products/sdr.asp">SDR</a> (Scalable Data Reduction) technologies that are critical to protecting corporate data, preserving its integrity and minimizing storage consumption. RAIN-EC addresses the specific data protection shortcomings of RAID-based storage system that become exposed as companies scale these systems into tens and hundreds of TBs while also ensuring that the data remains accessible into the future. SDR deduplicates like data chunks and incorporates compression to further help control data growth and reduce costs.</li><b>
<li><strong>Uses Standard Network Sharing Protocols.</strong> </b>Nothing is more annoying than to buy a solution and find out it only works with one software product; or that to make it work with other software products, you need to purchase additional software licenses or make programmatic changes to the software. Permabit's Enterprise Archive appears as a network filer (<a href="http://www.webopedia.com/TERM/C/CIFS.htm">CIFS</a>, <a href="http://www.webopedia.com/TERM/N/NFS.html">NFS</a> or <a href="http://www.webopedia.com/TERM/W/WebDAV.html">WebDAV</a> interfaces) to your archiving applications. This simplifies setup and maintenance while facilitating the introduction of Enterprise Archive into existing corporate network infrastructures.</li></ul>
<p>While there is no end to storage growth in sight, the tipping point for economically storing archived data and managing it long term is already upon increasing number of enterprises. It is these companies that need to look to new, more innovative designs such as Permabit's Enterprise Archive that confront this paradox of rapidly growing archived data stores while also keeping storage capacity and IT management costs under control head-on. In my next blog entry, I'll delve more deeply into Enterprise Archive's storage grid architecture and how it uses RAIN-EC to address these technical concerns.</p>]]></description>
            <link>http://permabit.dciginc.com/2008/06/the-tipping-point-for-managing.html</link>
            <guid>http://permabit.dciginc.com/2008/06/the-tipping-point-for-managing.html</guid>
            
            
                <category domain="http://www.sixapart.com/ns/types#tag">Archiving</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Data Reduction</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Data Retention</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Deduplication</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Networked Storage</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Storage Systems</category>
            
            <pubDate>Tue, 24 Jun 2008 11:00:00 -0600</pubDate>
        </item>
        
        <item>
    	    <author>
	        <name>Jerome M. Wendt</name>
        	<uri>http://www.dciginc.com/about/jeromemwendt</uri>
	    </author>
            <title>SATA Bit Error Rate Column Puts Some RAID-Based Storage Systems Vendors on the Defensive</title>
            <description><![CDATA[<p>The <a href="http://www.computerworld.com/">Computerworld</a> column I wrote a few weeks ago on the topic of "<a href="http://www.computerworld.com/action/article.do?command=viewArticleBasic&amp;taxonomyName=storage&amp;articleId=9090838&amp;taxonomyId=19&amp;intsrc=kc_feat">A Bit of a Flaw with SATA disk drives</a>" sparked quite a bit of debate around just how safe is data on today's RAID-based storage systems that use SATA disk drives? A series of <a href="http://www.computerworld.com/comments/node/9090838">comments</a> appeared on Computerworld's site where the column appeared as well as on a <a href="http://www.nabble.com/Flaw-with-SATA-disks---not-suitable-for-deduplication-environment-td17667420.html">forum</a> at <a href="http://www.nabble.com/">Nabble</a>'s web site. Also, at least one storage system vendor felt obligated to send me their <a href="http://www.datadomain.com/resources/701300000008ifK_lp.html">white paper</a> that explains how its RAID-based storage system accounts for this bit error rate problem on SATA disk drives.</p>
<p>Now before everyone starts worrying that data on their current storage systems is in eminent danger, relax. I have managed hundreds of TBs and about three dozen storage systems from multiple different storage vendors in enterprise environments. In that time and since, I have yet to ever experience or talk to anyone who has yet been impacted by the type of bit error on SATA drives that I described in the aforementioned column (though I suspect someone out there has been adversely affected). My personal experience, anecdotal evidence and general user feedback indicates to me that companies do not need to unnecessarily worry about the integrity of their data. But as data stores grow, companies need to be aware of and educate themselves about this important topic.</p>
<p>I believe it is fair to say that storage system providers themselves do not always know for sure just how reliable their RAID configurations really are in <i>your</i> environment. Maybe they do in their test labs and in their data centers but beyond that, they are just making educated guesses about how well it will perform based on ideal conditions. However in my many years as an end-user and analyst, I have yet to encounter very many companies who run their data centers under ideal conditions. One only needs to look at one of the <a href="http://www.dciginc.com/2008/06/datacenter-management-101-part-i-cable-manage.html"><u><font color="#0000ff">pictures</u></font></a> posted on&nbsp;DCIG Analyst Tim Anderson's blog entry to understand that.</p>
<p>Another case in point - remember for how many years manufactures and disk drive vendors told us that SATA disk drives only failed once every million hours (or about once every 114 years)? Then all of a sudden a <a href="http://www.cs.cmu.edu/~bianca/fast/index.html">study</a> comes out of <a href="http://www.cmu.edu/index.shtml">Carnegie Mellon</a>'s Computer Science Departments about a year ago that confirms what many of users have felt in our guts for years - that the manufacturer's posted disk drive failure rates, when applied to real world environments that the rest of us operate in - are at best suspect.</p>
<p>So now what is happening? Storage systems vendors and their RAID technologies are under the gun and they are feeling a little heat from the Computerworld column that I posted. While they are quick to defend their technologies in forums and white papers, what other assurances or guarantees do users really have? Disk drive manufacturers have disclosed for some time the bit error rate on their respective FC and SATA drives. So is it time for similar disclosures on the reliability of RAID-based storage systems? </p>
<p>As systems scale into the hundreds of TBs or even PBs, as <a href="http://www.permabit.com/">Permabit</a>'s <a href="http://www.permabit.com/products/enterprise-archive.asp">Enterprise Archive</a> does (though&nbsp;Enterprise Archive&nbsp;is grid storage system, not RAID-based), these types of disclosures become more important. Is the error rate one in 100 trillion like some SATA disk drive manufacturers claim? Is it one in 100 quadrillion? Or, when put to the test, will they also fail up to 4% of the time in some real world situations, and do they have a mechanism to recover for those levels of failure rates? </p>
<p>No matter what the manufacturers say, their RAID systems are probably more prone to failure in the real world than even they realize or can document. Because of this, users are wise to factor in some margin of error in terms of how well vendor's RAID systems really perform and recover from disk drive failures versus what they claim. This becomes more important as these storage systems scale into the tens and hundreds of TBs and manage more disk drives with ever more capacity. In these circumstances, new grid storage architectures that don't rely on RAID might be a better fit in your environment.</p>
<p>Now I don't profess to have all of the answers, but don't assume your storage system vendor does either. There is no reason now, any more so than there has been in the past, for users to place their inherent trust in the RAID architecture included with your storage system. Because as storage systems scale to manage ever more disk drives with ever larger capacities and then deduplicate data stored on these systems, you are placing more than your faith on these storage systems, you are betting your company's future on them. </p>
<p>So query your vendors to see if they can provide satisfactory answers about how well their systems manage your data and more importantly, how they can recover it when you need it. Then match their responses to your internally documented application SLAs and&nbsp;user expectations. If companies find your vendors' answers don't match your reality, it might be an indication that it is time to evaluate new storage technologies that better&nbsp;match your high capacity, long term data storage needs.</p>]]></description>
            <link>http://permabit.dciginc.com/2008/06/sata-bit-error-rate-column-put.html</link>
            <guid>http://permabit.dciginc.com/2008/06/sata-bit-error-rate-column-put.html</guid>
            
            
                <category domain="http://www.sixapart.com/ns/types#tag">Archiving</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Data Retention</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Deduplication</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Grid Storage</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Networked Storage</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Storage Systems</category>
            
            <pubDate>Wed, 11 Jun 2008 07:45:00 -0600</pubDate>
        </item>
        
        <item>
    	    <author>
	        <name>Jerome M. Wendt</name>
        	<uri>http://www.dciginc.com/about/jeromemwendt</uri>
	    </author>
            <title>Archiving, Deduplication and SATA Disk Drives; The Possibilities are &quot;Frightening&quot;</title>
            <description><![CDATA[<p>It's 2008 and one would think that disk-based storage systems are beyond the point of catastrophic outages and/or data loss as a result of disk drive failures. The prevalent use of RAID in storage systems for disk drive protection in its many forms would seem like ample insurance against the loss of data. However a careful examination of the facts exposes the flaws in assuming that RAID alone is sufficient as a means of data protection; especially when used in conjunction with today's high capacity SATA disk drives.</p>
<p>A <a href="http://www.cs.cmu.edu/~bianca/fast/index.html"><u><font color="#0000ff">study</u></font></a> published in 2007 by Bianca Schroeder and Garth A. Gibson of <a href="http://www.cmu.edu/index.shtml">Carnegie Mellon</a>'s Computer Science Department shows that the actual mean time to failure (MTTF) for SATA drives in the field is significantly higher than either SCSI or FC drives with as many as 4% of SATA drives failing in some production storage systems. However what this study does not examine is the risk associated with recovering data when these disk drive failures do occur.</p>
<p>Much ado is made of the possibility of a second drive failing as well as the time it takes to recover a SATA disk drive with a capacity of 500 GB or greater. However another risk that receives much less attention is the non-recoverable bit error rate of SATA drives that neither RAID nor more spare disk drives in storage systems can correct. </p>
<p>SATA disk drive manufacturers <a href="http://www.seagate.com/docs/pdf/datasheet/disc/ds_dmax.pdf"><u><font color="#0000ff">publish</u></font></a> on their SATA disk drive specification sheet how often these bit error rates (BERs) will occur. Though these BERs may vary by vendor, it does not take very long to uncover that some vendors' SATA disk drives experience a non-recoverable BER as frequently as once for every 10 - 12 TBs of data (though it is stated on data sheets in terms like "1 per 10^14"). These errors result in a complete loss of the block of data where that bit error occurs&nbsp;when and if reconstruction of that block occurs, whether or not RAID is used to protect it. This error rate is deemed so high that a team of researchers at Microsoft used the term "frightening" in a <a href="ftp://ftp.research.microsoft.com/pub/tr/TR-2005-166.pdf"><u><font color="#0000ff">December 2005 technical report</u></font></a> to describe this possibility of data loss. </p>
<p>But why doesn't RAID protect against these unrecoverable bit error rates? RAID is designed to protect against the failure of individual disk drives, not individual bits of data. So when a bit of data becomes corrupted, RAID offers no means to recover from it because it is not checking for faults at that level as RAID "assumes" the data is protected. This flaw is exposed in RAID systems since, should a disk drive fail in a RAID group, the data on the new drive is reconstructed from existing data. </p>
<p>If, as existing data is read and then copied to a new drive, the controller cannot read the data, the copy of that bit of data to the replacement disk drive can never complete and data loss occurs. The odds of this occurring increase as the capacity of SATA disk drives also increase. As 1 TB SATA drives become more readily available and companies configure RAID sets as 7+1 (7 disk drives in an array group plus 1 for parity), the possibility of data loss assuming the loss of one drive reaches about 1 in 10 (based on manufacturers specifications). </p>
<p>So let's put this possibility of data loss in perspective and when it matters. The loss of 1 bit out of every 100 trillion bits sounds pretty insignificant (and frankly, it is). This forces companies to prioritize when losing a bit of data matters, and how important the loss of an individual, seemingly insignificant amount&nbsp;bit of data is to their business operations. </p>
<p>Two examples that immediately come to mind: deduplication and archived data.</p>
<p>Deduplication puts a premium on each bit of data since a bit of data is only stored once but may be used in the reconstruction of tens, hundreds or even thousands of files during restores. As a result, loosing a single bit of data can have catastrophic consequences from a recovery standpoint. If just one bit happens to be corrupt, it may impact more than just the recovery of an individual file but multiple files&nbsp;depending on the affected bit.</p>
<p>The loss of a single bit of data in archived data can not only can preclude companies from recovering multiple files that share the same bit of data, it also brings into question whether or not the archived data will be available when a company needs it. Archived data may be stored for years or even decades and copied many times over that period of time. During that time, a company may store hundreds of terabytes or even petabytes of information so the possibility of data loss goes from a remote possibility to almost a certainty as Microsoft's researchers encountered.</p>
<p>All of this goes to prove the point that companies need a new architecture that goes beyond the RAID technology that is found in today's disk-based storage system when storing this amount of information for the timeframes that are typical for archiving. RAID protection, and even making multiple copies of the same data, is no longer a guarantee that the individual bits of data that comprise a file are adequately protected or preserved. </p>
<p>In these circumstances, new disk-based storage systems such as Permabit's <a href="http://www.permabit.com/products/enterprise-archive.asp">Enterprise Archive</a> and its <a href="http://www.permabit.com/products/rain-ec.asp">RAIN-EC</a> architecture take steps to mitigate the possibility of the loss of individual bits of data so they are preserved and protected long term. In an upcoming blog entry, I'll take a closer look at how&nbsp;RAIN-EC addresses this issue in&nbsp;Permabit's Enterprise Archive solution.</p>]]></description>
            <link>http://permabit.dciginc.com/2008/05/archiving-deduplication-and-sa.html</link>
            <guid>http://permabit.dciginc.com/2008/05/archiving-deduplication-and-sa.html</guid>
            
            
                <category domain="http://www.sixapart.com/ns/types#tag">Archiving</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Data Retention</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Deduplication</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Networked Storage</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Storage Systems</category>
            
            <pubDate>Fri, 30 May 2008 07:00:00 -0600</pubDate>
        </item>
        
        <item>
    	    <author>
	        <name>Jerome M. Wendt</name>
        	<uri>http://www.dciginc.com/about/jeromemwendt</uri>
	    </author>
            <title>Massive Scalability Requires More Than Just a Massive Storage System</title>
            <description><![CDATA[<p align="center"></p><font style="FONT-SIZE: 0.8em" size="2">
<p>In a <a href="http://permabit.dciginc.com/2008/05/scalable-grid-storage-architec.html">previous blog entry</a>, I took a look at some of the factors driving the need for a scalable architecture that meets today's enterprise archiving needs. In this blog entry, I'll take a closer look at how <a href="http://www.permabit.com/">Permabit</a>'s <a href="http://www.permabit.com/products/enterprise-archive.asp">Enterprise Archive</a> delivers on some of these requirements.</p></font>
<p>In the past, and even to a large extent today, the idea of building a massively scalable storage unit for your archived records was pretty simple to understand - build a bigger building or knock out a wall and put on an addition. Even today companies too often equate archiving with keeping more boxes of paper and tape in ever bigger buildings with your records management provider. </p>
<p>However as companies move towards archiving data on disk-based storage systems, you can't just always build bigger buildings or knock down walls. If anything, companies want to store more data in a smaller footprint. Making it more complicated, companies are creating exponentially more data than they were 10, 5 and even 2 years ago and keeping it for longer periods of time. Factor in mobile devices that manipulate existing data and create new data and the increasing use of video in corporations and the result is millions, billions and even trillions of file-based data elements that create thousands of terabytes of data. </p>
<p>In the past disk-based storage systems contributed to the problem since they were designed to support daily operations, not retain data for long periods of time. This limitation forced companies to take great care in selecting a platform that scales to meet their data archiving and storage capacity needs short and long term. Using disk-based storage systems are becoming the preferred platform for storing archived data, it may add to the difficulty in selecting a system since the decision has more far-reaching implications. </p>
<p>Specific concerns that a company may have about using a disk-based storage system for archiving include:</p>
<ul>
<li>Cost</li>
<li>Data migrations</li>
<li>A single disk's finite storage capacity</li>
<li>Implementing new disk storage technologies as they become available</li>
<li>Storage system upgrades</li>
<li>System management and efficiency</li></ul>
<p><a href="http://www.permabit.com/">Permabit</a>'s <a href="http://www.permabit.com/products/enterprise-archive.asp">Enterprise Archive</a> addresses these specific reservations that companies may have around using disk to store their archived data since "massively scalable" no longer means bigger buildings but better data reduction and data retention technology in the same size footprint. Permabit's Enterprise Archive uses a grid storage architecture that facilitates easy storage capacity expansion and technology refresh as well as supports transparent advances in storage system technology (faster processors, higher speed controllers, faster network interconnects, larger capacity disk drives, deduplication, etc.) in its design.</p>
<p>Enterprise Archive's grid storage architecture uses nodes that are based on off-the-shelf hardware components (servers and disk drives). Companies can then upgrade to newer, faster processor and storage technologies simply by introducing new nodes into the Permabit Enterprise Archive configuration. The process is seamless without user involvement nor interruption. Once introduced, data is then automatically redistributed from existing nodes to new nodes or, assuming companies want to replace an existing node with a new node, all of the data is then migrated to the new node. This configuration eliminates the normal management hassles, resource impact and costs (internal and external) of data migrations that technology refreshes introduce while providing a transparent upgrade path to new storage system hardware technologies.</p>
<p>Enterprise Archive's grid storage architecture is based on Permabit's <a href="http://www.permabit.com/products/rain-ec.asp">RAIN-EC</a> technology helps to ratchet up its maximum storage capacity. The <a href="http://www.permabit.com/products/data-center-series.asp">Data Center</a> series of Permabit's Enterprise Archive scales to 96 TB in individual grids (up to 32 grids) which can then be pooled together for a total system capacity of 3 PB. However when one starts to think in terms of archiving data for 10, 30 or 50 years or longer, 3 PB may still seem insufficient. But then again, there are increased drive capacities on the horizon.</p>
<p>That's where&nbsp;another component of Permabit's Enterprise Archive feature set comes into place: <a href="http://www.permabit.com/products/sdr.asp">Scalable Data Reduction</a> (SDR). SDR performs data deduplication on data as it is ingested by the Permabit Enterprise Archive by comparing incoming segments of data with existing data segments. Storing like segments eliminates redundant data on the Permabit Enterprise Archive and provides companies with a higher net effective storage capacity than just the raw storage provided by the storage nodes. </p>
<p>SDR also drives down costs since existing storage capacity is used more effectively. Further,&nbsp;by coupling&nbsp;SDR with Permabit Enterprise Archive's grid storage capacity, larger capacity drives can be regularly introduced that increases Enterprise Archive's raw storage capacity capabilities which further contributes to keeping storage costs at a minimum.</p>
<p>In these respects, Permabit's Enterprise Archive gives companies a means to massively scale a disk-based storage system without needing a massive building in which to store their archived data. By combining a grid architecture with its SDR deduplication and Permabit's <a href="http://www.permabit.com/products/rain-ec.asp">RAIN-EC</a> technology (subject of another blog entry soon) , Enterprise Archive delivers the inherent attributes of infinite capacity, simplified management and long life that storage systems intended for managing archived data requires. </p>
<p>However disk-based systems designed for archive also need to address other concerns such as preserving the integrity of the data (digital fingerprinting) to&nbsp;withstand the possibility of multiple simultaneous hardware failures&nbsp;as well as providing&nbsp;content certificates to ensure the data satisfies legal audits. I'll take a closer look at those features in&nbsp;an upcoming&nbsp;blog entry.</p>]]></description>
            <link>http://permabit.dciginc.com/2008/05/massive-scalability-requires-m.html</link>
            <guid>http://permabit.dciginc.com/2008/05/massive-scalability-requires-m.html</guid>
            
            
                <category domain="http://www.sixapart.com/ns/types#tag">Archiving</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Data Reduction</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Data Retention</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Deduplication</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Grid Storage</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Networked Storage</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Storage Systems</category>
            
            <pubDate>Thu, 22 May 2008 05:00:00 -0600</pubDate>
        </item>
        
        <item>
    	    <author>
	        <name>Jerome M. Wendt</name>
        	<uri>http://www.dciginc.com/about/jeromemwendt</uri>
	    </author>
            <title>Scalable Grid Storage Architecture a Prerequisite for Enterprise Data Archiving</title>
            <description><![CDATA[<p>One can hardly have a conversation about storage management these days without the topic of archiving surfacing. Part of the reason that archiving is commanding more attention is because as companies create and keep ever greater amounts of referential data on their production storage systems, it is creating a host of new problems.</p>
<p>Some of the problems are obvious. Retaining more data drives up storage costs across the board from extra capacity needed on production storage systems to the need for more capacity in backup. Though deduplication helps to take some of the sting out of disk-based storage costs and using disk as a backup target helps shorten the backup windows, this is only a short term reprieve in coping with existing corporate data and storage management problems, not the final solution.</p>
<p>Most companies fail to understand the financial, technical and legal liabilities that keeping this amount of data online presents to their companies. <a href="http://www.forrester.com/rb/analyst/andrew_reichman">Andrew Reichman</a>, a Senior Analyst with <a href="http://www.forrester.com/rb/research">Forrester Research</a>, conducted studies over the last couple of years that illustrates some of the risks that unmanaged data presents to companies. His findings with Forrester Research included:</p>
<ul>
<li>Storage budgets are flat or declining</li>
<li>The #1 reason companies are buying more storage capacity is because it is easier to throw more capacity at the problem than to understand and deal with the problem</li>
<li>Companies find it very difficult to find good storage people</li>
<li>Data&nbsp;is growing 60% year-over-year while storage costs are dropping only 20% year-over-year</li></ul>
<p>Actual experiences and&nbsp;expenditures&nbsp;will vary by company but these findings clearly identify/illustrate that companies are approaching a point where they&nbsp;must pro-actively manage their data and start to separate production data from stale or infrequently accessed data. Moving this data from primary storage to secondary reduces backup windows since less data is backed up during full backups. Companies can also reduce the amount of money they pay for secondary storage since, instead of procuring high cost Fibre Channel storage systems, they can purchase lower cost, higher capacity storage systems to house this type of data.</p>
<p>However before anyone runs down to <a href="http://www.compusa.com/">CompUSA</a> to start buying hard drives and servers, companies need to put some thought into the type of storage system that they are going to use to host archived data. Companies should also weigh the wisdom of using a product from their existing storage system vendor's portfolio of storage systems as their storage system for archived data. The challenge that most companies will find in this situation is that while most storage systems using off-the-shelf SATA disk drives&nbsp;will cost less than FC disk drives, companies cannot forget to factor in the technical challenges, legal liabilities and financial costs associated with long term data archiving and retention. </p>
<p>Selecting a system that has a scalable and flexible architecture and also satisfies external legal compliance issues by retaining data for the appropriate time without keeping it too long; thereby exposing companies to new risks is a separate issue. In the next blog entry, I'll take a closer look at:</p>
<ul>
<li>How <a href="http://www.permabit.com/products/enterprise-archive.asp">Permabit's Enterprise Archive</a> delivers on these specific concerns</li>
<li>What specific features it has; and</li>
<li>Partnerships that&nbsp;<a href="http://www.permabit.com/">Permabit</a> has&nbsp;in place to address corporate concerns around implementing and managing archived data short and long term</li></ul>]]></description>
            <link>http://permabit.dciginc.com/2008/05/scalable-grid-storage-architec.html</link>
            <guid>http://permabit.dciginc.com/2008/05/scalable-grid-storage-architec.html</guid>
            
            
                <category domain="http://www.sixapart.com/ns/types#tag">Archiving</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Data Retention</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Deduplication</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Grid Storage</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Networked Storage</category>
            
            <pubDate>Mon, 12 May 2008 05:00:00 -0600</pubDate>
        </item>
        
    </channel>
</rss>