Feb 1, 2006 12:00 PM,
By Jay Ankeney
Technology trends in storing and protecting file-based AV content.
Networked storage systems like Hitachi’s Lightning 9980V can provide security and order when sharing information across an organization.
A wise man once said that data out of context is chaos; data within context is information; information given purpose is wisdom; and wisdom shared is intelligence.
Yet while stored intelligence can lead to productivity, sharing it over a network is increasingly challenging for IT departments in today’s corporate environments. And lost intelligence is the bane of a failed networked storage infrastructure.
So this technology trends article will take a convivial look at the current options for storing and sharing that information. Rather than becoming techno-geeky, the approach will be to provide an understanding of today’s trends for information network architecture and its future directions. After all, increasing numbers of people behind computers are generating growing amounts of information and sharing it either as data or audiovisual media throughout a corporation. With this in mind, a company’s ability to access, distribute, and protect that information can determine whether that sharing results in intelligence or chaos.
THE HISTORY OF DATA STORAGE
To start at the beginning, the modern concept of data storage actually began in 1801 when Joseph Marie Jacquard improved on an earlier 1728 concept by the French inventor Falcon to create pasteboard punch cards whose rows of holes could be used to control looms when weaving patterns. During his work from 1837 to 1871, Charles Babbage planned to use similar cards to feed information into the uncompleted Analytical Engine, and it is said that Ada Lovelace (for whom the programming language Ada is named) actually completed a program on those cards to calculate Bernoulli figures.
By the 1890s paper tape was being used to store information for automatic processing. Some of us still remember threading perforated strips of tape into early DEC computers like the PDP-11 to boot their memories. Rolling reels of magnetic tape were used until the 1960s to store data and software applications, but their clumsy read/write paradigms made them awkward to implement.
Then in September of 1956, IBM introduced magnetic hard disk storage in its IBM 350 (which used 50 internal disks for random access storage), and modern computer storage had arrived. The floppy disk was introduced on the IBM System/370 in 1971, but once Seagate Technology introduced the first 5.25in. hard drive with a capacity of 5MB in 1980 for its ST-506, most computer designers adopted hard disk technology for their storage requirements.
The problem with individual hard drives, of course, is that all your storage is in one basket. It has to be accessed from one source, and if that source fails you lose your data. But in 1978 Norman Ken Ouchi at IBM was awarded a patent for a system that used disk mirroring and parity protection to recover and preserve data stored in a failed memory unit. By 1987 researchers at the University of California at Berkeley had started to investigate multiple-drive arrays, calling them RAID arrays. Originally, that stood for “Redundant Arrays of Inexpensive Disks,” although today “independent” is often substituted for “inexpensive.” RAID arrays can provide increased storage and faster access by striping information across more disks than are actually needed. That way, if one disk fails, the data can be rebuilt with the help of parity blocks stored on a parity drive. If a single drive in the array fails, data blocks and a parity block from the working drives can be combined to reconstruct the missing data.
Today, RAID arrays come in a number of levels, each increasing the available drive subsystem bandwidth. This is an important consideration for networking video information. The RAID hierarchy starts with RAID 0, which is sometimes referred to as JBOD for “Just a Bunch of Disks” that have been concatenated together. RAID 1 introduces mirroring whereby the data is copied to two or more identical disk sets. RAID 2 stripes data at the bit (rather than block) level and uses a Hamming code for error correction. Despite the potential for high data transfer rates, this is the only original level of RAID not currently being used.
RAID 3 (very rare) uses byte-level striping and a dedicated parity drive to facilitate data recovery in case of a drive failure. RAID 4 is similar but stripes at the block level so it can service multiple read requests simultaneously. RAID 5 uses block-level striping and distributes parity data with the actual data to help reconstruct lost information. RAID 5 has achieved popularity due to the low cost of the redundancy it provides and its ability to have a theoretically unlimited number of drives.
But it doesn’t stop there. RAID 6 uses a second independent distributed parity scheme for an additional level of protection. There are also nested RAID levels including RAID 10 (or RAID 0+1), RAID 50 (RAID 5 + 0) and RAID 100 (RAID 10+0) where one RAID array uses another RAID as its basic element instead of using physical disks. Recently we’ve also seen a growing number of proprietary RAID levels from various manufacturers.
You could attach the storage system directly to the RAID array for direct-attached storage (DAS), but that would just give you an island of information. DAS is implemented via a parallel SCSI interface to transmit data using the Server Message Block (SMB) protocol.
SCSI worked fine inside the storage unit itself, to let the operating system manage the RAID disks, but had limited usefulness for DAS networks because cabling is limited to less than 100ft., and SCSI cannot connect more than 16 devices per network. Even ultra-SCSI 640 has a bandwidth of just 640MBps.
To improve sharing that stored data, in 1974 IBM created a proprietary networking architecture called Systems Network Architecture (SNA). Although still used extensively in financial networks by banks and many government agencies, SNA is a licensed protocol, meaning royalties must be paid to IBM for its use.
An alternative is local area networks, or LANs, which were developed in the late ’70s to create high-speed file-sharing links between several large central computers at one site. Of many competing network topologies created at this time, Ethernet and ARCNET were the most popular. Current LANs are most likely to be based on switched Ethernet or Wi-Fi technology running at 10Mbps to 1000Mbps. Ethernet, standardized as IEEE 802.3, has become the most widespread LAN standard in use and has largely replaced other LAN standards. Ethernet segments are restricted in length by the type of cabling used; for example, 10Base-5 coax cables have a maximum length of approximately 1,500ft. Greater length can be obtained by using an Ethernet repeater, which takes the signal from one Ethernet cable and repeats it onto another.
With a LAN a single site could have dozens of computers connected, but the feasibility of LANs was hindered by the proliferation of incompatible network protocols and a disagreement over how best to share resources. Soon after its introduction in 1983, NetWare dominated the personal computer LAN business. But for enterprise-level file sharing, integrators needed to cast a far larger net. That net evolved into what we now call TCP/IP (Transmission Control Protocol/Internet Protocol), a widely deployed standard for the sharing of data across the network. Today, TCP/IP is the primary protocol for the transmission of information over the Internet.
Figure 1: In NAS systems, many clients running different operating systems can access the same centralized storage.
For a larger image, click here
The first popular approach for sharing information throughout a corporate facility was networked-attached storage (NAS). With NAS, the host uses a file system device driver to access data using file-access protocols such as Network File System (NFS) for UNIX or Common Internet File System (CIFS) for Windows systems. NAS systems interpret commands from these protocols and perform the internal file and device I/O operations necessary to execute them.
NAS provides a structure in which many clients running different operating systems can access the same centralized storage, which means that security, management, and backup of the data can also be centralized. Also, when the need arises to expand the network, the IT administrator can simply add another NAS device.
Network-attached storage proponents cite benefits of NAS such as its low buy-in cost, easy installation and training, and server-like management simplicity. NAS systems can generally be accessed over a computer network (usually TCP/IP), which enables multiple computers to share the same storage space at once. This allows central management of hard disks or RAID arrays to minimize overhead.
STORAGE AREA NETWORK
The storage area network (SAN) was developed as an even broader architecture than NAS. SAN transmits data on the block level rather than as files and is composed of Fibre Channel disk arrays. Fibre Channel started in 1988, with ANSI standard approval in 1994, as a way to simplify the complicated connectors employed by the HIPPI (High-Performance Parallel Interface) system then in use in similar roles. More recently SAN has been supplanted by NAS with Gigabit Ethernet since the number of clients that can simultaneously connect to NAS is limited only by the network’s topology. Today, Fibre Channel can transmit data at 1Gbps, 2Gbps, or 4Gbps. An 8Gbps standard is being developed, and a 10Gbps standard has been ratified, but no products are available yet based on that standard.
In the Fibre Channel-based SAN architecture, a network of storage devices are connected to each other and to a server or cluster of servers that act as an access point to the SAN. In addition, the SAN system includes some form of permission control or “token passing system” that enables one (and only one) server to write to the array at any one point in time. Of course, any number of users can read from the SAN simultaneously.
SAN can be very useful when a relatively small number of servers need high-speed, deterministic access to a pool of storage. That is why Fibre Channel SAN architecture is the type of networked storage generally preferred for most networked video server installations. In addition, Fibre Channel can connect devices up to a distance of 75 miles. One drawback, however, is that SANs require expensive Fibre Channel controllers in each server and potentially expensive Fibre Channel switches for interconnectivity.
On a SAN, every device can communicate with other devices on a separate network, making it possible to back up all the separate depositories of information without tying up the standard network infrastructure with gigabytes of data. This is often called serverless backup or third-party copying. As a result, the switched 4Gb full-duplex capabilities of Fibre Channel fabrics can significantly improve backup and restore performance as compared to NAS architecture. Fibre Channel SANs can be accessed on all servers in a LAN or wide area network (WAN) or even on metropolitan area networks (MAN) to cover even longer distances.
Figure 2: In the Fibre Channel-based SAN architecture, a network of storage devices are connected to each other and to a server or cluster of servers that act as an access point to the SAN.
For a larger image, click here
The latest generation of networked storage is called IP-SAN. This concept combines the ease of deployment and ease of use of Ethernet NAS with the functionality and scalability of Fibre Channel SAN. IP-SAN merges the file storage/sharing of NAS and the block-level storage/sharing capabilities of SAN but at a significantly lower cost. Like NAS, IP-SAN can be built on Gigabit Ethernet network infrastructures that have already been deployed, although it is most commonly implemented as a separate network off the main shared network infrastructure for security reasons. An IP-SAN network is managed either locally or remotely as a single realm and can scale to petabytes in size.
An IP-SAN uses an Internet SCSI, or iSCSI, as its IP host or client. iSCSI is an official standard, ratified February 2003 by the Internet Engineering Task Force, that uses the SCSI protocol over a TCP/IP network. It enables any machine on an IP network (initiator) to contact a remote dedicated server (target) and perform block I/O on it just as it would do with a local hard disk. Many predict that the iSCSI standard will displace Fibre Channel in the long run since Ethernet data rates are currently increasing faster than data rates for Fibre Channel and similar disk-attachment technologies.
By separating the controllers from the data repositories, an IP-SAN eliminates the need for IT departments to buy more infrastructure than they currently use; they have to add more controllers only as needed. Because the IP-SAN is based on iSCSI solutions it can be implemented simply, using only the Microsoft iSCSI initiator on a host server, a target iSCSI storage device, and a Gigabit Ethernet switch to deliver block-level storage over IP. This simplicity means that IP-SANs based on iSCSI can be installed for less than half the cost of Fibre Channel SAN technology and can result in up to 60 percent savings in storage administration costs over a DAS system with the same capacity.
STANDARDIZING STORAGE MANAGEMENT
A vendor-neutral trade organization for network storage architectures, the Storage Network Industry Association (SNIA) has found that corporations are expanding their storage capacity by 30 percent annually. The SNIA predicts that through 2008 worldwide demand for direct-attached storage (DAS) will increase only 28 percent. Capacities for network-attached storage (NAS) and storage area networks (SAN) will grow 62 percent annually.
However, since each storage network architecture has to depend on proprietary devices from different manufacturers, however, the new challenge for IT departments will be interfacing these systems with each other. For example, each device in today’s SAN has its own disparate management interface, which can be a real pitfall for administrators and systems integrators. To smooth the integration process, the SNIA has initiated its Storage Management Initiative Specification (SMI-S) to help reduce costs associated with multi-vendor storage management. SMI-S is the first initiative in the storage market that works to develop and standardize interoperable storage management technologies and aggressively promote them to the storage, networking, and user communities. Its goal is to consolidate the storage management function in order to encourage IT administrators to adopt products that conform to SMI-S guidelines. More information is available at www.snia.org/smi.