The electronic storage of applications and of your data has been an essential aspect of IT right from the very beginning. It would of course make no sense at all if the many, regularly occurring results of “data processing” always had to be converted into a printable format and archived in the traditional way. That would be counter-productive in terms of modern technology and not exactly beneficial to business success.
Computing and storage clearly belong together so that we can always access the results of IT processes. However, storage methods have changed constantly in the relatively short history of IT, and today we have several technical possibilities that seem to exist side by side on an equal footing.
The basic principle is always the same: users and applications retrieve stored data, either in part or in full. The type of data also plays a role; it could be databases, texts, photos or sound and video/film recordings. Access and security checks are a further factor – not every employee should have the same access to all data. The media and network connections used also decide who can receive how many types of access: the media are tape, hard drive, flash or printouts; speed, bandwidth, latencies and distances are important factors with regard to networks. As far as network technologies are concerned, we have to differentiate between Ethernet, Fibre Channel, ASS, SATA, PCIe or InfiniBand.
Storage technology all began with block storage. Here data is usually stored on the computer itself in specified fixed blocks (for example of 512 bytes). It is stored “raw” without any higher-level metadata, which records the data format, data type or the creator or owner of the data, for example. Access to the data is regulated via the operating system as mounted drive volume. Applications and file systems decided how blocks can be addressed, combined and modified on this basis.
Block storage is considered particularly suitable for high-performance applications for primary storage, for example with structured information from databases or business intelligence, virtual large volumes or applications based on Java or PHP.
The application first writes a data block, which then goes through a software/hardware initiator and is forwarded via a server-level DAS connection or network-based SAB connection. Nowadays, the following protocols can be used with DAS (Direct Attached Storage): SATA, ASS, FC (Fibre Channel) or NVMe (Non-Volatile Memory Express), whereas with SAN (Storage Area Network) the connections run either via Ethernet with iSCSI or NVMe-oF or via Fibre Channel with FCP or NVMe-oF. The storage controller then receives the blocks and the data is written to the device concerned as a data block.
According to SNIA, the speed varies depending on the block interface used:
- SATA DAS On-board 6 GB/s
- ASS (SCSI) DAS On-board 12 GB/s
- Thunderbolt DAS On-board 40 Gb/s
- NVMe DAS On-board 16 Gb/s (PCIe 3.0 x 16)
- Fibre Channel (FCP/NVMe-oF) DAS/SAN/WAN (FCIP) HBA 32 Gb/s (I)
- Ethernet (iSCSI/iSER/NVMe-oF) DAS/SAN/WAN NIC & Offload Adapter 100 Gb/s
- InfiniBand (SRP/iSER/NVMe-oF) SAN HCA 100 Gb/s
It would therefore not be correct to regard block storage as simply slow or out of date. The fact is that it depends on other elements and, above all, on the protocols that have been added to this technology over the course of time, the most important in recent years being NVMe. Enterprises should also take a close look at their suppliers of choice: to what extent have they integrated the new possibilities and protocols and what support do they offer?
At the security level, iSCSI provides CHAP authentication for all iSCSI installations, for example. IPsec ensures security in connection channels, and VLANs enable the logical isolation of storage and other data traffic. According to SNIA, large SANs based on iSCSI should be physically separated from LANs in order to boost storage quality.
A Fibre Channel provides FCP WWN-based access control for the stored content. Furthermore, switch zoning and LUN masking for storage are also supported. That means authentication and in-flight encryption for storage networks with the appropriate set-up. Switches can be configured for a minimum access and restricted connections. A Fibre Channel installation should always be physically separated from LANs.
For Dell EMC, block storage offers “better performance and speed than file level storage systems. Each block volume can be treated as an independent disk drive and controlled by external Server OS.” Block storage is described as the typical “native storage interface of most storage media at the driver level”. For example, “a hard disk driver writes out and reads in blocks by their block address on the formatted disk.” Many applications therefore use block storage for their persistent I/O operations, including most of the relational databases, such as Oracle, DB2 from IBM.
Files are similar in construction to folders or archives for paper documents, photos or other collected items. They bundle documents together on the basis of specified criteria to make it easy to access them later. Besides files in text applications like Word, application cases are also clustered databases, Big Data or various types of media.
Here are some properties of a file: files have names and are generally addressable as bytes. A file handle records the I/O operations performed. Finally, files are organised in named file systems or directories that can contain various structured subdirectories consisting of logical, virtual or physical elements.
A file path or route to a file could then look like this for example:
Like all file systems, a file system for storage has a hierarchical structure. Each stored file, frequently consisting of unstructured data, can be described with a path leading to it and can therefore be opened again. Metadata with information about a file such as a brief description, size, author or date is stored along with the file itself. Network-Attached Storage (NAS) has proved to be the most secure method of sharing files among users of a network.
File systems are usually organised locally in a LAN, where they achieve a high level of performance. If users are scattered throughout a WAN, special efforts are required in terms of performance and management. The vendor NetApp has built up a solid market position especially in the field of file storage.
Virtual layers with various file systems are created on the basis of a physical layer on storage devices such as tape, disk, flash or persistent memory and linked with one another by logical layers. The SNIA points out that there are already hundreds of virtual file systems with different characteristics, including EXT3, EXT4, JFS, ZFS or GPFS.
And then there are distributed file systems such as NFS or SMB, which work together with the more important operating systems and are designed to facilitate access to files across networks. Their aim is to make working with distributed file systems as simple as with a local file system – with abstraction from the underlying physical network structures. Even under distributed conditions, the performance of file systems can be as good as that of block storage and compete with iSCSI. File storage is therefore also well suited to storing virtual machines (VMs) and containers. Although pNFS allows the generation of parallel structures, the abstraction layer is more extensive, which can lead to higher latencies than with block storage.
Under certain circumstances – especially with video and sound documents – the accumulated metadata can reach a larger size, but this can be reduced again by compression or deduplication.
According to IBM, object storage is well suited to storing unstructured data in a flat environment: “Object storage is a hierarchy-free method of storing data, typically used in the cloud. Unlike other data storage methods, object-based storage does not use a directory tree. Discrete units of data (objects) exist at the same level in a storage pool. Each object has a unique, identifying name that an application uses to retrieve it. Additionally, each object may have metadata that is retrieved with it.” (IBM website, “Organizing unstructured data in a flat environment”)
Object storage was developed for access at the application level using an API rather than at the user level. According to IBM, object storage can accommodate virtually any quantity of data without requiring partitioning of the data set. A lack of hierarchy means there are no bottlenecks created by complex directory systems. And object storage systems have mechanisms to ensure data consistency, enabling automatic data replication, rolling updates, and no downtime.
Some object storage systems also use Erasure Coding to replicate data at remote locations. If this is not possible with local object storage, a backup at hardware or operating system level is recommended as an alternative.
Market for object storage / IDC
In a survey of 450 customers, the market researchers at IDC found out that “Object-Based Storage” (OBC) is mainly used for use cases such as backup, disaster recovery, archiving and active archive: “Newer use case such as data analytics for unstructured data, media streaming, and web serving are deployed on off-premises OBS and are expected to be revenue-generating workloads in the future.”
The author works as a journalist for com! professional, ictk.ch and TechTarget. He has published several books on storage topics.