In general, distributed files systems are IT solutions that allow multiple users to access and share data in what appears to be a single seamless storage pool. The back-end enabling systems can follow one of a few architectural patterns, in popular order they are client-server, which tends to be the most common, cluster-based architectures are most useful in large data centers, and decentralized files systems also exist.
These architectures comprise multiple back-end systems connected together via a network, with middleware orchestrating file storage, and employing many techniques to ensure the “distributed” system’s performance meets the needs of users. In this way, the distributed system has a capacity of service, and the load on that service is the total demand by all the active users. When load approaches or exceeds that capacity, system performance will degrade, and will show signs of lag, or service outages.
The chief benefit rests on the fact that sharing data is fundamental to distributed systems, and therefore forms the basis for many distributed applications. Specifically, distributed files systems are proven ways to securely and reliably accommodate the data sharing between multiple processes over long periods. This makes them ideal as a foundational layer for distributed systems and applications.
Distributed systems form the modern concept of “the cloud” and support the idea that the cloud is essentially limitless in storage capacity. These systems can expand behind the scenes and match any growth in demand. They can manage massive volumes of information, safeguard its integrity, and ensure its availability to users 99.9995% of the time. And in that small sliver of downtime, there are contingencies upon contingencies in place. For cloud data centers, this is their business, and so are able to benefit from economies of scale more readily than enterprises or smaller businesses that deploy their own distributed systems.
Enterprises and small businesses may deploy their own distributed file systems to facilitate business operations, regionally, even globally. For instance, distributed systems may support private clouds, parallel computing, even real-time control systems. Municipalities deploy real-time traffic control and monitoring systems to better manage commuter times, all made possible by DFS supported applications. Sophisticated parallel computing models are deployed across many participating computing systems in collaborations that help compute large data sets, as in astronomical calculations where one computer simply won’t do the work.
While they are popularly understood to be file sharing technologies, distributed file systems are characterized by several features beyond sharing data. The most desirable DFS characteristics are outlined below.
Technically, distributed file systems need to achieve several goals to produce the effect of active file sharing for multiple users. The basic distributed model connects multiple local file systems together, by mounting them, and then abstracts a layer of storage management from the user interface layer, hiding how those files are optimally stored on underlying infrastructure, so that engaging clients view the multiple system as one. From the user’s perspective, they will often expect simultaneous user data access, while accessing the freshest version. These demands raise many considerations for system designers.
To answer many of these considerations, system designers use different DFS mechanisms, including:
A distributed file system is a reference to a group of systems that work together acting as a single shared file system. A distributed system is governed in part by a network file system, in general terms. There are several varieties of network file systems in use, but they all allow a remote host to mount and interact with file systems over a network. Often distributed file systems and network file systems are used interchangeably.
More specifically, Network File System (NFS) is a protocol developed by Sun Microsystems that has subsequently become the de facto standard file sharing protocol today. The most widely supported is NFS version 2 (NFSv2). For example, Windows Servers use the NFS protocol to allow transfers of files between systems running Windows and non-Windows systems, like Unix or Linux. The latest version, NFS version 4 (NFSv4), works through firewalls and on the Internet.
The main benefit of distributed file systems is to connect together multiple systems through a network and expand storage capabilities while maximizing user access. Subsequent benefits include:
Block storage can be compared to two other common storage formats, file, and object storage. These formats aim to store, organize, and allow access to data in specific ways that benefit certain data applications. For instance, file storage, commonly seen on desktop computers as a file and folder hierarchy, presents information intuitively to users. This intuitive format, though, can hamper operations when data becomes voluminous. Block storage and object storage both help to overcome the scaling of data in their own ways. Block storage does this by “chunking” data into arbitrarily sized data blocks that can be easily managed by software, but provides little data about file contents, leaving that to the application to determine. Object storage decouples the data from the application, using metadata as a file organization method which then allows object stores to span multiple systems, but still be easily located and accessed.
Three distributed file system architectures are in use today. Client/server file systems are most common, and readily available. Decentralized file systems are often found in peer-to-peer community-based networks. And cluster-based file systems are useful in large datacenters.