This is a summary of the Hadoop Distributed File System white paper, which can be found at this link.
Abstract
- The Hadoop Distributed File System (HDFS) is designed to store very large datasets reliably, and to stream said datasets at a high bandwidth to user applications.
- We describe the architecture of HDFS and report on the experience of using HDFS to manage 25 petabytes of enterprise data at Yahoo!.
Introduction
- Hadoop provides a distributed file system and a framework for the analysis and transformation of very large data sets using the MapReduce paradigm.
- Hadoop is an Apache project; all components are available via the Apache open source license.
Access all course materials today
The rest of this tutorial's contents are only available for premium members. Please explore your options at the link below.