Mark As Completed Discussion

This is a summary of the Hadoop Distributed File System white paper, which can be found at this link.

Abstract

  • The Hadoop Distributed File System (HDFS) is designed to store very large datasets reliably, and to stream said datasets at a high bandwidth to user applications.
  • We describe the architecture of HDFS and report on the experience of using HDFS to manage 25 petabytes of enterprise data at Yahoo!.

Introduction

  • Hadoop provides a distributed file system and a framework for the analysis and transformation of very large data sets using the MapReduce paradigm.
  • Hadoop is an Apache project; all components are available via the Apache open source license.
Introduction