Mark As Completed Discussion

This is a summary of the Hadoop Distributed File System white paper, which can be found at this link.

Abstract

  • The Hadoop Distributed File System (HDFS) is designed to store very large datasets reliably, and to stream said datasets at a high bandwidth to user applications.
  • We describe the architecture of HDFS and report on the experience of using HDFS to manage 25 petabytes of enterprise data at Yahoo!.

Introduction

  • Hadoop provides a distributed file system and a framework for the analysis and transformation of very large data sets using the MapReduce paradigm.
  • Hadoop is an Apache project; all components are available via the Apache open source license.
Introduction

Access all course materials today

The rest of this tutorial's contents are only available for premium members. Please explore your options at the link below.

Returning members can login to stop seeing this.