Big Data

Explain Hadoop Distributed File Systems (HDFS)

Difficulty: unrated

Source: bregman-arie/devops-exercises by Arie Bregman

Answer

  • Distributed file system providing high aggregate bandwidth across the cluster.
  • For a user it looks like a regular file system structure but behind the scenes it's distributed across multiple machines in a cluster
  • Typical file size is TB and it can scale and supports millions of files
  • It's fault tolerant which means it provides automatic recovery from faults
  • It's best suited for running long batch operations rather than live analysis