Description of the Big Data Cluster

System Description

The computers (seen hereafter as ‘nodes’) that perform the bulk of the computation on the Big Data Cluster are the so-called ‘worker nodes’. Each node has two 18-core Intel Xeon Gold 6140 Skylake CPUs (2.3 GHz clock speed, 24.75 MB L3 cache, 6 memory channels, 140 W power), for a total of 36 cores per node. Each node has 384 GB of memory (12 x 32 GB DDR4 at 2666 MT/s) and 48 TB (12 x 4 TB) SATA hard disks, and each node is connected to each other node by a 10 Gb/s Ethernet network. These 8 nodes, in addition to two other nodes as well as ancillary hardware that facilitates the network, compose the entirety of the Big Data Cluster. These nodes were purchased as a part of a larger acquisition in 2018 that is explained in more detail here.

Types of Nodes

The Big Data cluster contains several types of nodes that fall into four main categories for usage.

  • Management node – There is one management node, which is reserved for administration of the cluster. It is not available to users.
  • Edge/Login node – Users work on this node directly. This is where users log-in, access files, write and compile code, and submit jobs to be run on the worker nodes. Furthermore, these are the only nodes that may be accessed via SSH/SCP from outside of the cluster. They serve as an interface between the Big Data Cluster and the outside network.
  • Worker nodes – These nodes are where the majority of computing on the cluster will take place. There are 2 types of worker nodes – DataNodes and Name Nodes. Users normally do not interact with any node directly, except for the Name Node.
    • Name Node – This node is a worker node designated as the main access point to the Hadoop Distributed File System (HDFS) and manages the file system namespace. Worker Node #2 is the Name Node for this Big Data Cluster.

Storage

There are a few storage systems attached to the cluster. Here we describe the areas which are relevant to users.

UMBC AFS Storage Access
Your AFS partition is the directory where your personal files are stored when you use the DoIT computer labs or the gl.umbc.edu login nodes. The UMBC-wide /afs can be accessed from the Big Data Edge/Login node.

Scratch Space
The Edge/Login node has a local /scratch directory, and it is about 500 GB in size.  This space is shared between all users, and it is intended as the location from which large data transfers into HDFS originate.