Glossary

The following is a list of commonly used terms and acronyms, and their definitions when used in the context of the MIT Supercloud.

First is a visual labeling the portions of the system with the terminology we tend to use for each piece.

 

CPU
The Central Processing Unit (CPU) is the part of a computer which executes software programs. CPU refers to an individual silicon chip, such as Intel's Xeon-E5 or AMD's Opteron.  A CPU contains one or more cores.  Also known as a processor or socket.

Core
A core is the smallest computation unit that can run a program.

GPU
A Graphics Processing Unit (GPU) is a specialized device originally used to generate computer output.  Each compute node can host one or more GPUs.  Modern GPUs have many simple compute cores and have been used for parallel processing.

Group Shared Directory
A directory, created upon user request, where members of the group shared directory can share files with other members of the group.  Since a user’s home directory is accessible only to the user, a group shared directory is the only mechanism for users to share files.

HPC
High Performance Computing (HPC) refers to the practice of aggregating computing power to achieve higher performance that would not possible by using a typical computer.

Job
A job is a separately executable unit of work whose resources are allocated and shared.  Users create job submission scripts to ask the scheduler for resources (cores, a specific processor type, etc).  The scheduler places the requests in a queue and allocates the requested resources.

Job Array
According to the Slurm documentation:  “Job arrays offer a mechanism for submitting and managing collections of similar jobs quickly and easily”.   Job arrays are useful for applying the same processing routine to a collection of multiple inputs, data, or files.  Job arrays offer a very simple way to submit a large number of independent processing jobs.

Job Slot
A processor core for running a job.

LLGrid Beta
LLGrid Beta is a collection of software packages that are released as a beta test on the Supercloud.  The beta software packages are ones that Supercloud users have requested but are not included in the Supercloud system image.

bwedx
A course platform containing online courses that use the Supercloud system for exercises.

Login Node
The login node controls user access to a parallel computer.  Users usually connect to login nodes via SSH to compile and debug their code, review their results, do some simple tests, and submit their interactive and batch jobs to the scheduler.

Modules
An open source software management tool used in most HPC facilities.  Using modules enable users to selectively pick the software that they want and add them to their environment.  Using the module command, you can manipulate your environment to gain access to new software or different versions of a package.

Node
A stand-alone computer where jobs are run.  Each node is connected to other compute nodes via a fast network interconnect.  While accessible via interactive jobs, compute nodes are not meant to be accessed directly by users.

Process
An independent computation running on a computer.  Processes have their own address space and may create threads that will share their address space.  Processes must use interprocess communication to communicate with other processes.

Slurm
Simple Linux Utility for Resource Management (SLURM) is a job scheduler which coordinates the running of many programs on a shared facility.  Slurm is used on the MIT Supercloud system.  It replaced the SGE scheduler.

Socket
A computational unit packaged as one, and usually made of a single chip often called processor.  Modern sockets carry many cores.

SSH
Secure Shell (SSH) is a protocol to securely access remote computers.  Based on the client-server model, users with an SSH client can access a remote computer.  Some operating systems such as Linux and Mac OS have a built-in SSH client and others can use one of many publicly available clients.  For Windows, we recommend PuTTY or Cygwin for ssh.

Thread
Threads are lightweight processes which exist within a single operating system process.  Threads share the address space of the process that created them and can communicate directly with other threads in the same process.