Student Projects

About Us

Our team creates powerful supercomputers for modeling and analyzing the most complex problems and the largest data sets to enable revolutionary discoveries and capabilities.  Many of these capabilities have been developed and published in partnership with amazing students (see our Google Scholar page).

Here are a selection of videos describing some of our work:

Listed below are a wide range protential projects in AI, Mathematics, Green Computing, Supercomputing Systems, Online Learning, and Connection Science.  If you are interested in any of these projects, please send us email at supercloud@mit.edu (please avoid using ChatGPT or another LLM to write your email).

  • Mathematics of Big Data & Machine Learning
    Big Data describes a new era in the digital age where the volume, velocity, and variety of data created across a wide range of fields is increasing at a rate well beyond our ability to analyze the data.  Machine Learning has emerged as a powerful tool for transforming this data into usable information.  Many technologies (e.g., spreadsheets, databases, graphs, matrices, deep neural networks, …) have been developed to address these challenges.  The common theme amongst these technologies is the need to store and operate on data as tabular collections instead of as individual data elements.  This project explore the common mathematical foundation of these tabular collections (associative arrays) that apply across a wide range of applications and technologies.  Associative arrays unify and simplify Big Data and Machine Learning.  Understanding these mathematical foundations enables seeing past the differences that lie on the surface of Big Data and Machine Learning applications and technologies and leverage their core mathematical similarities to solve the hardest Big Data and Machine Learning challenges.
    References:
  • Catastrophe vs Conspiracy: Heavy Tail Statistics
    Heavy-tail distributions, where the probability decays slower than exp(-x), are a natural result of multiplicative processes and play an important role in many of today’s most important problems (pandemics, climate, weather, finance, wealth distribution, social media, …).     Computer networks are among the most notable examples of heavy-tail distributions, whose celebrated discovery led to the creation of the new field of Network Science.   However, this observation brings with it the recognition that many cyber detection systems use light-tail statistical tests for which there may be no combination of thresholds that can result in acceptable operator probability-of-detection (Pd) and probability-of-false-alarm (Pfa). This Pd/Pfa paradox is consistent with the lived experience of many cyber operators and a possible root cause is the potential incompatibility of light-tail statistical tests on heavy-tail data. The goal of this effort is to develop the necessary educational and training tools for effectively understanding and applying heavy-tail distributions in a cyber context.
    References:
  • Abstract Algebra of Cyberspace
    Social media, e-commerce, streaming video, e-mail, cloud documents, web pages, traffic flows, and network packets fill vast digital lakes, rivers, and oceans that we each navigate daily. This digital hyperspace is an amorphous flow of data supported by continuous streams that stretch standard concepts of type and dimension. The unstructured data of digital hyperspace can be elegantly represented, traversed, and transformed via the mathematics of hypergraphs, hypersparse matrices, and associative array algebra. This work will explore a novel mathematical concept, the semilink, that combines pairs of semirings to provide the essential operations for network/graph analytics, database operations, and machine learning.
    References:
  • Mathematical Underpinnings of Associative Array Algebra
    Semirings have found success as an algebraic structure which can support the variety of data types and operations used by those working with graphs, matrices, spreadsheets, and database, and form the mathematical foundation of the associativey array algebra of D4M and matrix algebra of GraphBLAS. Mirroring the fact that module theory has many but not all of the structural guarantees of vector space theory, the semimodule theory has some but not all of the structural guarantees of module theory. The added generality of semirings allows semimodule theory to consider structures wholly unlike rings and fields like Boolean algebras and the max-plus algebra. By focusing on these special cases which are diametrically opposed to the traditional ring and field cases, analogs of standard linear algebra like eigenanalysis and solving linear systems. This work will further explore the theory of semirings in the form of solving linear systems, carrying out eigenanalysis, and graph algorithms.
    References:
  • MIT/Stanford Next Generation Operating System
    The goal of the MIT/Stanford DBOS–the DBMS-oriented Operating System–is to build a completely new operating system (OS) stack for distributed systems. Currently, distributed systems are built on many instances of a single-node OS like Linux with entirely separate cluster schedulers, distributed file systems, and network managers. DBOS uses a distributed transactional DBMS as the basis for a scalable cluster OS. We have shown that such a database OS can do scheduling, file management, and inter-process communication with competitive performance to existing systems. It can additionally provide significantly better analytics and dramatically reduce code complexity by building core OS services from standard database queries, while implementing low-latency transactions and high availability only once. We are currently working on building a complete end-to-end prototype of DBOS.  This project will exploring implementing next generation cyber analytics within DBOS.
    References:
  • Supercomputing and Cloud interoperability
    Shared Supercomputing typically resources offer a limited set of hardware and software combinations that researchers can leverage. At the same time, the absolute number of resources offered are also limited physically. Commercial cloud providers can offer an avenue to leverage additional resources as well as new/unique hardware as needs arise. Thus, having the technology to seamlessly transition between a shared resource such as the MIT SuperCloud and a commercial cloud provider (AWS, Microsoft Azure, Google Cloud) can significantly increase user productivity and enable new research. This project aims to
    • Make the MIT SuperCloud software stack available as a deployable image on commercial cloud providers
    • Develop tools to seamlessly transition between SuperCloud and cloud as requirements change
    • Enable sponsors and funding agencies to provide a standard AI stack that can be leveraged by performers and the broader community
  • Performance Tuning of Large-Scale Cluster Management Systems
    Modern supercomputers rely on a collection of open-source and bespoke custom software to handle node and user provisioning, system configuration, configuration persistence, change management, monitoring and metrics gathering and imaging.  The MIT SuperCloud system’s routine monthly maintenance includes a full reimage and reinstall of the operating system, all software and configuration files to ensure the reliability of our imaging system as well as to maintain a consistent state for our users, preventing the accumulation of incidental changes which could complicate troubleshooting and interfere with the running of user jobs.  The frequency with which we reimage nodes necessitates that the process be streamlined and optimized such that node reinstallation is as quick and reliable as possible.  This project would explore methods to refine our node installation procedures and search for new efficiencies furthering our ability to manage and maintain very large systems.
  • Datacenteric AI
    The Datacentric AI project aims to develop revolutionary data centric systems that can enable edge-to-datacenter scale computing while also providing high performance and accuracy for AI tasks, high productivity for AI developers using the system, and self-driven resource management of underlying complex systems. Rapidly evolving technologies such as new computing architectures, AI frameworks, supercomputing systems, cloud, and data management are the key enablers of AI and the speed at which they develop is outpacing the ability of AI practitioners to leverage optimally. As this compute capability, AI frameworks, and data diversity have grown, AI models have also evolved from traditional feed-forward or convolutional networks that employ computationally simple layers to more complex networks that use differential equations to model physical phenomenon. These new classes of algorithms and massive model architectures need new types of data-centric systems that can help map the novel computing requirements with ever complex hardware platforms such as quantum processors, neuromorphic processors and datacenter scale chips. A data-centric system would need revolutionary operating systems, ML-enhanced data management, highly parallel algorithms, and workload-aware schedulers that can automatically map workloads to heterogenous hardware platforms. By developing technologies to address these needs of future AI systems, this project aims to provide Lincoln and DoD researchers with the tools to address the needs of future AI systems.
  • Parallel Python Programming
    There are plethora of libraries to enable parallel programming with Python programming language but little has been done with partitioned global array semantics  (PGAS) approach.  Using PGAS approach, one can deploy a parallel capability that provides good speed-up without sacrificing the ease of programming in Python. This project will explore the scalability and performance of the preliminary implementation of PGAS in Python and compare its performance with other libraries available for Python parallel programming, and potentially seeking further performance optimization in the current PGAS implementation.
    References:
  • 3D Visualization of Supercomputer Performance
    There are a number of data collection and visualization tools to assist in the real time performance analysis of High Performance Computing(HPC) systems but there is a need to analyze past performance for systems troubleshooting and system behavior analysis. Optimizing HPC systems for processing speed, power consumption, and network optimization can be difficult to do in real time so a system to use collected data to “rerun” system performance would be advantageous. Gaming engines, like Unity 3D, can be used to build virtual system representations and run scenarios using historical or manufactured data to identify system failures or bottlenecks and fine tuned to optimize performace metrics.
    References:
  • Data Analytics and 3D Game Development
    The LLSC operates and maintains a large number of High Performance Computing clusters for general-purpose accelerated discovery across many research domains. The operation of these systems requires access to detailed information regarding the status of systems schedulers, storage arrays, compute node status, network data, and data center conditions. The collections of data represents the collection of over 80 million data points per day. Effectively correlating this volume of data into actionable information requires innovative approaches and tools. The LLSC has developed a 3D Monitoring and Management platform by leveraging Unity3D to render the physical data center space and assets into a virtual environment which strives to provide a holistic view of the HPC resources in a human digestible format. Our goal is to achieve a level of situational awareness that enables the operations team to identify and correct issues before they negatively impact the user experience. Some near term goals are to fold the innovative Green Data Center challenge work and data into the M&M system to enable the identification of carbon impacts of different job profiles across a heterogeneous compute environment.
    References:
  • Predicting future training needs
    The ability to provide necessary documentation and training for our researchers requires that we understand the suite of applications, workflows and software tools currently used and are able to develop insight into future trends in applications, workflow development and software tool selection. To do this we need to collect and analyze data from jobs run on the LLSC-SuperCloud systems, researcher help requests, the educational platform and the user database.
    Current projects in this area include reviewing the data collection and storage methods, developing a data set that captures the necessary data to provide insight on our training and researcher support needs. With respect to our userbase, we would like to understand who they are, which research domains they represent, how often they use the system, how much of the system they use and what their usage pattern looks like over time. This knowledge can then be aligned with our training needs to identify education and training gaps, design a prioritized suite of new examples, create updates to existing examples and predict the value of offering focused micro-lessons.
  • Evaluating Training Effectiveness
    Another research area for the ETO team is focused on using data to evaluate the effectiveness of our education and training modules. This research effort is related to evaluating the impact of informal training on the researcher’s HPC understanding and growth. Using data from our courses and researcher use of the supercomputing, do researchers use the supercomputing system effectively? Have they aligned their workflow with the one of the canonical HPC workflows, are they requesting the proper system resources and are they using all that they have requested. For more information see: A Data Driven Approach to Informal HPC Training Evaluation
  • Gamification of HPC learning modules
    Educational games add a level of engagement to training that can aid in the development of intuition about HPC concepts. This effort focuses on evaluating the use of educational games in informal learning environments. Questions to consider include; which aspects of our education and training resources are appropriate for games v in-class educational activities, what is the balance between development effort and educational value, what data should we collect and how do we confirm that the game supports the stated learning outcomes? Outcomes from this effort include capturing best practices for developing educational games for HPC, development tool recommendations and development guidelines for small teams from conception to release.
  • HPC Teaching and Learning
    High Performance Computing (aka supercomputing) is an ever-evolving field. While the team has created learning modules and teaching examples for a number of long-standing workflows, we need to review existing learning modules with an eye toward recent and future trends in the applications that will require HPC resources. For example, the recent explosion of Machine Learning and Artificial Intelligence applications, all of which require supercomputers, necessitates extension of our existing learning module suite. This project will focus on creating new learning materials that will engage the learner while meeting the learning objectives. The delivery method for the materials is flexible, e.g., example code, learning modules, educational games and demonstrations, however, the materials must use andragogical best practices and include evaluation criteria and an evaluation plan.