Q&A: Designing and Building UAlbany's AI Supercomputer

Two men in dark suits stand facing one another between two long rows of computer racks.
Chief Information Officer Brian Heaton, left, seen here in the Data Center with President Havidán Rodríguez, first came to UAlbany as an undergraduate biology major. Then his hobby interest in computers took him in a new direction. (Photo by Brian Busher)

By Jordan Carleo-Evangelist

ALBANY, N.Y. (Nov. 14, 2024) — The University’s new AI supercomputer wouldn’t be possible without the complex IT infrastructure that supports it. That infrastructure is built and maintained by UAlbany’s Information Technology Services overseen by Chief Information Officer Brian Heaton.

Heaton and his staff of 117 were integral to the NVIDIA DGX system’s design and construction, and the supercomputer is housed in the ITS Data Center off Fuller Road.

In addition to the 192 NVIDIA Tensor Core GPUs that power it, the new AI system is networked within the Data Center by about three miles of fiber-optic cable —roughly the distance of one loop around University Drive. From start to finish, design and construction took about 20 months.

Beyond computing hardware, the University’s investments in AI Plus have included nine additional technical and research support positions within ITS to help faculty make the most of the new NVIDIA and IBM artificial intelligence clusters installed over the last year.

Heaton is a two-time UAlbany alumnus (BS in biology ’93, MA in geography ’98) who has worked in ITS for nearly three decades. He told us what makes a supercomputer a supercomputer, what it took to design the system and how running it is a bit like running a time share.

How would you explain a supercomputer to a kindergartener?

A supercomputer is like having a lot of friends helping you with a big job. It's a very powerful computer made up of many smaller computers working together to solve really big problems quickly. The smaller computers in a supercomputer help solve problems much faster than a regular computer.

What was the most challenging part of this project?

The biggest challenge was designing an AI system that would be able to do the many different things that our researchers would ask of it, knowing full well that we’re a big university with diverse computational needs. The whole premise of AI Plus is to make AI available to everyone on campus. We met with researchers from across the institution to hear about the projects they’re going to work on. At the end of that process, we had to design something that accommodates all those use-cases, not just some of them. 

Some people are just looking to get their research done or teach their class, and they don’t need to know the fine detail of AI computations. For them, we built a system with an interface that allows any researcher to use it without stumbling over technical issues. But on the flip side, there are many researchers that have been doing AI for a long time, and they need that command line access to the system. They want to roll up their sleeves and get right down into it. We offer that, too. We’re accommodating the full range of interest and ability. 

UAlbany researchers have been using AI for years. What makes the new system different?

Generally, what has happened in the past is our researchers have used grant funding to buy much smaller standalone AI systems for their projects. They have not had the funding to buy anything of the magnitude that we just installed. From a university perspective, that approach is extremely inefficient. You may have a number of small AI systems, but they’re all siloed. Somebody can’t say, “I’d like to use the free time on Professor Y’s AI system,” or “I would like to use the computing capability of three or six of those islands to run a massive job.” You can’t do any of that because none of those systems were designed with that type of scalability in mind. 

Our large system offers that flexibility. Researchers that need that dedicated, always-at-their-fingertips access that they’re accustomed to having on their island AI machine can now use that grant funding to buy priority access to our larger system. We’ve joked many times that we’re essentially running a time-share business — with the goal of giving more faculty access to extremely advanced computing as efficiently as possible. 

How will the new ITS support positions assist faculty with their research and teaching?

We knew that building the system alone without the staff in place to get the most out of it would be a missed opportunity. So we hired infrastructure automation engineers and a data storage engineer — the folks who are in the back room managing and keeping the system humming along and healthy at a high level of performance. We also hired research technology analysts who will work directly with faculty when they get onboarded to the system to help them figure out what their needs are, what tools they might need and what systems are best for them — including advising on how many GPUs they might need for their project. We’ve also hired AI developer analysts. That’s an IT support position at a different skill level that has the capability of working with researchers who are on the very advanced end of the technical spectrum and need a more advanced level of support. The final piece of the puzzle was adding an educational technology analyst position to focus on helping faculty bring AI into a wide range of academic courses. 

How did you transition from being an undergrad biology major to running the IT systems for a major research university?

In my youth, I had an interest in becoming a cardiologist. In my freshman year of college, I discovered that my natural reaction to the visuals associated with blood was less than optimal. While I finished that degree, I no longer had my earlier excitement and drive in that professional direction. One day my landlord noticed how I repurposed the dining room in my apartment with a fancy computer console and covered an entire wall with topographic maps of the Capital Region. I’ve always had a hobby interest in computers and maps. Unbeknownst to me, his New York State agency was involved in a research collaboration with UAlbany’s Department of Geography and Planning. He recommended that I explore their master's program since it lends itself to careers that involve using computers to analyze and present mapping data — today's Google Maps, for example. I did exactly that, earned that degree and was hired by the Geography Department along the way as a research analyst using computers to analyze satellite imagery and aerial photography — similar to today’s Google Earth. Eventually ITS hired me because those particular computer skills were increasing in demand. Years went by as I moved through many roles and positions in ITS, and they never figured out how to get rid of me.