HPC4AI implements a cloud-HPC system targeting AI workloads. Designed for research, allowing end-to-end configuration options: from HW to OS to cloud to service. The portfolio of services includes VMs+Storage, virtual clusters with access to OpenStack API (IaaS & PaaS).
The OpenStack cloud system exploits over 2400 physical cores, 60 TB RAM, 120 GPUs (NVidia T4/V100/A40), 25 Gb/s networking and 4 storage classes with different characteristics.
The SLURM HPC system consists of 68 nodes Intel (32 cores, 128GB RAM per node, OPA 100Gb/s), 4 nodes Arm (Ampere Altra 80 cores, 512GB RAM, 2xA100, GPUs, 2xBF2 DPUs, IB 100Gb/s), 4 nodes Intel (40 cores, 1 TB RAM, T4+V100 GPUs, IB 56Gb/s), 2 HPC storage systems (BeeGFS and LUSTRE all-flash)
APPLICATIONS AND SERVICES
Cloud High-Performance HPC4AI
● Deployment of computing resources based on the definition of a project/request via the form on the project website. Access to resources via remote console through a dedicated web service
● Use of computing and storage services
● Scientific and technical support for the design and development of new applications and services
● Support for porting, integration and optimization of scientific applications on the cloud platform
● Support to experimentation (research and innovation) in different areas of Computer Science: high-performance applications, high-frequency streaming, Big Data, Machine and Deep Learning
● Hosting and customization of systems (hardware, cloud stack, applications) in “co-design” mode with the possibility of customization throughout the software stack
(subject to specific scientific collaboration agreement) SLURM HPC system
● HPC applications, scientific applications on a single CPU + GPU node (R, Matlab, C /C ++, Java), MPI applications, benchmarking, on-demand job queue systems
Cloud zone based on OpenStack technology within a class CED Tier III equivalent
- Globally 2000+ Intel core CPU, 72 GPU, 24+ TiB RAM, 2+ PB storage and
a backup system with versioning in mixed flash / nVme / ssd / sas technology
- 16 nodes 4 Nvidia Turing T4 GPUs per node
- 2 nodes with 4 NVidia V100 SMX2 GPUs per node
- 3 storage systems with different classes of security, reliability and speed e integrated
backup system for a total of around 3PB
HPC4AI can be used for research, innovation, and pre-commercial development by academic and industrial users.
● UNITO and UNITA Alliance researchers using HPC4AI for research projects benefit from a 50% discount; UNITO researchers with no active grants, Ph.D. students, and MSc students can access HPC4AI for free (up to a reasonable configuration size). Non-UNITO academic users and industrial users are subject to this price table.
● The organizing committee of relevant initiatives (e.g. hackathons for students) and non-profit associations are eligible for free usage (subject to availability and scientific committee evaluation).
● Resources are billed monthly according to allocated resources. HPC resources are billed on effective usage.
Other contacts:
Sergio Rabellino: sergio.rabellino@unito.it
Marco Aldinucci: marco.aldinucci@unito.it