HPC and AI Workloads Drive Storage System Design

Many organizations are tied to outdated storage systems that cannot meet HPC and AI workload needs. Designing high‑throughput, highly scalable HPC storage systems require expert planning and configuration. The Dell Validated Designs for HPC Storage solution offers a way to quickly upgrade antiquated storage….

GigaIO Introduces 32 GPU Single-Node Supercomputer

Carlsbad, California, July 13, 2023 – GigaIO, provider of workload-defined infrastructure for AI and technical computing, recently announced that it successfully configured 32 AMD Instinct MI210 accelerators to a single-node server utilizing the company’s FabreX PCIe memory fabric. Available today, the 32-GPU engineered solution, called SuperNODE, is designed to offer a simplified system capable of […]

22,000 GPUs: Inflection AI Building 22 exaFLOPS Generative AI Cluster

Palo Alto-based startup Inflection AI yesterday said it is building the world’s largest AI cluster comprised of 22,000 NVIDIA H100 Tensor Core GPUs that will deliver 22 exaFLOPS performance. The company also said it has raised $1.3 billion in a funding round led by Microsoft, Reid Hoffman, Bill Gates, Eric Schmidt and new investor NVIDIA, […]

Monster API Says Its Platform Cuts AI Development Costs Up to 90%

Palo Alto, Calif., June 8, 2023 – Today Monster API is launching its platform to provide developers access to GPU infrastructure and pre-trained AI models at a lower cost than other cloud-based options, designed to deliver ease of use and scalability. It utilizes decentralized computing intended to enable developers to efficiently create AI applications, saving […]

Purdue Announces GPU Expansion of Gilbreth HPC Cluster

April 27, 2023, West Lafayette, IN — The Rosen Center for Advanced Computing (RCAC) at Purdue University has added 104 new NVIDIA A100 GPUs to the Gilbreth community HPC cluster. Based on Dell PowerEdge R7525 compute nodes with .5 TB of RAM, two Nvidia A100 Tensor Core GPUs, and 100 Gbps HDR Infiniband, this expansion […]

Microsoft Introduces Generative AI VM on Azure with Scaling up to Thousands of GPUs

Microsoft today introduced the ND H100 v5 VM on the Azure cloud, a virtual machine for development generative AI applications. The VM can scale from eight to thousands of NVIDIA H100 GPUs with Quantum-2 InfiniBand networking, Microsoft said, and the adoption of H100’s, NVIDIA’s latest data center GPUs, will accelerate performance for AI models over […]

ClearML Certified to Run NVIDIA AI Enterprise Software Suite

Tel Aviv — March 7, 2023 –  ClearML, an open-source MLOps platform, today announced it has been certified to run NVIDIA AI Enterprise, an end-to-end platform for building accelerated production AI. ClearML said the certification makes its MLOps platform more efficient across workflows, enabling optimization of NVIDIA GPUs. It also ensures that ClearML is compatible with and optimized for NVIDIA DGX […]

Conventional Wisdom Watch: Matsuoka & Co. Take on 12 Myths of HPC

A group of HPC thinkers, including the estimable Satoshi Matsuoka of the RIKEN Center for Computational Science in Japan, have come together to challenge common lines of thought they say have become, to varying degrees, accepted wisdom in HPC. In a paper entitled “Myths and Legends of High-Performance Computing” appearing this week on the Arvix […]

Relief for the Solution Architect: Pushing Back on HPC Cluster Complexity with Warewulf and Apptainer

[SPONSORED CONTENT]  How did you, at heart and by training a research scientist, financial analyst or product design engineer doing multi-physics CAE, how did you end up as a… systems administrator? You set out to be one thing and became something else entirely. You finished school and began working with some hefty HPC-class clusters. One […]

Overcoming Challenges to Deep Learning Infrastructure

With use cases like computer vision, natural language processing, predictive modeling, and much more, deep learning (DL) provides the kinds of far-reaching applications that change the way technology can impact human existence. The possibilities are limitless, and we’ve just scratched the surface of its potential. There are three significant obstacles for you to be aware of when designing a deep learning infrastructure: scalability, customizing for each workload, and optimizing workload performance.