Question 1

What makes the H100 better than the A100?

Accepted Answer

The H100 features the next-generation Hopper architecture with 4th gen Tensor Cores, offering up to 4x faster training and 30x faster inference compared to A100. It includes 80GB of faster HBM3 memory (vs HBM2e), Transformer Engine with FP8 precision support, and significantly improved memory bandwidth at 3.35 TB/s. The H100 is specifically optimized for large language models and transformer-based architectures.

Question 2

Can I use H100 for inference workloads?

Accepted Answer

Absolutely! The H100 excels at both training and inference. With Transformer Engine and FP8 precision, H100 delivers industry-leading inference performance, especially for large language models. However, for pure inference workloads at scale, you might also consider our A100 or L40S options which offer excellent cost-performance ratios.

Question 3

Do you support multi-GPU configurations?

Accepted Answer

Yes! Spheron supports multi-GPU configurations up to 8x H100 per node with InfiniBand networking. We offer both NVLink-connected GPUs for maximum bandwidth and InfiniBand clusters for distributed training across multiple nodes. For large-scale training, we provide bare metal clusters with up to 10 nodes (80x H100 GPUs) simultaneously. Our infrastructure is optimized for frameworks like PyTorch DDP, DeepSpeed, and Megatron-LM.

Question 4

What deep learning frameworks are supported?

Accepted Answer

All major frameworks are supported: PyTorch, TensorFlow, JAX, MXNet, and more. We provide pre-configured Docker images with CUDA 12.1+, cuDNN, NCCL, and optimized libraries. You can also bring your own Docker images or use custom environments.

Question 5

How does InfiniBand improve training performance?

Accepted Answer

InfiniBand provides 400 Gb/s bandwidth with sub-microsecond latency, essential for multi-GPU distributed training. With GPUDirect RDMA, data transfers directly between GPU memory across nodes without CPU involvement, dramatically reducing communication overhead. This is crucial for large model training where gradient synchronization can become a bottleneck.

Question 6

What's the minimum rental period?

Accepted Answer

There's no minimum! Spheron charges by the hour with per-minute billing granularity. Rent an H100 for just an hour to test your workload, or keep it running for months. You only pay for what you use with no long-term contracts or commitments.

Question 7

Can I get persistent storage with my H100 instance?

Accepted Answer

No! Each H100 instance comes with 2.4TB of high-speed NVMe SSD storage. You can't attach additional persistent storage volumes that survive instance termination at this moment.

Question 8

How quickly can I deploy an H100 instance?

Accepted Answer

H100 instances are typically ready in 60-90 seconds. Our infrastructure is pre-warmed and optimized for rapid deployment. You can provision, configure, and start training on an H100 in under 2 minutes using the Spheron app.

Question 9

What regions are H100s available in?

Accepted Answer

H100 GPUs are currently available in US Region, Europe, and Canada. We're continuously expanding capacity and regions. Check our app or contact sales for specific region requirements.

Question 10

Do you offer support for production deployments?

Accepted Answer

Yes! We provide 24/7 technical support for production workloads. Our team has deep expertise in GPU infrastructure and can help with troubleshooting issue with GPU VM and bare metal servers. Enterprise customers get dedicated support channels and SLA guarantees.

Question 11

Can I run H100 on Spot instances? What are the risks?

Accepted Answer

Yes, Spheron offers Spot instances for H100 at significantly reduced rates (up to 70% savings). However, Spot instances can be interrupted when demand increases. Key risks include: potential job interruption during training/inference, loss of unsaved state or checkpoints, and need to restart from last saved checkpoint. Best practices: implement frequent checkpointing (every 15-30 minutes), use Spot for fault-tolerant workloads, save model weights to persistent storage regularly, and consider Spot for development/testing rather than production inference. For critical production workloads or multi-day training jobs, we recommend dedicated instances with SLA guarantees.

Provider	Price/hr	Savings
SpheronBest Value	$1.33/hr	-
RunPod	$2.40/hr	1.8x more expensive
Vultr	$2.99/hr	2.2x more expensive
Nebius	$3.08/hr	2.3x more expensive
Lambda Labs	$3.29/hr	2.5x more expensive
CoreWeave	$4.25/hr	3.2x more expensive
Azure	$6.98/hr	5.2x more expensive
AWS	$7.00/hr	5.3x more expensive
Google Cloud	$11.06/hr	8.3x more expensive

H100 GPU Rental

Technical Specifications

Ideal Use Cases

LLM Training

AI Inference at Scale

Generative AI & Diffusion Models

High-Performance Computing (HPC)

Pricing Comparison

Performance Benchmarks

InfiniBand Configuration for Multi-GPU Clusters

Frequently Asked Questions