English
Appearance
Building a production-grade AI infrastructure knowledge system.
GPU clusters, RDMA networks, compilers and inference engines.
3D parallelism, ZeRO optimization, multi-node scaling.
vLLM, TensorRT-LLM and production inference stack.
GPU architecture, HBM, heterogeneous computing
AI cluster design and benchmarking
InfiniBand, RoCE, NCCL optimization
Triton, TVM and kernel fusion
Megatron-LM, DeepSpeed, 3D parallelism
vLLM, TensorRT-LLM serving