We seek an AI Cloud Platform System Engineer to build, scale and optimize LLM training/inference/Data Platform. This role spans distributed training systems, GPU/CPU compute optimization, inference frameworks optimization and data platform for training/inferencing. You will ensure a resilient, cost-efficient platform for both training and production inference workloads, leveraging Kubernetes-native solutions.
Key Responsibilities
Distributed Training/Inference Platform Development
Platform & System Optimization
Kubernetes-Centric Development
Preferred Qualifications
Technical Skills
Education & Soft Skills