We are seeking a Hardware Solutions Engineer to design, validate, and support customized server and infrastructure solutions for customers across AI, HPC, and enterprise environments. This role bridges customer requirements with engineering execution, ensuring scalable, high-performance, and reliable hardware solutions.
Key Responsibilities
Design, architect, and implement high-performance computing (HPC) and AI/ML infrastructure solutions, including GPU-accelerated clusters, high-speed storage, and low-latency networking.
Lead the end-to-end solution lifecycle from pre‑sales architecture through system integration, validation, and deployment in customer environments.
Collaborate with sales and customers to translate requirements into scalable AI/HPC system designs, including compute, GPU topology, interconnect (InfiniBand/Ethernet), and storage architectures.
Drive OEM server and appliance configuration, including BOM definition, firmware/BIOS tuning, thermal/power considerations, and manufacturability alignment.
Work closely with manufacturing and integration teams to ensure system‑level validation, rack integration, burn‑in testing, and production readiness for complex server appliances.
Deliver technical presentations, demos, and proof‑of‑concepts (POCs) focused on AI workloads (training/inference), HPC simulations, and data‑intensive applications.
Support cluster bring‑up and optimization, including OS provisioning, workload managers (e.g., Slurm), container platforms, and GPU software stacks (CUDA, drivers, AI frameworks).
Provide advanced troubleshooting and performance tuning across compute, GPU, storage, and networking subsystems.
Perform on‑site or remote deployment support, including rack‑level integration, cluster commissioning, and acceptance testing.
Develop and maintain technical documentation, including solution architectures, integration guides, and manufacturing/test procedures.
Interface with cross‑functional teams (engineering, manufacturing, support) to resolve complex system‑level issues and improve product quality.
Stay current with emerging technologies in AI infrastructure, HPC architectures, GPU platforms, and data center design.
Bachelor’s degree in Computer Science, Electrical Engineering, or related field (or equivalent experience).
Hands‑on experience with HPC clusters, AI/ML infrastructure, or GPU‑accelerated systems.
Strong knowledge of server architecture (x86/ARM), CPU/GPU platforms (NVIDIA/AMD), memory, storage, and networking.
Experience with high‑speed interconnects (InfiniBand, RDMA, NVLink, high‑performance Ethernet).
Familiarity with OEM server platforms and system integration, including BIOS/BMC, firmware, and hardware validation.
Understanding of manufacturing and system integration processes (rack integration, burn‑in, QA, and deployment workflows).
Experience with Linux environments, scripting (Bash/Python), and system automation tools.
Knowledge of AI/HPC software stacks (CUDA, Kubernetes, Slurm, Docker, AI frameworks like PyTorch/TensorFlow) is highly desirable.
Strong analytical, troubleshooting, and performance optimization skills across full system stack.
Excellent communication skills with the ability to work across customers, sales, and engineering/manufacturing teams.
Ability to operate both independently and in cross‑functional environments.
Preferred Skills
Experience designing or deploying AI training clusters or large‑scale HPC environments.
Exposure to liquid cooling, high‑density rack design, or power/thermal optimization.
Familiarity with OEM/ODM workflows and contract manufacturing environments.
Experience with benchmarking and performance tuning (e.g., MLPerf, IO benchmarks, MPI workloads).
#J-18808-Ljbffr