STNINC V2

At STN, we don't just adapt to the digital future, we engineer it. Our mission is to help organizations thrive in a rapidly evolving technology landscape through strategic insight, cutting-edge solutions, and a security-first mindset. We provide end-to-end services spanning cloud consulting, AI infrastructure, and enterprise security, enabling secure, scalable, and future-ready transformation.

As trusted advisors, we align IT investments with business outcomes that drive performance and growth, starting with deep strategic engagement and delivering tailored solutions built for long-term impact.

Our approach is innovation-led and rooted in cybersecurity, with a focus on leveraging the right technologies to solve real-world challenges. We invest in our people and foster a culture of growth, inclusion, and purpose because we believe empowered teams build transformative technology.

Overview
The AI Infrastructure Engineer will be responsible for designing, deploying, and maintaining robust infrastructure systems tailored for AI and machine learning operations. This role focuses on ensuring seamless performance, scalability, and reliability in distributed computing environments. You'll collaborate with data scientists, ML engineers, and DevOps teams to support large-scale AI training and inference pipelines.

Key Responsibilities

Design and implement AI infrastructure solutions, including cluster management, resource allocation, and workload orchestration for high-performance computing (HPC) environments.
Deploy, configure, and troubleshoot containerized applications using Kubernetes across various flavors (e.g., vanilla Kubernetes, Amazon EKS, Google GKE, Azure AKS, and on-premises setups).
Manage job scheduling and resource management using Slurm for efficient utilization of GPU clusters in AI training workflows.
Optimize Ubuntu-based systems for AI workloads, including kernel tuning, security hardening, and performance monitoring.
Integrate and maintain NVIDIA GPU technologies, ensuring compatibility with AI frameworks like TensorFlow, PyTorch, and CUDA.
Monitor system performance, identify bottlenecks, and implement automation scripts for infrastructure provisioning and scaling.
Collaborate on disaster recovery planning, security compliance, and cost optimization for cloud and on-premises AI infrastructure.
Stay updated on emerging technologies in AI infrastructure and contribute to best practices documentation.

Experience & Qualifications

Required

Bachelor’s degree in computer science, Engineering, or a related field (or equivalent experience).
Proven expertise as an Ubuntu specialist, with hands-on experience in system administration, networking, and
Scripting (e.g., Bash, Python) on Ubuntu servers.
Extensive experience with Kubernetes in all major flavors, including cluster setup, scaling, networking (e.g., CNI plugins), and security (e.g., RBAC, Pod Security Policies).
Strong proficiency in Slurm for managing HPC clusters, including job submission, queue configuration, and integration with GPU resources.
3+ years of experience in infrastructure engineering, preferably in AI/ML or HPC environments.
Familiarity with cloud platforms (AWS, GCP, Azure) and container orchestration tools.
Excellent problem-solving skills and ability to work in a fast-paced, collaborative environment.

Preferred

NVIDIA certifications (e.g., NVIDIA Certified Professional in Data Center GPU Management or CUDA Programming) are a strong plus.
Experience with other HPC schedulers (e.g., PBS, LSF) or AI-specific tools like Kubeflow.
Knowledge of infrastructure-as-code tools (e.g., Terraform, Ansible) and CI/CD pipelines.
Background in AI model deployment, monitoring tools (e.g., Prometheus, Grafana), or edge computing.

Compensation

Full-Time, Exempt
Salary: $145K-195K, DOE

Benefits

Health Coverage – Medical, Dental & Vision
FSA Health and Dependent Care available
401(k) Plan
Unlimited Paid Time Off (PTO)
Observed Holidays Paid
Cell Phone Allowance
Collaborative, growth-driven culture

Ai Infrastructure Engineer

Key Responsibilities

Experience & Qualifications

Compensation

Benefits

Apply for this Position