ML Support Engineer, IT Engineer Staff
3 weeks ago
Company:
Job Area:
Information Technology Group, Information Technology Group > IT EngineeringGeneral Summary:
We are seeking a highly skilled Technical Support Engineer specializing in Machine Learning (ML) operations, Kubernetes, container technologies, and Run:AI. In this role, you will be responsible for providing technical and operational support for customers leveraging GPU computing platforms to optimize and manage AI/ML workloads, particularly in Kubernetes-based environments. The ideal candidate will have deep expertise in Kubernetes orchestration and GPU management, as well as a solid understanding of how these address AI/ML operations at scale.
Key Responsibilities
- Kubernetes Orchestration & Resource Management: Serve as the subject matter expert for Kubernetes and container orchestration. Guide customers through the design and deployment of Kubernetes clusters tailored for AI/ML use cases, helping them effectively manage workloads through Run:AI. Ensure optimal resource allocation, including GPU sharing, node management, and job scheduling across clusters.
- Cluster Monitoring & Optimization: Monitor and tune Kubernetes clusters to ensure they are optimized for AI/ML workloads. Provide support on managing Kubernetes autoscaling, resource quotas, and performance monitoring of distributed ML models running on Kubernetes clusters via the Run:AI platform.
-GPU troubleshooting and incident response: Diagnose and resolve complex issues regarding dependencies between GPU drivers and software, Nvidia toolkit errors, or GPU component failure.
- Run:AI Platform Support: Provide expert support for the Run:AI platform, assisting customers with the deployment, configuration, and management of Kubernetes clusters that handle AI/ML workloads. This includes setting up the platform, configuring resource pools (GPU, CPU), and optimizing Kubernetes namespaces to ensure proper orchestration of workloads.
- Workload Optimization on Kubernetes: Assist customers in optimizing dynamic resource allocation for their AI/ML workloads by utilizing the Run:AI scheduler in conjunction with Kubernetes's native tools. Help manage job preemption, scheduling priorities, and horizontal scaling of workloads across clusters.
- Kubernetes Troubleshooting & Incident Response: Diagnose and resolve complex issues related to Kubernetes cluster management, including pod failures, node connectivity issues, and namespace misconfigurations. Provide support in handling incidents such as job contention, GPU misallocation, and failed containerized workloads, ensuring smooth operation across the entire Kubernetes environment.
- Integration Support: Help customers integrate Run:AI into their existing Kubernetes-based ML infrastructure. Ensure seamless operation of AI/ML pipelines, covering data flow, distributed training, and model deployment. Troubleshoot issues arising from the interaction between Run:AI, Kubernetes, and other ML tools (e.g., TensorFlow, PyTorch, Kubeflow).
- Security and Best Practices in Kubernetes: Advise customers on security best practices for Kubernetes clusters handling sensitive ML workloads, such as secure pod communications, role-based access control (RBAC), and resource isolation for multi-tenant clusters. Ensure Kubernetes and containerized environments are secure and compliant with organizational policies.
- Collaboration with HQ: Work closely with the engineering and product teams in HQ, providing feedback on Kubernetes-related issues, cluster optimization features, and improvements to the Run:AI platform. Escalate complex issues and contribute to ongoing platform development.
- Training & Documentation: Develop training materials and deliver technical workshops on using Run:AI in Kubernetes environments. Maintain up-to-date documentation on best practices for configuring and managing Kubernetes clusters for AI/ML workloads, focusing on high availability, performance, and security.
Minimum Qualifications:
• 4+ years of IT-related work experience with a Bachelor's degree.OR
7+ years of IT-related work experience without a Bachelor’s degree.
Physical Requirements:
• Frequently transports and installs equipment up to 20 lbs.
Requirements
- 3+ years of experience in technical support roles with strong expertise in Kubernetes administration, container orchestration, and AI/ML workload management.
- 1+ year of general GPU administration, addressing issues with driver conflicts, hardware failures, and performance issues
- In-depth knowledge of Kubernetes (CKA or CKAD certification highly preferred), including core components like Kubelet, Kube-API, Kube-scheduler, and etc.
- Proficiency in Kubernetes resource management (e.g., CPU/GPU allocation, pods, services, and namespaces) and troubleshooting common Kubernetes issues in production environments.
- Experience with configuration management tools (Puppet, Chef, Ansible) and Kubernetes management platforms like Rancher a plus
- Experience with Run:AI platform or similar tools for ML workload optimization (e.g., Kubeflow, MLFlow, Slurm) in Kubernetes environments.
- Hands-on experience with Docker and containerized environments for AI/ML operations, including distributed training, scaling, and deployment.
- Strong understanding of ML frameworks (e.g., TensorFlow, PyTorch) and how they interact with Kubernetes clusters for model training and deployment.
- Excellent analytical, communication, and problem-solving skills.
- Ability to manage priorities in a fast-paced environment and collaborate within a matrix organization.
*References to a particular number of years experience are for indicative purposes only. Applications from candidates with equivalent experience will be considered, provided that the candidate can demonstrate an ability to fulfill the principal duties of the role and possesses the required competencies.
Applicants : Qualcomm is an equal opportunity employer. If you are an individual with a disability and need an accommodation during the application/hiring process, rest assured that Qualcomm is committed to providing an accessible process. You may e-mail or call Qualcomm's toll-free number found . Upon request, Qualcomm will provide reasonable accommodations to support individuals with disabilities to be able participate in the hiring process. Qualcomm is also committed to making our workplace accessible for individuals with disabilities. (Keep in mind that this email address is used to provide reasonable accommodations for individuals with disabilities. We will not respond here to requests for updates on applications or resume inquiries).
Qualcomm expects its employees to abide by all applicable policies and procedures, including but not limited to security and other requirements regarding protection of Company confidential information and other confidential and/or proprietary information, to the extent those requirements are permissible under applicable law.
To all Staffing and Recruiting Agencies :Our Careers Site is only for individuals seeking a job at Qualcomm. Staffing and recruiting agencies and individuals being represented by an agency are not authorized to use this site or to submit profiles, applications or resumes, and any such submissions will be considered unsolicited. Qualcomm does not accept unsolicited resumes or applications from agencies. Please do not forward resumes to our jobs alias, Qualcomm employees or any other company location. Qualcomm is not responsible for any fees related to unsolicited resumes/applications.
-
Principal AI/ML Engineer
5 months ago
Cork, Ireland Analog Devices Full timeADI’s Central AI team develops next-generation AI technology that transforms our understanding of the physical world. We develop solutions at multiple tech stack layers, from AI-enabled software applications to deeply embedded AI algorithms. Our mission is to build the Intelligent Edge, where AI transforms how we solve challenging problems by combining...
-
Senior ML Ops and Automation Engineer
5 months ago
Cork, Ireland Proofpoint Full timeIt's fun to work in a company where people truly BELIEVE in what they're doing! We're committed to bringing passion and customer focus to the business. Corporate Overview Proofpoint is a leading cybersecurity company protecting organizations’ greatest assets and biggest risks: vulnerabilities in people. With an integrated suite of cloud-based solutions,...
-
Staff Software Engineer
1 month ago
Cork, Ireland NetApp Full timeTitle: Staff Software Engineer Location: Bangalore, Karnataka, IN, 560071 Requisition ID: 127663 Job SummaryMember of a software engineering team involved in development & design of AI Data Platform built on NetApp’s flagship storage operating ONTAP. Data is the currency of business in the digital era. NetApp is the data authority, helping customers...
-
Staff Software Engineer
3 weeks ago
Cork, Ireland NetApp Full timeTitle: Staff Software Engineer Location: Bangalore, Karnataka, IN, 560071 Requisition ID: 127663 Job SummaryMember of a software engineering team involved in development & design of AI Data Platform built on NetApp’s flagship storage operating ONTAP. Data is the currency of business in the digital era. NetApp is the data authority, helping customers...
-
Staff ML Engineer
4 months ago
Cork, Ireland Analog Devices Full timeThe AI Incubation team is looking for experienced Machine Learning Engineers to develop core AI technologies for Analog Devices’ future AI products. Our technologies range from generative AI for electronics to deeply embedded AI models in our silicon. We work across various markets and applications to solve problems beyond the reach of traditional...
-
Lead Machine Learning Engineer
1 month ago
Cork, Ireland Reperio Human Capital Full timeLead Machine Learning Engineer Location: Cork Salary: €(phone number removed)HybridReperio have partnered with a large fintech company here in Ireland who are seeking a Lead Machine Learning Engineer to join their talented Data and AI team. In this role, you will be responsible for leading the design, development, and deployment of machine learning models...
-
Cork, Ireland Qualcomm Full timeCompany: QT Technologies Ireland Limited Job Area: Information Technology Group, Information Technology Group > IT Engineering General Summary: We are seeking an experienced Operations Support Engineer with advanced expertise in Linux, virtualization technologies, enterprise storage, and Splunk queries. The role will support operational...
-
Senior AI/ML Engineer
4 months ago
Cork, Ireland Analog Devices Full timeThe AI Incubation team is looking for experienced Machine Learning Engineers to develop core AI technologies for Analog Devices’ future AI products. Our technologies range from generative AI for electronics to deeply embedded AI models in our silicon. We work across various markets and applications to solve problems beyond the reach of traditional...
-
Software Engineer for Shared Platforms
1 month ago
Cork, Ireland NetApp Full timeTitle: Software Engineer for Shared Platforms Location: Cork, Munster, IE, T23 PPT8 Requisition ID: 127120 Job SummaryJoin our team of Software Engineers and be at the forefront of developing and designing cutting-edge features in AI/ML for NetApp's flagship storage operating system. As part of our dynamic Research and Development function, you'll...
-
Software Engineer
1 month ago
Cork, Ireland NetApp Full timeTitle: Software Engineer Location: Bangalore, Karnataka, IN, 560071 Requisition ID: 127413 Job SummaryIf you are a continuous learner who wants to join an empowered team of forward-thinking, smart, dedicated technologists that approach every problem with fresh eyes, then this is the team for you. If you want to work on cloud technologies, then this is...
-
Staff Engineer, Process Development
3 weeks ago
Cork, Ireland Stryker Ireland Full timePermanent hybrid role based in Carrigtwohill, Cork Job Description: Are you a Process /Automation Engineer with experience in Electronics manufacturing ? Are you ready to take on the responsibilities of a Staff Process Development Engineer who will develop new ways of realising the design of electronic components to manufacturing for our ENT powered...
-
Staff Engineer, R&D
3 months ago
Cork, Ireland Stryker European Operations Limited Full timeWork Flexibility: Hybrid or Onsite Based in our NeuroHub Position Summary The Neurovascular Stroke Market is continuing to go through huge growth, driven by developments in new products to improve clinical outcomes, treat more patients and save more lives. We are hiring a Staff R&D Engineer to join our Process & Technology Development team and play a...
-
Senior Staff Engineer, Advanced Operations
3 weeks ago
Cork, Ireland Stryker European Operations Limited Full timeWork Flexibility: Hybrid Position Summary Are you ready to take the next step from a Technical Lead role in Advanced Operations and become a key driver in shaping the future of product innovation and process development? We are seeking a dynamic and strategic Senior Staff Engineer to lead early funnel engagement within our Joint Replacement (JR) Division....
-
Senior Staff Engineer, Continuous Improvement
2 months ago
Cork, Ireland Stryker European Operations Limited Full timeWork Flexibility: Onsite Position Summary The Senior Staff Engineer, CI will lead projects dedicated to improving productivity and processes at the Macroom manufacturing facility. The key responsibility of this role will be to manage the transformational initiative to support growth volumes along with reducing costs and to identify and implement process...
-
Technical Support Engineer
3 weeks ago
Cork, Ireland NetApp Full timeTitle: Technical Support Engineer - French/Spanish speaker Location: Cork, Munster, IE, T23 PPT8 Requisition ID: 127946 Job SummaryAs a Technical Support Engineer, you will provide technical support to customers, customer support personnel, and field support staff, focused on diagnosing, troubleshooting, repairing and debugging NetApp products. Support...
-
Senior/ Staff Engineer, R&D
1 month ago
Cork, Ireland Stryker European Operations Limited Full timeWork Flexibility: Hybrid or Onsite Position Summary The Neurovascular Stroke Market is continuing to go through huge growth, driven by developments in new products to improve clinical outcomes, treat more patients and save more lives. We are hiring a Senior/ Staff R&D Engineer to join our Process & Technology Development team and play a key role in...
-
CSV Engineer
5 months ago
Cork, Ireland Westbourne IT Global Services Full timeOverview: The Validation Engineer will be responsible for the validation of Lab Systems in QCL Labs and Manufacturing Operations. We are seeking an enthusiastic and experienced validation engineer to join our team on a long-term contract basis. The Validation Engineer will provide systems validation and compliance expertise for newly purchased...
-
Senior Staff Engineer, Advanced Operations
3 weeks ago
Cork, Ireland Stryker Full timeWe are excited to be named one of the World’s Best Workplaces by Fortune Magazine! We are proud to offer you our total rewards package which includes bonuses, healthcare, insurance benefits, retirement programs, wellness programs, as well as service and performance awards – not to mention various social and recreational activities, all of which are...
-
Staff/ Senior Staff CAE Engineer
2 months ago
Cork, Ireland Stryker European Operations Limited Full timeWork Flexibility: Hybrid Position Summary: This role combines computational modeling and simulation with practical testing to optimize product designs and ensure high performance and safety standards. The engineer will play a critical role in integrating simulation-driven insights into the decision-making process, driving the adoption of advanced...
-
Senior IT Support Engineer
1 month ago
Cork, Ireland Reperio Human Capital Full timeSenior IT Support EngineerContract | €260-€300/day | Cork A client of mine is looking for an experienced Senior IT Systems Engineer to join their team in Cork. This role involves providing high-level support and managing complex IT systems, with a focus on Windows and Microsoft environments. You will also lead key infrastructure projects while ensuring...