- Oversee the design, implementation, and maintenance of IT systems that support operational activities, ensuring high availability and performance of GPU resources.
- Provide technical guidance across complex infrastructure projects.
- Develop and execute operational strategies that align with the company’s goals for GPU-as-a-Service, focusing on scalability, efficiency, and reliability.
- Lead and mentor a diverse team of technology professionals, facilitating a culture of innovation, accountability, and continuous improvement.
- Manage relationships with key vendors and third-party service providers to ensure compliance with service level agreements (SLAs) and industry standards.
- Identify opportunities for process improvements across operations. Implement best practices to enhance productivity, reduce costs, and improve service quality.
- Work closely with product development, sales, and marketing teams to ensure seamless integration of services and alignment with customer needs.
- Ensure all operations comply with relevant laws, regulations, and industry standards related to data protection and service delivery.
Requirements
- Bachelor’s degree in Computer Science or a related technical field
- Proven experience (10+ years) in an operations or technology leadership role within the IT or cloud services industry.
- Strong understanding of GPU technologies and cloud computing principles.
- Demonstrated experience in managing complex IT systems and operational processes.
- Exceptional analytical and troubleshooting skills
- Understand the Kubernetes environments and be able to run the debugging.
- Familiarity with energy-efficient computing and sustainable data center operations.
- Proven ability to manage priorities in a dynamic, fast-paced environment.
- Hands-on expertise and comprehensive knowledge of CPU/GPU cluster and platform.
- Exceptional communication skills, capable of discussing both technical and non-technical topics with diverse audiences.
- Strong interpersonal skills, with a proven ability to develop professional relationships across business and technical teams.
- Ability to manage multiple projects simultaneously while maintaining attention to detail.
- Knowledgeable in operating and managing processes in CPU/GPU cluster.
- Strategic thinker with the ability to implement innovative solutions that drive business success.
- Excellent documentation skills to effectively articulate technical designs, issues, procedures, and assessments.