Job Title: CloudOps Lead Engineer
Location : Plano, TX
Job Overview:
Role Overview
We are looking for a motivated and hands-on Cloud Ops Engineer to join our team and play a critical role in deploying, operating, and maintaining our cloud-based products. This role is essential to ensuring smooth deployments, high system reliability, and strong operational performance across our environments.
The ideal candidate has a strong “want-to-learn” mindset, is eager to deeply understand our ecosystem and architecture, and demonstrates ownership, leadership, and the ability to drive initiatives independently.
Key Responsibilities
Cloud Operations & Deployments
Support and manage cloud deployments to ensure reliable, repeatable, and high-quality releases.
Own day-to-day cloud operational health, including uptime, performance, and stability.
Work closely with engineering teams to support product deployments and operational readiness.
Automation & Scripting
Design, develop, and maintain automation scripts to improve deployment quality, efficiency, and consistency.
Continuously enhance CI/CD and operational workflows through scripting and tooling.
Reduce manual effort and operational risk through automation-first approaches.
Monitoring & System Health
Implement and maintain monitoring, alerting, and logging to ensure system health and performance.
Proactively identify issues, perform root cause analysis, and drive permanent fixes.
Ensure systems are scalable, resilient, and performant.
Architecture & Ecosystem Understanding
Develop a deep understanding of the platform architecture, cloud ecosystem, and dependencies.
Contribute to operational best practices, standards, and continuous improvement initiatives.
Act as a self-starter who can independently identify gaps and propose solutions.
LLM & AI Enablement
Apply Large Language Models (LLMs) in cloud operations use cases such as automation, observability, diagnostics, or operational intelligence.
Stay current with advancements in LLMs and AI-driven tooling and apply them pragmatically within the Cloud Ops domain.
Collaborate with engineering teams to integrate LLM-based capabilities into operational workflows.
Required Skills & Qualifications
Technical Skills
Hands-on experience with cloud platforms (Azure.
Strong scripting skills (e.g., Python, Bash, PowerShell, or similar).
Experience with deployment pipelines, automation, and monitoring tools.
Solid understanding of cloud infrastructure, networking, and application operations.
LLM & AI Experience
Practical experience working with Large Language Models (LLMs).
Familiarity with applying LLMs to engineering or operational workflows is required.
Professional Attributes
Strong desire to learn and deeply understand complex systems.
Self-starter with the ability to take ownership and drive initiatives independently.
Demonstrates leadership, accountability, and problem-solving mindset.
Strong collaboration and communication skills