Senior Platform Operations Engineer (Remote Opportunity)
Hyatt is currently seeking an enthusiastic Senior Platform Operations Engineer to join the Platform Operations team where you will join a team of 7 other professionals who love what they do. In this role, you will be collaborating closely with the broader Platform, NOC and Product teams where you'll be instrumental in continuing to make Hyatt a leading hospitality company by maintaining and supporting our various platforms. You'll be part of a team that is passionate about diversity, equity, and inclusion, nurturing curiosity and new skills and building connections across the organization with stakeholders, colleagues, and guests.
Who We Are
At Hyatt, we believe in the power of belonging and cultivating a culture of care where our colleague's become family. Since 1957, our colleagues and our guests have served as the heart of our business and made Hyatt one of the best hospitality brands in the world, with more than 1,200 hotel, all-inclusive, and wellness resort properties in 71 countries across six continents. As we continue to grow-we never lose sight of what's most important: People.
We are in a time of extraordinary transformation. Passion for personal travel combined with the explosive growth of the global business has underpinned our growth for years. Hyatt is at the epicenter of the evolution of travel-and we are looking for passionate changemakers to be a part of our journey. At the heart of Hyatt is our shared belief that hospitality is more than just a job-it's a career for people that care.
How We Care for Our People
Well-being is the ultimate realization of our purpose - we care for people so they can be their best. We believe this focus on our colleagues is the key to our success and we've earned a place on Fortune's prestigious "100 Best Companies to Work For®" for the last eight years, ranking No. 16 in 2022.
We're proud to offer exceptional corporate benefits which include:
- Annual allotment of free hotel stays at Hyatt hotels globally
- Flexible work schedule and location
- Work-life benefits including well-being initiatives such as a complimentary Headspace subscription, and a discount at the on-site fitness center
- A global family assistance policy with paid time off following the birth or adoption of a child as well as financial assistance for adoption
- Paid Time Off, Medical, Dental, Vision, 401K with company match
Our Commitment to Diversity, Equity, and Inclusion
Our success is underpinned by our diverse, equitable, and inclusive culture. We are committed to diversity across the board-from whom we hire and develop, the organizations we support, and whom we buy from and work with.
Being part of Hyatt means always having space to be you. Our global teams are a mosaic of cultures, ethnicities, genders, ages, abilities, and identities. We constantly strive to reflect the world we care for with teams that achieve and grow together. To learn more about our commitments to DE&I, please visit the Why Hyatt section of the Hyatt career page.
Who You Are
As our ideal candidate, you understand the power and purpose of our Culture of Care and embody our core values of Empathy, Inclusion, Integrity, Experimentation, Respect, and Well-being. You enjoy working with a close fun team, are results-driven, and want a variety of opportunities to develop personally and professionally.
Primary responsibilities include providing technical support and maintenance of containers on Kubernetes. The ideal person for this position is skilled and passionate about solving technical problems while also focusing on the guest and colleague experience. The Platform Operations team is responsible for the daily support and maintenance of the Hyatt.com production and test environments. The team ensures Hyatt.com is operating within expected page response time, error rates, and availability. Team members report incidents and problems to both internal Hyatt Business and Product Owners as well as external vendors. Platform Operations works closely with the development team and third-party vendors to identify and resolve critical production issues. Also, the team works with the Platform Engineering team on test and production deployments using Agile methodologies. Team members collaborate to establish new technical operational strategies in cooperation with hosting vendors and internal technology teams. Individuals with an "automate everything" mentality will thrive in Platform Operations, where there is a strong push to automate operational procedures, deployments, and repeatable tasks.
- Manage and tune docker containers and Kubernetes pods/deployments.
- Assist development in troubleshooting issues with containers.
- Conduct root cause analysis in close collaboration with the development and engineering team.
- Provide Tier 3 Production 24x7 on-call support (rotation) supporting various technologies.
- Establish and monitor Key Performance Indicators (KPI's) for website and applications (micro services).
- Implement and measure operational strategies to improve site reliability and availability.
- Remediate all security vulnerabilities pertaining to products managed by the team.
- Support software releases using established automated procedures.
- Automate repeatable and frequent operational processes and procedures.
- Participate in Change, Incident, and Problem management.
- Foster a culture of deep learning through blameless post-mortems to improve the shared goal of reliability across services.
- Improve and document operational and troubleshooting procedures.
- Provide technical guidance and training to other team members.
- 5+ years hands on experience managing and troubleshooting issues on Docker and Kubernetes.
- Hands on experience handling infrastructure as code using ansible, helm charts etc.
- 5+ years' experience with Nginx or a similar tool.
- 5+ years' experience with supporting modern web and mobile application environments.
- 4+ years' experience tuning and troubleshooting application performance for memory, CPU, and thread usage, and database pooling
- 4+ years' experience with Bash, Python, Perl, or similar scripting language.
- 4+ years' experience with any APM tool like Datadog
- 4+ years' experience with any log aggregator tool
- Experience Salt Stack or similar configuration management tool.
- Working knowledge of incident, problem, and change management best practices.
- Strong business and technical acumen being able to effectively communicate with others
- Strong organizational and prioritization skills.
- Working knowledge of CDN, Load balancers and DNS.
The position responsibilities outlined above are in no way to be construed as all-encompassing. Other duties, responsibilities, and qualifications may be required and/or assigned as necessary.