Site Reliability Engineer
Victoria, Australia · Permanent · HybridJob description
The Site Reliability Engineer plays a critical role in ensuring the reliability, availability, performance and security of Champion Data’s infrastructure and cloud services.
Reporting to the Head of Security & Infrastructure, this role sits within the DevSecOps function and works closely with software engineering teams to ensure the platforms powering our sports data ecosystem operate reliably and efficiently.
The successful candidate will help manage and evolve the infrastructure that underpins Champion Data’s data platforms, internal systems and global operations. This includes monitoring platform health, responding to operational issues, improving automation, and strengthening the security and resilience of our environments.
Reporting to the Head of Security & Infrastructure, this role sits within the DevSecOps function and works closely with software engineering teams to ensure the platforms powering our sports data ecosystem operate reliably and efficiently.
The successful candidate will help manage and evolve the infrastructure that underpins Champion Data’s data platforms, internal systems and global operations. This includes monitoring platform health, responding to operational issues, improving automation, and strengthening the security and resilience of our environments.
Job requirements
- 3+ years experience working in infrastructure engineering, systems administration, DevOps or Site Reliability Engineering roles
- Hands-on experience operating and maintaining production infrastructure environments
- Practical experience working with AWS and Azure cloud services
- Experience administering Windows Server and Linux environments
- Strong understanding of monitoring, alerting and operational incident management
- Experience working with networking concepts including managed network devices, routing and wireless environments
- Experience working with automation or scripting to improve operational efficiency
- Familiarity with infrastructure-as-code concepts and tooling
- Strong analytical and troubleshooting skills with the ability to diagnose complex technical issues
- Excellent communication and collaboration skills when working with engineering teams and internal stakeholders
- A proactive mindset — focused on preventing problems rather than simply responding to them
- An interest in continuously improving platform reliability, automation and operational efficiency
Job responsibilities
- Infrastructure Reliability & Operations: Monitor, maintain and improve the reliability, availability and performance of infrastructure that supports Champion Data’s products and internal systems. Identify potential issues early and proactively implement improvements.
- Cloud Infrastructure Management: Support and maintain cloud environments, primarily within Azure and AWS, with a supporting role in GCP and OCI, ensuring systems are secure, resilient and operating efficiently.
- Monitoring & Observability: Design, implement and maintain monitoring, alerting and logging systems that provide visibility into platform health and performance. Focus on proactive detection of issues before they impact users.
- Automation & Infrastructure as Code: Automate operational tasks wherever possible and support infrastructure-as-code practices to ensure environments are repeatable, maintainable and scalable.
- Systems Administration: Administer and maintain Windows and Linux servers, virtual environments and supporting infrastructure components.
- Network & Connectivity Support: Support core networking components including Firewalls, LAN, WAN, internet connectivity and wireless services used across Champion Data offices and infrastructure environments.
- Security & Patch Management: Stay across emerging security vulnerabilities and ensure systems are patched, maintained and aligned with internal security policies and standards.
- Operational Support: Respond to operational alerts and incidents across infrastructure systems and support engineering teams when platform issues arise.
- Disaster Recovery & Resilience: Participate in backup validation, disaster recovery planning and resilience testing to ensure critical systems can recover effectively from failures.
- Infrastructure Lifecycle Management: Assist with the maintenance, upgrade and decommissioning of infrastructure as systems evolve or reach end-of-life.
- Collaboration with Engineering Teams: Work closely with developers and technical leads to bridge the gap between development and operations, ensuring systems are designed and deployed with reliability in mind.
Job benefits
- Mindful Me Days: Four days per year dedicated to your wellbeing
- Flexible Working: Our team works globally in a hybrid environment, which includes two weeks per year, during which you can work remotely from anywhere worldwide.
- AUS Office: Access to a newly renovated office in Southbank, Melbourne.
- Our spaces: all of our offices have great end-of-journey facilities; including showers, wellness/prayer room & games
- Employee Assistance Programs: 24/7 Access to financial support and counselling
- Wellness Incentive: To help support your home office setup or contribute towards a wellness activity/service of your choosing
- Get to the Game: Funds towards tickets for you and your family/friends to attend sporting events globally.
- Advance You Learning and Development: Annual budget to spend on learning and development activities plus Brain Block Time, this is time set aside monthly to learn or study.
- LinkedIn Learning: Access to an external learning platform LinkedIn Learning
- Social Events: Including an annual Family Day & Volunteer Day
- Paid Parental Leave