Platform Reliability & Scalability: Build and maintain a centralised Kubernetes-based platform to support large-scale, highly available IoT and telecommunications workloads.
GitOps & Automation: Implement and manage GitOps workflows using ArgoCD and GitLab CI/CD pipelines to automate infrastructure provisioning and application delivery.
Infrastructure as Code: Develop and maintain Terraform modules, Ansible playbooks and scripting languages (e.g. Python) based automation to ensure consistent, auditable, and reproducible infrastructure deployments.
Monitoring & Observability: Configure and enhance monitoring, logging, and alerting systems (e.g., Prometheus, Grafana) to ensure proactive detection and resolution of platform issues.
System Optimisation & Troubleshooting: Analyse and optimise performance across Linux, Kubernetes, and AWS components; troubleshoot and resolve production incidents efficiently.
Security & Compliance: Apply DevSecOps best practices, enforce access control, and support secure platform design.
Platform User Support: Collaborate with internal teams using the platform - providing technical guidance, onboarding support, troubleshooting assistance, and CI/CD best practice consultations.
Collaboration & Continuous Improvement: Participate in design reviews, retrospectives, and cross-team discussions to drive operational excellence and evolve platform capabilities.