Hiring organization

Experience
2 - 5 years
Industry
IT-Software Services
Role Category
Programming & Design
Role
DevOps Engineer
Employment Type
Full-time
Job Location
Bangalore, India
Date posted
November 18, 2021
Position title
DevOps Cloud Engineer
Description
- Provide 24/7 Production Support Services ensuring optimal reliability and performance of systems and infrastructure across Enterprise Suites for Supply Chain Domain.
- Rotation On-Call support for rotation shift.
Role
Day to Day Live System Support
- Anomaly detection from periodic scanning monitoring tools on live production systems.
- Alert response from monitors within the expected SLAs.
- Run high severity Incident Triage process via Slack or Big Panda (or tool x).
- Triage and resolution of Incidents, following pre- defined Runbooks/automation.
- Recommend fixes for new issues or finding patterns through observations in the history of events.
- Automate manual processes to show iterative improvement.
- Document ‘Tribal’ knowledge gained and SOPs as Runbooks.
- Update Confluence when out of date information is discovered through triage of events.
- Perform post Incident analysis and review, providing timelines and any monitor information available.
- Perform On-Call Support for High Severity events (SEV 1, 2).
- Monitor Incident/Request queues and resolve lower priority issues (SEV 3, 4) within the SLAs.
- Partner in warranty/renewal support.
- Create /enhance production monitors and dashboards where needed to support the team’s efforts.
- Perform Patching Activities
Development & Automation
- Support in Reliability Teams Development effort.
- Automation of manual processes within Reliability Team scope of activities – including any automation which involves simplification of the reliability teams operational activities.
- Anomaly detection from periodic scanning monitoring tools on live production
- Provide recommendations/ PR fixes for new issues.
- Create /enhance production monitors and dashboards where needed to support the team’s efforts.
- Support Reliability Operations team on Incident Management on need basis.
Key skills
- Good Exposure/understanding on DevOps Concepts, Tools & Technologies at operation side of it (Jenkin, GIT, Docker, Ansible)
- Good communication & Confident in dealing with various level of end user’s from Sr. to Jr level to co-ordinate the troubleshoot the reported issues.
- Good to have Programming & Scripting knowledge (like PYTHON, GO, SHELL, PERL, BASH, etc..)