hireejobs
Hyderabad Jobs
Banglore Jobs
Chennai Jobs
Delhi Jobs
Ahmedabad Jobs
Mumbai Jobs
Pune Jobs
Vijayawada Jobs
Gurgaon Jobs
Noida Jobs
Oil & Gas Jobs
Banking Jobs
Construction Jobs
Top Management Jobs
IT - Software Jobs
Medical Healthcare Jobs
Purchase / Logistics Jobs
Sales
Ajax Jobs
Designing Jobs
ASP .NET Jobs
Java Jobs
MySQL Jobs
Sap hr Jobs
Software Testing Jobs
Html Jobs
IT Jobs
Logistics Jobs
Customer Service Jobs
Airport Jobs
Banking Jobs
Driver Jobs
Part Time Jobs
Civil Engineering Jobs
Accountant Jobs
Safety Officer Jobs
Nursing Jobs
Civil Engineering Jobs
Hospitality Jobs
Part Time Jobs
Security Jobs
Finance Jobs
Marketing Jobs
Shipping Jobs
Real Estate Jobs
Telecom Jobs

Watson AI Site Reliability Engineering Team Manager

2.00 to 5.00 Years   Bangalore   28 Dec, 2020
Job LocationBangalore
EducationNot Mentioned
SalaryNot Disclosed
IndustryIT - Software
Functional AreaGeneral / Other Software
EmploymentTypeFull-time

Job Description

Ready to grow your career in the cloud Do you like the feeling that you are making a difference This is your chance to be a leader of a dynamic team of talented professionals deploying and maintaining innovative, industry-leading, cloud-based software.Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE is a key role in our growing and dynamic IBM Watson Cognitive AI, Planning Analytics, and Cognos Analytics business on Cloud. This leadership role is focused on managing a group of SREs who are deploying, maintaining, and automating wide ranges of operational tasks for the IBM Watson Cognitive AI, Planning Analytics, and Cognos Analytics services on IBM Cloud environments. You will work collaboratively with the entire global cloud organization and IBM vendors to support, maintain, and operationally improve the reliability of the application.Watson AI, Planning and Cognos Analytics Site Reliability Engineer Manager is responsible for:

  • Ensuring team of local SREs are providing optimal production environment support and deployment for Watson AI, Planning and Cognos Analytics Services in the IBM Cloud public regions and dedicated environments.
  • Driving incident management process and support a blameless post-mortem culture.
  • Partnering with development teams to improve services via rigorous testing and release procedures.
  • Overseeing team of developers working on automation for deployments, upgrades and self-remediation.
  • Ensuring that local team is aware of and adhering to IBM Cloud processes and security/compliance initiatives
  • Developing metrics and reports that drive improvements in availability of the Watson AI, Planning and Cognos Analytics services as well as improvement in SRE team effectiveness.
Required Technical and Professional Expertise
  • 2+ years experience with managing team of developers working on software engineering, software development, or system operations
  • Ability to multi-task and solve operational issues prior to and during customer impacting events.
  • Strong communication skills - ability to communicate (often via slack and webex) observations and ideas for diagnosing and preventing issues or improving SRE processes to shorten diagnosis and resolution.
  • Ability to observe operational support techniques and make improvements to SRE processes.
  • Capability to work in a global, multicultural and diverse environment
  • Ability to work for AP shift hours (22:00-06:00 UTC from March to October, 23:00-07:00 UTC from November to February)
  • Ability to work as Emergency Response Manager during AP shift regularly and weekends on rotation basis (once every 5 weeks)
  • Experience with Agile methodologies including sprint planning, GitHub Enterprise and XenHub
Preferred Technical and Professional Expertise
  • Experience with customer escalations and/or operations war room.
  • Experience with troubleshooting issues in production systems
  • Experience with DevOps engineering or SRE
  • Experience using Watson AI services (especially Watson Assistant and Watson Discovery)
  • Experience with cloud technologies such as Docker, Kubernetes and Open Shift
  • Experience working with IBM Cloud (Bluemix) UI/CLI
  • Knowledge of IBM Cloud stack (IAM, CloudFoundry, ALB, Ingress, Cerberus, etc)
  • Knowledge of COS and ICD database services (e.g. Postgres, etcd, RabbitMQ, Redis, Elastic)
  • Knowledge of Networking (HTTP, DataPower, TLS, Akamai, DNS) to troubleshoot network issues
  • Hands-on experience using source control (Git, GitHub) and CI/CD pipeline (Jenkins, Ghenkins, Tekton, etc),
  • Experience with developing monitoring for production components and instrumenting code for observability using New Relic, LogDNA, Sysdig, Prometeus
,

Keyskills :
javaacademicsacpalgorithmsandroidsoftware development toolsnew relicdata scienceit operationsaudio masteringsprint planningsoftware business

Watson AI Site Reliability Engineering Team Manager Related Jobs

© 2019 Hireejobs All Rights Reserved