Hyderabad Jobs |
Banglore Jobs |
Chennai Jobs |
Delhi Jobs |
Ahmedabad Jobs |
Mumbai Jobs |
Pune Jobs |
Vijayawada Jobs |
Gurgaon Jobs |
Noida Jobs |
Hyderabad Jobs |
Banglore Jobs |
Chennai Jobs |
Delhi Jobs |
Ahmedabad Jobs |
Mumbai Jobs |
Pune Jobs |
Vijayawada Jobs |
Gurgaon Jobs |
Noida Jobs |
Oil & Gas Jobs |
Banking Jobs |
Construction Jobs |
Top Management Jobs |
IT - Software Jobs |
Medical Healthcare Jobs |
Purchase / Logistics Jobs |
Sales |
Ajax Jobs |
Designing Jobs |
ASP .NET Jobs |
Java Jobs |
MySQL Jobs |
Sap hr Jobs |
Software Testing Jobs |
Html Jobs |
Job Location | Hyderabad |
Education | Not Mentioned |
Salary | Not Disclosed |
Industry | Banking / Financial Services |
Functional Area | Service / Installation / Repair |
EmploymentType | Full-time |
As a Site Reliability Engineer (SRE) you will help build a meaningful engineering discipline, combining software and systems to develop creative engineering solutions to operations problems. Much of our support and software development focuses on optimizing existing systems, building infrastructure and reducing work through automation. You ll join a team of curious problem solvers with a diverse set of perspectives who are thinking big and taking risks. In this environment you ll take the lead on relevant projects, supported by an organization that provides the support and mentorship you need to learn and grow. As an SRE you ll be focused on running better production applications and systems.As an SRE, you are responsible for the following:Develop, test and debug automated tasks (Apps, Systems, Infrastructure)Troubleshoot minor incidents and contribute to resolution through post-mortemsParticipate in the application or service development lifecycle through code contributionsEngage with tools and operations teams to address failure patterns and incidentsDevelop automation tools for efficient, noiseless alerting, toil and technical debtConduct performance tests, document and/or identify application optimizationsOwn and drive incident management bridge calls and chats with production management, application development, infrastructure teams, and senior leadership with the purpose of remediating customer impacting incidents quickly.Establishing strong command and control of an Incident, establishing clear accountability and methodical evaluation of complex issue scenarios.Competent and reliable adherence to critical process and procedure, and appropriate escalations in support of production incidents.Applying technical and environmental knowledge and experience to develop and drive appropriate work streams, forming paths to resolution.Distribution of clear and concise communications, summarizing incidents and the business/customer experience to a wide group of technical and non-technical audiences.Provide detailed notes of highly visible production issues (P1 level tickets) on a timely basis to the production support staff and executive management.Ensure incident data is accurately captured and documented in the incident recording tools.Priming appropriate materials and follow ups to hand-off to the Root Cause Analysis phase in the Problem Management process.Provide additional support for any quarterly releases, conversions or projects as required.Working as part of a global follow-the-sun Team, providing 24 x 7 production support coverage on a rotating basis.Qualifications:Bachelor s degree or equivalent experience in an software engineering disciplineProficiency in at least one software language (e.g. Python, Java, GO, etc.).Understanding of the software delivery lifecycleExpertise in application, data and infrastructure architecture disciplinesAdvanced knowledge of one or more infrastructure components (e.g. networking, cloud services, orchestration tools, containerization, compute and storage systems)Capable of managing service-level changes to a system or serviceHands-on experience with AbInitio, ETL, cloud deployment, monitoring, and ops analysis tools such as Kubernetes, Prometheus, Elasticsearch, Grafana, Kibana, Splunk, DynaTrace, etc. Change and Incident Management Tool (ServiceNow)- 6+ years of experience within technology environment is required.- Relevant Incident managementexperience in an enterprise scale environment.- Technical pedigree must include oneor more of the following: Mainframe, Java, AbInitio, IMS, Oracle and DB2.- Extensive customer service andclient interaction skills.- Possess critical thinking andtroubleshooting skills.- Ability to think and actindependently to resolve production issues.- Must display a history of achievinggoals in a high performance environment.- Advanced analytical skills.- Must be able to multitask in a fastpaced environment utilizing multiple tools,
Keyskills :
root cause analysisroot causethinking bigcustomer serviceautomation toolscritical thinkingproduction supportproblem management