hireejobs
Hyderabad Jobs
Banglore Jobs
Chennai Jobs
Delhi Jobs
Ahmedabad Jobs
Mumbai Jobs
Pune Jobs
Vijayawada Jobs
Gurgaon Jobs
Noida Jobs
Oil & Gas Jobs
Banking Jobs
Construction Jobs
Top Management Jobs
IT - Software Jobs
Medical Healthcare Jobs
Purchase / Logistics Jobs
Sales
Ajax Jobs
Designing Jobs
ASP .NET Jobs
Java Jobs
MySQL Jobs
Sap hr Jobs
Software Testing Jobs
Html Jobs
IT Jobs
Logistics Jobs
Customer Service Jobs
Airport Jobs
Banking Jobs
Driver Jobs
Part Time Jobs
Civil Engineering Jobs
Accountant Jobs
Safety Officer Jobs
Nursing Jobs
Civil Engineering Jobs
Hospitality Jobs
Part Time Jobs
Security Jobs
Finance Jobs
Marketing Jobs
Shipping Jobs
Real Estate Jobs
Telecom Jobs

Truminds Software System - Site Reliability Engineer - Chef/Ansible/APM Tools

3.00 to 9.00 Years   Bangalore   20 Oct, 2021
Job LocationBangalore
EducationNot Mentioned
SalaryNot Disclosed
IndustryIT - Software
Functional AreaGeneral / Other Software,Site Engineering / Project Management
EmploymentTypeFull-time

Job Description

: Site Reliability EngineerName Of The Organization : Truminds Software SystemsLocation : Bangalore/BengaluruPosition Title : Sre EngineerYears Of Experience : 3+ Yrs.Mandatory Skills : - Relevant experience as SRE would be an added advantage.- Good understanding of uplifting the maturity (App Engineering practices & Ops)- Understanding of software delivery lifecycles, particularly Agile/Lean & DevOps.- Proven experience in handling large scale and growing infrastructure across Data Centres and heterogeneous Cloud platforms.- Team player with good communication and problem-solving skills.Job Duties and Responsibilities : - The DISH IT team is looking for a highly motivated, talented, and experienced SRE Specialist to be part of the Site Reliability Engineering team. - As a Site Reliability Engineer (SRE) we want the person to be responsible for both uplifting and maintaining our evolving technology platforms, infrastructure and technology controls. As an SRE, the role will include the development/engineering of solutions to maximize system reliability & automation. The role will address three dimensions: - Tools Coverage - Assess the tools coverage and ensure sufficient monitoring is in place to enable mature observability and data-driven decision making- Defining and educating Engineering teams - Process, Procedures, Guide Rails and best practices.- Culture - Inculcate the culture of high performing teams and adopt the ways of working with the influence of SREThe role will need to work with a global team responsible for a mission-critical business function and will partner with Infrastructure, DevOps and Core practices (like Security, Identity, ProdOps, Cloud platform and Tools) teams to identify and implement automation opportunities to drive down toil, reduce technical debt and improve system reliability.Key Responsibilities : - Own the Infrastructure, APM and work with DevOps teams to Build, Release, Monitor and run the services to improve service reliably- Write software to automate API-driven tasks at scale and contribute to the product codebase in Java, JS, React, Node, Go and Python- Work with Ansible, Puppet, Chef, Terraform or another config management/orchestration suite, know where its broken, work towards fixing them and explore new alternatives- Define and accelerate the implementation of support processes, tools and best practices- Maintain services once they are live by measuring and monitoring availability, latency and overall system reliability- Handle cross-team performance issues from the identification of the cause, determining the areas of improvement and driving those actions to closure- Performance and maturity baselining of DevOps process, tools maturity & coverage, metrics, technology and engineering practices- Define, Measure and improve Reliability Metrics (SLO/SLI), Observability (Monitoring, Logging-Tracing solutions), Ops process (Incident, Problem Mgmt) and streamline - automate the release management - A strong believer of automation to bring in sustained continuous improvement by automating Toil, Runbooks, Improving the ability of the applications to auto-heal leading to improved reliability Experience to Include : - Knowledge in the one or more of the following key areas : Ops maturity (performance testing, monitoring, operations - SIP), APM, Performance Benchmarking, Software Design and lifecycle (planning - discovery to provision), Infosec (including compliance, security)- Good understanding & implementation experience using 12-factor App principle- Exp in building monitoring/metrics & alerting tool (APM tool), a custom dashboard for each Application stack against the supported environment.- Expertise with Python-related Technologies and Frameworks.- Exp with Unix/Linux-OS Internals and administration or Networking and SME on at least one of the Cloud computing Infrastructure - GCP/Azure/AWS - Familiarity with handling :A) Containerization - Kubernetes, Docker, Rancher, etc.B) Kafka, Yarn, Elastic Search, etc.C) Source code management and Implementation of Security best practices. D) Tech Stack - Python, Falcon, Elastic Search, MongoDB, AWS (SQS S3), Map Reduce.E) Data science (AI/ ML) and analytics to be able to predict failures/operational issues.- Be a subject matter expert, able to upskill/cross skill engineering teams on SRE principles, tools and execution.- Troubleshoot, debug, and diagnose operational issues and drive them to closure. - Monitor the health of Dish-Sling services, and define as well as a track reliability metric.Educational Qualification : - B.E./B.Tech. in Computer Science/ IT or MCA would be preferred.- Excellent oral and written communication skills.,

Keyskills :
cloud computingproblem solvingsoftware designrelease managementtechnology platformswritten communicationcontinuous improvementreliability engineering

Truminds Software System - Site Reliability Engineer - Chef/Ansible/APM Tools Related Jobs

© 2019 Hireejobs All Rights Reserved