hireejobs
Hyderabad Jobs
Banglore Jobs
Chennai Jobs
Delhi Jobs
Ahmedabad Jobs
Mumbai Jobs
Pune Jobs
Vijayawada Jobs
Gurgaon Jobs
Noida Jobs
Oil & Gas Jobs
Banking Jobs
Construction Jobs
Top Management Jobs
IT - Software Jobs
Medical Healthcare Jobs
Purchase / Logistics Jobs
Sales
Ajax Jobs
Designing Jobs
ASP .NET Jobs
Java Jobs
MySQL Jobs
Sap hr Jobs
Software Testing Jobs
Html Jobs
IT Jobs
Logistics Jobs
Customer Service Jobs
Airport Jobs
Banking Jobs
Driver Jobs
Part Time Jobs
Civil Engineering Jobs
Accountant Jobs
Safety Officer Jobs
Nursing Jobs
Civil Engineering Jobs
Hospitality Jobs
Part Time Jobs
Security Jobs
Finance Jobs
Marketing Jobs
Shipping Jobs
Real Estate Jobs
Telecom Jobs

Observability Emgineer

5.00 to 7.00 Years   Kolkata   24 Jun, 2021
Job LocationKolkata
EducationNot Mentioned
SalaryNot Disclosed
IndustryBanking / Financial Services
Functional AreaQuality (QA-QC)
EmploymentTypeFull-time

Job Description

*Site Reliability Engineering (SRE) at DBS combines software and systems engineering to build run, and maintain high performant, distributed, fault tolerant and resilient financial systems. Site Reliability Engineers focus on ensuring a joyful customer journey.As a Site Reliability Engineer you will be filling a mission-critical role ensuring that our systems are healthy, monitored, automated, fault tolerant and designed to scale.You will collaborate and work closely with engineering teams to continually improve our production services, facilitating fast delivery of new products, and reducing downtime.Key Responsibilities and Deliverables:

  • Drive Site Reliability Engineering agenda to improve availability, reliability, and performance of services
  • Drive observability for our applications.
  • Drive optimise operate initiative, example, reduction of operation toil
  • Work with application teams in setting up SLI, SLO and Error budget for their applications
  • Work with enterprise team in deploying SRE enablers/initiatives.
  • Build, maintain, and improve our monitoring tools infrastructure platform used across the company.
  • APM agent setup for Applications in a large, multi-methodology organization (Server and/or Data Center)
  • Administer monitoring applications and tools sets and takes ownership of performance, availability and capacity planning.
  • Perform APM different agent installations, upgrades, migrations, and add-on installation.
  • Determine ways to optimize/improve APM on-board process workflows, as well as identify where functionality can/cannot meet user requests
  • Analyses data collected to learn about the performance of infrastructure, applications and services.
  • Responds to outages with prompt and efficient resolution and communicates issues to via the proper support channels
  • Gathers requirements and functional specifications from support teams in order to build monitors and dashboards that provide insight into the health and performance of systems.
  • Provides tuning to existing monitors by identifying trends, oddities and potential bottlenecks and triggering thresholds or actionable alerts.
  • Provides collaboration with other engineering teams around application integration with monitoring tools through the build out of analytics (reporting), visualization (dashboards), and alerting (correlation).
  • Learns new technologies and how to monitor them.
  • Advocate to the business on monitoring concepts and capabilities and how to use these systems.
  • Provides on-call support of our toolsets.
  • Researches, evaluates, and recommends new technologies and provides a roadmap for future monitoring capabilities.
  • Assists with planning and executing disaster recovery solutions and business continuity planning for our monitoring and collaboration tools.
  • Ensures support documentation is current for system configuration.
  • Provides technical guidance and training to the NOC staff around monitoring solutions and reviews monitoring tasks completed.
Acceptance:
  • 5+ years experience in systems administration with UNIX/Linux based operating systems.
  • 5+ years plus of experience managing APM systems deployment, monitoring, scaling, debugging is desirable.
  • Understands key SRE concepts such as Toil, SLI, SLO, Error Budgets, MTTD, MTTR, etc
  • 5+ years experience with mainstream, centralized, enterprise-class monitoring systems such as Datadog, AppDynamics, or Dynatrace.
  • Understanding of system administration principles (Monitoring, Network, Storage, Scripting).
  • Experience with a variety of Amazon Web Services, VMWare Tanzu and OpenShift platforms on-board with APM agent setup
  • Must have experience in Server Consolidation and Virtualization.
  • Must have experience in managing Container based architecture such as Docker and Kubernetes.
  • Operational experience in High Availability and Disaster Recovery environments, such as load balancing, clusters, data replication, etc.
  • Experience with infrastructure automation tools or coding/scripting (i.e., Ansible, Terraform, Puppet, Chef, Python, Go, PowerShell)
  • Familiar with LDAP, Active Directory, and Single Sign On implementations. Knowledge of ITIL best practices
  • Experience working on Service-now Incident, Change and Knowledge management tools.
  • Ability to work autonomously with minimal supervision and manage multiple issues at once.
  • Familiar with PKI Infrastructure and web applications tools like Apache Tomcat, HTTPD and Nginx.
  • Familiar with Splunk query language, SQL, and/or metrics (Statsd, Prometheus, CollectD, InfluxDB, Graphite, Grafana and CloudWatch) knowledge a plus.
Possess strong interpersonal and communication skills to be able to deal with and form good relationships with other technology teams through day to day support and project work,

Keyskills :
system administrationhigh availabilityknowledge managementactive directoryautomation toolsapache tomcatsystems engineeringload balancingserver consolidationoncall support

Observability Emgineer Related Jobs

© 2019 Hireejobs All Rights Reserved