Current jobs related to AWS Incident Management Team Lead - Dublin, Dublin City - Amazon


  • Dublin, Dublin City, Ireland Amazon Full time

    AWS is at the forefront of providing high availability of Amazon Web Services. Our team's expertise in large-scale event and incident management enables us to make customer impacting events shorter and less frequent.We achieve this through automated tooling that quickly identifies the cause of an issue, helping mitigate its impact. Much of our engineer time...


  • Dublin, Dublin City, Ireland Amazon Full time

    As a Software Development Manager on the AWS Incident Management team, you will lead the development and implementation of automated tooling roadmaps to detect and resolve issues within AWS infrastructure.About UsAWS Incident Tooling is at the heart of high availability for Amazon Web Services. We make customer-impacting events shorter and less frequent by...


  • Dublin, Dublin City, Ireland Amazon Full time

    Incident Management Engineer, AWS Incident Detection and ResponseJob ID: 2917202 | Amazon Web Services New Zealand LimitedSales, Marketing and Global Services (SMGS) is responsible for driving revenue, adoption, and growth from the largest and fastest growing small- and mid-market accounts to enterprise-level customers including public sector. The AWS Global...


  • Dublin, Dublin City, Ireland Amazon Full time

    Job OverviewAWS Incident Management is a critical function within the AWS organization, responsible for preventing and responding to availability and security issues across all AWS services. As a Senior Software Development Manager, you will play a key role in defining and delivering business priorities for the AWS Incident Management team.You will work...


  • Dublin, Dublin City, Ireland TN Ireland Full time

    A company overview is essential for understanding our mission and values. At TN Ireland, we own the design, planning, delivery, and operation of all AWS global infrastructure.We support all AWS data centers and all of the servers, storage, networking, power, and cooling equipment that ensure our customers have continual access to the innovation they rely...


  • Dublin, Dublin City, Ireland Amazon Full time

    About the RoleA Support Engineer on the AWS Incident Response team will lead projects and build processes to reduce the duration, frequency, and impact of issues within the AWS and Amazon infrastructure. You will also spend a portion of your time directing the resolution of high visibility incidents by leading conference calls and teams across the globe....


  • Dublin, Dublin City, Ireland Amazon Full time

    Software Development Manager, AWS Incident Tooling & ResponseJob ID: 2830638 | Amazon Development Centre Ireland LimitedAWS Resilience owns services that prevent and respond to availability and security issues for all AWS Services. In other words, we're the people who keep the cloud running. We work on the most challenging problems, with constant new...


  • Dublin, Dublin City, Ireland beBee Careers Full time

    Job Description:A Software Development Manager is sought after to lead a team of engineers in developing and maintaining automated tooling for incident response within AWS infrastructure.The successful candidate will oversee the roadmap and delivery of these tools, ensuring seamless collaboration with cross-functional teams and driving improvements based on...


  • Dublin, Dublin City, Ireland Amazon Full time

    Software Development Manager, AWS Incident Tooling & ResponseJob ID: 2830638 | Amazon Development Centre Ireland LimitedAWS Resilience owns services that prevent and respond to availability and security issues for all AWS Services. In other words, we're the people who keep the cloud running. We work on the most challenging problems, with constant new...


  • Dublin, Dublin City, Ireland ENGINEERINGUK Full time

    Software Development Manager, AWS Incident Tooling & ResponseDESCRIPTIONAWS Resilience owns services that prevent and respond to availability and security issues for all AWS Services. In other words, we're the people who keep the cloud running. We work on the most challenging problems, with constant new services and possible failure modes to prevent - and...

AWS Incident Management Team Lead

3 weeks ago


Dublin, Dublin City, Ireland Amazon Full time
About AWS Incident Tooling & Response
At Amazon, we own services that prevent and respond to availability and security issues for all AWS Services. Our mission is to keep the cloud running with constant new services and possible failure modes to prevent.
We're a diverse team of software, security experts, operations managers, and other vital roles working together to deliver the highest standards for safety, security, and availability.

Our automated tooling quickly identifies the cause of an issue and helps mitigate its impact. We make customer impacting events shorter and less frequent by detecting early large-scale events and providing the tooling to enable fast mitigation.

We provide our solutions for other AWS groups to manage their own events. It's an exciting time to join our team as we are growing and expanding our offerings.

About this role

You will manage automated tooling roadmaps and delivery for the detection and resolution of issues within AWS infrastructure. You will work closely with the team managing the incident response and with leadership to gather new requirements.

Based on learning from past incidents, you will drive further improvements into our automation, tooling, and processes so that the next event is shorter or avoided entirely.

You will coordinate across project teams to expand the use of our tooling to additional areas across Amazon.

Main responsibilities:

  • Define and Deliver Business Priorities: You will be a key contributor and owner of the direction of the AWS Incident Management team. You will define, plan, track, and deliver on strategic goals for the team, while ensuring that the team remains unblocked and focused
  • Cross-Site, Cross-Team Coordination: You will be responsible for coordinating with your counterparts and sister teams to ensure that a clear communication channel exists between AWS Incident tooling and Response teams.
  • Performance Management/Team Health: You will own all facets of performance and career management for the team. You will ensure the operational load of your team remains manageable and as minimal as possible.

What you'll bring:

  • Knowledge of engineering practices and patterns for the full software/hardware/networks development life cycle, including coding standards, code reviews, source control management, build processes, testing, certification, and livesite operations.
  • Experience in engineering team management.
  • Experience in leading the definition and development of multitier web services.
  • Experience partnering with product and program management teams.
  • Experience communicating with users, other technical teams, and senior leadership to collect requirements, describe software product features, technical designs, and product strategy.