Intermediate Site Reliability Engineer, Database Operations

4 weeks ago


Dublin, Ireland GitLab Full time

Intermediate Site Reliability Engineer, Database Operations GitLab is an open-core software company that develops the AI-powered DevSecOps Platform used by more than 100,000 organizations. The Database Operations team owns the lifecycle of the PostgreSQL database engine for GitLab.com, focusing on reliability, scalability, performance, and security of the database engine and its supporting services.
GitLab.com is the largest GitLab instance and presents unique challenges. The experience of our team feeds back into other engineering groups and our customers running self-managed installations.
Responsibilities Automate operational tasks across environments (e.g., package updates, configuration changes, provisioning of user-facing services).
Respond to platform emergencies, alerts, and escalations from Customer Support.
Minimize manual effort in managing software lifecycles (operating systems, etc.).
Develop a fully automated, multi-environment observability stack and extend it to predict capacity needs based on usage patterns.
Plan for new service roll-outs, capacity management, and work with users to optimize resource consumption.
As An SRE You Will Work on database reliability and performance for GitLab.com within the SRE team and in product collaborations.
Analyze solutions and implement best practices for PostgreSQL clusters and components.
Improve observability of database metrics and meet database objectives.
Collaborate with peer SREs to roll out changes and mitigate production incidents.
On-call support on rotation; provide database expertise to engineering teams (migrations, queries, performance).
Automate database infrastructure and build self-service tooling; use GitLab product to operate GitLab.com efficiently.
Plan growth of GitLab/'s database infrastructure and design/maintain core components to scale for hundreds of thousands of concurrent users.
Support and debug database production issues across services and stack levels; ensure monitoring and alerting target symptoms rather than outages.
Document actions for repeatability and automation.
Projects You Could Work On Database administration review (backups, performance tuning) and automation of replica setup and backups monitoring.
Build self-service tools for engineers using GitLab ChatOps; provide technical assistance on database design and tuning.
Review migrations and recommend query/schema changes for performance; participate in production incident mitigation.
Contribute to infrastructure design and scalability focusing on data storage; plan next steps to scale the database.
Define specifications for future database requirements including enhancements, upgrades, and capacity planning.
Qualifications / Requirements Experience running PostgreSQL in high-growth, large production environments, using self-managed (VM, Kubernetes with PostgreSQL Operators) and DBaaS services.
Hands-on experience using PostgreSQL internals data to design, build, and troubleshoot systems.
Experience with infrastructure automation, orchestration, and configuration management (Chef, Ansible, Puppet, Terraform).
Solid understanding of SQL and PL/pgSQL.
Significant experience in a large SaaS distributed systems production environment.
Strong written and verbal English communication skills; ability to collaborate asynchronously.
Documentation-focused mindset and ability to deliver quickly with iteration.
Proactive, go-for-it attitude and a desire to fix problems when you see them.
Strong data modeling and data structure design skills.
Bonus: programming skills (backend engineer), preferably Ruby and/or Go.
Bonus: experience with ClickHouse or other modern OLAP databases.
Senioriy level Associate
Employment type Full-time
Job function Engineering and Information Technology
GitLab is an equal opportunity employer. If you require accommodation during the recruiting process or have questions about location eligibility, contact our Talent Acquisition team.

#J-18808-Ljbffr



  • Dublin, Ireland Salesforce, Inc. Full time

    *To get the best candidate experience, please consider applying for a maximum of 3 roles within 12 months to ensure you are not duplicating efforts.*Job CategoryDataJob Details****About Salesforce****Salesforce is the #1 AI CRM, where humans with agents drive customer success together. Here, ambition meets action. Tech meets trust. And innovation isn’t a...


  • Dublin Pike, Ireland Circle Full time

    Senior Site Reliability Engineer • Circle Join Circle, a leading financial technology company building the next generation of money infrastructure and real‑world blockchain solutions. Circle is at the intersection of payments, commerce, and technology, delivering an open, secure infrastructure that supports a wide range of cryptocurrency and traditional...


  • Dublin Pike, Ireland eBay Full time

    Social network you want to login/join with: Software Engineer, Site Reliability, Dublin col-narrow-left Client: eBay Location: Dublin, Ireland Job Category: Other - EU work permit required: Yes col-narrow-right Job Reference: b284d91f3933 Job Views: 4 Posted: 23.07.2025 Expiry Date: 06.09.2025 col-wide Job Description: At eBay, we're more than a...


  • Dublin, Ireland Reperio Human Capital Full time

    Site Reliability Engineer (SRE) 198205 Desired skills: Site Reliability Engineer, SRE, Contract Site Reliability Engineer (SRE) Location: ROI (Hybrid) Contract: Initial 6-month day rate (with potential extension)Role Overview My client is seeking an experienced Site Reliability Engineer to enhance the reliability, scalability, and performance of cloud-based...


  • Dublin, Ireland Reperio Human Capital Ltd Full time

    Site Reliability Engineer (Azure PaaS / Observability) We're seeking an experienced Site Reliability Engineer (SRE) to join a high-performing team supporting a large-scale SaaS platform built on Microsoft Azure PaaS. This role focuses on observability, system reliability, and incident management for complex cloud environments - ideal for engineers who thrive...


  • Dublin, Ireland Reperio Human Capital Ltd Full time

    Site Reliability Engineer - Azure | SaaS | Dublin (Hybrid - 2 Days Onsite) A leading software company is looking for a Site Reliability Engineer (SRE) to join their growing team supporting a complex SaaS platform hosted in Microsoft Azure. This is a traditional SRE role focused on system reliability, performance, monitoring, incident response, and...


  • Dublin, Ireland Reperio Human Capital Full time

    Senior Site Reliability Engineer 196338 Desired skills: SRE, Azure, Cloud, SaaS, PaaS Site Reliability Engineer - Azure PaaS / ObservabilityWe're searching for a skilled Site Reliability Engineer to join a dynamic team responsible for a large-scale SaaS platform running on Microsoft Azure PaaS. This role is perfect for engineers who enjoy diving deep into...


  • Dublin, Ireland Reperio Human Capital Ltd Full time

    Site Reliability Engineer - Azure PaaS / Observability We're searching for a skilled Site Reliability Engineer to join a dynamic team responsible for a large-scale SaaS platform running on Microsoft Azure PaaS. This role is perfect for engineers who enjoy diving deep into complex systems, strengthening observability, and resolving issues at speed - not just...


  • Dublin, Ireland Reperio Human Capital Ltd Full time

    Senior Site Reliability Engineer (Azure) - SaaS Overview We're partnering with a leading software company looking for a Senior Site Reliability Engineer (SRE) to scale and support their Azure-hosted SaaS platform. You'll be pivotal in maintaining high system reliability, performance, and observability-collaborating closely with development, operations, and...


  • Dublin, Ireland FIS. Empowering the Financial World Full time

    Position Type : Full time Type Of Hire : Experienced (relevant combo of work and education) Education Desired : Bachelor of Computer Engineering Travel Percentage : 0% Are you curious, motivated and forward-thinking? At FIS you’ll have the opportunity to work on some of the most challenging and relevant issues in financial services and technology....