Cloud Network Operations Engineer
While candidates in the listed location(s) are encouraged for this role, candidates in other locations will be considered. This role is hybrid.
We're growing fast and attracting the best talent in the world. Bricksters — as we call ourselves — are a special mix of smart, curious, quick thinkers. If you ask a Brickster what they love about working here, you'll likely hear about our culture.
We are seeking an experienced a Network Operations Center engineer to join our team. The successful candidate will be responsible for monitoring critical Databricks' infrastructure and developing monitoring tools and alerting dashboards. They will also work closely with stakeholders to investigate and resolve incidents, perform root cause analysis, and propose solutions to increase the reliability and stability of the Databricks platform.
The impact you will have:
- Monitor critical infrastructure, triage alerts to proactively identify incidents, and work with stakeholders to resolve incidents.
- Investigate incidents and propose solutions to improve platform reliability and stability.
- Perform root cause analysis for reoccurring incidents and provide proactive solutions.
- Develop toolings or automate processes to improve platform monitoring and alerting.
- Contribute to software development efforts to improve overall service reliability and stability.
- Communicate with internal stakeholders, including executive staff, to provide incident analysis.
- Participate in war rooms and temporary communication channels during outages.
- Demonstrate cross-functional leadership and establish ownership of incidents and outages.
- Multitask on several incidents and/or projects at once
What we look for:
- 3 years of experience as a NOC, SRE, or DevOps engineer
- Knowledge of cloud technologies such as Azure, AWS, and GCP
- Hands-on experience with monitoring, logging, and alerting tools
- Hands-on experience with containers and orchestration technologies
- Automation and scripting skills
- Linux systems administration skills.
- Knowledge of managing incidents
- Excellent communication skills.
- Technical degree or equivalent experience
- Willingness to learn the Databricks products
Databricks is the data and AI company. More than 9,000 organizations worldwide — including Comcast, Condé Nast, and over 50% of the Fortune 500 — rely on the Databricks Lakehouse Platform to unify their data, analytics and AI. Databricks is headquartered in San Francisco, with offices around the globe. Founded by the original creators of Apache Spark™, Delta Lake and MLflow, Databricks is on a mission to help data teams solve the world’s toughest problems. To learn more, follow Databricks on Twitter, LinkedIn and Facebook.
Our Commitment to Diversity and Inclusion
At Databricks, we are committed to fostering a diverse and inclusive culture where everyone can excel. We take great care to ensure that our hiring practices are inclusive and meet equal employment opportunity standards. Individuals looking for employment at Databricks are considered without regard to age, color, disability, ethnicity, family or marital status, gender identity or expression, language, national origin, physical and mental ability, political affiliation, race, religion, sexual orientation, socio-economic status, veteran status, and other protected characteristics.
If access to export-controlled technology or source code is required for performance of job duties, it is within Employer's discretion whether to apply for a U.S. government license for such positions, and Employer may decline to proceed with an applicant on this basis alone.