Data Engineer

About Nebulaworks

Nebulaworks is a consulting and engineering firm founded, built, and managed by engineers, for engineers. Our mission is to be the best in the world at creating high-performance engineering teams where members are inspired to collaborate openly, incentivized to gather new knowledge and skills, and value simplicity when solving difficult problems.

We believe technological and cultural change starts when the focus is on developing the team’s skillset, collaboration, and process to deliver software. The tools come and go, but the right team is invaluable. They possess the foundational knowledge and expertise to reason about any problem, propose numerous reasonable solutions, and identify a low-risk path to proving their hypothesis.

Job Purpose

As a Data Engineer at Nebulaworks, you will be pivotal in managing and analyzing large datasets crucial for advancements in life sciences. This role involves collaborating with cloud engineers and scientists, providing essential data management, machine learning, and data lifecycle expertise to support our cutting-edge research and discovery projects.

You Will

Build distributed, scalable, and reliable data pipelines that ingest and process data at scale and in real time.
Collaborate with other teams to design, develop, and deploy data tools that support both operations and product use cases.
Perform offline analysis of large data sets using components of a big data software ecosystem. Evaluate and advise on technical aspects of open work requests in the product backlog with the product lead.
Own product features from the development phase through to production deployment. Evaluate big data technologies and prototype solutions to improve data processing architecture.
Develop and maintain specialist knowledge of database concepts, object and data modeling techniques, and design principles, and a detailed knowledge of database architectures, software, and facilities.
Apply Agile methodologies in project management.
Deploy and maintain database storage Infrastructures in AWS cloud.
Troubleshoot and determine the root cause of complex data provenance, metadata issues, and engineering questions that may involve interfacing with various technical staff in multiple organizations and with differing levels of expertise.
Investigate, evaluate, test, and recommend technical solutions for future systems.
Develop tools and procedures to monitor and automate system tasks on servers and clusters.
Creates data environments and/or data sets to serve a wide range of data users, including but not limited to Data Scientists, Data Analysts, Business Analysts etc
Code/Build/Deploy - git, jenkins, AWS
Handle large, complex data sets to meet functional and non-functional business requirements.
Implement methods for enhancing data reliability, efficiency, and quality.
Use programming languages such as Python (must) and R for data tasks.
Expertise in RDBM, NoSQL, JSON, and other database technologies.
Develop and maintain machine learning models aligned with business objectives.
Collaborate with data scientists in implementing advanced machine learning algorithms.
Adapt to dynamic working environments, managing multiple priorities.

Job Purpose

Requirements

You value driving all changes through version control
You value all team members being hands-on and directly contributing to business outcomes
Proven experience as a Data Engineer or in a similar role.
Strong understanding of machine learning principles and application in data processes.
Experience in building and deploying machine learning models.
Expertise in Python, Object Oriented Programming and SQL database management.
Familiarity with AWS cloud-based solutions.
Excellent problem-solving, communication, and analytical skills.
Bachelor’s degree in Data Science, Computer Science, IT, or related field; Master’s is a plus.

Nice to Haves

Expertise in scientific data, especially within life sciences, is a significant plus.
Data engineering or machine learning-related certification
Mastered Continuous Integration, Continuous Delivery, and Continuous Deployment approaches and processes
Experience with functional programming languages (Haskell, etc.)
Experience with containers and container platforms (Kubernetes (EKS, GKE, AKS), docker, etc.)
Experience across multiple major cloud platforms (AWS, Azure, GCP)
Expertise in advanced deployment patterns (blue/green, feature flags, canary, etc.)
Experience with Twelve-Factor application architectures
Experience with managing the large-scale infrastructure supporting numerous teams and applications

Why Apply:

Hybrid work: Irvine office and work from home
Competitive salary and compensation package
401K with match
Employee Incentive Plan (EIP)
Health and dental, medical coverage

Posting Statement:

All applicants must be legally permitted to work in the United States without a visa; we cannot provide new or continuing visa sponsorship opportunities at this time.
Nebulaworks is an Equal Employment Opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, age, national origin, protected veteran status, disability status, sexual orientation, gender identity or expression, marital status, genetic information, or any other characteristic protected by law.
We request candidates to scrub such information from their resumes before applying to help make Nebulaworks an inclusive environment for everyone.

To apply:

$ ssh -p 23234 <github_user>@sshapply.nebulaworks.com
$ scp -P 23234 <resume>.pdf <github_user>@sshapply.nebulaworks.com:resume.pdf

Note: You must use an SSH key that’s associated with the same user in github