Data Pipeline Engineer


Premium Job From Heni Publishing

Recruiter

Heni Publishing

Listed on

6th July 2021

Location

London

Salary/Rate

Competitive salary offered depending on level of experience

Salary Notes

Competitive salary offered depending on level of experience

Type

Permanent

Start Date

ASAP

This job has now expired please search on the home page to find live IT Jobs.

Data Pipeline Engineer

Overview

Our team is delivering cutting-edge machine learning and data science solutions to solve challenging problems across art, crypto-art, law, and the social media sectors. We collect huge amounts of data by reviewing websites, scanning books, and recording videos.

We are looking to grow our engineering team to manage the large amounts of data produced on the web, social media and blockchains. You would be joining a strong team of 30+ people (data scientists, engineers, and analysts), in a fast-paced and collaborative team environment. 

As a data engineer, you will be responsible for:

Writing data pipelines in Python or R to extract data from blockchain & service APIs into relational databases.

Developing tools to monitor existing data pipelines and web scrapers.

Developing tools to monitor data quality in various databases.

Improving the data quality of the databases by working on hotfixes with the help of QA & fellow data engineers/data scientists.

Deploying machine learning models on a secured public facing API via Kubernetes/Kubeflow/Ray Serve.

Writing and securing public facing APIs on Kubernetes to allow external partners to access our services.

Work with data scientists in deploying the following on Kubernetes/AWS cloud

building datasets and training models

prediction performance monitoring of models already deployed and serving

online machine learning pipelines

Monitoring load on databases and suggesting ways to optimise SQL queries.

Mentor and develop junior members of the team.

Innovate and suggest new methods of doing things.

 

Requirements

BSc/MSc in computer science or quantitative field

Experience in:

3+ years of Python/R and SQL.

Building and deploying data/machine learning pipelines.

Deploying jobs and services on Kubernetes.

Working with AWS EC2, RDS, S3.

Github - PRs, Git Flow etc.

CI/CD - Concourse/Github Workflows/GitOps

Backend API development for model/data serving 

Familiarity with machine learning, blockchain and data science

Highly beneficial if you have experience in any of the above (although this is not a data science/machine learning role) 

Motivation to work with dirty data and to clean it.

You are currently using an outdated browser.

Please consider using a modern browser such as one listed below: