Data Pipeline Engineer
Our team is delivering cutting-edge machine learning and data science solutions to solve challenging problems across art, crypto-art, law, and the social media sectors. We collect huge amounts of data by reviewing websites, scanning books, and recording videos.
We are looking to grow our engineering team to manage the large amounts of data produced on the web, social media and blockchains. You would be joining a strong team of 30+ people (data scientists, engineers, and analysts), in a fast-paced and collaborative team environment.
As a data engineer, you will be responsible for:
- Writing data pipelines in Python or R to extract data from blockchain & service APIs into relational databases.
- Developing tools to monitor existing data pipelines and web scrapers.
- Developing tools to monitor data quality in various databases.
- Improving the data quality of the databases by working on hotfixes with the help of QA & fellow data engineers/data scientists.
- Deploying machine learning models on a secured public facing API via Kubernetes/Kubeflow/Ray Serve.
- Writing and securing public facing APIs on Kubernetes to allow external partners to access our services.
- Work with data scientists in deploying the following on Kubernetes/AWS cloud
- building datasets and training models
- prediction performance monitoring of models already deployed and serving
- online machine learning pipelines
- Monitoring load on databases and suggesting ways to optimise SQL queries.
- Mentor and develop junior members of the team.
- Innovate and suggest new methods of doing things.
- BSc/MSc in computer science or quantitative field
- Experience in:
- 3+ years of Python/R and SQL.
- Building and deploying data/machine learning pipelines.
- Deploying jobs and services on Kubernetes.
- Working with AWS EC2, RDS, S3.
- Github - PRs, Git Flow etc.
- CI/CD - Concourse/Github Workflows/GitOps
- Backend API development for model/data serving
- Familiarity with machine learning, blockchain and data science
- Highly beneficial if you have experience in any of the above (although this is not a data science/machine learning role)
- Motivation to work with dirty data and to clean it.
Contact Name: Lai Kay Man