HPC Software Engineer
net annual basic salary + other benefits
This job has now expired please search on the home page to find live IT Jobs.
Job reference: VN21-58
HPC Software Engineer: cloud-based deployment for on-demand high-performance computing in Destination Earth
Contract Duration: Two years
Deadline for applications: 31/01/2022
ECMWF is the European Centre for Medium-Range Weather Forecasts. It is an intergovernmental organisation created in 1975 by a group of European nations and is today supported by 34 Member and Co-operating States, mostly in Europe. The Centre's mission is to serve and support its Member and Co-operating States and the wider community by developing and providing world-leading global numerical weather prediction. ECMWF functions as a 24/7 research and operational centre with a focus on medium and long-range predictions and holds one of the largest meteorological archives in the world. The success of its activities relies primarily on the talent of its scientists, strong partnerships with its Member and Co-operating States and the international community, some of the most powerful supercomputers in the world, and the use of innovative technologies such as machine learning across its operations.
Over the years, ECMWF has also developed a strong partnership with the European Union, and for the past seven years has been an entrusted entity for the implementation and operation of the Climate and the Atmosphere Monitoring Services of the EU's Copernicus component of its Space Programme, as well as a contributor to the Copernicus Emergency Management Service. The collaboration does not stop there and includes other areas of work, including High Performance Computing and the development of digital tools that enable ECMWF to extend its provision of data and products covering weather, climate, air quality, fire and flood prediction and monitoring.
ECMWF has recently become a multi-site organisation, with its headquarters based since its creation in Reading, UK, its new data centre in Bologna, Italy, and new offices in Bonn, Germany.
About Destination Earth (DestinE)
It is foreseen that ECMWF will be a major partner in the implementation of the Destination Earth (DestinE) initiative starting later in 2021, together with ESA and EUMETSAT as partners. The objective of the European Commission DestinE initiative is to deploy several highly accurate thematic digital replicas of the Earth, called Digital Twins. The Digital Twins will help monitor and predict environmental change and human impact, in order to develop and test scenarios that would support sustainable development and corresponding European policies for the Green Deal.
DestinE will thus contribute to revolutionising the European capability to monitor and predict our changing planet, complementing exisiting national and European efforts such as those provided by the national meteorological services and the Copernicus Services. It will be run in several phases, of which the first, the implementation phase, covers the period end-2021 - mid-2024. Future phases are foreseen (subject to funding) that will operationalise the digital twins, scale-up system production and add applications and new twin options.
DestinE covers several demanding digital technology aspects required to develop, implement and operate the two high-priority digital twins on weather induced and geophysical extremes and on climate change adaptation. ECMWF will be responsible for the delivery of these digital twins, which will rely on complex Earth-system simulation models, data assimilation methods for fusing simulations and observations through inverse modelling and the integration of observations and models from sectors such as water and food management, renewable energies and socio-economic risk and disaster management.
These science components require advanced digital technology solutions to maximize the efficient computing and data handling on extreme-scale infrastructures, and to adapt and operate these infrastructures across different heterogeneous architectures within a federated framework. This federated framework includes the Core Platform and Data Lake developed, deployed and operated by ESA and EUMETSAT respectively.
The DestinE developments take forward the long-term investments of the ECMWF Member States in building a unique European prediction capability and will support the further advancement of member states services and Copernicus Services.
Summary of the role
ECMWF has an exciting opportunity to help build the required digital infrastructure for DestinE in collaboration with partners throughout Europe. This role will support the delivery of containerised HPC workflows on third-party cloud-based compute platforms.
DestinE is a distributed system of autonomous service providers tied together by a specified set of interfaces. The anticipated work of this position explores novel pathways for HPC deployment and data production. It is envisaged that containerised HPC workflows will need to be developed and deployed on EuroHPC infrastructure.
The HPC applications team is part of the High-Performance Computing and Storage Section in ECMWF's Computing Department. The team provides in-depth knowledge and expertise to support ECMWF developers, advising and assisting in writing, maintaining, debugging, and optimising the suites of demanding scientific codes used by ECMWF. The other two teams in the section are responsible for maintaining ECMWF's petascale high-performance computing systems, helping developers in achieving the most efficient use of these systems and providing the online and archive data storage systems.
Whilst this position will be based at the ECMWF HQ in Reading, United Kingdom, there will be strong collaboration with staff working on DestinE in Bonn, Germany and the Platform & Services teams in Bologna, Italy, and it is anticipated visits to both sites will be required.
Main duties and key responsibilities
Supporting the adaptation of existing HPC codes and workflows on cloud-based systems
Designing and developing prototypes for containerised workflows (including complex MPI/hybrid intensive compute applications) on a range of HPC platforms (including EuroHPC systems)
Deploying, supporting and benchmarking the workflows on HPC and multi-cloud environments
Liaising closely with DestinE partners ESA and Eumetsat as well as other partnerships and users to identify and resolve performance and functional issues with the HPC workflows
Developing tooling to ease the deployment of HPC applications and workflows across different cloud infrastructures
Excellent interpersonal and communication skills
Ability and willingness to collaborate with internal and external experts
Strong analytical and problem-solving skills, with a proactive continuous improvement approach
Self-motivated, and able to work with minimal supervision
Ability to maintain effective communication and documentation with the rest of the team and a distributed project partner community
Dedication, passion, and enthusiasm to succeed both individually and within a team
Highly organised with the capacity to work on a diverse range of tasks to tight deadlines in a matrix management environment
Education, experience, knowledge and skills (including language)
A university degree (EQF Level 6) or equivalent industry experience
Demonstrable experience in some of the following is required:
Experience with HPC environments and the deployment of large-scale parallel applications
Experience in designing and developing applications in an operational Linux Cloud environment
Experience with developing and deploying to HPC clusters in a Cloud environment (e.g., AWS Parallel Cluster, Azure CycleCloud)
Good understanding of cloud architectures and cloud platforms
Experience with running scientific workloads on Cloud platforms
Proven track record in software engineering
Experience with batch schedulers (e.g., SLURM, PBS), parallel filesystems (e.g., Lustre, GPFS, BeeGFS), and parallel programming and profiling tools would be an advantage
Candidates must be able to work effectively in English and interviews will be conducted in English
Good knowledge of one of the Centre's other working languages (French or German) is not required but would be an advantage
Demonstrable knowledge and skills in some of the following is required:
Cloud Native (e.g., Kubernetes, Docker, Singularity)
Cloud IaaS (e.g., Amazon, Google, Microsoft, Oracle)
Good Programming and scripting skills (any higher level language, e.g. C/C++, Fortran, Java, Python, Julia, Bash)
For more information on how to apply, please click the "Apply" button.