How cloud computing and Copernicus can support the sustainable development community

Event date: 
17 November 2020

Over the last few years, there has been a step-change in the capabilities of cloud-based systems for the collection and processing of large data, such as the regular images collected by Earth Observation (EO) satellites. This blog describes how the Earth Observation for Sustainable Development Laboratory (EO4SD Lab) leverages these advantages to give users ability to generate large volumes of geospatial data products.

The key goal of the EO4SD Lab is to put the power of EO data in the hands of end users from Development Agencies and International Finance Institutions (IFIs) to make effective, information driven decisions in development aid scenarios around the world. The EO4SD Lab project is part of the wider European Space Agency (ESA) EO4SD initiative to support the uptake of EO-derived information in sustainable development over a wide range of thematic areas. EO4SD Lab complements other EO4SD projects in that it directly provides users with the tools to create their own bespoke geospatial products to meet their specific needs.

Two parallel technical advances in recent years have made cloud-based EO data exploitation platforms like EO4SD Lab viable.

Firstly, there has been a significant increase in available EO data in the last few decades. A key factor in this is the launch of the European Commission’s Copernicus programme and its Sentinel satellites. This constellation of satellites enables observations of any global location in weekly or better timeframes. As such, the limiting factor is no longer availability of data, but the ability to access, process and analyse it.

Secondly, there is the ramp up in cloud-computing capability. Previously, EO users were expected to download and analyse satellite data on their own computing infrastructure requiring suitable data storage and processing power. This approach is becoming increasingly limiting, with the wealth of EO data available. Rather, systems such as EO4SD Lab and ESA Thematic Exploitation Platform (TEPs) programme are utilizing scale cloud-based hardware to collocate within the same virtual environment to have processing capability alongside the EO data. By enabling open and simple-to-use access to users, through a web portal, this approach brings the users to the data, rather than the data to the users. A key benefit of this cloud-computing approach is on-demand scalability, which allows dynamic provisioning of processing capability for only when it is needed. This significantly reduces the ICT costs for end users.

The EO4SD Lab uses a range of open source software tools to enable such capabilities. The figure below shows the overall technology stack and some of the key pre-existing components used.

EO4SD-Lab core elements

There are four key EO4SD Lab technological components:

  • The underlying ICT infrastructure, including the dynamic processing capabilities, is provided by the CREODIAS Infrastructure-as-a-Service (IaaS). CREODIAS is one of the Copernicus Data and Information Access Services (DIAS) platforms, which allow users to discover, manipulate, process and download Copernicus data and information. CREODIAS also provides the majority of the EO data, ensuring fast direct access to the EO data to be processed. Openstack is used to control the deployment of this infrastructure.
  • To enable the efficient, scalable processing of EO data containerisation technology. The use of container-technology is particularly relevant as it allows distinct processing activities to be instigated and executed independent of other activities. This both reduces the limitations in terms of processing activities and also ensure resources are used efficiently. Within the EO4SD Lab, OSS tools such as Argo and Kubernetes are used for container orchestration with containers themselves being created using Docker.
  • EO data processing and generation uses a range of commonly-used software languages tools and packages. EO4SD Lab provides a data processing service portfolio for the generation of a range of derived products, such as land cover, vegetation indices and water parameters. These have been built using applications such as OrfeoToolbox and ESA SNAP, or coding languages such as python. The EO4SD Lab has, through its developer tab, the ability for users to upload their own scripts written in a wide range of languages, enabling them to create their own bespoke services.
  • Visualisation of geospatial data is enabled through two key OSS tools. GeoServer is a commonly used open-source server that allows users to share, process and edit geospatial data. It publishes data from any major spatial data source using open standards and is the key tool for the displaying geospatial data through the GeoBrowser. Complementing this is GeoNode, which is a web-based application and platform for developing geospatial information systems (GIS) and for deploying spatial data infrastructures (SDI). Within the EO4SD Lab, GeoNode, through the EO Wiki interface, provides users with searchable access to the various geospatial data layers generated and the ability to combine these layers into maps for further analysis and sharing. Additionally, the EO4SD Lab provides user with access to full software packages such as SNAP and QGIS for further data analysis, visualisation and processing.

Other supporting OSS components are used to provide the key portal functionality – this includes JupyterHub for code sharing, KeyCloak for single sign-on and user management and Elastic Stack for index searching and data analytics.

Ultimately, EO4SD Lab is fusing cloud and Copernicus to create a fully scalable portal suitable for all user levels, giving them an online suite of tools and services to create their own individual and distinct data products. This not only helps to make the most effective use of Europe’s investment into EO data collection, but is also opening up the potential for EO information to be applied in developing countries where IT infrastructure is more likely to be outdated and limited bandwidths may prevent effective access to large data streams.