Content:

ESSI – Earth & Space Science Informatics

ESSI1.1 – Informatics in Oceanography and Ocean Science

EGU2020-873 | Displays | ESSI1.1

The new online Black Sea Oceanographic Database

Elena Zhuk, Maxim Vecalo, and Andrey Ingerov

The new online Black Sea Oceanographic Database

Elena.V. Zhuk

 Marine Hydrophysical Institute, Russian Academy of Science, Russia

alenixx@gmail.com

 

The new improvements of the Black Sea Oceanographic Database (BSOD) dedicated to the online access of the hydrological and hydro-chemical data, taking into account users priorities, data types, methods and time of data access are presented.

According to the results of the free DBMS analysis, the PostgreSQL object-relational DBMS was selected for archiving the data in the BSOD. PostgreSQL provides high performance and reliability and the ability to work with a big data. Moreover, the PostgreSQL has the functions allowing to work with GIS objects, using the PostGIS extension and has built-in support for poorly structured data in JSON format. For the development provided the capability to select   large data set in accordance with the criteria specified by metadata selection. Taking these two features into account, the part of the database responsible for accessing the metadata, was designed for interactive transaction processing (OLTP access template), while the other part, responsible for the in-situ data archiving was developed in accordance with the “star” architecture, which is typical for the OLAP access template.

After analyzing the oceanographic in-situ observations, the following main entities were identified: Cruise, Ship, Station, Measurements, as well as Measured parameters and the relationships between them. A set of attributes was compiled for each of the entities and the tables were designed. The BSOD includes the following:

- Metadata tables : Cruises, ships, stations, stations_parameters.

- Data tables: measurements.

-Vocabularies: vocabularies were constructed using the SeaDataCloud BODC vocabularies parameters.

-Referencedata tables: GEBCO, EDMO, p01_vocabuary, p02_vocabuary, p06_vocabuary, l05_vocabuary.

To provide the online data access to the Black Sea Oceanographic Database, a  User Interface-UI was implemented. It was developed using jQuery and mapBox GL javascript libraries and provides visual data selection for date period, cruises, parameters such as temperature, salinity, oxygen, nitrates, nitrites, phosphates and other metadata.

Acknowledgements: the work was carried out in the framework of the Marine Hydrophysical Institute of the Russian Academy of Science  task No. 0827-2018-0002.

Keywords: Black Sea, oceanographic database,  PostgreSQL, online data access, Geo-information system.

How to cite: Zhuk, E., Vecalo, M., and Ingerov, A.: The new online Black Sea Oceanographic Database , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-873, https://doi.org/10.5194/egusphere-egu2020-873, 2020.

EGU2020-2004 | Displays | ESSI1.1

VM-ADCP backscatter data management using QGIS

Paola Picco, Roberto Nardini, Sara Pensieri, Roberto Bozzano, Luca Repetti, and Maurizio Demarte

VM-ADCP (Vessel Mounted Acoustic Doppler Current Profiler) are regularly operating on board of several research vessels with the aim of providing 3-D ocean currents fields. Along with ocean currents, these instruments also measure acoustic backscatter profile on a known frequency, that can be of great advantages for other environmental investigations such as the zooplankton migrations. The presence of zooplankton can be detected by a variation of acoustic backscatter changing  with the depth at a periodic (diurnal or semidiurnal) variability, related to the vertical  migration of these organisms. GIS has proven to be a powerful tool to manage the huge amount of VM-ADCP backscatter data obtained during the oceanographic campaigns. Moreover, this allows to extract relevant information on zooplankton distribution and abundance, even when the monitoring strategy of the experiment does not completely meet the temporal and spatial resolution required for these studies. The application here described has been developed on QGIS and tested on the Ligurian Sea (Mediterranean Sea). In order to obtain the comparability of data from instruments operating at different frequencies and sampling set-up, echo intensity data are converted into volume backscatter strength and corrected for the slant-range. Using high-resolution bathymetry rasters acquired and processed by the Italian Hydrographic Institute, allows to discard the anomalous high backscatter values due to presence of the bottom. Another advantage of the GIS is the possibility to easily identify night-collected data from the daily ones and their spatial distribution, as well as those from the surface and the deeper layer. All the possible combinations can be then visualised and analysed.

How to cite: Picco, P., Nardini, R., Pensieri, S., Bozzano, R., Repetti, L., and Demarte, M.: VM-ADCP backscatter data management using QGIS, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-2004, https://doi.org/10.5194/egusphere-egu2020-2004, 2020.

EGU2020-8073 | Displays | ESSI1.1

Delivering marine data from the cloud using the SeaDataCloud Discovery and Access service

Peter Thijsse, Dick Schaap, and Michele Fichaut

SeaDataNet is an operational pan-European infrastructure for managing marine and ocean data and its core partners are National Oceanographic Data Centres (NODC’s) and oceanographic data focal points from 34 coastal states in Europe. Currently SeaDataNet gives discovery and access to more than 2.3 million data sets for physical oceanography, chemistry, geology, geophysics, bathymetry and biology from more than 650 data originators. The population has increased considerably in cooperation with and involvement in many associated EU projects and initiatives such as EMODnet. The SeaDataNet infrastructure has been set up in a series of projects in last two decades. Currently the SeaDataNet core services and marine data management standards are upgraded in the EU HORIZON 2020 ‘SeaDataCloud’ project that runs for 4 years from 1st November 2016. The upgraded services include a movement “to the cloud” via a strategic and technical cooperation of the SeaDataNet consortium with the EUDAT consortium of e-infrastructure service providers. This is an important step into the EOSC domain.

One of the main components of SeaDataNet is the CDI Data Discovery and Access service that provides users access to marine data from 100 connected data centres. The previous version of the CDI service was appreciated for harmonising the dataset, but also had some flaws towards usability of the interface and performance.  Under SeaDataCloud the CDI Data Discovery and Access service has now been upgraded by introducing a central data buffer in the cloud that continuously synchronises by replication from the data centres. The “datacache” itself is being hosted and horizontally synchronised between 5 EUDAT e-data centres. During the implementation of the replication prcoess additional quality control mechanisms have been included on the central metadata and associated data in the buffer.

In October 2019 the actual public launch took place of the operational production version of the upgraded CDI Data Discovery and Access service. The user interface has been completely redeveloped, upgraded, reviewed and optimised, offering a very efficient query and shopping experience with great performance. Also, the import process for new and updated CDI metadata and associated data sets has been innovated, introducing successfully cloud technology.

The upgraded user interface has been developed and tested in close cooperation with the users. It now also includes the “MySeaDataCloud” concept in which various services are offered to meet the latest demands of users: e.g. save searches, sharing datasearches and eventually even pushing data in the SDC VRE. The user interface and machine-to-machine interfaces have improved the overall quality, performance and ease-of-use of the CDI service towards human users and machine processes.

The presentation will provide more technical background on the upgrading of the CDI Data Discovery and Access service, and adopting the cloud. It will report on the current release (https://cdi.seadatanet.org), demonstrate the wealth of data, present the experiences of developing services in the cloud, and demonstrate the advantages of this system for the scientific community.

How to cite: Thijsse, P., Schaap, D., and Fichaut, M.: Delivering marine data from the cloud using the SeaDataCloud Discovery and Access service, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-8073, https://doi.org/10.5194/egusphere-egu2020-8073, 2020.

EGU2020-10292 | Displays | ESSI1.1

New BioGeoChemical products provided by the Copernicus Marine Service

Virginie Racapé, Vidar Lien, Nilsen Jan Even Øie, Havard Vindenes, Leonidas Perivoliotis, and Seppo Kaitala

The Copernicus Marine service is a “one-stop-shop” providing freely available operational data on the state of the marine environment for use by marine managers, advisors, and scientists, as well as intermediate and end users in marine businesses and operations. The Copernicus Marine service offers operationally updated and state-of-the-art products that are well documented and transparent. The European Commission’s long-term commitment to the Copernicus program offers long-term visibility and stability of the Copernicus Marine products. Furthermore, Copernicus Marine offers a dedicated service desk, in addition to training sessions and workshops.

Here, we present the in situ biogeochemical data products distributed by the Copernicus Marine System since 2018. It offers available data of chlorophyll-a, oxygen, and nutrients collected across the globe. These products integrate observation aggregated from the Regional EuroGOOS consortium (Arctic-ROOS, BOOS, NOOS, IBI-ROOS, MONGOOS) and Black Sea GOOS as well as from SeaDataNet2 National Data Centers (NODCs) and JCOMM global systems (Argo, GOSUD, OceanSITES, GTSPP, DBCP) and the Global telecommunication system (GTS) used by the Met Offices.

The in situ Near Real Time biogeochemical product is updated every month whereas the reprocessed product is updated two times per year. Products are delivered on NetCDF4 format compliant with the CF1.7 standard and well-documented quality control procedures.

How to cite: Racapé, V., Lien, V., Jan Even Øie, N., Vindenes, H., Perivoliotis, L., and Kaitala, S.: New BioGeoChemical products provided by the Copernicus Marine Service , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10292, https://doi.org/10.5194/egusphere-egu2020-10292, 2020.

Access to marine data is a key issue for the EU Marine Strategy Framework Directive and the EU Marine Knowledge 2020 agenda and includes the European Marine Observation and Data Network (EMODnet) initiative. EMODnet aims at assembling European marine data, data products and metadata from diverse sources in a uniform way.

The EMODnet Bathymetry project is active since 2008 and has developed Digital Terrain Models (DTM) for the European seas, which are published at a regular interval, each time improving quality and precision, and expanding functionalities for viewing, using, and downloading. The DTMs are produced from survey and aggregated data sets that are referenced with metadata adopting the SeaDataNet Catalogue services. SeaDataNet is a network of major oceanographic data centres around the European seas that manage, operate and further develop a pan-European infrastructure for marine and ocean data management. The latest EMODnet Bathymetry DTM release also includes Satellite Derived Bathymetry and has a grid resolution of 1/16 arcminute (circa 125 meters), covering all European sea regions. Use has been made of circa 9400 gathered survey datasets, composite DTMs and SDB bathymetry. Catalogues and the EMODnet DTM are published at the dedicated EMODnet Bathymetry portal including a versatile DTM viewing and downloading service.  

As part of the expansion and innovation, more focus has been directed towards bathymetry for near coastal waters and coastal zones. And Satellite Derived Bathymetry data have been produced and included to fill gaps in coverage of the coastal zones. The Bathymetry Viewing and Download service has been upgraded to provide a multi-resolution map and including versatile 3D viewing. Moreover, best-estimates have been determined of the European coastline for a range of tidal levels (HAT, MHW, MSL, Chart Datum, LAT), thereby making use of a tidal model for Europe. In addition, a Quality Index layer has been formulated with indicators derived from the source data and which can be queried in the The Bathymetry Viewing and Download service. Finally, extra functonality has been added to the mechanism for downloading DTM tiles in various formats and special high-resolution DTMs for interesting areas.  

This results in many users visiting the portal, browsing the DTM Viewer, downloading the DTM tiles and making use of the OGC Web services for using the EMODnet Bathymetry in their applications.

The presentation will highlight key details of the EMODnet Bathymetry DTM production process and the Bathymetry portal with its extensive functionality.

How to cite: Schaap, D. M. A. and Schmitt, T.: EMODnet Bathymetry – further developing a high resolution digital bathymetry for European seas , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10296, https://doi.org/10.5194/egusphere-egu2020-10296, 2020.

EGU2020-13614 | Displays | ESSI1.1

Speeding-up data analysis: DIVAnd interpolation tool in the Virtual Research Environment

Charles Troupin, Alexander Barth, Merret Buurman, Sebastian Mieruch, Léo Bruvry Lagadec, Themis Zamani, and Peter Thijsse

A typical hurdle faced by scientists when it comes to process data is the installation and maintenance of software tools: the installation procedures are sometimes poorly documented, while there is often several dependencies that may create incompatibilities issues. In order to make easier the life of scientists and experts, a Virtual Research Environment (VRE) is being developed in the frame of SeaDataCloud project.

The goal is to provide them with a computing environment where the tools are already deployed and datasets are available for direct processing. In the context of SeaDataCloud, the tools are:

  • WebODV, able to perform data reading, quality check, subsetting, among many other possibilities.
  • DIVAnd, for the spatial interpolation of in situ measurements.
  • A visualisation toolbox for both the input data and the output, gridded fields.

DIVAnd 

DIVAnd (Data-Interpolating Variational Analysis in n dimensions) is a software tool designed to  generate a set of gridded fields from in situ observations. The code is written in Julia a high-performance programming language (https://julialang.org/), particularly suitable for the processing of large matrices. 

The code, developed and improved on a regular basis, is distributed via the hosting platform GitHub: https://github.com/gher-ulg/DIVAnd.jl. It supports Julia-1.0 since its version 2.1.0 (September 2018). 

Notebooks

Along with the source code, a set of jupyter-notebooks describing the different steps for the production of a climatology are provided, with an increasing level of complexity: https://github.com/gher-ulg/Diva-Workshops/tree/master/notebooks.

Deployment in the VRE

JupyterHub (https://jupyter.org/hub), is a multiple-user instance of jupyter notebooks. It has proven an adequate solution to allow several users to work simultaneously with the DIVAnd tool and it offers different ways to isolate the users. The approach selected in the frame of this project is the Docker containers, in which the software tools, as well as their dependencies, are stored. This solution allows multiple copies of a container to be run efficiently in a system and also makes it easier to perform the deployment in the VRE. The authentication step is also managed by JupyterHub.

Docker container

The Docker container is distributed via Docker Hub (https://hub.docker.com/r/abarth/divand-jupyterhub) and includes the installation of:

  • The Julia language (currently version 1.3.1);
  • Libraries and tools such as netCDF, unzip, git;
  • Various Julia packages such as PyPlot (plotting library), NCDatasets (manipulation of netCDF files) and DIVAnd.jl.
  • The most recent version of the DIVAnd notebooks.

All in all, Docker allows one to provide a standardized computing environment to all users and helped significantly the development of the VRE.

How to cite: Troupin, C., Barth, A., Buurman, M., Mieruch, S., Bruvry Lagadec, L., Zamani, T., and Thijsse, P.: Speeding-up data analysis: DIVAnd interpolation tool in the Virtual Research Environment, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-13614, https://doi.org/10.5194/egusphere-egu2020-13614, 2020.

The European Open Science Cloud (EOSC) is an initiative launched by the European Commission in 2016, as part of the European Cloud Initiative. EOSC aims to provide a virtual environment with open and seamless services for storage, management, analysis and re-use of research data, across borders and scientific disciplines, leveraging and federating the existing data infrastructures.

Following its launch several Calls have been published and several projects have been granted for developing (parts of) the EOSC, such as for example ENVRI-FAIR. For the marine domain a dedicated call was launched as part of ‘The Future of Seas and Oceans Flagship Initiative’, combining interests of developing a thematic marine EOSC cloud and serving the Blue Economy, Marine Environment and Marine Knowledge agendas.

The winning H2020 Blue-Cloud project is dedicated to marine data management and it is coordinated by Trust-IT with MARIS as technical coordinator. The aims are:

  • To build and demonstrate a Pilot Blue Cloud by combining distributed marine data resources, computing platforms, and analytical services
  • To develop services for supporting research to better understand & manage the many aspects of ocean sustainability

  • To develop and validate a number of demonstrators of relevance for marine societal challenges
  • To formulate a roadmap for expansion and sustainability of the Blue Cloud infrastructure and services.

The project will federate leading European marine data management infrastructures (SeaDataNet, EurOBIS, Euro-Argo, Argo GDAC, EMODnet, ELIXIR-ENA, EuroBioImaging, CMEMS, C3S, and ICOS-Marine), and horizontal e-infrastructures (EUDAT, DIAS, D4Science) to capitalise on what exists already and to develop and deploy the Blue Cloud. The federation will be at the levels of data resources, computing resources and analytical service resources. A Blue Cloud data discovery and access service will be developed to facilitate sharing with users of multi-disciplinary datasets. A Blue Cloud Virtual Research Environment (VRE) will be established to facilitate that computing and analytical services can be shared and combined for specific applications.

This innovation potential will be explored and unlocked by developing five dedicated Demonstrators as Virtual Labs together with excellent marine researchers. There is already a large portfolio of existing services managed by the Blue Cloud founders which will be activated and integrated to serve the Blue-Cloud. 

The modular architecture of the VRE will allow scalability and sustainability for near-future expansions, such as connecting additional infrastructures, implementing more and advanced blue analytical services, configuring more dedicated Virtual Labs, and targeting more (groups of) users.

The presentation will describe the vision of the Blue-Cloud framework, the Blue-Cloud data discovery and access service (to find and retrieve data sets from a diversified array of key marine data infrastructures dealing with physics, biology, biodiversity, chemistry, and bio genomics), the Blue-Cloud VRE (to facilitate collaborative research using a variety of data sets and analytical tools, complemented by generic services such as sub-setting, pre-processing, harmonizing, publishing and visualization). The technical architecture of Blue-Cloud will be presented via 5 real-life use-cases to demonstrate the impact that such innovation can have on science and society.

How to cite: Garavelli, S. and Schaap, D. M. A.: Blue-Cloud: Developing a marine thematic EOSC cloud to explore and demonstrate the potential of cloud based open science in the domain of ocean sustainability , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-16449, https://doi.org/10.5194/egusphere-egu2020-16449, 2020.

EGU2020-20215 | Displays | ESSI1.1

Browser based state-of-the-art software for automated data reduction, quality control and dissemination for marine carbon data

Benjamin Pfeil, Steve Jones, Maren Karlsen, Camilla Stegen Landa, Rocio Castano Primo, Alex Vermeulen, and Oleg Mirzov

Essential Ocean Variable Inorganic Carbon observations collected from instruments at sea are typically processed by individual PIs before submitting to data centres and other data archives. Often this work is done on an ad-hoc basis using unpublished, self-built software, and published in unique formats. This conflicts with the Interoperability and Reusability aspects of the FAIR data principles: such data requires significant reformatting efforts by data centres and/or end users, and reproducibility is impossible without a full record of the processing performed and QC decisions made by PIs. The manual nature of this process implies additional workload for PIs who need to submit their data to multiple archives/data product. There is a clear need to standardise the data workflow from measurement to publication using common, open source, and documented tools whose algorithms are fully accessible and all processing is recorded for full transparency.

The Ocean Thematic Centre of the European Research Infrastructure ICOS (Integrated Carbon Observation System) is developing QuinCe, a browser-based tool for uploading, processing, automatic and manual quality control, and publication of data from underway pCO₂ systems on ships and moorings. Data can be uploaded directly from instruments in any text format, where it is standardised and processed using algorithms approved by the scientific community. Automatic QC algorithms can detect many obvious data errors; afterwards PIs can perform full quality control of the data following Standard Operating Procedures and best practises. All records of QC decisions, with enforced explanatory notes, are recorded by the software to enable full traceability and reproducibility. The final QCed dataset can be downloaded by the PI, and is sent to the ICOS Carbon Portal and SOCAT project for publication. The ICOS Carbon Portal integrates marine data with ICOS data from the ecosystem and atmosphere on a regional scale and data is integrated via SOCAT in the annual Global Carbon Budgets of the Global Carbon Project where it informs policy/decision makers, the scientific community and the general public.

For platforms with operational data flows, the data is transmitted directly from ship to shore, QuinCe processes, quality controls and publishes Near Real Time data to the ICOS Carbon Portal and to Copernicus Marine Environmental Monitoring Services In Situ TAC as soon as it is received with no human intervention, greatly reducing the time from measurement to data availability.

Full metadata records for instruments are kept and maintained at the ICOS Carbon Portal, utilising existing standardised vocabularies and version control to maintain a complete history. The correct metadata for any given dataset is available at any time, and can be converted to any required format, allowing compliance with the United Nations Sustainable Development Goal 14.3.1 methodology ‘average marine acidity (pH) measured at agreed suite of representative sampling stations’ and ICOS data relevant to SDG 14.3 is distributed to IOC UNESCO’s IODE. While much of this work is currently performed manually, international efforts are underway to develop fully automated systems and these will be integrated as they become available.

How to cite: Pfeil, B., Jones, S., Karlsen, M., Stegen Landa, C., Castano Primo, R., Vermeulen, A., and Mirzov, O.: Browser based state-of-the-art software for automated data reduction, quality control and dissemination for marine carbon data, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-20215, https://doi.org/10.5194/egusphere-egu2020-20215, 2020.

Marine and ocean data represent a significant resource that can be used to improve the global knowledge of the seas. A huge amount of data is produced every day by ocean observations all around Europe. The ability to leverage this valuable potential depends on the capacity of the already established European (EU) ocean data infrastructures to support new needs in the field of ocean data management and to adopt the emerging technologies.

The SeaDataNet e-infrastructure (https://www.seadatanet.org), built up in early 2000 years, plays an important role for marine scientists and other ocean stakeholders communities, giving access to more than 2.2 million multidisciplinary harmonised marine and ocean data sets coming mainly from the European seas collected by more than 110 data centers, and offering data products and metadata services. Thanks to the 4-year SeaDataCloud Horizon 2020 project, started the 1st of November 2016, the development of a more efficient electronic infrastructure, kept up with the times and offering new services, based on the cloud and High Performance Computing (HPC) technologies, was addressed. It has renewed the original SeaDataNet Information Technology (IT) architecture. The collaboration with the EUDAT consortium, composed of a number of research communities and large European computer and data centres, enabled the migration of the data storage and services into the cloud environment, new instruments, such as High-Frequency Radar (HFR), Flow Cytometer and Glider data, have been standardised in agreement with the respective user communities. Furthermore, a Virtual Research Environment will support research collaboration.

SDN infrastructure is focused on historical digital ocean data and also supports the management of data streams from sensors based on the Sensor Web Enablement (SWE) standards of the Open Geospatial Consortium (OGC).

Harmonisation of ocean data allows more countries to be able to use data for scientific research and for decision-making purpose but data re-use is related also to the trust that the ocean scientific community places in the data. The latter issue involves a well-defined process of data quality checks. In SDN, data producers have to label each individual measurement with a value according to the SDN Quality Check (QC) Flags, and they follow specific procedures presented in the SDN-QC guideline (https://www.seadatanet.org/Standards/Data-Quality-Control). Furthermore, a range of checks are carried out on the data, as part of the process of data products generation to improve the overall quality.

A relevant issue that limits data re-use is that some researchers are reluctant to share their own data, the push to encourage them it is to give them the right acknowledgment for the work done by means of the data citation, for this reason from the SDN portal a Digital Object Identifier (DOI) minting service is freely available for every data producer that shares their data. In addition, data versioning is available on the cloud platform for reproducible analysis.

 

 

The SeaDataCloud project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement Nº 730960.

How to cite: Pecci, L., Fichaut, M., and Schaap, D.: Enhancing SeaDataNet e-infrastructure for ocean and marine data, new opportunities and challenges to foster data re-use, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-20337, https://doi.org/10.5194/egusphere-egu2020-20337, 2020.

EGU2020-21387 | Displays | ESSI1.1

Cultivating a mutually beneficial ocean science data management relationship with Brexit Nations

Andrew Conway, Adam Leadbetter, and Tara Keena

Integration of data management systems is a persistent problem in European projects that span multiple agencies. Months, if not years of projects are often expended on the integration of disparate database structures, data types, methodologies and outputs. Moreover, this work is usually confined to a single effort, meaning it is needlessly repeated on subsequent projects. The legacy effect of removing these barriers could therefore yield monetary and time savings for all involved, far beyond a single cross-jurisdictional project. 

The European Union’s INTERREG VA Programme has funded the COMPASS project to better manage marine protected areas (MPA) in peripheral areas. Involving five organisations, spread across two nations, the project has developed a cross-border network for marine monitoring. Three of those organisations are UK-based and bound for Brexit (the Agri-Food and Biosciences Institute, Marine Scotland Science and the Scottish Association of Marine Science). With that network under construction, significant efforts have been placed on harmonizing data management processes and procedures between the partners. 

A data management quality management framework (DM-QMF) was introduced to guide this harmonization and ensure adequate quality controls would be enforced. As lead partner on data management, the Irish Marine Institute (MI) initially shared guidelines for infrastructure, architecture and metadata. The implementation of those requirements were then left to the other four partners, with the MI acting as facilitator. This led to the following being generated for each process in the project:

Data management plan: Information on how and what data were to be generated as well as where it would be stored. 

Flow diagrams: Diagrammatic overview of the flow of data through the project. 

Standard Operating Procedures: Detailed explanatory documents on the precise workings of a process.

Data management processes were allowed to evolve naturally out of a need to adhere to this set standard. Organisations were able to work within their operational limitations, without being required to alter their existing procedures, but encouraged to learn from each other. Very quickly it was found that there were similarities in processes, where previously it was thought there were significant differences. This process of sharing data management information has created mutually benefiting synergies and enabled the convergence of procedures within the separate organisations. 

The downstream data management synergies that COMPASS has produced have already taken effect. Sister INTERREG VA projects, SeaMonitor and MarPAMM, have felt the benefits. The same data management systems cultivated as part of the COMPASS project are being reused, while the groundwork in creating strong cross boundary channels of communication and cooperation are saving significant amounts of time in project coordination.

Through data management, personal and institutional relationships have been strengthened, both of which should persist beyond the project terminus in 2021, well into a post-Brexit Europe. The COMPASS project has been an exemplar of how close collaboration can persist and thrive in a changing political environment, in spite of the ongoing uncertainty surrounding Brexit.

How to cite: Conway, A., Leadbetter, A., and Keena, T.: Cultivating a mutually beneficial ocean science data management relationship with Brexit Nations, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21387, https://doi.org/10.5194/egusphere-egu2020-21387, 2020.

EGU2020-21908 | Displays | ESSI1.1

EMODnet: FAIR and open source marine data, digital products and services

Jan-Bart Calewaert, Kate Larkin, Conor Delaney, Andree Anne Marsan, and Tim Collart

Unlocking the potential of big ocean data relies on Findable, Reusable, Interoperable and Reusable (FAIR) data. This is a core principle of the European Marine Observation and Data network (EMODnet), a leading long-term marine data service provider, funded by the EU. Over 150 organizations deliver harmonized data through seven portals spanning bathymetry, geology, physics, chemistry, biology, seabed habitats and human activities, with a central portal. Recent data and data products include a high-resolution digital terrain model bathymetry product, digital  vessel density maps, marine litter maps and products on geological features. International use cases include sustainable fisheries management and offshore wind farm development. The EMODnet Data Ingestion Service enhances data sharing and the EMODnet Associated Partnership Scheme offers benefits for industry and wider stakeholders. Increasingly, EMODnet is interacting and collaborating with other key marine data initiatives in Europe and globally. This include collaborations with Copernicus Marine Service (CMEMS), SeaDataCloud and others to develop the pilot Blue Cloud as a marine component of the European Open Science Cloud (EOSC), as well as with China, USA and international organisations such as IODE/IOC. This presentation/contribution will provide an update on EMODnet developments, with a future outlook considering main challenges and opportunities and touch upon key collaborations with other marine data initiatives in Europe and globally.

How to cite: Calewaert, J.-B., Larkin, K., Delaney, C., Marsan, A. A., and Collart, T.: EMODnet: FAIR and open source marine data, digital products and services, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21908, https://doi.org/10.5194/egusphere-egu2020-21908, 2020.

ESSI1.12 – Innovative Evaluation Frameworks and Platforms for Weather and Climate Research

EGU2020-729 | Displays | ESSI1.12 | Highlight

A Conceptual Framework for Modelling the Climate Change and its Impacts within a River Basin using Remote Sensing data

Sathyaseelan Mayilvahanam, Sanjay Kumar Ghosh, and Chandra Shekhar Prasad Ojha

Abstract

In general, modelling the climate change and its impacts within a hydrological unit brings out an understanding of the system and, its behaviour with various model constrains. The climate change and global warming studies are being under research and development phase, because of its complex and dynamic nature. The IPCC 5th Assessment Report on global warming states that in the 21st century, there may be an increase in temperature of the order of ~1.5°C. This transient climate may cause significant impacts or any discrepancies in the water availability of the hydrological unit. This may lead to severe impacts in countries with high population such as India, China, etc., The Remote sensing datasets play an essential role in modelling the climatic changes for a river basin at different spatial and temporal scales. This study aims to propose a conceptual framework for the above-defined problem with emphasising on remote sensing datasets. This framework involves five entities such as the data component, process component,  impact component,  feedback component and, uncertainty component. The framework flow begins with the data component entity that involves two significant inputs, such as the hydro-meteorological data and the land-hydrology data. The essential attributes of the hydro-meteorological data entities are the precipitation, temperature, relative humidity, wind speed and solar radiation. These datasets may be obtained and analysed from empirical or statistical methods, in-situ based or satellite-based methods, respectively. These mathematical models on long-run historical climate data may provide knowledge on climate change detections or its trends. The meteorological data derived from the satellites may have a measurable bias with that of the in situ data. The satellite-based land-hydrology data component involves various attributes such as topography, soil, vegetation, water bodies, other land use / land cover, soil moisture, evapotranspiration. The process component involves complex land-hydrology processes that may be well established and modelled by customizable hydrological models. Here, we may emphasise the use of remote-sensing based model parameter values in the equations either directly or indirectly. Also, the land-atmospheric process component involves various complex processes that may take place in this zone. These processes may be well established and solved by customizable atmospheric weather models. The land components play a significant role in modelling the climate changes, because these land processes may trigger global warming by various anthropogenic agents. The main objective of this framework is to emphasise the climate change impacts using remote sensing. Hence, the impact component entity plays an essential role in this conceptual framework. The climate change impact within a river basin at various spatial and temporal scales are identified using different hydrological responses. The feedback entity is the most sensitive part of this framework, because it may alter the climate forcing either positive or negative. An uncertainty model component handles the uncertainty in the model framework. The highlight of this conceptual framework is to use the remote sensing datasets in climate change studies. The limitations on the correctness of the remote sensing data with the insitu data at every location is not feasible.

How to cite: Mayilvahanam, S., Ghosh, S. K., and Ojha, C. S. P.: A Conceptual Framework for Modelling the Climate Change and its Impacts within a River Basin using Remote Sensing data, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-729, https://doi.org/10.5194/egusphere-egu2020-729, 2020.

EGU2020-1612 | Displays | ESSI1.12

Building Web Processing Services with Birdhouse

Carsten Ehbrecht, Stephan Kindermann, Ag Stephens, and David Huard

The Web Processing Service (WPS) is an OGC interface standard to provide processing tools as Web Service.
The WPS interface standardizes the way processes and their inputs/outputs are described,
how a client can request the execution of a process, and how the output from a process is handled.

Birdhouse tools enable you to build your own customised WPS compute service
in support of remote climate data analysis.

Birdhouse offers you:

  • A Cookiecutter template to create your own WPS compute service.
  • An Ansible script to deploy a full-stack WPS service.
  • A Python library, Birdy, suitable for Jupyter notebooks to interact with WPS compute services.
  • An OWS security proxy, Twitcher, to provide access control to WPS compute services.

Birdhouse uses the PyWPS Python implementation of the Web Processing Service standard.
PyWPS is part of the OSGeo project.

The Birdhouse tools are used by several partners and projects.
A Web Processing Service will be used in the Copernicus Climate Change Service (C3S) to provide subsetting
operations on climate model data (CMIP5, CORDEX) as a service to the Climate Data Store (CDS).
The Canadian non profit organization Ouranos is using a Web Processing Service to provide climate indices
calculation to be used remotely from Jupyter notebooks.

In this session we want to show how a Web Processing Service can be used with the Freva evaluation system.
Freva plugins can be made available as processes in a Web Processing Service. These plugins can be run
using a standard WPS client from a terminal and Jupyter notebooks with remote access to the Freva system.

We want to emphasise the integrational aspects of the Birdhouse tools: supporting existing processing frameworks
to add a standardized web service for remote computation.

Links:

  • http://bird-house.github.io
  • http://pywps.org
  • https://www.osgeo.org/
  • http://climate.copernicus.eu
  • https://www.ouranos.ca/en
  • https://freva.met.fu-berlin.de/

How to cite: Ehbrecht, C., Kindermann, S., Stephens, A., and Huard, D.: Building Web Processing Services with Birdhouse, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-1612, https://doi.org/10.5194/egusphere-egu2020-1612, 2020.

EGU2020-3501 | Displays | ESSI1.12 | Highlight

The Arctic Ocean Observation Operator for 6.9 GHz (ARC3O)

Clara Burgard, Dirk Notz, Leif T. Pedersen, and Rasmus T. Tonboe

The diversity in sea-ice concentration observational estimates retrieved from brightness temperatures measured from space is a challenge for our understanding of past and future sea-ice evolution as it inhibits reliable climate model evaluation and initialisation. To address this challenge, we introduce a new tool: the Arctic Ocean Observation Operator (ARC3O). 

ARC3O allows us to simulate brightness temperatures at 6.9 GHz at vertical polarisation from standard output of an Earth System Model to be compared to observations from space at this frequency. We use simple temperature and salinity profiles inside the snow and ice column based on the output of the Earth System Model to compute these brightness temperatures. 

In this study, we evaluate ARC3O by simulating brightness temperatures based on three assimilation runs of the MPI Earth System Model (MPI-ESM) assimilated with three different sea-ice concentration products. We then compare these three sets of simulated brightness temperatures to brightness temperatures measured by the Advanced Microwave Scanning Radiometer Earth Observing System (AMSR-E) from space. We find that they differ up to 10 K in the period between October and June, depending on the region and the assimilation run. However, we show that these discrepancies between simulated and observed brightness temperature can be mainly attributed to the underlying observational uncertainty in sea-ice concentration and, to a lesser extent, to the data assimilation process, rather than to biases in ARC3O itself. In summer, the discrepancies between simulated and observed brightness temperatures are larger than in winter and locally reach up to 20 K. This is caused by the very large observational uncertainty in summer sea-ice concentration but also by the melt-pond parametrisation in MPI-ESM, which is not necessarily realistic. 

ARC3O is therefore capable to realistically translate the simulated Arctic Ocean climate state into one observable quantity for a more comprehensive climate model evaluation and initialisation, an exciting perspective for further developing this and similar methods.

How to cite: Burgard, C., Notz, D., Pedersen, L. T., and Tonboe, R. T.: The Arctic Ocean Observation Operator for 6.9 GHz (ARC3O), EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-3501, https://doi.org/10.5194/egusphere-egu2020-3501, 2020.

EGU2020-4658 | Displays | ESSI1.12

Integrating e-infrastructures for remote climate data processing

Christian Pagé, Wim Som de Cerff, Maarten Plieger, Alessandro Spinuso, Iraklis Klampanos, Malcolm Atkinson, and Vangelis Karkaletsis

Accessing and processing large climate data has nowadays become a particularly challenging task for end users, due to the rapidly increasing volumes being produced and made available. Access to climate data is crucial for sustaining research and performing climate change impact assessments. These activities have strong societal impact as climate change affects and requires that almost all economic and social sectors need adapting.

The whole climate data archive is expected to reach a volume of 30 PB in 2020 and up to 2000 PB in 2024 (estimated), evolving from 0.03 PB (30 TB) in 2007 and 2 PB in 2014. Data processing and analysis must now take place remotely for the users: users typically have to rely on heterogeneous infrastructures and services between the data and their physical location. Developers of Research Infrastructures have to provide services to those users, hence having to define standards and generic services to fulfil those requirements.

It will be shown how the DARE eScience Platform (http://project-dare.eu) will help developers to develop needed services more quickly and transparently for a large range of scientific researchers. The platform is designed for efficient and traceable development of complex experiments and domain-specific services. Most importantly, the DARE Platform integrates the following e-infrastructure services: the climate IS-ENES (https://is.enes.org) Research Infrastructure front-end climate4impact (C4I: https://climate4impact.eu), the EUDAT CDI (https://www.eudat.eu/eudat-collaborative-data-infrastructure-cdi) B2DROP Service, as well as the ESGF (https://esgf.llnl.gov). The DARE Platform itself can be deployed by research communities on local, public or commercial clouds, thanks to its containerized architecture.

More specifically, two distinct Use Cases for the climate science domain will be presented. The first will show how an open source software to compute climate indices and indicators (icclim: https://github.com/cerfacs-globc/icclim) is leveraged using the DARE Platform to enable users to build their own workflows. The second Use Case will demonstrate how more complex tools, such as an extra-tropical and tropical cyclone tracking software (https://github.com/cerfacs-globc/cyclone_tracking), can be easily made available to end users by infrastructure and front-end software developers.

How to cite: Pagé, C., Som de Cerff, W., Plieger, M., Spinuso, A., Klampanos, I., Atkinson, M., and Karkaletsis, V.: Integrating e-infrastructures for remote climate data processing, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-4658, https://doi.org/10.5194/egusphere-egu2020-4658, 2020.

EGU2020-8543 | Displays | ESSI1.12

CDOs for CMIP6 and Climate Extremes Indices

Fabian Wachsmann

The Climate Data Operators [1] tool kit (CDO) is a worldwide popular infrastructure software developed and maintained at the Max Planck Institute for Meteorology (MPI-M). It comprises a large number of command line operators for gridded data, including statistics, interpolation, or arithmetics. Users benefit from the extensive support facilities provided by the MPI-M and the DKRZ.

As a part of the sixth phase of the Coupled Model Intercomparison Project (CMIP6), the German Federal Ministry of Education and Research (BMBF) is funding activities promoting the use of the CDOs for CMIP6 data preparation and analysis.  

The operator ‘cmor’ has been developed to enable users to prepare their data according to the CMIP6 data standard. It is part of the web-based CMIP6 post-processing infrastructure [2] which is developed at DKRZ and used by different Earth System Models. The CDO metadata and its data model have been expanded to include the CMIP6 data standard so that users can use the tool for project data evaluation.

As a second activity, operators for 27 climate extremes indices, which were defined by the Expert Team on Climate Change Detection and Indices (ETCCDI), have been integrated into the tool. As with CMIP5, the ETCCDI climate extremes indices will be part of CMIP6 model analyses due to their robustness and straightforward interpretation.

This contribution provides an insight into advanced CDO application and offers ideas for post-processing optimization. 

[1] Schulzweida, U. (2019): CDO user guide. code.mpimet.mpg.de/projects/cdo , last access: 01.13.2020.

[2] Schupfner, M. (2020):  The CMIP6 Data Request WebGUI. c6dreq.dkrz.de , last access: 01.13.2020.

How to cite: Wachsmann, F.: CDOs for CMIP6 and Climate Extremes Indices, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-8543, https://doi.org/10.5194/egusphere-egu2020-8543, 2020.

An important aspect of an Earth Systems Science Prediction Systems (ESSPS) is to describe and predict the behavior of contaminants in different environmental compartments following severe accidents at chemical and nuclear installations. Such an ESSPS could be designed as a platform allowing to integrate models describing atmospheric, hydrological, oceanographic processes, physical-chemical transformation of the pollutants in the environment, contamination of food chain, and finally the overall exposure of the population with harmful substances. Such a chain of connected simulation models needed to describe the consequences of severe accidents in the different phases of an emergency should use different input data ranging from real-time online meteorological to long-term numerical weather prediction or ocean data.

One example of an ESSPS is the Decision Support Systems JRODOS for off-site emergency management after nuclear emergencies. It integrates many different simulation models, real-time monitoring, regional GIS information, source term databases, and geospatial data for population and environmental characteristics.

The development of the system started in 1992 supported by European Commission’s RTD Framework programs. Attracting more and more end users, the technical basis of of the system had to be considerably improved. For this, Java has been selected as a high level software language suitable for development of distributed cross-platform enterprise quality applications. From the other hand, a great deal of scientific computational software is available only as C/C++/FORTRAN packages. Moreover, it is a common scenario when some outputs of model A should act as inputs of model B, but the two models do not share common exchange containers and/or are written in different programming languages.

To combine the flexibility of Java language and the speed and availability of scientific codes, and to be able to connect different computational codes into one chain of models, the notion of distributed wrapper objects (DWO) has been introduced. DWO provides logical, visual and technical means for the integration of computational models into the core of the system system, even if models and the system use different programming languages. The DWO technology allows various levels of interactivity including pull- and push driven chains, user interaction support, and sub-models calls. All the DWO data exchange is realized in memory and does not include IO disk operations, thus eliminating redundant reader/writer code and minimizing slow disk access. These features introduce more stability and performance of an ESSPS that is used for decision support.

The current status of the DWO realization in JRODOS is presented focusing on the added value compared to traditional integration of different simulation models into one system.

How to cite: Trybushnyi, D., Raskob, W., Ievdin, I., Müller, T., Pylypenko, O., and Zheleznyak, M.: Flexible Java based platform for integration of models and datasets in Earth Systems Science Prediction Systems: methodology and implementation for predicting spreading of radioactive contamination from accidents, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-9532, https://doi.org/10.5194/egusphere-egu2020-9532, 2020.

EGU2020-13105 | Displays | ESSI1.12

Web-based post-processing workflow composition for CMIP6

Martin Schupfner and Fabian Wachsmann

CMIP6 defines a data standard as well as a data request (DReq) in order to facilitate analysis across results from different climate models. For most model output, post-processing is required to make it CMIP6 compliant. The German Federal Ministry of Education and Research (BMBF) is funding a project [1] providing services which help with the production of quality-assured CMIP6 compliant data according to the DReq. 

 

In that project, a web-based GUI [2] has been developed which guides the modelers through the different steps of the data post-processing workflow, allowing to orchestrate the aggregation, diagnostic and standardizing of the model data in a modular manner. Therefor the website provides several functionalities:
1. A DReq generator, based on Martin Juckes’ DreqPy API [3], can be used to tailor the DReq according to the envisaged experiments and supported MIPs. Moreover, the expected data volume can be calculated.

2. The mapping between variables of the DReq and of the raw model output can be specified. These specifications (model variable names, units, etc.) may include diagnostic algorithms and are stored in a database. 

3. The variable mapping information can be retrieved as a mapping table (MT). Additionally, this information can be used to create post-processing script fragments. One of the script fragments contains processing commands based on the diagnostic algorithms entered into the mapping GUI, whereas the other rewrites the (diagnosed) data in a CMIP6 compliant format. Both script fragments use the CDO tool kit [4] developed at the Max Planck Institute for Meteorology, namely the CDO expr and cmor [5] operators. The latter makes use of the CMOR3 library [6] and parses the MT. The script fragments are meant to be integrated into CMIP6 data workflows or scripts. A template for such a script, that allows for a modular and flexible process control of the single workflow steps, will be included when downloading the script fragments.

4. User specific metadata can be generated, which supply the CDO cmor operator with the required and correct metadata as specified in the CMIP6 controlled vocabulary (CV).

 

[1] National CMIP6 Support Activities. https://www.dkrz.de/c6de , last access 9.1.2020.

[2] Martin Schupfner (2018): CMIP6 Data Request WebGUI. https://c6dreq.dkrz.de/ , last access 9.1.2020.

[3] Martin Juckes (2018): Data Request Python API. Vers. 01.00.28. http://proj.badc.rl.ac.uk/svn/exarch/CMIP6dreq/tags/latest/dreqPy/docs/dreqPy.pdf , last access 9.1.2020.  

[4] Uwe Schulzweida (2019): CDO User Guide. Climate Data Operators. Vers. 1.9.8. https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf , last access 9.1.2020.

[5] Fabian Wachsmann (2017): The cdo cmor operator. https://code.mpimet.mpg.de/attachments/19411/cdo_cmor.pdf , last access 9.1.2020.

[6] Denis Nadeau (2018): CMOR version 3.3. https://cmor.llnl.gov/pdf/mydoc.pdf , last access 9.1.2020.

How to cite: Schupfner, M. and Wachsmann, F.: Web-based post-processing workflow composition for CMIP6, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-13105, https://doi.org/10.5194/egusphere-egu2020-13105, 2020.

EGU2020-13306 | Displays | ESSI1.12 | Highlight

CMIP model evaluation with the ESMValTool v2.0

Axel Lauer, Fernando Iglesias-Suarez, Veronika Eyring, and the ESMValTool development team

The Earth System Model Evaluation Tool (ESMValTool) has been developed with the aim of taking model evaluation to the next level by facilitating analysis of many different ESM components, providing well-documented source code and scientific background of implemented diagnostics and metrics and allowing for traceability and reproducibility of results (provenance). This has been made possible by a lively and growing development community continuously improving the tool supported by multiple national and European projects. The latest version (2.0) of the ESMValTool has been developed as a large community effort to specifically target the increased data volume of the Coupled Model Intercomparison Project Phase 6 (CMIP6) and the related challenges posed by analysis and evaluation of output from multiple high-resolution and complex ESMs. For this, the core functionalities have been completely rewritten in order to take advantage of state-of-the-art computational libraries and methods to allow for efficient and user-friendly data processing. Common operations on the input data such as regridding or computation of multi-model statistics are now centralized in a highly optimized preprocessor written in Python. The diagnostic part of the ESMValTool includes a large collection of standard recipes for reproducing peer-reviewed analyses of many variables across atmosphere, ocean, and land domains, with diagnostics and performance metrics focusing on the mean-state, trends, variability and important processes, phenomena, as well as emergent constraints. While most of the diagnostics use observational data sets (in particular satellite and ground-based observations) or reanalysis products for model evaluation some are also based on model-to-model comparisons. This presentation introduces the diagnostics newly implemented into ESMValTool v2.0 including an extended set of large-scale diagnostics for quasi-operational and comprehensive evaluation of ESMs, new diagnostics for extreme events, regional model and impact evaluation and analysis of ESMs, as well as diagnostics for emergent constraints and analysis of future projections from ESMs. The new diagnostics are illustrated with examples using results from the well-established CMIP5 and the newly available CMIP6 data sets.

How to cite: Lauer, A., Iglesias-Suarez, F., Eyring, V., and development team, T. E.: CMIP model evaluation with the ESMValTool v2.0, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-13306, https://doi.org/10.5194/egusphere-egu2020-13306, 2020.

EGU2020-14745 | Displays | ESSI1.12

ESMValTool pre-processing functions for eWaterCycle

Fakhereh Alidoost, Jerom Aerts, Bouwe Andela, Jaro Camphuijsen, Nick van De Giesen, Gijs van Den Oord, Niels Drost, Yifat Dzigan, Ronald van Haren, Rolf Hut, Peter C. Kalverla, Inti Pelupessy, Stefan Verhoeven, Berend Weel, and Ben van Werkhoven

eWaterCycle is a framework in which hydrological modelers can work together in a collaborative environment. In this environment, they can, for example, compare and analyze the results of models that use different sources of (meteorological) forcing data. The final goal of eWaterCycle is to advance the state of FAIR (Findable, Accessible, Interoperable, and Reusable) and open science in hydrological modeling.

Comparing hydrological models has always been a challenging task. Hydrological models exhibit great complexity and diversity in the exact methodologies applied, competing for hypotheses of hydrologic behavior, technology stacks, and programming languages used in those models. Pre-processing of forcing data is one of the roadblocks that was identified during the FAIR Hydrological Modelling workshop organized by the Lorentz Center in April 2019. Forcing data can be retrieved from a wide variety of sources with discrepant variable names and frequencies, and spatial and temporal resolutions. Moreover, some hydrological models make specific assumptions about the definition of the forcing variables. The pre-processing is often performed by various sets of scripts that may or may not be included with model source codes, making it hard to reproduce results. Generally, there are common steps in the data preparation among different models. Therefore, it would be a valuable asset to the hydrological community if the pre-processing of FAIR input data could also be done in a FAIR manner.

Within the context of the eWaterCycle II project, a common pre-processing system has been created for hydrological modeling based on ESMValTool (Earth System Model Evaluation Tool). ESMValTool is a community diagnostic and performance metrics tool developed for the evaluation of Earth system models. The ESMValTool pre-processing functions cover a broad range of operations on data before diagnostics or metrics are applied; for example, vertical interpolation, land-sea masking, re-gridding, multi-model statistics, temporal and spatial manipulations, variable derivation and unit conversion. The pre-processor performs these operations in a centralized, documented and efficient way. The current pre-processing pipeline of the eWaterCycle using ESMValTool consists of hydrological model-specific recipes and supports ERA5 and ERA-Interim data provided by the ECMWF (European Centre for Medium-Range Weather Forecasts). The pipeline starts with the downloading and CMORization (Climate Model Output Rewriter) of input data. Then a recipe is prepared to find the data and run the preprocessors. When ESMValTool runs a recipe, it will also run the diagnostic script that contains model-specific analysis to derive required forcing variables, and it will store provenance information to ensure transparency and reproducibility. In the near future, the pipeline is extended to include Earth observation data, as these data are paramount to the data assimilation in eWaterCycle.

In this presentation we will show how using the pre-processor from ESMValTool for Hydrological modeling leads to connecting Hydrology and Climate sciences, and increase the impact and sustainability of ESMValTool.

How to cite: Alidoost, F., Aerts, J., Andela, B., Camphuijsen, J., van De Giesen, N., van Den Oord, G., Drost, N., Dzigan, Y., van Haren, R., Hut, R., Kalverla, P. C., Pelupessy, I., Verhoeven, S., Weel, B., and van Werkhoven, B.: ESMValTool pre-processing functions for eWaterCycle, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-14745, https://doi.org/10.5194/egusphere-egu2020-14745, 2020.

EGU2020-17472 | Displays | ESSI1.12

ESMValCore: analyzing CMIP data made easy

Bouwe Andela, Lisa Bock, Björn Brötz, Faruk Diblen, Laura Dreyer, Niels Drost, Paul Earnshaw, Veronika Eyring, Birgit Hassler, Nikolay Koldunov, Axel Lauer, Bill Little, Saskia Loosveldt-Tomas, Lee de Mora, Valeriu Predoi, Mattia Righi, Manuel Schlund, Javier Vegas-Regidor, and Klaus Zimmermann

The Earth System Model Evaluation Tool (ESMValTool) is a free and open-source community diagnostic and performance metrics tool for the evaluation of Earth system models participating in the Coupled Model Intercomparison Project (CMIP). Version 2 of the tool (Righi et al. 2019, www.esmvaltool.org) features a brand new design, consisting of ESMValCore (https://github.com/esmvalgroup/esmvalcore), a package for working with CMIP data and ESMValTool (https://github.com/esmvalgroup/esmvaltool), a package containing the scientific analysis scripts. This new version has been specifically developed to handle the increased data volume of CMIP Phase 6 (CMIP6) and the related challenges posed by the analysis and the evaluation of output from multiple high-resolution or complex Earth system models. The tool also supports CMIP5 and CMIP3 datasets, as well as a large number of re-analysis and observational datasets that can be formatted according to the same standards (CMOR) on-the-fly or through scripts currently included in the ESMValTool package.

At the heart of this new version is the ESMValCore software package, which provides a configurable framework for finding CMIP files using a “data reference syntax”, applying commonly used pre-processing functions to them, running analysis scripts, and recording provenance. Numerous pre-processing functions, e.g. for data selection, regridding, and statistics are readily available and the modular design makes it easy to add more. The ESMValCore package is easy to install with relatively few dependencies, written in Python 3, based on state-of-the-art open-source libraries such as Iris and Dask, and widely used standards such as YAML, NetCDF, CF-Conventions, and W3C PROV. An extensive set of automated tests and code quality checks ensure the reliability of the package. Documentation is available at https://esmvaltool.readthedocs.io.

The ESMValCore package uses human-readable recipes to define which variables and datasets to use, how to pre-process that data, and what scientific analysis scripts to run. The package provides convenient interfaces, based on the YAML and NetCDF/CF-convention file formats, for running diagnostic scripts written in any programming language. Because the ESMValCore framework takes care of running the workflow defined in the recipe in parallel, most analyses run much faster, with no additional programming effort required from the authors of the analysis scripts. For example, benchmarks show a factor of 30 speedup with respect to version 1 of the tool for a representative recipe on a 24 core machine. A large collection of standard recipes and associated analysis scripts is available in the ESMValTool package for reproducing selected peer-reviewed analyses. The ESMValCore package can also be used with any other script that implements it’s easy to use interface. All pre-processing functions of the ESMValCore can also be used directly from any Python program. These features allow for use by a wide community of scientific users and developers with different levels of programming skills and experience.

Future plans involve extending the public Python API (application programming interface) from just preprocessor functions to include all functionality, including finding the data and running diagnostic scripts. This would make ESMValCore suitable for interactive data exploration from a Jupyter Notebook.

How to cite: Andela, B., Bock, L., Brötz, B., Diblen, F., Dreyer, L., Drost, N., Earnshaw, P., Eyring, V., Hassler, B., Koldunov, N., Lauer, A., Little, B., Loosveldt-Tomas, S., de Mora, L., Predoi, V., Righi, M., Schlund, M., Vegas-Regidor, J., and Zimmermann, K.: ESMValCore: analyzing CMIP data made easy, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-17472, https://doi.org/10.5194/egusphere-egu2020-17472, 2020.

EGU2020-18454 | Displays | ESSI1.12

A simulator for the CLARA-A2 cloud climate data record and its application to assess EC-Earth polar cloudiness

Salomon Eliasson, Karl Göran Karlsson, and Ulrika Willén

One of the primary purposes of satellite simulators is to emulate the inability of retrievals, based on visible and infrared sensors, to detect subvisible clouds from space by removing them from the model. The current simulators in the COSP rely on a single visible cloud optical depth (τ)-threshold (τ=0.3) applied globally to delineate cloudy and cloud-free conditions. However, in reality, the cloud sensitivity of a retrieval varies regionally.

This presentation describes the satellite simulator for the CLARA-A2 climate data record (CDR). The CLARA simulator takes into account the variable
skill in cloud detection of the CLARA-A2 CDR using long/lat-gridded values separated by daytime and nighttime, which enable it to filter out clouds from
climate models that would be undetectable by observations. We introduce two methods of cloud mask simulation, one that depends on a spatially variable
τ-threshold and one that uses the cloud probability of detection (POD) as a function of the model τ and long/lat. The gridded POD values are from the
CLARA-A2 validation study by Karlsson and Hakansson (2018).

Both methods replicate the relative ease or difficulty for cloud retrievals, depending on the region and illumination. They increase the cloud sensitivity where the cloud retrievals are relatively straightforward, such as over mid-latitude oceans, and they decrease the sensitivity where cloud retrievals are
notoriously tricky, such as where thick clouds may be inseparable from cold, snow-covered surfaces, as well as in areas with an abundance of broken and
small-scale cumulus clouds such as the atmospheric subsidence regions over the ocean.

The CLARA simulator, together with the International Satellite Cloud Climatology Project (ISCCP) simulator of the COSP, is used to assess Arctic clouds in the EC-Earth climate model compared to the CLARA-A2 and ISCCP H-Series CDRs. Compared to CLARA-A2, EC-Earth generally underestimates cloudiness in the Arctic. However, compared to ISCCP and its simulator, the opposite conclusion is reached. Based on EC-Earth, this paper shows that the simulated cloud mask of CLARA-A2 is more representative of the CDR than using a global optical depth threshold, such as used by the ISCCP simulator.
The simulator substantially improves the simulation of the CLARA-A2-detected clouds compared to a global optical depth threshold, especially in the polar regions, by accounting for the variable cloud detection skill over the year.

How to cite: Eliasson, S., Karlsson, K. G., and Willén, U.: A simulator for the CLARA-A2 cloud climate data record and its application to assess EC-Earth polar cloudiness, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-18454, https://doi.org/10.5194/egusphere-egu2020-18454, 2020.

EGU2020-19181 | Displays | ESSI1.12

ESMValTool - introducing a powerful model evaluation tool

Valeriu Predoi, Bouwe Andela, Lee De Mora, and Axel Lauer

The Earth System Model eValuation Tool (ESMValTool) is a powerful community-driven diagnostics and performance metrics tool. It is used for the evaluation of Earth System Models (ESMs) and allows for routine comparisons of either multiple model versions or observational datasets. ESMValTool's design is highly modular and flexible so that additional analyses can easily be added; in fact, this is essential to encourage the community-based approach to its scientific development. A set of standardized recipes for each scientific topic reproduces specific diagnostics or performance metrics that have demonstrated their importance in ESM evaluation in the peer-reviewed literature. Scientific themes include selected Essential Climate Variables, a range of known systematic biases common to ESMs such as coupled tropical climate variability, monsoons, Southern Ocean processes, continental dry biases and soil hydrology-climate interactions, as well as atmospheric CO3 budgets, tropospheric and stratospheric ozone, and tropospheric aerosols. We will outline the main functional characteristics of ESMValTool Version 2; we will also introduce the reader to the current set of diagnostics and the methods they can use to contribute to its development.

How to cite: Predoi, V., Andela, B., De Mora, L., and Lauer, A.: ESMValTool - introducing a powerful model evaluation tool, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-19181, https://doi.org/10.5194/egusphere-egu2020-19181, 2020.

EGU2020-19298 | Displays | ESSI1.12

Integrating Model Evaluation and Observations into a Production-Release Pipeline

Philipp S. Sommer, Ronny Petrik, Beate Geyer, Ulrike Kleeberg, Dietmar Sauer, Linda Baldewein, Robin Luckey, Lars Möller, Housam Dibeh, and Christopher Kadow

The complexity of Earth System and Regional Climate Models represents a considerable challenge for developers. Tuning but also improving one aspect of a model can unexpectedly decrease the performance of others and introduces hidden errors. Reasons are in particular the multitude of output parameters and the shortage of reliable and complete observational datasets. One possibility to overcome these issues is a rigorous and continuous scientific evaluation of the model. This requires standardized model output and, most notably, standardized observational datasets. Additionally, in order to reduce the extra burden for the single scientist, this evaluation has to be as close as possible to the standard workflow of the researcher, and it needs to be flexible enough to adapt it to new scientific questions.

We present the Free Evaluation System Framework (Freva) implementation within the Helmholtz Coastal Data Center (HCDC) at the Institute of Coastal Research in the Helmholtz-Zentrum Geesthacht (HZG). Various plugins into the Freva software, namely the HZG-EvaSuite, use observational data to perform a standardized evaluation of the model simulation. We present a comprehensive data management infrastructure that copes with the heterogeneity of observations and simulations. This web framework comprises a FAIR and standardized database of both, large-scale and in-situ observations exported to a format suitable for data-model intercomparisons (particularly netCDF following the CF-conventions). Our pipeline links the raw data of the individual model simulations (i.e. the production of the results) to the finally published results (i.e. the released data). 

Another benefit of the Freva-based evaluation is the enhanced exchange between the different compartments of the institute, particularly between the model developers and the data collectors, as Freva contains built-in functionalities to share and discuss results with colleagues. We will furthermore use the tool to strengthen the active communication with the data and software managers of the institute to generate or adapt the evaluation plugins.

How to cite: Sommer, P. S., Petrik, R., Geyer, B., Kleeberg, U., Sauer, D., Baldewein, L., Luckey, R., Möller, L., Dibeh, H., and Kadow, C.: Integrating Model Evaluation and Observations into a Production-Release Pipeline , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-19298, https://doi.org/10.5194/egusphere-egu2020-19298, 2020.

EGU2020-21666 | Displays | ESSI1.12

Freva - Free Evaluation System Framework - New Aspects and Features

Christopher Kadow, Sebastian Illing, Oliver Kunst, Thomas Schartner, Jens Grieger, Mareike Schuster, Andy Richling, Ingo Kirchner, Henning Rust, Ulrich Cubasch, and Uwe Ulbrich

The Free Evaluation System Framework (Freva - freva.met.fu-berlin.de) is a software infrastructure for standardized data and tool solutions in Earth system science. Freva runs on high performance computers to handle customizable evaluation systems of research projects, institutes or universities. It combines different software technologies into one common hybrid infrastructure, including all features present in the shell and web environment. The database interface satisfies the international standards provided by the Earth System Grid Federation (ESGF). Freva indexes different data projects into one common search environment by storing the meta data information of the self-describing model, reanalysis and observational data sets in a database. This implemented meta data system with its advanced but easy-to-handle search tool supports users, developers and their plugins to retrieve the required information. A generic application programming interface (API) allows scientific developers to connect their analysis tools with the evaluation system independently of the programming language used. Users of the evaluation techniques benefit from the common interface of the evaluation system without any need to understand the different scripting languages. Facilitation of the provision and usage of tools and climate data automatically increases the number of scientists working with the data sets and identifying discrepancies. The integrated webshell (shellinabox) adds a degree of freedom in the choice of the working environment and can be used as a gate to the research projects HPC. Plugins are able to integrate their e.g. post-processed results into the database of the user. This allows e.g. post-processing plugins to feed statistical analysis plugins, which fosters an active exchange between plugin developers of a research project. Additionally, the history and configuration sub-systemstores every analysis performed with the evaluation system in a database. Configurations and results of the toolscan be shared among scientists via shell or web system. Therefore, plugged-in tools benefit from transparency and reproducibility. Furthermore, if configurations match while starting an evaluation plugin, the system suggests touse results already produced by other users – saving CPU/h, I/O, disk space and time. The efficient interaction between different technologies improves the Earth system modeling science framed by Freva.

New Features and aspects of further development and collaboration are discussed.

 

How to cite: Kadow, C., Illing, S., Kunst, O., Schartner, T., Grieger, J., Schuster, M., Richling, A., Kirchner, I., Rust, H., Cubasch, U., and Ulbrich, U.: Freva - Free Evaluation System Framework - New Aspects and Features, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21666, https://doi.org/10.5194/egusphere-egu2020-21666, 2020.

EGU2020-22155 | Displays | ESSI1.12

Climate Index Metadata and its Implementation

Klaus Zimmermann and Lars Bärring

Climate indices play an important role in the practical use of climate and weather data. Their application spans a wide range of topics, from impact assessment in agriculture and urban planning, over indispensable advice in the energy sector, to important evaluation in the climate science community. Several widely used standard sets of indices exist through long-standing efforts of WMO and WCRP Expert Teams (ETCCDI and ET-SCI), as well as European initiatives (ECA&D) and more recently Copernicus C3S activities. They, however, focus on the data themselves, leaving much of the metadata to the individual user. Moreover, these core sets of indices lack a coherent metadata framework that would allow for the consistent inclusion of new indices that continue to be considered every day.

In the meantime, the treatment of metadata in the wider community has received much attention. Within the climate community efforts such as the CF convention and the much-expanded scope and detail of metadata in CMIP6 have improved the clarity and long-term usability of many aspects of climate data a great deal.

We present a novel approach to metadata for climate indices. Our format describes the existing climate indices consistent with the established standards, adding metadata along the lines of existing metadata specifications. The formulation of these additions in a coherent framework encompassing most of the existing climate index standards allows for its easy extension and inclusion of new climate indices as they are developed.

We also present Climix, a new Python software for the calculation of indices based on this description. It can be seen as an example implementation of the proposed standard and features high-performance calculations based on state-of-the-art infrastructure, such as Iris and Dask. This way, it offers shared memory and distributed parallel and out-of-core computations, enabling the efficient treatment of large data volumes as incurred by the high resolution, long time-series of current and future datasets.

How to cite: Zimmermann, K. and Bärring, L.: Climate Index Metadata and its Implementation, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-22155, https://doi.org/10.5194/egusphere-egu2020-22155, 2020.

ESSI1.15 – Towards SMART Monitoring and Integrated Data Exploration of the Earth System

EGU2020-21816 | Displays | ESSI1.15 | Highlight

The challenge of sensor selection, long term-sensor operation and data evaluation in inter- -institutional long term monitoring projects (lessons learned in the MOSES project)

Philipp Fischer, Madlen Friedrich, Markus Brand, Uta Koedel, Peter Dietrich, Holger Brix, Dorit Kerschke, and Ingeborg Bussmann

Measuring environmental variables over longer times in coastal marine environments is a challenge in regard to sensor maintenance and data processing of continuously produced comprehensive datasets. In the project “MOSES” (Modular Observation Solutions for Earth Systems), this procedure became even more complicated because seven large Helmholtz centers from the research field Earth and Environment (E&E) within the framework of the German Ministery of Educatiopn and Research (BMBF) work together to design and construct a large scale monitoring network across earth compartments to study the effects of short-term events on long term environmental trends. This requires the development of robust and standardized automated data acquisition and processing routines, to ensure reliable, accure and precise data.

Here, the results of two intercomparison workshops on senor accuracy and precicion for selected environmental variables are presented. Environmental sensors which were to be used in MOSES campaigns on hydrological extremes (floods and draughts) in the Elbe catchment and the adjacent coastal areas in the North Sea in 2019 to 2020 were compared for selected parameters (temperature, salinity, chlorophyll-A, turbidity and methane) in the same experimentally controlled water body, assuming that all sensors provide comparable data. Results were analyzed with respect to individual sensor accuracy and precision related to an “assumed” real value as well as with respect to a cost versus accuracy/precision index for measuring specific environmental data. The results show, that accuracy and precision of sensors do not necessarily correlate with the price of the sensors and that low cost sensors may provide the same or even higher accuracy and precision values as even the highest price sensor types.

How to cite: Fischer, P., Friedrich, M., Brand, M., Koedel, U., Dietrich, P., Brix, H., Kerschke, D., and Bussmann, I.: The challenge of sensor selection, long term-sensor operation and data evaluation in inter- -institutional long term monitoring projects (lessons learned in the MOSES project) , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21816, https://doi.org/10.5194/egusphere-egu2020-21816, 2020.

Rapid urbanization has become a major urban sustainability concern due to environmental impacts, such as development of urban heat island (UHI) and the reduction of urban security states. To date, most research on urban sustainability development has focus on dynamic change monitoring or UHI state characterization. While there is little literature on UHI change analysis. In addition, there has been little research on the impact of land use and land cover changes (LULCCs) on UHI, especially simulates future trend of LULCCs, UHI change, and dynamic relationship of LULCCs and UHI. The purpose of this research is to design a remote sensing based framework that investigates and analysis that how the LULCCs in the process of urbanization affected thermal environment. In order to assesses and predicts impact of LULCCs on urban heat environment, multi-temporal remotely sensed data from 1986 to 2016 were selected as source data, and Geographic Information System (GIS) methods such as CA-Markov model were employed to construct the proposed framework. The results shown that (1) there has been a substantial strength of urban expansion during the 40 years study period; (2) the most far distance urban center of gravity movement from north-northeast (NEE) to west-southwest (WSW) direction; (3) the dominate temperature were middle level, sub-high level and high level in the research area; (4) there was a higher changing frequency and range from east to west; (5) there was significant negative correlation between land surface temperature and vegetation, and significant positive correlation between temperature and human settlement.

How to cite: Liu, P., Han, R., and Yang, L.: Land-Use/Land-Cover Changes and Their Influence on Urban Thermal Environment in Zhengzhou City During the Period of 1986 to 2026, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-9338, https://doi.org/10.5194/egusphere-egu2020-9338, 2020.

EGU2020-22587 | Displays | ESSI1.15

Deep neural networks for total organic carbon prediction and data-driven sampling

Everardo González Ávalos and Ewa Burwicz

Over the past decade deep learning has been used to solve a wide array of regression and classification tasks. Compared to classical machine learning approaches (k-Nearest Neighbours, Random Forests,… ) deep learning algorithms excel at learning complex, non-linear internal representations in part due to the highly over-parametrised nature of their underling models; thus, this advantage often comes at the cost of interpretability. In this work we used deep neural network to construct global total organic carbon (TOC) seafloor concentration map. Implementing Softmax distributions on implicitly continuous data (regression tasks) we were able to obtain probability distributions to asses prediction reliability. A variation of Dropout called Monte Carlo Dropout is also used during the inference step providing a tool to model prediction uncertainties. We used these techniques to create a model information map which is a key element to develop new data-driven sampling strategies for data acquisition. 

How to cite: González Ávalos, E. and Burwicz, E.: Deep neural networks for total organic carbon prediction and data-driven sampling, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-22587, https://doi.org/10.5194/egusphere-egu2020-22587, 2020.

EGU2020-19631 | Displays | ESSI1.15

Implementing FAIR in a Collaborative Data Management Framework

Angela Schäfer, Norbert Anselm, Janik Eilers, Stephan Frickenhaus, Peter Gerchow, Frank Oliver Glöckner, Antonie Haas, Isabel Herrarte, Roland Koppe, Ana Macario, Christian Schäfer-Neth, Brenner Silva, and Philipp Fischer

Today's fast digital growth made data the most essential tool for scientific progress in Earth Systems Science. Hence, we strive to assemble a modular research infrastructure comprising a collection of tools and services that allow researchers to turn big data into scientific outcomes.

Major roadblocks are (i) the increasing number and complexity of research platforms, devices, and sensors, (ii) the heterogeneous project-driven requirements towards, e. g., satellite data, sensor monitoring, quality assessment and control, processing, analysis and visualization, and (iii) the demand for near real time analyses.

These requirements have led us to build a generic and cost-effective framework O2A (Observation to Archive) to enable, control, and access the flow of sensor observations to archives and repositories.

By establishing O2A within major cooperative projects like MOSES and Digital Earth in the research field Earth and Environment of the German Helmholtz Association, we extend research data management services, computing powers, and skills to connect with the evolving software and storage services for data science. This fully supports the typical scientific workflow from its very beginning to its very end, that is, from data acquisition to final data publication. 

The key modules of O2A's digital research infrastructure established by AWI to enable Digital Earth Science are implementing the FAIR principles:

  • Sensor Web, to register sensor applications and capture controlled meta data before and alongside any measurement in the field
  • Data ingest, allowing researchers to feed data into storage systems and processing pipelines in a prepared and documented way, at best in controlled NRT data streams
  • Dashboards, allowing researchers to find and access data and share and collaborate among partners
  • Workspace, enabling researchers to access and use data with research software in a cloud-based virtualized infrastructure that allows researchers to analyse massive amounts of data on the spot
  • Archiving and publishing data via repositories and Digital Object Identifiers (DOI).

How to cite: Schäfer, A., Anselm, N., Eilers, J., Frickenhaus, S., Gerchow, P., Glöckner, F. O., Haas, A., Herrarte, I., Koppe, R., Macario, A., Schäfer-Neth, C., Silva, B., and Fischer, P.: Implementing FAIR in a Collaborative Data Management Framework, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-19631, https://doi.org/10.5194/egusphere-egu2020-19631, 2020.

EGU2020-19648 | Displays | ESSI1.15

From source to sink - Sustainable and reproducible data pipelines with SaQC

David Schäfer, Bert Palm, Lennart Schmidt, Peter Lünenschloß, and Jan Bumberger

The number of sensors used in the environmental system sciences is increasing rapidly, and while this trend undoubtedly provides a great potential to broaden the understanding of complex spatio-temporal processes, it comes with its own set of new challenges. The flow of data from a source to its sink, from sensors to databases, involves many, usually error prone intermediate steps. From the data acquisition with its specific scientific and technical challenges, over the data transfer from often remote locations to the final data processing, all carry great potential to introduce errors and disturbances into the actual environmental signal.

Quantifying these errors becomes a crucial part of the later evaluation of all measured data. While many large environmental observatories are moving from manual to more automated ways of data processing and quality assurance, these systems are usually highly customized and hand written. This approach is non-ideal in several ways: First, it is a waste of resources as the same algorithms are implemented over and over again and second, it imposes great challenges to reproducibility. If the relevant programs are made available at all, they expose all problems of software reuse: correctness of the implementation, readability and comprehensibility for future users, as well as transferability between different computing environments. Beside these problems, related to software development in general, another crucial factor comes into play: the end product, a processed and quality controlled data set, is closely tied to the current version of the programs in use. Even small changes to the source code can lead to vastly differing results. If this is not approached responsibly, data and programs will inevitably fall out of sync.

The presented software, the 'System for automated Quality Control (SaQC)' (www.ufz.git.de/rdm-software/saqc), helps to either solve, or massively simplify the solution to the presented challenges. As a mainly no-code platform with a large set of implemented functionality, SaQC lowers the entry barrier for the non-programming scientific practitioner, without sacrificing the possibilities to fine-grained adaptation to project specific needs. The text based configuration allows the easy integration into version control systems and thus opens the opportunity to use well established software for data lineage. We will give a short overview of the program's unique features and showcase possibilities to build reliable and reproducible processing and quality assurance pipelines for real-world data from a spatially distributed, heterogeneous sensor network.

How to cite: Schäfer, D., Palm, B., Schmidt, L., Lünenschloß, P., and Bumberger, J.: From source to sink - Sustainable and reproducible data pipelines with SaQC, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-19648, https://doi.org/10.5194/egusphere-egu2020-19648, 2020.

EGU2020-9251 | Displays | ESSI1.15

An integrative framework for data-driven investigation of environmental systems

Daniel Eggert and Doris Dransch

Environmental scientists aim at understanding not only single components but systems, one example is the flood system; scientists investigate the conditions, drivers and effects of flood events and the relations between them. Investigating environmental systems with a data-driven research approach requires linking a variety of data, analytical methods, and derived results.


Several obstacles exist in the recent scientific work environment that hinder scientists to easily create these links. They are distributed and heterogeneous data sets, separated analytical tools, discontinuous analytical workflows, as well as isolated views to data and data products. We address these obstacles with the exception of distributed and heterogeneous data since this is part of other ongoing initiatives.


Our goal is to develop a framework supporting the data-driven investigation of environmental systems. First we integrate separated analytical tools and methods by the means of a component-based software framework. Furthermore we allow for seamless and continuous  analytical workflows by applying the concept of digital workflows, which also demands the aforementioned integration of separated tools and methods. Finally we provide integrated views of data and data products by interactive visual interfaces with multiple linked views. The combination of these three concepts from computer science allows us to create a digital research environment that enable scientists to create the initially mentioned links in a flexible way. We developed a generic concept for our approach, implemented a corresponding framework and finally applied both to realize a “Flood Event Explorer” prototype supporting the comprehensive investigation of a flood system.


In order to implement a digital workflow our approach intends to precisely define the workflow’s requirements. We mostly do this by conducting informal interviews with the domain scientists. The defined requirements also include the needed analytical tools and methods, as well as the utilized data and data products. For technically integrating the needed tools and methods our created software framework provides a modularization approach based on a messaging system. This allows us to create custom modules or wrap existing implementations and tools. The messaging system (e.g. pulsar) then connects these individual modules. This enables us to combine multiple methods and tools into a seamless digital workflow. The described approach of course demands the proper definition of interfaces to modules and data sources. Finally our software framework provides multiple generic visual front-end components (e.g. tables, maps and charts) to create interactive linked views supporting the visual analysis of the workflow’s data.

How to cite: Eggert, D. and Dransch, D.: An integrative framework for data-driven investigation of environmental systems, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-9251, https://doi.org/10.5194/egusphere-egu2020-9251, 2020.

Mayon Volcano on eastern Luzon Island is the most active volcano in the Philippines. It is named and renowned as the "perfect cone" for the symmetric conical shape and has recorded eruptions over 50 times in the past 500 years. Geographically the volcano is surrounded by the eight cities and municipalities with 1 million inhabitants. Currently, its activity is daily monitored by on-site observations such as seismometers installed on Mayon's slopes, plus, electronic distance meters (EDMs), precise leveling benchmarks, and portable fly spectrometers. Compared to existing direct on-site measurements, satellite remote sensing is currently assuming an essential role in understanding the whole picture of volcanic processes. The vulnerability to volcanic hazards is high for Mayon given that it is located in an area of high population density on Luzon Island. However, the satellite remote sensing method and dataset have not been integrated into Mayon’s hazard mapping and monitoring system, despite abundant open-access satellite dataset archives. Here, we perform multiscale and multitemporal monitoring based on the analysis of a nineteen-year Land Surface Temperature (LST) time series derived from satellite-retrieved thermal infrared imagery. Both Landsat thermal imagery (with 30-meter spatial resolution) and MODIS (Moderate Resolution Imaging Spectroradiometer) LST products (with 1-kilometer spatial resolution) are used for the analysis. The Ensemble Empirical Mode Decomposition (EEMD) is applied as the decomposition tool to decompose oscillatory components of various timescales within the LST time series. The physical interpretation of decomposed LST components at various periods are explored and compared with Mayon’s eruption records. Results show that annual-period components of LST tend to lose their regularity following an eruption, and amplitudes of short-period LST components are very responsive to the eruption events. The satellite remote sensing approach provides more insights at larger spatial and temporal scales on this renowned active volcano. This study not only presents the advantages and effectiveness of satellite remote sensing on volcanic monitoring but also provides valuable surface information for exploring the subsurface volcanic structures in Mayon.

How to cite: Chan, H.-P. and Konstantinou, K.: Surface Temperature Monitoring by Satellite Thermal Infrared Imagery at Mayon Volcano of Philippines, 1988-2019, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-1412, https://doi.org/10.5194/egusphere-egu2020-1412, 2020.

EGU2020-3049 | Displays | ESSI1.15

Catchment scale prediction of soil moisture trends from Cosmic Ray Neutron Rover Surveys using machine learning

Erik Nixdorf, Marco Hannemann, Uta Ködel, Martin Schrön, and Thomas Kalbacher

Soil moisture is a critical hydrological component for determining hydrological state conditions and a crucial variable in controlling land-atmosphere interaction including evapotranspiration, infiltration and groundwater recharge.

At the catchment scale, spatial- temporal variations of soil moisture distribution are highly variable due to the influence of various factors such as soil heterogeneity, climate conditions, vegetation and geomorphology. Among the various existing soil moisture monitoring techniques, the application of vehicle-mounted Cosmic Ray Sensors (CRNS) allows monitoring soil moisture noninvasively by surveying larger regions within a reasonable time. However, measured data and their corresponding footprints are often allocated along the existing road network leaving inaccessible parts of a catchment unobserved and surveying larger areas in short intervals is often hindered by limited manpower.

In this study, data from more than 200 000 CRNS rover readings measured over different regions of Germany within the last 4 years have been employed to characterize the trends of soil moisture distribution in the 209 km2 large Mueglitz River Basin in Eastern Germany. Subsets of the data have been used to train three different supervised machine learning algorithms (multiple linear regression, random forest and artificial neural network) based on 85 independent relevant dynamic and stationary features derived from public databases.  The Random Forest model outperforms the other models (R2= ~0.8), relying on day-of-year, altitude, air temperature, humidity, soil organic carbon content and soil temperature as the five most influencing predictors.

After test and training the models, CRNS records for each day of the last decade are predicted on a 250 × 250 m grid of Mueglitz River Basin using the same type of features. Derived CRNS record distributions are compared with both, spatial soil moisture estimates from a hydrological model and point estimates from a sensor network operated during spring 2019. After variable standardization, preliminary results show that the applied Random Forest model is able to resemble the spatio-temporal trends estimated by the hydrological model and the point measurements. These findings demonstrate that training machine learning models on domain-unspecific large datasets of CRNS records using spatial-temporally available predictors has the potential to fill measurement gaps and to improve soil moisture dynamics predictions on a catchment scale.

How to cite: Nixdorf, E., Hannemann, M., Ködel, U., Schrön, M., and Kalbacher, T.: Catchment scale prediction of soil moisture trends from Cosmic Ray Neutron Rover Surveys using machine learning, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-3049, https://doi.org/10.5194/egusphere-egu2020-3049, 2020.

EGU2020-5028 | Displays | ESSI1.15

Modeling methane from the North Sea region with ICON-ART

Christian Scharun, Roland Ruhnke, Jennifer Schröter, Michael Weimer, and Peter Braesicke

Methane (CH4) is the second most important greenhouse gas after CO2 affecting global warming. Various sources (e.g. fossil fuel production, agriculture and waste, biomass burning and natural wetlands) and sinks (the reaction with the OH-radical as the main sink contributes to tropospheric ozone production) determine the methane budget. Due to its long lifetime in the atmosphere methane can be transported over long distances.

Disused and active offshore platforms can emit methane, the amount being difficult to quantify. In addition, explorations of the sea floor in the North Sea showed a release of methane near the boreholes of both, oil and gas producing platforms. The basis of this study is the established emission data base EDGAR (Emission Database for Global Atmospheric Research), an inventory that includes methane emission fluxes in the North Sea region. While methane emission fluxes in the EDGAR inventory and platform locations are matching for most of the oil platforms almost all of the gas platform sources are missing in the database. We develop a method for estimating the missing sources based on the EDGAR emission inventory.

In this study the global model ICON-ART (ICOsahedral Nonhydrostatic model - Aerosols and Reactive Trace gases) will be used. ART is an online-coupled model extension for ICON that includes chemical gases and aerosols. One aim of the model is the simulation of interactions between the trace substances and the state of the atmosphere by coupling the spatiotemporal evolution of tracers with atmospheric processes. ICON-ART sensitivity simulations are performed with inserted and adjusted sources to access their influence on the methane and OH-radical distribution on regional (North Sea) and global scales.

How to cite: Scharun, C., Ruhnke, R., Schröter, J., Weimer, M., and Braesicke, P.: Modeling methane from the North Sea region with ICON-ART, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-5028, https://doi.org/10.5194/egusphere-egu2020-5028, 2020.

EGU2020-10239 | Displays | ESSI1.15 | Highlight

Management and dissemination of global high-resolution agro-hydrological model simulation data from the Virtual Water Values project

Wolfgang Kurtz, Stephan Hachinger, Anton Frank, Wolfram Mauser, Jens Weismüller, and Christine Werner

The ViWA (Virtual Water Values) project aims to provide a global-scale assessment of the current usage of water resources, of the efficiency of water use and of agricultural yields as well as the flow and trade of ‘virtual’ water across country boundaries. This is achieved by establishing a global management and monitoring system which combines high-resolution (1 km2) agro-hydrological model simulations with information from high-resolution remote-sensing data from Copernicus satellites. The monitoring system is used to judge the progress in achieving water-related UN sustainable development goals on the local and global scale. Specific goals of the project are, for example, to:

  • evaluate possible inefficiencies of the current water use in agriculture, industry and water management and its economic consequences.
  • assess the vulnerability of agriculture and ecosystems to climate variability with a special emphasis on water availability.
  • identify regional hot-spots of unsustainable water use and to analyze possible institutional obstacles for a sustainable and efficient water use.
  • identify trade-offs between the commercial water use and protection of ecosystem services.

A cornerstone for reaching these project goals are high-resolution global ensemble simulations with an agro-hydrological model for a variety of crop types and management practices. These simulations provide the relevant information on agricultural yields and water demands at different scales. In this context, a considerable amount of data is generated and subsets of these data might also be of direct relevance for different external interest groups.

In this presentation, we describe our approach for managing the simulation data, with a special focus on possible strategies for data provisioning to interested stakeholders, scientists, practitioners and the general public. We will give an overview on the corresponding simulation and data storage workflows on the utilized HPC-systems and we will discuss methods for providing the data to the different interest groups. Among other aspects, we address findability (in the sense of the FAIR principles) of simulation results for the scienctific community in indexed search portals through a proper metadata annotation. We also discuss a prototypical interactive web portal for visualizing, subsetting and downloading of selected parts of the data set.

How to cite: Kurtz, W., Hachinger, S., Frank, A., Mauser, W., Weismüller, J., and Werner, C.: Management and dissemination of global high-resolution agro-hydrological model simulation data from the Virtual Water Values project, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10239, https://doi.org/10.5194/egusphere-egu2020-10239, 2020.

EGU2020-12328 | Displays | ESSI1.15

Assessment of Multiplatform Satellite Image Frequency for Crop Health Monitoring

Valeriy Kovalskyy and Xiaoyuan Yang

Imagery products are critical for digital agriculture as they help delivering value and insights to growers. Use of publicly available satellite data feeds by digital agriculture companies helps keeping imagery services affordable for broader base of farmers. Optimal use of public and private imagery data sources plays a critical role in the success of image based services for agriculture. 

At the Climate Corporation we have established a program focused on intelligence about satellite image coverage and frequency expected in different geographies and times of the year which is becoming critical for global expansion of the company. In this talk we report the results of our analysis on publicly available imagery data sources for key agricultural regions of the globe. Also, we demonstrate how these results can guide commercial imagery acquisition decisions on the case study in Brazil, where some growers run the risk of going through the growing season without receiving imagery from one satellite if relying on a single source of satellite imagery. The study clearly shows the validity of approaches taken as the results matched with factual image deliveries to single digits of percent cover on regional level. Also, our analysis clearly captured realistic temporal and spatial details of chances in image frequency from addition of alternative satellite imagery sources to the production stream. The optimization in imagery acquisitions enables filling data gaps for research and development. In the meantime, it contributes to delivering greater value for growers in Crop Health Monitoring and other image based service. 

How to cite: Kovalskyy, V. and Yang, X.: Assessment of Multiplatform Satellite Image Frequency for Crop Health Monitoring, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-12328, https://doi.org/10.5194/egusphere-egu2020-12328, 2020.

EGU2020-22594 | Displays | ESSI1.15

Machine learning as supporting method for UXO mapping and detection

Daniela Henkel, Everardo González Ávalos, Mareike Kampmeier, Patrick Michaelis, and Jens Greinert

Marine munitions, or unexploded ordnances (UXO), were massively disposed of in coastal waters after World War II; they are still being introduced into the marine environment during war activities and military exercises. UXO detection and removal has gained great interest during the ongoing efforts to install offshore wind parks for energy generation as well as cable routing through coastal waters. Additionally, 70 years after World War II munition dumping events, more and more chemical and conventional munition is rusting away increasing the risk of toxic contamination.

The general detection methodology includes high resolution multibeam mapping, hydroacoustic sub-bottom mapping, electromagnetic surveys with gradiometers as well as visual inspections by divers or remotely operated vehicles (ROVs). Using autonomous unmanned vehicles (AUVs) for autonomous underwater inspections with multibeam, camera and EM systems is the next technological step in acquiring meaningful high resolution data independently of a mother ship. However, it would be beneficial for the use of such technology to be able to better predict potential hot spots of munition targets and distinguish them from other objects such as rocks, small artificial constructions or metallic waste (wires, barrels, etc.).

The above-mentioned predictor layers could be utilized for machine learning with different, already existing, and accessible algorithms. The structure of the data has a high similarity to image data, an area where neural networks are the benchmark. As a first approach we therefore trained convolutional neural networks in a supervised manner to detect seafloor areas contaminated with UXO. For this we manually annotated known UXO locations as well as known non-UXO locations to generate a training dataset which was later augmented by rotating and flipping each annotated tile. We achieved a high accuracy with this approach using only a subset of the data sources mentioned above as input layers. We also explored the use of further input layers and larger training datasets, and their impact in performance. This is a good example for machine learning enabling us to classify large areas in a short time and with minimal need for manual annotation.

How to cite: Henkel, D., González Ávalos, E., Kampmeier, M., Michaelis, P., and Greinert, J.: Machine learning as supporting method for UXO mapping and detection, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-22594, https://doi.org/10.5194/egusphere-egu2020-22594, 2020.

EGU2020-11084 | Displays | ESSI1.15

Significance and implementation of SMART Monitoring Tools

Uta Koedel, Peter Dietrich, Erik Nixdorf, and Philipp Fischer

The term “SMART Monitoring” is often used in digital projects to survey and analyze data flows in near- or realtime. The term is also adopted in the project Digital Earth (DE) which was jointly launched in 2018 by the eight Helmholtz centers of the research field Earth and Environment (E&E) within the framework of the German Ministry of Education and Research (BMBF). Within DE, the “SMART monitoring” sub-project aims at developing workflows and processes to make scientific parameters and the related datasets SMART, which means specific, measurable, accepted, relevant, and trackable (SMART).

“SMART Monitoring” in DE comprises a combination of hard- and software tools to enhance the traditional sequential monitoring approach - where data are step-by-step analyzed and processed from the sensor towards a repository - into an integrated analysis approach where information on the measured value together with the status of each sensor and possible auxiliary relevant sensor data in a sensor network are available and used in real-time to enhance the sensor output concerning data accuracy,  precision, and data availability. Thus, SMART Monitoring could be defined as a computer-enhanced monitoring network with automatic data flow control from individual sensors in a sensor network to databases enhanced by automated (machine learning) and near real-time interactive data analyses/exploration using the full potential of all available sensors within the network. Besides, “SMART monitoring” aims to help for a better adjustment of sensor settings and monitoring strategies in time and space in iterative feedback.

This poster presentation will show general concepts, workflows, and possible visualization tools based on examples that support the SMART Monitoring idea.

How to cite: Koedel, U., Dietrich, P., Nixdorf, E., and Fischer, P.: Significance and implementation of SMART Monitoring Tools, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-11084, https://doi.org/10.5194/egusphere-egu2020-11084, 2020.

EGU2020-22618 | Displays | ESSI1.15

Towards easily accessible interactive big-data analysis on supercomputers

Katharina Höflich, Martin Claus, Willi Rath, Dorian Krause, Benedikt von St. Vieth, and Kay Thust

Demand on high-end high performance computer (HPC) systems by the Earth system science community today encompasses not only the handling of complex simulations but also machine and deep learning as well as interactive data analysis workloads on large volumes of data. This poster addresses the infrastructure needs of large-scale interactive data analysis workloads on supercomputers. It lays out how to enable optimizations of existing infrastructure with respect to accessibility, usability and interactivity and aims at informing decision making about future systems. To enhance accessibility, options for distributed access, e.g. through JupyterHub, will be evaluated. To increase usability, the unification of working environments via the operation and the joint maintenance of containers will be explored. Containers serve as a portable base software setting for data analysis application stacks and allow for long-term usability of individual working environments and repeatability of scientific analysis. Aiming for interactive big-data analysis on HPC will also help the scientific community in utilizing increasingly heterogeneous supercomputers, since the modular data-analysis stack already contains solutions for seamless use of various architectures such as accelerators. However, to enable day-to-day interactive work on supercomputers, the inter-operation of workloads with quick turn-around times and highly variable resource demands needs to be understood and evaluated. To this end, scheduling policies on selected HPC systems are reviewed with respect to existing technical solutions such as job preemption, utilizing the resiliency features of parallel computing toolkits like Dask. Presented are preliminary results focussing on the aspects of usability and interactive use of HPC systems on the basis of typical use cases from the ocean science community.

How to cite: Höflich, K., Claus, M., Rath, W., Krause, D., von St. Vieth, B., and Thust, K.: Towards easily accessible interactive big-data analysis on supercomputers, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-22618, https://doi.org/10.5194/egusphere-egu2020-22618, 2020.

EGU2020-11117 | Displays | ESSI1.15 | Highlight

Going beyond FAIR to increase data reliability

Uta Koedel and Peter Dietrich

The FAIR principle is on its way to becoming a conventional standard for all kinds of data. However, it is often forgotten that this principle does not consider data quality or data reliability issues. If the data quality isis not sufficiently described, a wrong interpretation and use of these data in a common interpretation can lead to false scientific conclusions. Hence, the statement about data reliability is an essential component for secondary data processing and joint interpretation efforts. Information on data reliability, uncertainty, quality as well as information on the used devices are essential and needs to be introduced or even implemented in the workflow from the sensor to a database if data is to be considered in a broader context.

In the past, many publications have shown that the same devices at the same location do not necessarily provide the same measurement data. Likewise, statistical quantities and confidence intervals are rarely given in publications in order to assess the reliability of the data. Many secondary users of measurement data assume that calibration data and the measurement of other auxiliary variables are sufficient to estimate the data reliability. However, even if some devices require on-site field calibration, that does not mean that the data are comparable. Heat, cold, internal processes on electronic components can lead to differences in measurement data recorded with devices of the same type at the same location, especially with the increasingly complex devices themselves.

The data reliability can be increased by implementing data uncertainty issues within the FAIR principle. The poster presentation will show the importance of comparative measurements, the information needs for the application of proxy-transfer functions, and suitable uncertainty analysis for databases.

How to cite: Koedel, U. and Dietrich, P.: Going beyond FAIR to increase data reliability, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-11117, https://doi.org/10.5194/egusphere-egu2020-11117, 2020.

ESSI2.1 – Metadata, Data Models, Semantics, and Collaboration

EGU2020-3663 | Displays | ESSI2.1

Ensuring Scientific Reproducibility within the Earth Observation Community: Standardized Algorithm Documentation for Improved Scientific Data Understanding

Aaron Kaulfus, Kaylin Bugbee, Alyssa Harris, Rahul Ramachandran, Sean Harkins, Aimee Barciauskas, and Deborah Smith

Algorithm Theoretical Basis Documents (ATBDs) accompany Earth observation data generated from algorithms. ATBDs describe the physical theory, mathematical procedures and assumptions made for the algorithms that convert radiances received by remote sensing instruments into geophysical quantities. While ATBDs are critical to scientific reproducibility and data reuse, there have been technical, social and informational issues surrounding the creation and maintenance of these key documents. A standard ATBD structure has been lacking, resulting in inconsistent documents of varying levels of detail. Due to the lack of a minimum set of requirements, there has been very little formal guidance on the ATBD publication process.  Additionally, ATBDs have typically been provided as static documents that are not machine readable, making search and discovery of the documents and the content within the documents difficult for users. To address the challenges surrounding ATBDs, NASA has prototyped the Algorithm Publication Tool (APT), a centralized cloud-based publication tool that standardizes the ATBD content model and streamlines the ATBD authoring process. This presentation will describe our approach in developing a common information model for ATBDs and our efforts to provide ATBDs as dynamic documents that are available for both human and machine utilization. We will also include our vision for APT within the broader NASA Earth science data system and how this tool may assist in standardizes and easing the ATBD creation and maintenance process.

How to cite: Kaulfus, A., Bugbee, K., Harris, A., Ramachandran, R., Harkins, S., Barciauskas, A., and Smith, D.: Ensuring Scientific Reproducibility within the Earth Observation Community: Standardized Algorithm Documentation for Improved Scientific Data Understanding, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-3663, https://doi.org/10.5194/egusphere-egu2020-3663, 2020.

EGU2020-18976 | Displays | ESSI2.1

Managing the knowledge created by the users trough Geospatial User Feedback system. The NEXTGEOSS use case

Alaitz Zabala Torres, Joan Masó Pau, and Xavier Pons

First approach to metadata was based on producer's point of view, since producers were responsible for documenting and sharing metadata about their products. Since 2012 (started in EU FP7 GeoViQua project), the Geospatial User Feedback approach described the user perspective on datasets/services (GUF, OGC standard in 2016). In the past users of the data gained knowledge about and with the data, but they lacked the means to easily and automatically share this knowledge in a formal way.

In the EU H2020 NextGEOSS project, the NiMMbus system has been matured as an interoperable solution to manage and store feedback items following the OGC GUF standard. NiMMbus can be used as a component for any geospatial portal, and, so far, has been integrated in several H2020 project catalogues or portals (NextGEOSS, ECOPotential, GeoEssential and GroundTruth2.0).

User feedback metadata complements producer's metadata and adds value to the resource description in a geospatial portal by collecting the knowledge gained by the user while using the data for the purpose originally foreseen by the producer or an innovative one.

The current GEOSS platform provide access to endless data resources. But to truly assist decision making, GEOSS wants to add a knowledge base. We believe that the NiMMbus system is a significant NextGEOSS contribution is this direction.

This communication describes how to extend the GUF to provide a set of knowledge elements and connect them to the original data creating a network of knowledge. They can be citations (publications and policy briefs), quality indications (QualityML vocabulary and ISO 19157), usage reports (code and analytical processes), etc. The NiMMbus offers tools to create different levels of feedback starting with comments, providing citations or extract quality indicators for the different quality classes (positional, temporal and attribute accuracy, completeness, consistency) and share them to other users as part of the user feedback and usage report. Usage reports in GUF standards can be extended to include code fragments that other users can apply to reproduce a previous usage. For example, in ECOPotential Protected Areas from Space map browser (continues on H2020 e-Shape project) a vegetation index optimum to observe phenological blooms can be encoded by a user in the layer calculation using a combination of original Sentinel-2 bands. The portal stores that in a JavaScript code (serialized as JSON) that describes which layers and formula were used. Once a user validated the new layer, can decide to make it available to everyone by publishing it as an open source JavaScript code in the NiMMbus system. From then on, any other user of the portal can import it and use it. As the usage description is a full feedback item, the user creating the dynamic layer can also describe any other related information such as comments or advertise a related publication.

The system moves the focus to sharing user of the data and complements the producers documentation with the richness of the knowledge that user gain in their data driven research. In addition to augment GEOSS data the system enables a social network of knowledge.

How to cite: Zabala Torres, A., Masó Pau, J., and Pons, X.: Managing the knowledge created by the users trough Geospatial User Feedback system. The NEXTGEOSS use case, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-18976, https://doi.org/10.5194/egusphere-egu2020-18976, 2020.

EGU2020-19636 | Displays | ESSI2.1

Advances in Collaborative Documentation Support for CMIP6

Charlotte Pascoe, David Hassell, Martina Stockhause, and Mark Greenslade
The Earth System Documentation (ES-DOC) project aims to nurture an ecosystem of tools & services in support of Earth System documentation creation, analysis and dissemination. Such an ecosystem enables the scientific community to better understand and utilise Earth system model data.
The ES-DOC infrastructure for the Coupled Model Intercomparison Project Phase 6 (CMIP6) modelling groups to describe their climate models and make the documentation available on-line has been available for 18 months, and more recently the automatic generation of documentation of every published simulation has meant that every CMIP6 dataset within the Earth System Grid Federation (ESGF) is now immediately connected to the ES-DOC description of the entire workflow that created it, via a “further info URL”.
The further info URL is a landing page from which all of the relevant CMIP6 documentation relevant to the data may be accessed, including experimental design, model formulation and ensemble description, as well as providing links to the data citation information.
These DOI landing pages are part of the Citation Service, provided by DKRZ. Data citation information is also available independently through the ESGF Search portal or in the DataCite search or Google’s dataset search. It provides users of CMIP6 data with the formal citation that should accompany any use of the datasets that comprise their analysis.
ES-DOC services and the Citation Service form a CMIP6 project  collaboration, and depend upon structured documentation provided by the scientific community. Structured scientific metadata has an important role in science communication, however it’s creation and collation exacts a cost in time, energy and attention.  We discuss progress towards a balance between the ease of information collection and the complexity of our information handling structures.
 
CMIP6: https://pcmdi.llnl.gov/CMIP6/
ES-DOC: https://es-doc.org/
Further Info URL: https://es-doc.org/cmip6-ensembles-further-info-url

Citation Service: http://cmip6cite.wdc-climate.de

How to cite: Pascoe, C., Hassell, D., Stockhause, M., and Greenslade, M.: Advances in Collaborative Documentation Support for CMIP6, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-19636, https://doi.org/10.5194/egusphere-egu2020-19636, 2020.

EGU2020-9412 | Displays | ESSI2.1

Semantic harmonization of geoscientific data sets using Linked Data and project specific vocabularies

Martin Schiegl, Gerold W. Diepolder, Abdelfettah Feliachi, José Román Hernández Manchado, Christine Hörfarter, Olov Johansson, Andreas-Alexander Maul, Marco Pantaloni, László Sőrés, and Rob van Ede

In geosciences, where nomenclature naturally has grown from regional approaches with limited cross-border harmonization, descriptive texts are often used for coding data whose meanings in the international context are not conclusively clarified. This leads to difficulties when cross border datasets are compiled. On one hand, this is caused by the national-language, regional and historical descriptions in geological map legends. On the other hand, it is related to the interdisciplinary orientation of the geosciences e.g. when concepts adopted from different areas have a different meaning. A consistent use and interpretation of data to international standards creates the potential for semantic interoperability. Datasets then fit into international data infrastructures. But what if the interpretation to international standards is not possible, because there is none, or existing standards are not applicable? Then efforts can be made to create machine-readable data using knowledge representations based on Semantic Web and Linked Data principles.

With making concepts reference able via uniform identifiers (HTTP URIs) and crosslinking them to other resources published in the web, Linked Data offers the necessary context for clarification of the meaning of concepts. This modern technology and approach ideally complements the mainstream GIS (Geographic Information System) and relational database technologies in making data findable and semantic interoperable.

GeoERA project (Establishing the European Geological Surveys Research Area to deliver a Geological Service for Europe, https://geoera.eu/) therefore provides the opportunity to clarify expert knowledge and terminology in the form of project specific vocabulary concepts on a scientific level and to use them in datasets to code data. At the same time, parts of this vocabulary might be later included in international standards (e.g. INSPIRE or GeoSciML), if desired. So called “GeoERA Project Vocabularies” are open collections of knowledge that, for example, may also contain deprecated, historical or only regionally relevant terms. In an ideal overall view, the sum of all vocabularies results in a knowledge database of bibliographically referenced terms that have been developed through scientific projects. Due to the consistent application of the data standards of Semantic Web and Linked Data nothing stands in the way of further use by modern technologies such as AI.

Project Vocabularies also could build an initial part of a future EGDI (European Geological Data Infrastructure, http://www.europe-geology.eu/) knowledge graph. They are restricted to linguistic labeled concepts, described in SKOS (Simple Knowledge Organization System) plus metadata properties with focus on scientific reusability.  In order to extend this knowledge graph, additionally they also could be supplemented by RDF data files to support project related applications and functionality.

How to cite: Schiegl, M., Diepolder, G. W., Feliachi, A., Hernández Manchado, J. R., Hörfarter, C., Johansson, O., Maul, A.-A., Pantaloni, M., Sőrés, L., and van Ede, R.: Semantic harmonization of geoscientific data sets using Linked Data and project specific vocabularies, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-9412, https://doi.org/10.5194/egusphere-egu2020-9412, 2020.

EGU2020-10227 | Displays | ESSI2.1

Towards an ontology based conceptual model, establishing maximum interoperability for interactive and distributed processing of geoscientific information

Rainer Haener, Henning Lorenz, Sylvain Grellet, Marc Urvois, and Eberhard Kunz

This study presents an approach on how to establish Conceptual Interoperability for autonomous, multidisciplinary systems participating in Research Infrastructures, Early Warning, or Risk Management Systems. Although promising implementations already exist, true interoperability is far from being achieved. Therefore, reference architectures and principles of Systems-of-Systems are adapted for a fully specified, yet implementation-independent Conceptual Model, establishing interoperability to the highest possible degree. The approach utilises use cases and requirements from geological information processing and modelling within the European Plate Observing System (EPOS).

Conceptual Interoperability can be accomplished by enabling Service Composability. Unlike integration, composability allows interactive data processing and beyond, evolving systems that enable interpretation and evaluation by any potential participant. Integrating data from different domains often leads to monolithic services that are implemented only for a specific purpose (Stovepipe System). Consequently, composability is essential for collaborative information processing, especially in modern interactive computing and exploration environments. A major design principle for achieving composability is Dependency Injection, allowing flexible combinations (Loose Coupling) of services that implement common, standardised interfaces (abstractions). Another decisive factor for establishing interoperability are Metamodels of data models that specify data and semantics regardless of their domain, based on a common, reusable approach. Thus, data from different domains can be represented by one common encoding that e.g. abstracts landslides (geophysical models) or buildings (urban planning) based on their geometry. An indispensable part of a Conceptual Model is detailed semantics, which not only requires terms from Domain-Controlled Vocabularies, but also ontologies providing qualified statements about the relationship between data and associated concepts. This is of major importance for evolutionary systems that are able to comprehend and react to state changes. Maximum interoperability also requires strict modularisation for a clear separation of semantics, metadata and the data itself.

Conceptual models for geological information that are governed by the described principles and their implementations are still far away. Moreover, a route to achieve such models is not straightforward. They span a multitude of communities and are far too complex for conventional implementation in project form. A first step could be applying modern design principles to new developments in the various scientific communities and join the results under a common stewardship like the Open Geospatial Consortium (OGC). Recently, a Metamodel has been developed within the OGC’s Borehole Interoperability Experiment (BoreholeIE); initiated and led by the French Geological Survey (BRGM). It combines the ISO standard (19148:2012 linear referencing) for localisation along borehole paths with the adaption of different encodings of borehole logs based on well-established OGC standards. Further developments aim at correlating borehole logs, geological or geotechnical surveys, and geoscientific models. Since results of surveys are often only available as non-schematised interpretations in text form, interoperability requires formal classifications, which can be derived from machine learning methods applied to the interpretations. As part of a Conceptual Model, such classifications can be used for an automated exchange of standard-conform borehole logs or to support the generation of expert opinions on soil investigations.

How to cite: Haener, R., Lorenz, H., Grellet, S., Urvois, M., and Kunz, E.: Towards an ontology based conceptual model, establishing maximum interoperability for interactive and distributed processing of geoscientific information, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10227, https://doi.org/10.5194/egusphere-egu2020-10227, 2020.

EGU2020-10600 | Displays | ESSI2.1

Information Model Governance for Diverse Disciplines

John S. Hughes and Daniel J. Crichton

The PDS4 Information Model (IM) Version 1.13.0.0 was released for use in December 2019. The ontology-based IM remains true to its foundational principles found in the Open Archive Information System (OAIS) Reference Model (ISO 14721) and the Metadata Registry (MDR) standard (ISO/IEC 11179). The standards generated from the IM have become the de-facto data archiving standards for the international planetary science community and have successfully scaled to meet the requirements of the diverse and evolving planetary science disciplines.

A key foundational principle is the use of a multi-level governance scheme that partitions the IM into semi-independent dictionaries. The governance scheme first partitions the IM vertically into three levels, the common, discipline, and project/mission levels. The IM is then partitioned horizontally across both discipline and project/mission levels into individual Local Data Dictionaries (LDDs).

The Common dictionary defines the classes used across the science disciplines such as product, collection, bundle, data formats, data types, and units of measurement. The dictionary resulted from a large collaborative effort involving domain experts across the community. An ontology modeling tool was used to enforce a modeling discipline, for configuration management, to ensure consistency and extensibility, and to enable interoperability. The Common dictionary encompasses the information categories defined in the OAIS RM, specifically data representation, provenance, fixity, identification, reference, and context. Over the last few years, the Common dictionary has remained relatively stable in spite of requirements levied by new missions, instruments, and more complex data types.

Since the release of the Common dictionary, the creation of a significant number of LDDs has proved the effectiveness of multi-level, steward-based governance. This scheme is allowing the IM to scale to meet the archival and interoperability demands of the evolving disciplines. In fact, an LDD development “cottage industry” has emerged that required improvements to the development processes and configuration management.  An LDD development tool now allows dictionary stewards to quickly produce specialized LDDs that are consistent with the Common dictionary.

The PDS4 Information Model is a world-class knowledge-base that governs the Planetary Science community's trusted digital repositories. This presentation will provide an overview of the model and additional information about its multi-level governance scheme including the topics of stewardship, configuration management, processes, and oversight.

How to cite: Hughes, J. S. and Crichton, D. J.: Information Model Governance for Diverse Disciplines, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10600, https://doi.org/10.5194/egusphere-egu2020-10600, 2020.

EGU2020-12079 | Displays | ESSI2.1 | Highlight

Enabling Science at Scale

Jeff de La Beaujardiere

The geosciences are facing a Big Data problem, particularly in the areas of data Volume (huge observational datasets and numerical model outputs), Variety (large numbers of disparate datasets from multiple sources with inconsistent standards), and Velocity (need for rapid processing of continuous data streams). These challenges make it difficult to perform scientific research and to make decisions about serious environmental issues facing our planet. We need to enable science at the scale of our large, disparate, and continuous data.

One part of the solution relates to infrastructure, such as by making large datasets available in a shared environment co-located with computational resources so that we can bring the analysis code to the data instead of copying data. The other part relies on improvements in metadata, data models, semantics, and collaboration. Individual datasets must have comprehensive, accurate, and machine-readable metadata to enable assessment of their relevance to a specific problem. Multiple datasets must be mapped into an overarching data model rooted in the geographical and temporal attributes to enable us to seamlessly find and access data for the appropriate location and time. Semantic mapping is necessary to enable data from different disciplines to be brought to bear on the same problem. Progress in all these areas will require collaboration on technical methods, interoperability standards, and analysis software that bridges information communities -- collaboration driven by a willingness to make data usable by those outside of the original scientific discipline.

How to cite: de La Beaujardiere, J.: Enabling Science at Scale, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-12079, https://doi.org/10.5194/egusphere-egu2020-12079, 2020.

A conceptual consensus, as well as a unified representation, on a certain geographic concept across multiple contexts, can be of great significance to the communication, retrieval, combination, and reuse of geographic information and knowledge. However, geographic concept is a rich synthesis of semantics, semiotics, quality (e.g., vagueness or approximation). The generation, representation calculation and application of a certain geographic concept, consequently, can be of great heterogeneity, especially considering different interests, domains, language, etc. In light of these semantic heterogeneity problems, to code core concepts uniquely can be a lighter alternative to tradition ontology-based method, the reason for which is numeric codes can be a symbolism of consensus on concept across domains and even languages. Consequently, this paper proposed a unified semantic model as well as an encoding framework for representation, reasoning, and computation of geographic concept based on geometric algebra (GA). In this method, a geographic concept can be represented as a collection of semantic elements, which can be further encoded based on its hierarchy structure, and all the semantic information of the concept can be preserved across the encoding process. On the basis of the encoding result, semantic information can be reasoned backward by some well-defined operators, semantic similarity can also be computed for information inference as well as semantic association retrieval. In the case study, the implementation of the proposed framework shows that this GA-based semantic encoding model of can be a promising method to the unified expression, reasoning, and calculation of geographic concepts, which, reasonably, can be further regarded as a prospect lighter alternative of the solution to semantic heterogeneity.

How to cite: Wu, F., Gao, H., and Yu, Z.: Dealing with Semantic Heterogeneity of Geographic Concepts: A Geometric Algebra-Based Encoding Method, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-2117, https://doi.org/10.5194/egusphere-egu2020-2117, 2020.

EGU2020-2375 | Displays | ESSI2.1

Data Management at CEDA

Kate Winfield

Sending data to a secure long-term archive is increasingly a necessity for science projects due to the funding body and publishing requirements. It is also good practice for long term scientific aims and to enable the preservation and re-use of valuable research data. The Centre for Environmental Data Analysis (CEDA) hosts a data archive holding vast atmospheric and earth observation data from sources including aircraft campaigns, satellites, pollution, automatic weather stations, climate models, etc. The CEDA archive currently holds 14 PB data, in over 250 millions of files, which makes it challenging to discover and access specific data. In order to manage this, it is necessary to use standard formats and descriptions about the data. This poster will explore best practice in data management in CEDA and show tools used to archive and share data.

How to cite: Winfield, K.: Data Management at CEDA, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-2375, https://doi.org/10.5194/egusphere-egu2020-2375, 2020.

EGU2020-5131 | Displays | ESSI2.1

Open access to geological information and 3D modelling data sets in the European Plate Observing System platform (EPOS)

Marc Urvois, Sylvain Grellet, Abdelfettah Feliachi, Henning Lorenz, Rainer Haener, Christian Brogaard Pedersen, Martin Hansen, Luca Guerrieri, Carlo Cipolloni, and Mary Carter

The European Plate Observing System (EPOS, www.epos-ip.org) is a multidisciplinary pan-European research infrastructure for solid Earth science. It integrates a series of domain-specific service hubs such as the Geological Information and Modelling Technical Core Service (TCS GIM) dedicated to access data, data products and services on European boreholes, geological and geohazards maps, mineral resources as well as a catalogue of 3D models. These are hosted by European Geological Surveys and national research organisations.

Even though interoperability implementation frameworks are well described and used (ISO, OGC, IUGS/CGI, INSPIRE …), it proved to be difficult for several data providers to deploy in the first place the required OGC services supporting the full semantic definition (OGC Complex Feature) to discover and view millions of geological entities. Instead, data are collected and exposed using a simpler yet standardised description (GeoSciML Lite & EarthResourceML Lite). Subsequently, the more complex data flows are deployed with the corresponding semantics.

This approach was applied to design and implement the European Borehole Index and associated web services (View-WMS and Discovery-WFS) and extended to 3D Models. TCS GIM exposes to EPOS Central Integrated Core Services infrastructure a metadata catalogue service, a series of “index services”, a codeList registry and a Linked Data resolver. These allow EPOS end users to search and locate boreholes, geological maps and features, 3D models, etc., based on the information held by the index services.

In addition to these services, TCS GIM focussed particularly on sharing European geological data using the Linked Data approach. Each instance is associated with a URI and points to other information resources also using URIs. The Linked Data principles ensure the best semantic description (e.g. URIs to shared codeList registries entries) and also enrich an initial “information seed” (e.g. a set of Borehole entries matching a search) with more contents (e.g. URIs to more Features or a more complex description). As a result, this pattern including Simple Feature and Linked Data has a positive effect on the IT architecture: interoperable services are simpler and faster to deploy and there is no need to harvest a full OGC Complex Feature dataset. This architecture is also more scalable and sustainable.

The European Geological Services codeList registries have been enriched with new vocabularies as part of the European Geoscience Registry. In compliance with the relevant European INSPIRE rules, this registry is now part of the INPIRE Register Federation, the central access point to the repository for vocabulary and resources. European Geoscience Registry is available for reuse and extension by other geoscientific projects.

During the EPOS project, this approach has been developed and implemented for the Borehole and Model data services. TCS GIM team provided feedback on INSPIRE through the Earth Science Cluster, contributed to the creation of the OGC GeoScience Domain Working Group in 2017, the launch of the OGC Borehole Interoperability Experiment in 2018, and proposed evolutions to the OGC GeoSciML and IUGS/CGI EarthResourceML standards.

How to cite: Urvois, M., Grellet, S., Feliachi, A., Lorenz, H., Haener, R., Brogaard Pedersen, C., Hansen, M., Guerrieri, L., Cipolloni, C., and Carter, M.: Open access to geological information and 3D modelling data sets in the European Plate Observing System platform (EPOS), EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-5131, https://doi.org/10.5194/egusphere-egu2020-5131, 2020.

EGU2020-6324 | Displays | ESSI2.1

A classification and predictive model of the complex REE mineral system

Hassan Babaie and Armita Davarpanah

We model the intermittent, non-linear interactions and feedback loops of the complex rare earth elements (REE) mineral system applying the self-organized criticality concept.  Our semantic knowledge model (REE_MinSys ontology) represents dynamic primary and secondary processes that occur over a wide range of spatial and temporal scales and produce the emergent REE deposits and their geometry, tonnage, and grade. These include the scale-invariant, out-of-equilibrium geodynamic and magmatic processes that lead to the formation of orthomagmatic (carbonatite, alkaline igneous rocks) and syn- and post-magmatic hydrothermal REE deposits. The ontology also represents the redistribution of the REE from these primary ores by metamorphic fluids and/or post-depositional surface and supergene processes in sedimentary basins, fluvial channels, coast areas, and/or regolith around or above them. The ontology applies concepts of the complex systems theory to represent the spatial and spatio-temporal elements of the REE mineral system such as source, driver, threshold barriers, trigger, avalanche, conduit, relaxation, critical point attractor, and self-organization for the alkaline igneous, Iron oxide (subcategory of IOCG), hydrothermal, marine placers, alluvial placers (including paleo-placers), phosphorite, laterite, and ion-adsorption clays REE deposits. The ontology is instantiated with diverse data drawn from globally-distributed types of well-studied small to giant REE deposits to build the REE_MinSys knowledge base. Users can query the data in the knowledge base to extract explicit and inferred facts in each type of REE deposit, for example by asking: “Which rare earth elements are in REE phosphate deposits?”; “Which rare earth elements are largely explored in REE placer deposits?”  Data from the knowledge base will be divided into training and testing sets after they are preprocessed and trends and data patterns are identified through data analytical procedures. The training and test datasets will be used to build models applying machine learning algorithms to predict potential REE deposits of different kinds in unexposed or covered areas.

How to cite: Babaie, H. and Davarpanah, A.: A classification and predictive model of the complex REE mineral system, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-6324, https://doi.org/10.5194/egusphere-egu2020-6324, 2020.

EGU2020-7058 | Displays | ESSI2.1

Enabling Data Reuse Through Semantic Enrichment of Instrumentation

Robert Huber, Anusuriya Devaraju, Michael Diepenbroek, Uwe Schindler, Roland Koppe, Tina Dohna, Egor Gordeev, and Marianne Rehage

Pressing environmental and societal challenges demand the reuse of data on a much larger scale. Central to improvements on this front are approaches that support structured and detailed data descriptions of published data. In general, the reusability of scientific datasets such as measurements generated by instruments, observations collected in the field, and model simulation outputs, require information about the contexts through which they were produced. These contexts include the instrumentation, methods, and analysis software used. In current data curation practice, data providers often put a significant effort in capturing descriptive metadata about datasets. Nonetheless, metadata about instruments and methods provided by data authors are limited, and in most cases are unstructured.

The ‘Interoperability’ principle of FAIR emphasizes the importance of using formal vocabularies to enable machine-understandability of data and metadata, and establishing links between data and related research entities to provide their contextual information (e.g., devices and methods). To support FAIR data, PANGAEA is currently elaborating workflows to enrich instrument information of scientific datasets utilizing internal as well as third party services and ontologies and their identifiers. This abstract presents our ongoing development within the projects FREYA and FAIRsFAIR as follows:

  • Integrating the AWI O2A (Observations to Archives) framework and associated suite of tools within PANGAEA’s curatorial workflow as well as semi-automatized ingestion of observatory data.
  • Linking data with their observation sources (devices) by recording the persistent identifiers (PID) from the O2A sensor registry system (sensor.awi.de) as part of the PANGAEA  instrumentation database.
  • Enriching device and method descriptions of scientific data by annotating them with appropriate vocabularies such as the NERC device type and device vocabularies or scientific methodology classifications.

In our contribution we will also outline the challenges to be addressed in enabling FAIR vocabularies of instruments and methods. This includes questions regarding reliability and trustworthiness of third party ontologies and services. Further, challenges in content synchronisation across linked resources and implications on FAIRness levels of data sets such as dependencies on interlinked data sources and vocabularies.

We will show in how far adapting, harmonizing and controlling the used vocabularies, as well as identifier systems between data provider and data publisher, improves the findability and re-usability of datasets , while keeping the curational overhead a slow as possible. This use case is a valuable example of how improving interoperability through harmonization efforts, though initially problematic and labor intensive, can benefits to a multitude of stakeholders in the long run: data users, publishers, research institutes, and funders.

How to cite: Huber, R., Devaraju, A., Diepenbroek, M., Schindler, U., Koppe, R., Dohna, T., Gordeev, E., and Rehage, M.: Enabling Data Reuse Through Semantic Enrichment of Instrumentation, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-7058, https://doi.org/10.5194/egusphere-egu2020-7058, 2020.

EGU2020-7937 | Displays | ESSI2.1

A Lightweight, Microservice-Based Research Data Management Architecture for Large Scale Environmental Datasets

Alexander Götz, Johannes Munke, Mohamad Hayek, Hai Nguyen, Tobias Weber, Stephan Hachinger, and Jens Weismüller

LTDS ("Let the Data Sing") is a lightweight, microservice-based Research Data Management (RDM) architecture which augments previously isolated data stores ("data silos") with FAIR research data repositories. The core components of LTDS include a metadata store as well as dissemination services such as a landing page generator and an OAI-PMH server. As these core components were designed to be independent from one another, a central control system has been implemented, which handles data flows between components. LTDS is developed at LRZ (Leibniz Supercomputing Centre, Garching, Germany), with the aim of allowing researchers to make massive amounts of data (e.g. HPC simulation results) on different storage backends FAIR. Such data can often, owing to their size, not easily be transferred into conventional repositories. As a result, they remain "hidden", while only e.g. final results are published - a massive problem for reproducibility of simulation-based science. The LTDS architecture uses open-source and standardized components and follows best practices in FAIR data (and metadata) handling. We present our experience with our first three use cases: the Alpine Environmental Data Analysis Centre (AlpEnDAC) platform, the ClimEx dataset with 400TB of climate ensemble simulation data, and the Virtual Water Value (ViWA) hydrological model ensemble.

How to cite: Götz, A., Munke, J., Hayek, M., Nguyen, H., Weber, T., Hachinger, S., and Weismüller, J.: A Lightweight, Microservice-Based Research Data Management Architecture for Large Scale Environmental Datasets, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-7937, https://doi.org/10.5194/egusphere-egu2020-7937, 2020.

EGU2020-9750 | Displays | ESSI2.1

A modular approach to cataloguing oceanographic data

Adam Leadbetter, Andrew Conway, Sarah Flynn, Tara Keena, Will Meaney, Elizabeth Tray, and Rob Thomas

The ability to access and search metadata for marine science data is a key requirement for answering fundamental principles of data management (making data Findable, Accessible, Interoperable and Reusable) and also in meeting domain-specific, community defined standards and legislative requirements placed on data publishers. Therefore, in the sphere of oceanographic data management, the need for a modular approach to data cataloguing which is designed to meet a number of requirements can be clearly seen. In this paper we describe a data cataloguing system developed at and in use at the Marine Institute, Ireland to meet the needs of legislative requirements including the European Spatial Data Infrastructure (INSPIRE) and the Marine Spatial Planning directive.

The data catalogue described here makes use of a metadata model focussed on oceanographic-domain. It comprises a number of key classes which will be described in detail in the paper, but which include:

  • Dataset - combine many different parameters, collected at multiple times and locations, using different instruments
  • Dataset Collection - provides a link between a Dataset Collection Activity and a Dataset, as well as linking to the Device(s) used to sample the environment for a given range of parameters. An example of a Dataset Collection may be the Conductivity-Temperature-Depth profiles taken on a research vessel survey allowing the individual sensors to be connected to the activity and the calibration of those sensors to be connected with the associated measurements. 
  • Dataset Collection Activity - a specialised dataset to cover such activities as research vessel cruises; or the deployments of  moored buoys at specific locations for given time periods
  • Platform - an entity from which observations may be made, such as a research vessel or a satellite
  • Programme - represents a formally recognized scientific effort receiving significant funding, requiring large scale coordination
  • Device - aimed at providing enough metadata for a given instance of an instrument to provide a skeleton SensorML record
  • Organisation - captures the details of research institutes, data holding centres, monitoring agencies, governmental and private organisations, that are in one way or another engaged in oceanographic and marine research activities, data & information management and/or data acquisition activities

The data model makes extensive use of controlled vocabularies to ensure both consistency and interoperability in the content of attribute fields for the Classes outlined above.

The data model has been implemented in a module for the Drupal open-source web content management system, and the paper will provide details of this application.

How to cite: Leadbetter, A., Conway, A., Flynn, S., Keena, T., Meaney, W., Tray, E., and Thomas, R.: A modular approach to cataloguing oceanographic data, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-9750, https://doi.org/10.5194/egusphere-egu2020-9750, 2020.

EGU2020-12281 | Displays | ESSI2.1

Modern Scientific Metadata Management: Atmospheric Radiation Measurement (ARM) Facility Data Center

Maggie Davis, Richard Cederwall, Giri Prakash, and Ranjeet Devarakonda

 

Atmospheric Radiation Measurement (ARM), a U.S. Department of Energy (DOE) scientific user facility, is a key geophysical data source for national and international climate research. Utilizing a standardized schema that has evolved since ARM inception in 1989, the ARM Data Center (ADC) processes over 1.8 petabytes of stored data across over 10,000 data products. Data sources include ARM-owned instruments, as well as field campaign datasets, Value Added Products, evaluation data to test new instrumentation or models, Principal Investigator data products, and external data products (e.g., NASA satellite data). In line with FAIR principles, a team of metadata experts classifies instruments and defines spatial and temporal metadata to ensure accessibility through the ARM Data Discovery. To enhance geophysical metadata collaboration across American and European organizations, this work will summarize processes and tools which enable the management of ARM data and metadata. For example, this presentation will highlight recent enhancements in-field campaign metadata workflows to handle the ongoing Multidisciplinary Drifting Observatory for the Study of Arctic Climate (MOSAiC) data. Other key elements of ARM data center include: the architecture of ARM data transfer and storage processes, evaluation of data quality, ARM consolidated databases. We will also discuss tools developed for identifying and recommending datastreams and enhanced DOI assignments for all data types to assist an interdisciplinary user base in selecting, obtaining, and using data as well as citing the appropriate data source for reproducible atmospheric and climate research.

How to cite: Davis, M., Cederwall, R., Prakash, G., and Devarakonda, R.: Modern Scientific Metadata Management: Atmospheric Radiation Measurement (ARM) Facility Data Center , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-12281, https://doi.org/10.5194/egusphere-egu2020-12281, 2020.

EGU2020-14755 | Displays | ESSI2.1

WMO Hydrological Observing System (WHOS) broker: implementation progress and outcomes

Enrico Boldrini, Paolo Mazzetti, Stefano Nativi, Mattia Santoro, Fabrizio Papeschi, Roberto Roncella, Massimiliano Olivieri, Fabio Bordini, and Silvano Pecora

The WMO Hydrological Observing System (WHOS) is a service-oriented System of Systems (SoS) linking hydrological data providers and users by enabling harmonized and real time discovery and access functionalities at global, regional, national and local scale. WHOS is being realized through a coordinated and collaborative effort amongst:

  • National Hydrological Services (NHS) willing to publish their data to the benefit of a larger audience,
  • Hydrologists, decision makers, app and portal authors willing to gain access to world-wide hydrological data,
  • ESSI-Lab of CNR-IIA responsible for the WHOS broker component: a software framework in charge of enabling interoperability amongst the distributed heterogeneous systems belonging to data providers (e.g. data publishing services) and data consumers (e.g. web portals, libraries and apps),
  • WMO Commission of Hydrology (CHy) providing guidance to WMO Member countries in operational hydrology, including capacity building, NHSs engagement and coordination of WHOS implementation.

In the last years two additional WMO regional programmes have been targeted to benefit from WHOS, operating as successful applications for others to follow:

  • Plata river basin,
  • Arctic-HYCOS.

Each programme operates with a “view” of the whole WHOS, a virtual subset composed only by the data sources that are relevant to its context.

WHOS-Plata is currently brokering data sources from the following countries:

  • Argentina (hydrological & meteorological data),
  • Bolivia (meteorological data; hydrological data expected in the near future),
  • Brazil (hydrological & meteorological data),
  • Paraguay (meteorological data; hydrological data in process),
  • Uruguay (hydrological & meteorological data).

WHOS-Arctic is currently brokering data sources from the following countries:

  • Canada (historical and real time data),
  • Denmark (historical data),
  • Finland (historical and real time data),
  • Iceland (historical and real time data),
  • Norway (historical and real time data),
  • Russian (historical and real time data),
  • United States (historical and real time data).

Each data source publishes its data online according to specific hydrological service protocols and/or APIs (e.g. CUAHSI HydroServer, USGS Water Services, FTP, SOAP, REST API, OData, WAF, OGC SOS, …). Each service protocol and API in turn implies support for a specific metadata and data model (e.g. WaterML, CSV, XML , JSON, USGS RDB, ZRXP, Observations & Measurements, …).

WHOS broker implements mediation and harmonization of all these heterogeneous standards, in order to seamlessly support discovery and access of all the available data to a growing set of data consumer systems (applications and libraries) without any implementation effort for them:

  • 52North Helgoland (through SOS v.2.0.0),
  • CUAHSI HydroDesktop (through CUAHSI WaterOneFlow),
  • National Water Institute of Argentina (INA) node.js WaterML client (through CUAHSI WaterOneFlow),
  • DAB JS API (through DAB REST API),
  • USGS GWIS JS API plotting library (through RDB service),
  • R scripts (through R WaterML library),
  • C# applications (through CUAHSI WaterOneFlow),
  • UCAR jOAI (through OAI-PMH/WIGOS metadata).

In particular, the support of WIGOS metadata standard provides a set of observational metadata elements for the effective interpretation of observational data internationally.

In addition to metadata and data model heterogeneity, WHOS needs to tackle also semantics heterogeneity. WHOS broker makes use of a hydrology ontology (made available as a SPARQL endpoint) to augment WHOS discovery capabilities (e.g. to obtain translation of a hydrology search parameter in multiple languages).

Technical documentation to exercise WHOS broker is already online available, while the official public launch with a dedicated WMO WHOS web portal is expected shortly.

How to cite: Boldrini, E., Mazzetti, P., Nativi, S., Santoro, M., Papeschi, F., Roncella, R., Olivieri, M., Bordini, F., and Pecora, S.: WMO Hydrological Observing System (WHOS) broker: implementation progress and outcomes, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-14755, https://doi.org/10.5194/egusphere-egu2020-14755, 2020.

EGU2020-15226 | Displays | ESSI2.1

OntoGeonous-GS: Implementation of an ontology for the geologic structures from the IUGS CGI and INSPIRE standards

Alizia Mantovani, Vincenzo Lombardo, and Fabrizio Piana

This contribution regards the encoding of an ontology for the GeologicStructure class. This is one of the sections of OntoGeonous, a bigger ontology for the geosciences principally devoted to the representation of the knowledge contained in the geological maps; the others regard the Geologic unit, Geomorphologic feature and Geologic event. OntoGeonous is developed by the University of Turin, Department of Computer Sciences, and the Institute of Geosciences and Earth Resources of the National Research Council of Italy (CNR-IGG).

The encoding of the knowledge is based on the definitions and hierarchical organization of the concepts proposed by the international standard: GeoScienceML directive(1) and INSPIRE Data Specification on Geology(2) drive the architecture at more general levels, while the broader/narrower representation by CGI vocabularies(3) provide the internal taxonomies of the specific sub-ontologies. 

The first release of OntoGeonous had a complete hierarchy for the GeologicUnit class, which is partly different from the organization of knowledge of the international standard, and taxonomies for GeologicStructure, GeologicEvent and GeomorphologicFeature. The encoding process of OntoGeonous is presented in Lombardo et al. (2018) and in the WikiGeo website(4), while a method of application to the geological maps is presented in Mantovani et al (2020).

This contribution shows how the international standard guided the encoding of the sub-ontology for the GeologicStructure and the innovations introduced in the general organization of OntoGeonous compared to the OntoGeonous first release.  The main differences come from the analysis of the UML schemata for the GeologicStructure subclasses(5): first, the presence of the FoldSystem class inspired the creation of more general class for the associations of features; second, the attempt to describe the NonDirectionalStructure class made us group all the remaining classes into a new class with opposite characteristics. Similar modification have been made all over the GeologicStructure ontology.

Our intent is to improve the formal description of geological knowledge in order to practically support the use of ontology-driven data model in the geological mapping task. 



Refereces

 

Lombardo, V., Piana, F., Mimmo, D. (2018). Semantics–informed geological maps: Conceptual modelling and knowledge encoding. Computers & Geosciences. 116. 10.1016/j.cageo.2018.04.001. 

 

Mantovani, A., Lombardo, V., Piana, F. (2020). Ontology-driven representation of knowledge for geological maps. (Submitted)

 

(1) http://www.geosciml.org. 

(2) http://inspire.jrc.ec.europa.eu/documents/Data_Specifications/INSPIRE_DataSpecification_GE_v3.0.pdf 

(3) http://resource.geosciml.org/def/voc/

(4) https://www.di.unito.it/wikigeo/index.php?title=Pagina_principale

(5) http://www.geosciml.org/doc/geosciml/4.1/documentation/html/EARoot/EA1/EA1/EA4/EA4/EA356.htm

 

How to cite: Mantovani, A., Lombardo, V., and Piana, F.: OntoGeonous-GS: Implementation of an ontology for the geologic structures from the IUGS CGI and INSPIRE standards, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-15226, https://doi.org/10.5194/egusphere-egu2020-15226, 2020.

EGU2020-15466 | Displays | ESSI2.1

Using standards to model delayed mode sensor processes

Alexandra Kokkinaki, Justin Buck, Emma Slater, Julie Collins, Raymond Cramer, and Louise Darroch

Ocean data are expensive to collect. Data reuse saves time and accelerates the pace of scientific discovery. For data to be re-usable the FAIR principles reassert the need for rich metadata and documentation that meet relevant community standards and provide information about provenance.

Approaches on sensor observations, are often inadequate at meeting FAIR; prescriptive with a limited set of attributes, while providing little or no provision for really important metadata about sensor observations later in the data lifecycle.

As part of the EU ENVRIplus project, our work aimed at capturing the delayed mode, data curation process taking place at the National Oceanography Centre’s British Oceanography Data Centre (BODC). Our solution uses Unique URIs, OGC SWE standards and controlled vocabularies, commencing from the submitted originators input and ending by the archived and published dataset. 

The BODC delayed mode process is an example of a physical system that is composed of several components like sensors and other computations processes such as an algorithm to compute salinity or absolute winds. All components are described in sensorML identified by unique URIs and associated with the relevant datastreams, which in turn are exposed on the web via ERDDAP using unique URIs.

In this paper we intend to share our experience in using OGC standards and ERDDAP to model the above mentioned process and publish the associated datasets in a unified way. The benefits attained, allow greater automation of data transferring, easy access to large volumes of data from a chosen sensor, more precise capturing of data provenance, standardization, and pave the way towards greater FAIRness of the sensor data and metadata, focusing on the delayed mode processing.

How to cite: Kokkinaki, A., Buck, J., Slater, E., Collins, J., Cramer, R., and Darroch, L.: Using standards to model delayed mode sensor processes, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-15466, https://doi.org/10.5194/egusphere-egu2020-15466, 2020.

EGU2020-18848 | Displays | ESSI2.1

Continuous ocean monitoring from sensor arrays on the UK large research vessels

Louise Darroch, Juan Ward, Alexander Tate, and Justin Buck

More than 40% of the human population live within 100 km of the sea. Many of these communities intimately rely on the oceans for their food, climate and economy. However, the oceans are increasingly being adversely affected by human-driven activities such as climate change and pollution. Many targeted, marine monitoring programmes (e.g. GOSHIP, OceanSITES) and pioneering observing technologies (e.g. autonomous underwater vehicles, Argo floats) are being used to assess the impact humans are having on our oceans. Such activities and platforms are deployed, calibrated and serviced by state-of-the-art research ships, multimillion-pound floating laboratories which operate diverse arrays of high-powered, high-resolution sensors around-the-clock (e.g. sea-floor depth, weather, ocean current velocity and hydrography etc.). These sensors, coupled with event and environmental metadata provided by the ships logs and crew, are essential for understanding the wider context of the science they support, as well as directly contributing to crucial scientific understanding of the marine environment and key strategic policies (e.g. United Nation’s Sustainable Development Goal 14). However, despite their high scientific value and cost, these data streams are not routinely brought together from UK large research vessels in coordinated, reliable and accessible ways that are fundamental to ensuring user trust in the data and any products generated from the data.  

The National Oceanography Centre (NOC) and British Antarctic Survey (BAS) are currently working together to improve the integrity of the data management workflow from sensor systems to end-users across the UK National Environment Research Council (NERC) large research vessel fleet, making cost effective use of vessel time while improving the FAIRness of data from these sensor arrays. The solution is based upon an Application Programming Interface (API) framework with endpoints tailored towards different end-users such as scientists on-board the vessels as well as the public on land. Key features include: Sensor triage using real-time automated monitoring systems, assuring sensors are working correctly and only the best data are output; Standardised digital event logging systems allowing data quality issues to be identified and resolved quickly; Novel open-source, data transport formats that are embedded with well-structured metadata, common standards and provenance information (such as controlled vocabularies and persistent identifiers), reducing ambiguity and enhancing interoperability across platforms; An open-source data processing application that applies quality control to international standards (SAMOS, or IOOS Qartod); Digital notebooks that manage and capture processing applied to data putting data into context; Democratisation and brokering of data through open data APIs (e.g. ERDDAP, Sensor Web Enablement), allowing end-users to discover and access data, layer their own tools or generate products to meet their own needs; Unambiguous provenance that is maintained throughout the data management workflow using instrument persistent identifiers, part of the latest recommendations by the Research Data Alliance (RDA).  

Access to universally interoperable oceanic data, with known quality and provenance, will empower a broad range of stakeholder communities, creating opportunities for innovation and impact through data use, re-use and exploitation.

How to cite: Darroch, L., Ward, J., Tate, A., and Buck, J.: Continuous ocean monitoring from sensor arrays on the UK large research vessels, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-18848, https://doi.org/10.5194/egusphere-egu2020-18848, 2020.

EGU2020-19895 | Displays | ESSI2.1

Towards an interoperability framework for observable property terminologies

Barbara Magagna, Gwenaelle Moncoiffe, Anusuriya Devaraju, Pier Luigi Buttigieg, Maria Stoica, and Sirko Schindler

In October 2019, a new working group (InteroperAble Descriptions of Observable Property Terminology or I-ADOPT WG1) officially launched its 18-month workplan under the auspices of the Research Data Alliance (RDA) co-led by ENVRI-FAIR2 project members. The goal of the group is to develop a community-wide, consensus framework for representing observable properties and facilitating semantic mapping between disjoint terminologies used for data annotation. The group has been active for over two years and comprises research communities, data centers, and research infrastructures from environmental sciences. The WG members have been heavily involved in developing or applying terminologies to semantically enrich the descriptions of measured, observed, derived, or computed environmental data. They all recognize the need to enhance interoperability between their efforts through the WG’s activities.

Ongoing activities of the WG include gathering user stories from research communities (Task 1), reviewing related terminologies and current annotation practices (Task 2) and - based on this - defining and iteratively refining requirements for a community-wide semantic interoperability framework (Task 3). Much like a generic blueprint, this framework will be a basis upon which terminology developers can formulate local design patterns while at the same time remaining globally aligned. This framework will assist interoperability between machine-actionable complex property descriptions observed across the environmental sciences, including Earth, space, and biodiversity science. The WG will seek to synthesize well-adopted but still disparate approaches into global best practice recommendations for improved alignment. Furthermore, the framework will help mediate between generic observation standards (O&M3, SSNO4, SensorML5, OBOE6, ..) and current community-led terminologies and annotation practices, fostering harmonized implementations of observable property descriptions. Altogether, the WG’s work will boost the Interoperability component of the FAIR principles (especially principle I3) by encouraging convergence and by enriching the terminologies with qualified references to other resources. We envisage that this will greatly enhance the global effectiveness and scope of tools operating across terminologies. The WG will thus strengthen existing collaborations and build new connections between terminology developers and providers, disciplinary experts, and representatives of scientific data user groups. 

In this presentation, we introduce the working group to the EGU community, and invite them to join our efforts. We report the methodology applied, the results from our first three tasks and the first deliverable, namely a catalog of domain-specific terminologies in use in environmental research, which will enable us to systematically compare existing resources for building the interoperability framework. 

1https://www.rd-alliance.org/groups/interoperable-descriptions-observable-property-terminology-wg-i-adopt-wg
2https://envri.eu/home-envri-fair/
3https://www.iso.org/standard/32574.html
4https://www.w3.org/TR/vocab-ssn/
5https://www.opengeospatial.org/standards/sensorml
6https://github.com/NCEAS/oboe/

How to cite: Magagna, B., Moncoiffe, G., Devaraju, A., Buttigieg, P. L., Stoica, M., and Schindler, S.: Towards an interoperability framework for observable property terminologies, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-19895, https://doi.org/10.5194/egusphere-egu2020-19895, 2020.

EGU2020-20448 | Displays | ESSI2.1

Supporting Users to Find Appropriate Visualizations of Spatio-Temporal Open Data Sets

Jan Schulte, Laura Helene Zepner, Stephan Mäs, Simon Jirka, and Petra Sauer

Over the last few years, a broad range of open data portals has been set-up. The aim of these portals is to improve the discoverability of open data resources and to strengthen the re-use of data generated by public agencies as well as research activities.

Often, such open data portals offer an immense amount of different types of data that may be relevant for a user. Thus, in order to facilitate the efficient and user-friendly exploration of available data sets, it is essential to visualize the data as quickly and easily as possible. While the visualization of static data sets is already well covered, selecting appropriate visualization approaches for potentially highly-dynamic spatio-temporal data sets is often still a challenge.

Within our contribution, we will introduce a preliminary study conducted by the mVIZ project which is funded by the German Federal Ministry of Transport and Digital Infrastructure as part of the mFUND programm. This project introduces a methodology to support the selection and creation of user-friendly visualizations for data discoverable via the open data portals such as the mCLOUD. During this process, specific consideration are given to properties and metadata of the datasets as input for a decision workflow to suggest appropriate visualization types. A resulting guideline will describe the methodology and serve as a basis for the conception, extension or improvement of visualization tools or for their further development and integration into open data portals.

The project focuses particularly on the creation of an inventory of open spatiotemporal data in open data portals as well as an overview of available visualization and analysis tools, the development of a methodology for selecting appropriate visualizations for the spatio-temporal data, and the development of a demonstrator for supporting the visualization of selected data sets.

How to cite: Schulte, J., Zepner, L. H., Mäs, S., Jirka, S., and Sauer, P.: Supporting Users to Find Appropriate Visualizations of Spatio-Temporal Open Data Sets, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-20448, https://doi.org/10.5194/egusphere-egu2020-20448, 2020.

EGU2020-21258 | Displays | ESSI2.1

Scaling metadata catalogues with web-based software version control and integration systems

Tara Keena, Adam Leadbetter, Andrew Conway, and Will Meaney

The ability to access and search metadata for marine science data is a key requirement for answering fundamental principles of data management (making data Findable, Accessible, Interoperable and Reusable) and also in meeting domain-specific, community defined standards and legislative requirements placed on data publishers. One of the foundations of effective data management is appropriate metadata cataloguing; the storing and publishing of descriptive metadata for end users to query online. However, with ocean observing systems constantly evolving and the number of autonomous platforms and sensors growing, the volume and variety of data is constantly increasing, therefore metadata catalogue volumes are also expanding. The ability for data catalogue infrastructures to scale with data growth is a necessity, without causing significant additional overhead, in terms of technical infrastructure and financial costs. 

To address some of these challenges, GitHub and Travis CI offers a potential solution for maintaining scalable data catalogues and hosting a variety of file types, all with minimal overhead costs.

GitHub is a repository hosting platform for version control and collaboration, and can be used with documents, computer code, or many file formats

GitHub Pages is a static website hosting service designed to host web pages directly from a GitHub repository

Travis CI is a hosted, distributed continuous integration service used to build and test projects hosted at GitHub 

GitHub supports the implementation of a data catalogue as it stores metadata records of different formats in an online repository which is openly accessible and version controlled. The base metadata of the data catalogue in the Marine Institute is ISO 19115/19139 based XML which is in compliance with the INSPIRE implementing rules for metadata. However, using Travis CI, hooks can be provided to build additional metadata records and formats from this base XML, which can also be hosted in the repository. These formats include:

DataCite metadata schema - allowing a completed data description entry to be exported in support of the minting of Digital Object Identifiers (DOI) for published data

Resource Description Framework (RDF) - as part of the semantic web and linked data

Ecological Metadata Language (EML) - for Global Biodiversity Information Facility (GBIF) – which is used to share information about where and when species have been recorded

Schema.org XML – which creates a structured data mark-up schema to increase search engine optimisation (SEO)

HTML - the standard mark-up language for web pages which can be used to represent the XML as a web pages for end users to view the catalogue online

 As well as hosting the various file types, GitHub Pages can also render the generated HTML pages as static web pages. This allows users to view and search the catalogue online via a generated static website. 

The functionality GitHub has to host and version control metadata files, and render them as web pages, allows for an easier and more transparent generation of an online data catalogue while catering for scalability, hosting and security.



How to cite: Keena, T., Leadbetter, A., Conway, A., and Meaney, W.: Scaling metadata catalogues with web-based software version control and integration systems , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21258, https://doi.org/10.5194/egusphere-egu2020-21258, 2020.

EGU2020-21522 | Displays | ESSI2.1

OpenSearch API for Earth observation DataHub service

Jovanka Gulicoska, Koushik Panda, and Hervé Caumont

OpenSearch is a de-facto standard specification and a collection of technologies that allow publishing of search results in a format suitable for syndication and aggregation. It is a way for websites and search engines to publish search results in a standard and accessible format.

Evolved through extensions within an international standards organisation, the Open Geospatial Consortium, OpenSearch has become a reference to make queries to a repository that contains Earth Observation information, to send and receive structured, standardized search requests and results, and to allow syndication of repositories. It is in this evolved form a shared API used by many applications, tools, portals and sites in the Earth sciences community. The OGC OpenSearch extensions that have been implemented for the NextGEOSS DataHub, following the OGC standards and validated to be fully compatible with the standard.

The OGC OpenSearch extensions implemented for CKAN, the open source software solution supporting the NextGEOSS Datahub, add the standardized metadata models and the OpenSearch API endpoints that allow the indexing of distributed EO data sources (currently over 110 data collections), and makes these available to client applications to perform queries and get the results. It allowed to develop a simple user interface as part of the NextGEOSS DataHub Portal, which implements the two-step search mechanism (leveraging data collections metadata and data products metadata) and translates the filtering done by users to an OpenSearch matching query. The user interface can render a general description document, that contains information about the collections available on the NextGEOSS DataHub, and then get a more detailed description document for each collection separately.

For generating the structure of the description documents and the result feed, we are using CKAN’s templates, and on top of that we are using additional files which are responsible for listing all available parameters and their options and perform validation on the query before executing. The search endpoint for getting the results feed, uses already existing CKANs API calls in order to perform the validation and get the filtered results taking into consideration the parameters of the user search.

The current NextGEOSS DataHub implementation therefore provides a user interface for users who are not familiar with Earth observation data collections and products, so they can easily create queries and access its results. Moreover, the NextGEOSS project partners are constantly adding additional data connectors and collecting new data sources that will become available through the OGC OpenSearch Extensions API. This will allow NextGEOSS to provide a variety of data for the users and accommodate their needs.

 

NextGEOSS is a H2020 Research and Development Project from the European Community under grant agreement 730329.

How to cite: Gulicoska, J., Panda, K., and Caumont, H.: OpenSearch API for Earth observation DataHub service, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21522, https://doi.org/10.5194/egusphere-egu2020-21522, 2020.

EGU2020-21882 | Displays | ESSI2.1

Research products across space missions – a prototype for central storage, visualization and usability

Mario D'Amore, Andrea Naß, Martin Mühlbauer, Torsten Heinen, Mathias Boeck, Jörn Helbert, Torsten Riedlinger, Ralf Jaumann, and Guenter Strunz

For planetary sciences, the main archives to archived access to mission data are ESA's Planetary Science Archive (PSA) and the Planetary Data System (PSA) nodes in the USA. Along with recent and upcoming planetary missions the amount of different data (remote sensing/in-situ data, derived products) increases constantly and serves as basis for scientific research resulting in derived scientific data and information. Within missions to Mercury (BepiColombo), the Outer Solar System moons (JUICE), and asteroids (NASA`s DAWN), one way of scientific analysis, the systematic mapping of surfaces, has received new impulses, also in Europe. These systematic surface analyses are based on the numeric and visual comparison and combination of different remote sensing data sets, such as optical image data, spectral-/hyperspectral sensor data, radar images, and/or derived products like digital terrain models. The analyses mainly results in map figures, data, and profiles/diagrams, and serves for describing research investigations within scientific publications.

Handling these research products equivalently to missions´ base data in the main archives, web-based geographic information systems became a common mean to impart spatial knowledge to all kinds of possible users in the last years. So, further platforms and initiatives came up handling planetary data within web-based GIS, services, or/and virtual infrastructures. Those systems are either built upon proprietary software environments, but more common upon a well-established stack of open source software such as PostgreSQL, GeoServer (server for sharing geospatial data) and a graphical user interface based on JavaScript. Applicable standards developed by the Open Geospatial Consortium (OGC), such as the Web Map Service (WMS) and the Web Feature Service (WFS) server-based data storage as interface between the user interface and the server.

 

This contribution aims to a prototypical system for the structured storage and visualization of planetary data compiled and developed within or with the contribution of Institute for Planetary Research (PF, DLR). Consequently, it enables user groups to store and spatially explore research products centrally, sustainably across multiple missions and scientific disciplines [1].

 

Technically, the system is based on two components: 1) an infrastructure that provides data storage and management capabilities as well as OGC-compliant interfaces for collaborative and web-based data access services, such as the EOC Geoservice [2]. 2) UKIS (Environmental and Crisis Information Systems), a framework developed at DFD for the implementation of geoscientific web applications [3]. Substantially the prototype based on a recent approach developed within PF [4] where an existing database established at Planetary Spectroscopy Laboratory (PSL), handling different kind of spatial data, meets a vector-based data collection of thematic, mainly geologic and geomorphologic mapping results [5].

 

An information system of this kind is essential to ensure the efficient and sustainable utilization of the information already obtained and published. This is considered a prerequisite for guaranteeing a continuous and long-term use of scientific information and knowledge within institutional frameworks.

 

[1] Naß, et al (2019) EPSC #1311

[2] Dengler et al. (2013) PV 2013, elib.dlr.de/86351/

[3] Mühlbauer (2019) dlr.de/eoc/UKIS/en/

[4] Naß, d ’Amore, Helbert (2017) EPSC #646-1

[5] Naß, Dawn Science Team (2019) EPSC #1304

How to cite: D'Amore, M., Naß, A., Mühlbauer, M., Heinen, T., Boeck, M., Helbert, J., Riedlinger, T., Jaumann, R., and Strunz, G.: Research products across space missions – a prototype for central storage, visualization and usability, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21882, https://doi.org/10.5194/egusphere-egu2020-21882, 2020.

EGU2020-22021 | Displays | ESSI2.1

An open-source database and collections management system for fish scale and otolith archives

Elizabeth Tray, Adam Leadbetter, Will Meaney, Andrew Conway, Caoimhín Kelly, Niall O’Maoileidigh, Elvira De Eyto, Siobhan Moran, and Deirdre Brophy

Scales and otoliths (ear stones) from fish are routinely sampled for age estimation and stock management purposes. Growth records from scales and otoliths can be used to generate long-term time series data, and in combination with environmental data, can reveal species specific population responses to a changing climate. Additionally, scale and otolith microchemical data can be utilized to investigate fish habitat usage and migration patters. A common problem associated with biological collections, is that while sample intake grows, long-term digital and physical storage is rarely a priority. Material is often collected to meet short-term objectives and resources are seldom committed to maintaining and archiving collections. As a consequence, precious samples are frequently stored in many different and unsuitable locations, and may become lost or separated from associated metadata. The Marine Institute’s ecological research station in Newport, Ireland, holds a multi-decadal (1928-2020) collection of scales and otoliths from various fish species, gathered from many geographic locations. Here we present an open-source database and archiving system to consolidate and digitize this collection, and show how this case study infrastructure could be used for other biological sample collections. The system utilizes the FAIR (Findable Accessible Interoperable and Reusable) open data principals, and includes a physical repository, sample metadata catalogue, and image library.

How to cite: Tray, E., Leadbetter, A., Meaney, W., Conway, A., Kelly, C., O’Maoileidigh, N., De Eyto, E., Moran, S., and Brophy, D.: An open-source database and collections management system for fish scale and otolith archives, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-22021, https://doi.org/10.5194/egusphere-egu2020-22021, 2020.

ESSI2.10 – Data Integration: Enabling the Acceleration of Science Through Connectivity, Collaboration, and Convergent Science

EGU2020-11709 | Displays | ESSI2.10

Streamlining Oceanic Biogeochemical Dataset Assembly in Support of Global Data Products

Eugene Burger, Benjamin Pfeil, Kevin O'Brien, Linus Kamb, Steve Jones, and Karl Smith

Data assembly in support of global data products, such as GLODAP, and submission of data to national data centers to support long-term preservation, demands significant effort. This is in addition to the effort required to perform quality control on the data prior to submission. Delays in data assembly can negatively affect the timely production of scientific indicators that are dependent upon these datasets, including products such as GLODAP. What if data submission, metadata assembly and quality control can all be rolled into a single application? To support more streamlined data management processes in the NOAA Ocean Acidification Program (OAP) we are developing such an application.This application has the potential for application towards a broader community.

This application addresses the need that data contributing to analysis and synthesis products are high quality, well documented, and accessible from the applications scientists prefer to use. The Scientific Data Integration System (SDIS) application developed by the PMEL Science Data Integration Group, allows scientists to submit their data in a number of formats. Submitted data are checked for common errors. Metadata are extracted from the data that can then be complemented with a complete metadata record using the integrated metadata entry tool that collects rich metadata that meets the Carbon science community requirements. Still being developed, quality control for standard biogeochemical parameters will be integrated into the application. The quality control routines will be implemented in close collaboration with colleagues from the Bjerknes Climate Data Centre (BCDC) within the Bjerknes Centre for Climate Research (BCCR).  This presentation will highlight the capabilities that are now available as well as the implementation of the archive automation workflow, and it’s potential use in support of GLODAP data assembly efforts.

How to cite: Burger, E., Pfeil, B., O'Brien, K., Kamb, L., Jones, S., and Smith, K.: Streamlining Oceanic Biogeochemical Dataset Assembly in Support of Global Data Products, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-11709, https://doi.org/10.5194/egusphere-egu2020-11709, 2020.

EGU2020-20966 | Displays | ESSI2.10

NextGEOSS data hub and platform - connecting data providers with geosciences communities

Bente Bye, Elnaz Neinavaz, Alaitz Zabala, Joan Maso, Marie-Francoise Voidrot, Barth De Lathouwer, Nuno Catarino, Pedro Gonzalves, Michelle Cortes, Koushik Panda, Julian Meyer-Arnek, and Bram Janssen

The geosciences communities share common challenges related to effective use of the vast and growing amount of data as well as the continueous development of new technology. It is therefore a great potential in learning from the experiences and knowledge aquired across the various fields. The H2020 project NextGEOSS is building a European data hub and platform to support the Earth observation communities with a set of tools and services through the platform. The suite of tools on the platform alllows scalablitly, interoperability and transparency in a flexible way, well suited to serve a multifaceted interdisciplinary community, NextGEOSS is developed with and for multiple communities, the NextGEOSS pilots. This has resulted and continues to provide transfer of experience and knowledge along the whole value chain from data provision to applications and services based on multiple sources of data. We will introduce the NextGEOSS data hub and platform and show some illustrative examples of the exchange of knowledge that facilitates faster uptake of data and advances in use of new technology. An onboarding system is benefitting for existing and new users. A capacity building strategy is an integral part of both the onboarding and the individual services, which will be highligthed in this presentation.

How to cite: Bye, B., Neinavaz, E., Zabala, A., Maso, J., Voidrot, M.-F., De Lathouwer, B., Catarino, N., Gonzalves, P., Cortes, M., Panda, K., Meyer-Arnek, J., and Janssen, B.: NextGEOSS data hub and platform - connecting data providers with geosciences communities, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-20966, https://doi.org/10.5194/egusphere-egu2020-20966, 2020.

EGU2020-12386 | Displays | ESSI2.10 | Highlight

Data dissemination best practices and challenges identified through NOAA’s Big Data Project

Meredith Richardson, Ed Kearns, and Jonathan O'Neil

Through satellites, ships, radars, and weather models, the National Oceanic and Atmospheric Administration (NOAA) generates and handles tens of terabytes of data per day. Many of NOAA’s key datasets have been made available to the public through partnerships with Google, Microsoft, Amazon Web Services, and more as part of the Big Data Project (BDP). This movement of data to the Cloud has enabled access for researchers from all over the world to vast amounts of NOAA data, initiating a new form of federal data management as well as exposing key challenges for the future of open-access data. NOAA researchers have run into challenges of providing “analysis-ready” datasets to which researchers from varying fields can easily access, manipulate, and use for different purposes. This issue arises as there is no agreed-upon format or method of transforming traditional datasets for the cloud across research communities, with each scientific field or start up expressing differing data formatting needs (cloud-optimized, cloud-native, etc.). Some possible solutions involve changing data formats into those widely-used throughout the visualization community, such as Cloud-Optimized GeoTIFF. Initial findings have led NOAA to facilitate roundtable discussions with researchers, public and private stakeholders, and other key members of the data community, to encourage the development of best practices for the use of public data on commercial cloud platforms. Overall, by uploading NOAA data to the Cloud, the BDP has led to the recognition and ongoing development of new best practices for data authentication and dissemination and the identification of key areas for targeting collaboration and data use across scientific communities.

How to cite: Richardson, M., Kearns, E., and O'Neil, J.: Data dissemination best practices and challenges identified through NOAA’s Big Data Project, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-12386, https://doi.org/10.5194/egusphere-egu2020-12386, 2020.

EGU2020-5972 | Displays | ESSI2.10 | Highlight

Data Systems to Enable Open Science: The Joint ESA-NASA Multi-Mission Algorithm and Analysis Platform’s Data Ecosystem
not presented

Kaylin Bugbee, Aaron Kaulfus, Aimee Barciauskas, Manil Maskey, Rahul Ramachandran, Dai-Hai Ton That, Chris Lynnes, Katrina Virts, Kel Markert, and Amanda Whitehurst

The scientific method within the Earth sciences is rapidly evolving. Ever increasing volumes require new methods for processing and understanding data while an almost 60 year Earth observation record makes more data-intensive retrospective analyses possible. These new methods of data analysis are made possible by technological innovations and interdisciplinary scientific collaborations. While scientists are beginning to adopt new technologies and collaborations to more effectively conduct data-intensive research, both the data information infrastructure and the supporting data stewardship model have been slow to change. Standard data products are generated at a processing system which are then ingested into local archives. These local archive centers then provide metadata to a centralized repository for search and discovery. Each step in the data process occurs independently and on different siloed components. Similarly, the data stewardship process has a well-established but narrow view of data publication that may be too constrained for an ever-changing data environment. To overcome these obstacles, a new approach is needed for both the data information infrastructure and stewardship models. The data ecosystem approach offers a solution to these challenges by placing an emphasis on the relationships between data, technologies and people. In this presentation, we present the Joint ESA-NASA Multi-Mission Algorithm and Analysis Platform’s (MAAP) data system as a forward-looking ecosystem solution. We will present the components needed to support the MAAP data ecosystem along with the key capabilities the MAAP data ecosystem supports. These capabilities include the ability for users to share data and software within the MAAP, the creation of analysis optimized data services, and the creation of an aggregated catalog for data discovery. We will also explore our data stewardship efforts within this new type of data system which includes developing a data management plan and a level of service plan.

How to cite: Bugbee, K., Kaulfus, A., Barciauskas, A., Maskey, M., Ramachandran, R., Ton That, D.-H., Lynnes, C., Virts, K., Markert, K., and Whitehurst, A.: Data Systems to Enable Open Science: The Joint ESA-NASA Multi-Mission Algorithm and Analysis Platform’s Data Ecosystem, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-5972, https://doi.org/10.5194/egusphere-egu2020-5972, 2020.

EGU2020-4616 | Displays | ESSI2.10 | Highlight

Documentation of climate change data supporting cross-domain data reuse

Martina Stockhause, Mark Greenslade, David Hassell, and Charlotte Pascoe

Climate change data and information is among those of the highest interest for cross-domain researchers, policy makers and the general public. Serving climate projection data to these diverse users requires detailed and accessible documentation.

Thus, the CMIP6 (Coupled Model Intercomparison Project Phase 6) data infrastructure consists not only of the ESGF (Earth System Grid Federation) as the data dissemination component but additionally of ES-DOC (Earth System Documentation) and the Citation Service for describing the provenance of the data. These services provide further information on the data creation process (experiments, models, …) and data reuse (data references and licenses) and connect the data to other external resources like research papers.

The contribution will present documentation of the climate change workflow around the furtherInfoURL page serving as an entry point. The challenges are to collect quality-controlled information from the international research community in different infrastructure components and to display them seamlessly alongside on the furtherInfoURL page.

 

References / Links:

  • CMIP6: https://pcmdi.llnl.gov/CMIP6/
  • ES-DOC: https://es-doc.org/
  • Citation Service: http://cmip6cite.wdc-climate.de

How to cite: Stockhause, M., Greenslade, M., Hassell, D., and Pascoe, C.: Documentation of climate change data supporting cross-domain data reuse, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-4616, https://doi.org/10.5194/egusphere-egu2020-4616, 2020.

EGU2020-8638 | Displays | ESSI2.10

Managing collaborative research data for integrated, interdisciplinary environmental research

Michael Finkel, Albrecht Baur, Tobias K.D. Weber, Karsten Osenbrück, Hermann Rügner, Carsten Leven, Marc Schwientek, Johanna Schlögl, Ulrich Hahn, Thilo Streck, Olaf A. Cirpka, Thomas Walter, and Peter Grathwohl

The consistent management of research data is crucial for the success of long-term and large-scale collaborative research. Research data management is the basis for efficiency, continuity, and quality of the research, as well as for maximum impact and outreach, including the long-term publication of data and their accessibility. Both funding agencies and publishers increasingly require this long term and open access to research data. Joint environmental studies typically take place in a fragmented research landscape of diverse disciplines; researchers involved typically show a variety of attitudes towards and previous experiences with common data policies, and the extensive variety of data types in interdisciplinary research poses particular challenges for collaborative data management.We present organizational measures, data and metadata management concepts, and technical solutions to form a flexible research data management framework that allows for efficiently sharing the full range of data and metadata among all researchers of the project, and smooth publishing of selected data and data streams to publicly accessible sites. The concept is built upon data type-specific and hierarchical metadata using a common taxonomy agreed upon by all researchers of the project. The framework’s concept has been developed along the needs and demands of the scientists involved, and aims to minimize their effort in data management, which we illustrate from the researchers’ perspective describing their typical workflow from the generation and preparation of data and metadata to the long-term preservation of data including their metadata.

How to cite: Finkel, M., Baur, A., Weber, T. K. D., Osenbrück, K., Rügner, H., Leven, C., Schwientek, M., Schlögl, J., Hahn, U., Streck, T., Cirpka, O. A., Walter, T., and Grathwohl, P.: Managing collaborative research data for integrated, interdisciplinary environmental research, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-8638, https://doi.org/10.5194/egusphere-egu2020-8638, 2020.

EGU2020-7115 | Displays | ESSI2.10 | Highlight

Data Management for Early Career Scientists – How to Tame the Elephant

Laia Comas-Bru and Marcus Schmidt

Data Management can be overwhelming, especially for Early Career Scientists. In order to give them a kick-start, the World Data System (WDS) organised a 3-day EGU-sponsored workshop on current achievements and future challenges in November 2019 in Paris. The purpose of the workshop was to provide Early Career Scientists with practical skills in data curation and management through a combination of practical sessions, group discussions and lectures. Participants were introduced to what are research data andcommon vocabulary to be used during the workshop. Later, a World Café session provided an opportunity to discuss individual challenges on data management and expectations of the workshop in small groups of peers. Lectures and discussions evolved around Open Science, Data Management Plans (DMP), data exchange, copyright and plagiarism, the use of Big Data, ontologies and cloud platforms in Science. Finally, the roles and responsibilities of the WDS as well as its WDS Early Career Researcher Network were discussed. Wrapping-up the workshop, attendees were walked through what is a data repository and how do they obtain their certifications.This PICO presentation given by two attendees of the workshop will showcase the main topics of discussion on data management and curation, provide key examples with special emphasis on the importance of creating a DMP at an early stage of your research project and share practical tools and advise on how to make data management more accessible.

How to cite: Comas-Bru, L. and Schmidt, M.: Data Management for Early Career Scientists – How to Tame the Elephant , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-7115, https://doi.org/10.5194/egusphere-egu2020-7115, 2020.

EGU2020-22052 | Displays | ESSI2.10

Towards Seamless Planetary-Scale Services

Peter Baumann and the Peter Baumann

Collaboration requires some minimum of common understanding, in the case of Earth data in particular common principles making data interchangeable, comparable, and combinable. Open standards help here; in case of Big Earth Data specifically the OGC/ISO Coverages standard. This unifying framework establishes  a common framework for regular and irregular grids, point clouds, and meshes., in particular: for spatio-temporal datacubes. Services grounding on such common understanding can be more uniform to access and handle, thereby implementing a principle of "minimal surprise" for users visiting different portals. Further, data combination and fusion benefits from canonical metadata allowing alignmen, e.g, between 2D DEMs, 3D satellite image timeseries, 4D atmospheric data.

The EarthServer federation is an open data center network offering dozens of Petabytes of a critical variety, such as radar and optical Copernicus data, atmospheric data, elevation data, and thematic cubes like global sea ice. Data centers like DIASs and CODE-DE, research organizations, companies, and agencies have teamed up in EarthServer. Strictly based on OGC standards, an ecosystem of data has been established that is available to users as a single pool, in particular for efficient distributed data fusion irrespective of data location.

The underlying datacube engine, rasdaman, enables location-transparent federation: clients can submit queries to any node, regardless of where data sit. Query evaluation is optimized automatically, including multi-data fusion of data residing on different nodes. Hence, users perceive one single, common information space. Thanks to the open standards, a broad spectrum of open-source and proprietary clients can utilize this federation, such ranging from OpenLayers and NASA WorldWind over QGIS and ArcGIS to python and R.

In our talk we present technology, services, and governance of this unique intercontinental line-up of data centers. A demo will show distributed datacube fusion live.

How to cite: Baumann, P. and the Peter Baumann: Towards Seamless Planetary-Scale Services , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-22052, https://doi.org/10.5194/egusphere-egu2020-22052, 2020.

EGU2020-18428 | Displays | ESSI2.10

Selection and integration of Earth Observation-based data for an operational disease forecasting system

Eleanor A Ainscoe, Barbara Hofmann, Felipe Colon, Iacopo Ferrario, Quillon Harpham, Samuel JW James, Darren Lumbroso, Sajni Malde, Francesca Moschini, and Gina Tsarouchi

The current increase in the volume and quality of Earth Observation (EO) data being collected by satellites offers the potential to contribute to applications across a wide range of scientific domains. It is well established that there are correlations between characteristics that can be derived from EO satellite data, such as land surface temperature or land cover, and the incidence of some diseases. Thanks to the reliable frequent acquisition and rapid distribution of EO data it is now possible for this field to progress from using EO in retrospective analyses of historical disease case counts to using it in operational forecasting systems.

However, bringing together EO-based and non-EO-based datasets, as is required for disease forecasting and many other fields, requires carefully designed data selection, formatting and integration processes. Similarly, it requires careful communication between collaborators to ensure that the priorities of that design process match the requirements of the application.

Here we will present work from the D-MOSS (Dengue forecasting MOdel Satellite-based System) project. D-MOSS is a dengue fever early warning system for South and South East Asia that will allow public health authorities to identify areas at high risk of disease epidemics before an outbreak occurs in order to target resources to reduce spreading of epidemics and improve disease control. The D-MOSS system uses EO, meteorological and seasonal weather forecast data, combined with disease statistics and static layers such as land cover, as the inputs into a dengue fever model and a water availability model. Water availability directly impacts dengue epidemics due to the provision of mosquito breeding sites. The datasets are regularly updated with the latest data and run through the models to produce a new monthly forecast. For this we have designed a system to reliably feed standardised data to the models. The project has involved a close collaboration between remote sensing scientists, geospatial scientists, hydrologists and disease modelling experts. We will discuss our approach to the selection of data sources, data source quality assessment, and design of a processing and ingestion system to produce analysis-ready data for input to the disease and water availability models.

How to cite: Ainscoe, E. A., Hofmann, B., Colon, F., Ferrario, I., Harpham, Q., James, S. J., Lumbroso, D., Malde, S., Moschini, F., and Tsarouchi, G.: Selection and integration of Earth Observation-based data for an operational disease forecasting system, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-18428, https://doi.org/10.5194/egusphere-egu2020-18428, 2020.

EGU2020-8937 | Displays | ESSI2.10

Realizing Maximum Transparency of Oceanographic Data Processing and Data Quality Control for Different End-User Communities

Manuela Köllner, Mayumi Wilms, Anne-Christin Schulz, Martin Moritz, Katrin Latarius, Holger Klein, Kai Herklotz, and Kerstin Jochumsen

Reliable data are the basis for successful research and scientific publishing. Open data policies assure the availability of publicly financed field measurements to the public, thus to all interested scientists. However, the variety of data sources and the availability or lack of detailed metadata cause a huge effort for each scientist to decide if the data are usable for their own research topic or not. Data end-user communities have different requirements in metadata details and data handling during data processing. For data providing institutes or agencies, these needs are essential to know, if they want to reach a wide range of end-user communities.

The Federal Maritime and Hydrographic Agency (BSH, Bundesamt für Seeschifffahrt und Hydrographie, Hamburg, Germany) is collecting a large variety of field data in physical and chemical oceanography, regionally focused on the North Sea, Baltic Sea, and North Atlantic. Data types vary from vertical profiles, time-series, underway measurements as well as real-time or delayed-mode from moored or ship-based instruments. Along other oceanographic data, the BSH provides all physical data via the German Oceanographic Data Center (DOD). It is crucial to aim for a maximum in reliability of the published data to enhance the usage especially in the scientific community.

Here, we present our newly established data processing and quality control procedures using agile project management and workflow techniques, and outline their implementation into metadata and accompanied documentation. To enhance the transparency of data quality control, we will apply a detailed quality flag along with the common data quality flag. This detailed quality flag, established by Mayumi Wilms within the research project RAVE Offshore service (research at alpha ventus) enables data end-users to review the result of several individual quality control checks done during processing and thus to identify easily if the data are usable for their research.

How to cite: Köllner, M., Wilms, M., Schulz, A.-C., Moritz, M., Latarius, K., Klein, H., Herklotz, K., and Jochumsen, K.: Realizing Maximum Transparency of Oceanographic Data Processing and Data Quality Control for Different End-User Communities, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-8937, https://doi.org/10.5194/egusphere-egu2020-8937, 2020.

In Europe, the Marine Strategy Framework Directive (MSFD) seeks to achieve a good environmental status of European marine waters and protect the resource base on which economic and social activities related to the sea depend. With this legislative tool the European Parliament recognizes the vital importance of the management of human activities that have an impact on the marine environment, integrating the concepts of environmental protection and sustainable use.
MSFD establishes a monitoring program of different descriptors for continuous evaluation and periodic updating of the objectives. In Spain, the Ministry of Ecological Transition (MITECO) is responsible and coordinator of carrying out the MSFD, but it is the Spanish Institute of Oceanography (IEO) that performs the research and study of the different indicators and therefore the tasks of collecting oceanographic data.
The Geographic Information Systems Unit of the IEO is responsible for storing, debugging and standardizing this data by including them in the IEO Spatial Data Infrastructure (IDEO). IDEO has useful and advanced tools to discover and manage the oceanographic, spatial or non-spatial data that the IEO manages. To facilitate access to IDEO, the IEO Geoportal was developed, which essentially contains a catalog of metadata and access to different IEO web services and data viewers.
Some examples of priority dataset for the MSFD are: Species and Habitat distribution, commercially-exploited fish and shellfish species distribution, Nutrients, Chlorophyll a, dissolved oxygen, spatial extent of loss of seabed, Contaminants, litter, noise, etc.
The correct preparation and harmonization of the mentioned data sets following the Implementing Rules adopted by the INSPIRE Directive is essential to ensure that the different Spatial Data Infrastructures (SDI) of the member states are compatible and interoperable in the community context.
The INSPIRE Directive was born with the purpose of making relevant, concerted and quality geographic information available in a way that allows the formulation, implementation, monitoring and evaluation of the impact or territorial dimension policies of the European Union.
The geographic data sets, together with their corresponding metadata, constitute the cartographic base on which the information collected for the update of the continuous evaluation of the different descriptors of the MSFD is structured.
Thus, although these datasets are intended for use by public institutions responsible for decision-making on the management of the marine environment, they can also be very useful for a wide range of stakeholders and reused for multiple purposes.
With all this in mind, the INSPIRE Directive is extremely interesting and essential for the tasks required for the MSFD. As with work on our projects related to the Marine Space Planning Directive (MSP).

How to cite: Bruque, G. and Tello, O.: Managing oceanographic data for the Marine Strategy Framework Directive in Spain supported by the Spatial Data Infrastructure of the Spanish Institute of Oceanography (IEO) and the INSPIRE Directive, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21975, https://doi.org/10.5194/egusphere-egu2020-21975, 2020.

EGU2020-9686 | Displays | ESSI2.10

Python-based Multidimensional and Parallel Climate Model Data Analysis in ECAS

Regina Kwee, Tobias Weigel, Hannes Thiemann, Karsten Peters, Sandro Fiore, and Donatello Elia

This contribution highlights the Python xarray technique in context of a climate specific application (typical formats are NetCDF, GRIB and HDF).

We will see how to use in-file metadata and why they are so powerful for data analysis, in particular by looking at community specific problems, e.g. one can select purely on coordinate variable names. ECAS, the ENES Climate Analytics Service available at Deutsches Klimarechenzentrum (DKRZ), will help by enabling faster access to the high-volume simulation data output from climate modeling experiments. In this respect, we can also make use of “dask” which was developed for parallel computing and can smoothly work with xarray. This is extremely useful when we want to exploit fully the advantages of our supercomputer.

Our fully integrated service offers an interface via Jupyter notebooks (ecaslab.dkrz.de). We provide an analysis environment without the need of costly transfers, accessing CF standardized data files and all accessible via the ESGF portal on our nodes (esgf-data.dkrz.de). We can analyse the data of e.g. CMIP5, CMIP6, Grand Ensemble and observation data. ECAS was developed in the frame of European Open Source Cloud (EOSC) hub.

How to cite: Kwee, R., Weigel, T., Thiemann, H., Peters, K., Fiore, S., and Elia, D.: Python-based Multidimensional and Parallel Climate Model Data Analysis in ECAS, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-9686, https://doi.org/10.5194/egusphere-egu2020-9686, 2020.

EGU2020-15961 | Displays | ESSI2.10

Automatic quality control and quality control schema in the Observation to Archive

Brenner Silva, Najmeh Kaffashzadeh, Erik Nixdorf, Sebastian Immoor, Philipp Fischer, Norbert Anselm, Peter Gerchow, Angela Schäfer, and Roland Koppe and the Computing and data center

The O2A (Observation to Archive) is a data-flow framework for heterogeneous sources, including multiple institutions and scales of Earth observation. In the O2A, once data transmission is set up, processes are executed to automatically ingest (i.e. collect and harmonize) and quality control data in near real-time. We consider a web-based sensor description application to support transmission and harmonization of observational time-series data. We also consider a product-oriented quality control, where a standardized and scalable approach should integrate the diversity of sensors connected to the framework. A review of literature and observation networks of marine and terrestrial environments is under construction to allow us, for example, to characterize quality tests in use for generic and specific applications. In addition, we use a standardized quality flag scheme to support both user and technical levels of information. In our outlook, a quality score should pair the quality flag to indicate the overall plausibility of each individual data value or to measure the flagging uncertainty. In this work, we present concepts under development and give insights into the data ingest and quality control currently operating within the O2A framework.

How to cite: Silva, B., Kaffashzadeh, N., Nixdorf, E., Immoor, S., Fischer, P., Anselm, N., Gerchow, P., Schäfer, A., and Koppe, R. and the Computing and data center: Automatic quality control and quality control schema in the Observation to Archive, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-15961, https://doi.org/10.5194/egusphere-egu2020-15961, 2020.

High-level satellite remote sensing products of Earth surface play an irreplaceable role in global climate change, hydrological cycle modeling and water resources management, environment monitoring and assessment. Earth surface high-level remote sensing products released by NASA, ESA and other agencies are routinely derived from any single remote sensor. Due to the cloud contamination and limitations of retrieval algorithms, the remote sensing products derived from single remote senor are suspected to the incompleteness, low accuracy and less consistency in space and time. Some land surface remote sensing products, such as soil moisture products derived from passive microwave remote sensing data have too coarse spatial resolution to be applied at local scale. Fusion and downscaling is an effective way of improving the quality of satellite remote sensing products.

We developed a Bayesian spatio-temporal geostatistics-based framework for multiple remote sensing products fusion and downscaling. Compared to the existing methods, the presented method has 2 major advantages. The first is that the method was developed in the Bayesian paradigm, so the uncertainties of the multiple remote sensing products being fused or downscaled could be quantified and explicitly expressed in the fusion and downscaling algorithms. The second advantage is that the spatio-temporal autocorrelation is exploited in the fusion approach so that more complete products could be produced by geostatistical estimation.

This method has been applied to the fusion of multiple satellite AOD products, multiple satellite SST products, multiple satellite LST products and downscaling of 25 km spatial resolution soil moisture products. The results were evaluated in both spatio-temporal completeness and accuracy.

How to cite: Bo, Y.: Bayesian Spatio-temporal Geostatistics-based Method for Multiple Satellite Products Fusion and Downscaling, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-20814, https://doi.org/10.5194/egusphere-egu2020-20814, 2020.

EGU2020-8973 | Displays | ESSI2.10

Space debris monitoring based on inter-continental stereoscopic detections

Alessandro Sozza, Massimo Cencini, Leonardo Parisi, Marco Acernese, Fabio Santoni, Fabrizio Piergentili, Stefania Melillo, and Andrea Cavagna

The monitoring of space debris and satellites orbiting around Earth is an essential topic in the space surveillance. The impact of debris, even of small size, against active spatial installations causes serious damage, malfunctions and potential service interruptions. Collision-avoidance maneuverings are often performed but they require increasingly complex protocols. Density of space debris is now so high that even astronomical observations are often degraded by it. Although it does not affect space weather, it may interfere with weather satellites.
We have developed an innovative experimental technique based on stereometry at intercontinental scale to obtain simultaneous images from two optic observatories, installed in Rome (at the Urbe Airport and in Collepardo on the Apennines) and in Malindi (Kenya). From the observations on Earth, it's possible to reconstruct the three-dimensional position and velocity of the objects. The distance between the two observatories is crucial for an accurate reconstruction. In particular, we have considered the sites of Urbe and Collepardo, with a baseline of 80 km, to detected Low-Earth orbits (LEO), while we have considered a baseline of 6000 km, between Urbe and Malindi, to observe geostationary orbits (GEO).
We will present the validation of the three-dimensional reconstruction method via a fully synthetic procedure that propagate the satellite trajectory, using SGP4 model and TLEs data (provided by NASA), and generate synthetic photographs of the satellite passage from the two observatories. Then we will compare the synthetic results with the experimental results obtained using real optic systems. The procedure can be automatized to identify unknown space objects and even generalized for an arbitrary number of sites of observation. The identified debris will be added to the catalogue DISCOS (Database and Information System Characterizing Objects in Space) owned by the European Space Agency (ESA) to improve the space surveillance and the ability to intervene in the case of potential risks. 

How to cite: Sozza, A., Cencini, M., Parisi, L., Acernese, M., Santoni, F., Piergentili, F., Melillo, S., and Cavagna, A.: Space debris monitoring based on inter-continental stereoscopic detections, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-8973, https://doi.org/10.5194/egusphere-egu2020-8973, 2020.

For the country and human society, it is a very important and meaningful work to make the mines mining controlled and rationally. Otherwise, illegal mining and unreasonable abandonment will cause waste and loss of resources. With the features of convenient, cheap, and instantaneous, remote sensing technology makes it possible to automatic monitoring the mines mining in large-scale.

We proposed a mine mining change detection framework based on multitemporal remote sensing images. In this framework, the status of mine mining is divided into mining in progress and stopped mining. Based on the multitemporal GF-2 satellite data and the mines mining data from Beijing, China, we have built a mines mining change dataset(BJMMC dataset), which includes two types, from mining to mining, and from mining to discontinued mining. And then we implement a new type of semantic change detection based on convolutional neural networks (CNNs), which involves intuitively inserting semantics into the detected change regions.

We applied our method to the mining monitoring of the Beijing area in another year, and combined with GIS data and field work, the results show that our proposed monitoring method has outstanding performance on the BJMMC dataset.

How to cite: Li, C.: Automatic Monitoring of Mines Mining based on Multitemporal Remote Sensing Image Change Detection, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-12012, https://doi.org/10.5194/egusphere-egu2020-12012, 2020.

EGU2020-16632 | Displays | ESSI2.10

Estimation of Vegetation Proportion Cover to Improve Land Surface Emissivity

Elnaz Neinavaz, Andrew K. Skidmore, and Roshanak Darvishzadeh

Precise estimation of land surface emissivity (LSE) is essential to predict land surface energy budgets and land surface temperature, as LSE is an indicator of material composition. There exist several approaches to LSE estimation employing remote sensing data; however, the prediction of LSE remains a challenging task. Among the existing approaches for calculating LSE, the NDVI threshold method appears to hold well over vegetated areas. To apply the NDVI threshold method, it is necessary to know the proportion of vegetation cover (Pv). This research aims to investigate the impact of Pv's prediction accuracy on the estimation of LSE over the forest ecosystem. In this regard, a field campaign coinciding with a Landsat-8 overpass was undertaken for the mixed temperate forest of the Bavarian Forest National Park, in southeastern Germany. The Pv in situ measurements were made for 37 plots. Four vegetation indices, namely NDVI, variable atmospherically resistant index, wide dynamic range vegetation index, and three-band gradient difference vegetation index, were applied to predict Pv for further use in LSE computing. Unlike previous studies that suggested variable atmospherically resistant index can be estimated Pv with higher prediction accuracy compared to NDVI over the agricultural area, our results showed that the prediction accuracy of Pv is not different when using NDVI over the forest (R2CV = 0.42, RMSECV = 0.06). Pv was measured with the lowest accuracy using the wide dynamic range vegetation index (R2CV = 0.014, RMSECV = 0.197) and three-band gradient difference vegetation index (R2CV = 0.032, RMSECV = 0.018).  The results of this study also revealed that the variation in the prediction accuracy of the Pv has an impact on the results of LSE calculation.

How to cite: Neinavaz, E., Skidmore, A. K., and Darvishzadeh, R.: Estimation of Vegetation Proportion Cover to Improve Land Surface Emissivity, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-16632, https://doi.org/10.5194/egusphere-egu2020-16632, 2020.

ESSI2.11 – Earth/Environmental Science Applications on HPC and Cloud Infrastructures

EGU2020-11280 | Displays | ESSI2.11 | Highlight

Heterogeneous cloud-supercomputing framework for daily seismic noise source inversion

Alexey Gokhberg, Laura Ermert, Jonas Igel, and Andreas Fichtner

The study of ambient seismic noise sources and their time- and space-dependent distribution is becoming a crucial component of the real-time monitoring of various geosystems, including active fault zones and volcanoes, as well as geothermal and hydrocarbon reservoirs. In this context, we have previously implemented a combined cloud - HPC infrastructure for production of ambient source maps with high temporal resolution. It covers the entire European continent and the North Atlantic, and is based on seismic data provided by the ORFEUS infrastructure. The solution is based on the Application-as-a-Service concept and includes (1) acquisition of data from distributed ORFEUS data archives, (2) noise source mapping, (3) workflow management, and (4) front-end Web interface to end users.

We present the new results of this ongoing project conducted with support of the Swiss National Supercomputing Centre (CSCS). Our recent goal has been transitioning from mapping the seismic noise sources towards modeling them based on our new method for near real-time finite-frequency ambient seismic noise source inversion. To invert for the power spectral density of the noise source distribution of the secondary microseisms we efficiently forward model global cross-correlation wavefields for any noise distribution. Subsequently, a gradient-based iterative inversion method employing finite-frequency sensitivity kernels is implemented to reduce the misfit between synthetic and observed cross correlations.

During this research we encountered substantial challenges related to the large data volumes and high computational complexity of involved algorithms. We handle these problems by using the CSCS massively parallel heterogeneous supercomputer "Piz Daint". We also apply various specialized numeric techniques which include: (1) using precomputed Green's functions databases generated offline with Axisem and efficiently extracted with Instaseis package and (2) our previously developed high performance package for massive cross correlation of seismograms using GPU accelerators. Furthermore, due to the inherent restrictions of supercomputers, some crucial components of the processing pipeline including the data acquisition and workflow management are deployed on the OpenStack cloud environment. The resulting solution combines the specific advantages of the supercomputer and cloud platforms thus providing a viable distributed platform for the large-scale modeling of seismic noise sources.

How to cite: Gokhberg, A., Ermert, L., Igel, J., and Fichtner, A.: Heterogeneous cloud-supercomputing framework for daily seismic noise source inversion, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-11280, https://doi.org/10.5194/egusphere-egu2020-11280, 2020.

EGU2020-13518 | Displays | ESSI2.11

Supporting Multi-cloud Model Execution with VLab

Mattia Santoro, Paolo Mazzetti, Nicholas Spadaro, and Stefano Nativi

The VLab (Virtual Laboratory), developed in the context of the European projects ECOPOTENTIAL and ERA-PLANET, is a cloud-based platform to support the activity of environmental scientists in sharing their models. The main challenges addressed by VLab are: (i) minimization of interoperability requirements in the process of model porting (i.e. to simplify as much as possible the process of publishing and sharing a model for model developers) and (ii) support multiple programming languages and environments (it must be possible porting models developed in different programming languages and which use an arbitrary set of libraries).

In this presentation we describe how VLab supports a multi-cloud deployment approach and the benefits.

In this presentation we describe VLab architecture and, in particular, how this enables supporting a multi-cloud deployment approach.

Deploying VLab on different cloud environments allows model execution where it is most convenient, e.g. depending on the availability of required data (move code to data).

This was implemented in the web application for Protected Areas, developed by the Joint Research Centre of the European Commission (EC JRC) in the context of the EuroGEOSS Sprint to Ministerial activity and demonstrated at the last GEO-XVI Plenary meeting in Canberra. The web application demonstrates the use of Copernicus Sentinel data to calculate Land Cover and Land Cover change in a set of Protected Areas belonging to different ecosystems. Based on user’s selection of satellite products to use, the different available cloud platforms where to run the model are presented along with their data availability for the selected products. After the platform selection, the web application utilizes the VLab APIs to launch the EODESM (Earth Observation Data for Ecosystem Monitoring) model (Lucas and Mitchell, 2017), monitoring the execution status and retrieve the output.

Currently, VLab was experimented with the following cloud platforms: Amazon Web Services, three of the 4+1 the Coperncius DIAS platforms (namely: ONDA, Creodias and Sobloo) and the European Open Science Cloud (EOSC).

Another possible scenario empowered by this multi-platform deployment feature is the possibility to let the user choose the computational platform and utilize her/his credentials to request the needed computational resources. Finally, it is also possible to exploit this feature for benchmarking different cloud platforms with respect to their performances.

 

References

Lucas, R. and A. Mitchell (2017). "Integrated Land Cover and Change Classifications"The Roles of Remote Sensing in Nature Conservation, pp. 295–308.

 

How to cite: Santoro, M., Mazzetti, P., Spadaro, N., and Nativi, S.: Supporting Multi-cloud Model Execution with VLab, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-13518, https://doi.org/10.5194/egusphere-egu2020-13518, 2020.

EGU2020-12904 | Displays | ESSI2.11

Accelerated hydrologic modeling: ParFlow GPU implementation

Jaro Hokkanen, Jiri Kraus, Andreas Herten, Dirk Pleiter, and Stefan Kollet

  ParFlow is known as a numerical model that simulates the hydrologic cycle from the bedrock to the top of the plant canopy. The original codebase provides an embedded Domain-Specific Language (eDSL) for generic numerical implementations with support for supercomputer environments (distributed memory parallelism), on top of which the hydrologic numerical core has been built.
  In ParFlow, the newly developed optional GPU acceleration is built directly into the eDSL headers such that, ideally, parallelizing all loops in a single source file requires only a new header file. This is possible because the eDSL API is used for looping, allocating memory, and accessing data structures. The decision to embed GPU acceleration directly into the eDSL layer resulted in a highly productive and minimally invasive implementation.
  This eDSL implementation is based on C host language and the support for GPU acceleration is based on CUDA C++. CUDA C++ has been under intense development during the past years, and features such as Unified Memory and host-device lambdas were extensively leveraged in the ParFlow implementation in order to maximize productivity. Efficient intra- and inter-node data transfer between GPUs rests on a CUDA-aware MPI library and application side GPU-based data packing routines.
  The current, moderately optimized ParFlow GPU version runs a representative model up to 20 times faster on a node with 2 Intel Skylake processors and 4 NVIDIA V100 GPUs compared to the original version of ParFlow, where the GPUs are not used. The eDSL approach and ParFlow GPU implementation may serve as a blueprint to tackle the challenges of heterogeneous HPC hardware architectures on the path to exascale.

How to cite: Hokkanen, J., Kraus, J., Herten, A., Pleiter, D., and Kollet, S.: Accelerated hydrologic modeling: ParFlow GPU implementation, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-12904, https://doi.org/10.5194/egusphere-egu2020-12904, 2020.

EGU2020-17296 | Displays | ESSI2.11 | Highlight

Copernicus Data Infrastructure NRW

Arne de Wall, Albert Remke, Thore Fechner, Jan van Zadelhoff, Andreas Müterthies, Sönke Müller, Adrian Klink, Dirk Hinterlang, Matthias Herkt, and Christoph Rath

The Competence Center Remote Sensing of the State Agency for Nature, Environment and Consumer Protection North Rhine-Westphalia (LANUV NRW, Germany) uses data from the Earth observation infrastructure Copernicus to support nature conservation tasks. Large amounts of data and computationally intensive processing chains (ingestion, pre-processing, analysis, dissemination) as well as satellite and in-situ data from many different sources have to be processed to produce statewide information products. Other state agencies and larger local authorities of NRW have similar requirements. Therefore, the state computing center (IT.NRW) has started to develop a Copernicus Data Infrastructure in NRW in cooperation with LANUV, other state authorities and partners from research and industry to meet their various needs. 

The talk presents the results of a pilot project in which the architecture of a Copernicus infrastructure node for the common Spatial Data Infrastructure of the state was developed. It is largely based on cloud technologies (i.a. Docker, Kubernetes). The implementation of the architectural concept comprised as a use case of an effective data analysis procedure to monitor orchards in North Rhine-Westphalia. In addition to Sentinel 1 and Sentinel 2 data, the new Copernicus Data Infrastructure processes digital terrain models, digital surface models and LIDAR-based data products. Finally we will discuss the experience gained, lessons learned, and conclusions for further developments of the Copernicus Data Infrastructure in North-Rhine Westphalia.

How to cite: de Wall, A., Remke, A., Fechner, T., van Zadelhoff, J., Müterthies, A., Müller, S., Klink, A., Hinterlang, D., Herkt, M., and Rath, C.: Copernicus Data Infrastructure NRW, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-17296, https://doi.org/10.5194/egusphere-egu2020-17296, 2020.

EGU2020-18749 | Displays | ESSI2.11

Towards a scalable framework for earth science simulation models, using asynchronous many-tasks

Kor de Jong, Derek Karssenberg, Deb Panja, and Marc van Kreveld

Computer models are built with a specific purpose (or scope) and runtime platform in mind. The purpose of an earth science simulation model might be to be able to predict the spatio-temporal distribution of fresh water resources at a continental scale, and the runtime platform might be a single CPU core in a single desktop computer running one of the popular operating systems. Model size and complexity tend to increase over time, for example due to the availability of more detailed input data. At some point, such models need to be ported to more powerful runtime platforms, containing more cores or nodes that can be used in parallel. This complicates the model code and requires additional skills of the model developer.

Designing models requires the knowledge of domain experts, while developing models requires software engineering skills. By providing facilities for representing state variables and a set of generic modelling algorithms, a modelling framework makes it possible for domain experts without a background in software engineering to create and maintain models. An example of such a modelling framework is PCRaster [3], and examples of models created with it are the PCRGLOB-WB global hydrological and water resources model [2], and the PLUC high resolution continental scale land use change model [4].

Models built using a modelling framework are portable to all runtime platforms on which the framework is available. Ideally, this includes all popular runtime platforms, ranging from shared memory laptops and desktop computers to clusters of distributed memory nodes. In this work we look at an approach for designing a mod elling framework for the development of earth science models using asynchronous many-tasks (AMT). AMT is a programming model that can be used to write software in terms of relatively small tasks, with dependencies between them. During the execution of the tasks, new tasks can be added to the set. An advantage of this approach is that it allows for a clear separation of concerns between the model code and the code for executing work. This allows models to be expressed using traditional procedural code, while the work is performed asynchronously, possibly in parallel and distributed.

We designed and implemented a distributed array data structure and an initial set of modelling algorithms, on top of an implementation of the AMT programming model, called HPX [1]. HPX provides a single C++ API for defining asynchronous tasks and their dependencies, that execute locally or on remote nodes. We performed experiments to gain insights in the scalability of the individual algorithms and simple models in which these algorithms are combined.

In the presentation we will explain key aspects of the AMT programming model, as implemented in HPX, how we used the programming model in our framework, and the results of our scalability experiments of models built with the framework.

References
[1] HPX V1.3.0. http://doi.acm.org/10.1145/2676870.2676883, 5 2019.

[2] E. H. Sutanudjaja et al. PCR-GLOBWB 2: a 5 arc-minute global hydrological and water resources model. Geoscientific Model Development Discussions, pages 1–41, dec 2017.

[3] The PCRaster environmental modelling framework. https://www.pcraster.eu

[4] PLUC model. https://github.com/JudithVerstegen/PLUC_Mozambique

How to cite: de Jong, K., Karssenberg, D., Panja, D., and van Kreveld, M.: Towards a scalable framework for earth science simulation models, using asynchronous many-tasks, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-18749, https://doi.org/10.5194/egusphere-egu2020-18749, 2020.

EGU2020-21390 | Displays | ESSI2.11 | Highlight

In Situ and In Transit Computing for Large Scale Geoscientific Simulation

Sebastian Friedemann, Bruno Raffin, Basile Hector, and Jean-Martial Cohard

In situ and in transit computing is an effective way to place postprocessing and preprocessing tasks for large scale simulations on the high performance computing platform. The resulting proximity between the execution of preprocessing, simulation and postprocessing permits to lower I/O by bypassing slow and energy inefficient persistent storages. This permits to scale workflows consisting of heterogeneous components such as simulation, data analysis and visualization, to modern massively parallel high performance platforms. Reordering the workflow components gives a manifold of new advanced data processing possibilities for research. Thus in situ and in transit computing are vital for advances in the domain of geoscientific simulation which relies on the increasing amount of sensor and simulation data available.

In this talk, different in situ and in transit workflows, especially those that are useful in the field of geoscientific simulation, are discussed. Furthermore our experiences augmenting ParFlow-CLM, a physically based, state-of-the-art, fully coupled water transfer model for the critical zone, with FlowVR, an in situ framework with a strict component paradigm, are presented.
This allows shadowed in situ file writing, in situ online steering and in situ visualization.

In situ frameworks further can be coupled to data assimilation tools.
In the on going EoCoE-II we propose to embed data assimilation codes into an in transit computing environment. This is expected to enable ensemble based data assimilation on continental scale hydrological simulations with multiple thousands of ensemble members.

How to cite: Friedemann, S., Raffin, B., Hector, B., and Cohard, J.-M.: In Situ and In Transit Computing for Large Scale Geoscientific Simulation, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21390, https://doi.org/10.5194/egusphere-egu2020-21390, 2020.

EGU2020-19280 | Displays | ESSI2.11

Filesystem and object storage for climate data analytics in private clouds with OpenStack

Ezequiel Cimadevilla Alvarez, Aida Palacio Hoz, Antonio S. Cofiño, and Alvaro Lopez Garcia

Data analysis in climate science has been traditionally performed in two different environments, local workstations and HPC infrastructures. Local workstations provide a non scalable environment in which data analysis is restricted to small datasets that are previously downloaded. On the other hand, HPC infrastructures provide high computation capabilities by making use of parallel file systems and libraries that allow to scale data analysis. Due to the great increase in the size of the datasets and the need to provide computation environments close to data storage, data providers are evaluating the use of commercial clouds as an alternative for data storage. Examples of commercial clouds are Google Cloud Storage and Amazon S3, although cloud storage is not restricted to commercial clouds since several institutions provide private or hybrid clouds. These providers use systems known as “object storage” in order to provide cloud storage, since they offer great scalability and storage capacity compared to POSIX file systems found in local or HPC infrastructures.

Cloud storage systems, based on object storage, are incompatible with existing libraries and data formats used by climate community to store and analyse data. Legacy libraries and data formats include netCDF and HDF5, which assume the underlying storage is a file system and it’s not an object store. However, new libraries such as Zarr try to solve the problem of storing multidimensional arrays both in file systems and object stores.

In this work we present a private cloud infrastructure built upon OpenStack which provides both file system and object storage. The infrastructure also provides an environment, based on JupyterHub, to perform  remote data analysis, close to the data. This has some advantages from users perspective. First, users are no required to deploy the required software and tools for the analysis. Second, it provides a remote environment where users can perform scalable data analytics. And third, there is no constraint to download huge amounts of data, to users local computer, before running the analysis of the data.

How to cite: Cimadevilla Alvarez, E., Palacio Hoz, A., Cofiño, A. S., and Lopez Garcia, A.: Filesystem and object storage for climate data analytics in private clouds with OpenStack, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-19280, https://doi.org/10.5194/egusphere-egu2020-19280, 2020.

EGU2020-1631 | Displays | ESSI2.11

Chemical speciation in GPU for the parallel resolution of reactive transport problems.

Pablo Gamazo, Lucas Bessone, Julián Ramos, Elena Alvareda, and Pablo Ezzatti

Reactive Transport modelling (RTM) involves the resolution of the partial differential equation that governs the transport of multiple chemical components, and several algebraic equations that account for chemical interactions. Since RTM can be very computational demanding, especially when considering long term and/or large scale scenarios, several effort have been made on the last decade in order to parallelize it. Most works have focused on implementing domain decomposition technics for distributed memory architectures, and also some effort have been made for shared memory architectures. Despite the recent advances on GPU only few works explore this architecture for RTM, and they mainly focused on the implementation of parallel sparse matrix solvers for the component transport. Solving the component transport consumes an important amount of time during simulation, but another time consuming part of RTM is the chemical speciation, a process that has to be performed multiple times during the resolution of each time step over all nodes (or discrete elements of the mesh). Since speciation involves local calculations, it is a priory a very attractive process to parallelize. But, to the author’s knowledge, no work on literature explores chemical speciation parallelization on GPU. One of the reasons behind this might be the fact that the unknowns and the number of chemical equations that act over each node might be different and can dynamically change in time. This can be a drawback for the single instruction multiple data paradigm since it might lead to the resolution of several systems with potentially different sizes all over the domain. In this work we use a general formulation that allows to solve efficiently chemical specialization on GPU. This formulation allows to consider different primary species for each node of the mesh and allows the precipitation of new mineral species and their complete dissolution keeping constant the number of components.

How to cite: Gamazo, P., Bessone, L., Ramos, J., Alvareda, E., and Ezzatti, P.: Chemical speciation in GPU for the parallel resolution of reactive transport problems., EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-1631, https://doi.org/10.5194/egusphere-egu2020-1631, 2020.

EGU2020-1632 | Displays | ESSI2.11

Performance Evaluation of different time schemes for a Nonlinear diffusion equation on multi-core and many core platforms

Lucas Bessone, Pablo Gamazo, Julián Ramos, and Mario Storti

GPU architectures are characterized by the abundant computing capacity in relation to memory bandwich. This makes them very good for solving problems temporaly explicit and with compact spatial discretizations. Most works using GPU focuses on the parallelization of solvers of linear equations generated by the numerical methods. However, to obtain a good performance in numerical applications using GPU it is crucial to work preferably in codes based entirely on GPU. In this work we solve a 3D nonlinear diffusion equation, using finite volume method in cartesian meshes. Two different time schemes are compared, explicit and implicit, considering for the latter, the Newton method and Conjugate Gradient solver for the system of equations. An evaluation is performed in CPU and GPU of each scheme using different metrics to measure performance, accuracy, calculation speed and mesh size. To evaluate the convergence propierties of the different schemes in relation to spatial and temporal discretization, an arbitrary analytical solution is proposed, which satisfies the differential equation by chossing a source term chosen based on it.

How to cite: Bessone, L., Gamazo, P., Ramos, J., and Storti, M.: Performance Evaluation of different time schemes for a Nonlinear diffusion equation on multi-core and many core platforms, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-1632, https://doi.org/10.5194/egusphere-egu2020-1632, 2020.

EGU2020-17708 | Displays | ESSI2.11

A Web-Based Virtual Research Environment for Marine Data

Merret Buurman, Sebastian Mieruch, Alexander Barth, Charles Troupin, Peter Thijsse, Themis Zamani, and Naranyan Krishnan

Like most areas of research, the marine sciences are undergoing an increased use of observational data from a multitude of sensors. As it is cumbersome to download, combine and process the increasing volume of data on the individual researcher's desktop computer, many areas of research turn to web- and cloud-based platforms. In the scope of the SeaDataCloud project, such a platform is being developed together with the EUDAT consortium.

The SeaDataCloud Virtual Research Environment (VRE) is designed to give researchers access to popular processing and visualization tools and to commonly used marine datasets of the SeaDataNet community. Some key aspects such as user authentication, hosting input and output data, are based on EUDAT services, with the perspective of integration into EOSC at a later stage.

The technical infrastructure is provided by five large EUDAT computing centres across Europe, where operational environments are heterogeneous and spatially far apart. The processing tools (pre-existing as desktop versions) are developed by various institutions of the SeaDataNet community. While some of the services interact with users via command line and can comfortably be exposed as JupyterNotebooks, many of them are very visual (e.g. user interaction with a map) and rely heavily on graphical user interfaces.

In this presentation, we will address some of the issues we encountered while building an integrated service out of the individual applications, and present our approaches to deal with them.

Heterogeneity in operational environments and dependencies is easily overcome by using Docker containers. Leveraging processing resources all across Europe is the most challenging part as yet. Containers are easily deployed anywhere in Europe, but the heavy dependence on (potentially shared) input data, and the possibility that the same data may be used by various services at the same time or in quick succession means that data synchronization across Europe has to take place at some point of the process. Designing a synchronization mechanism that does this without conflicts or inconsistencies, or coming up with a distribution scheme that minimizes the synchronization problem is not trivial.

Further issues came up during the adaptation of existing applications for server-based operation. This includes topics such as containerization, user authentication and authorization and other security measures, but also the locking of files, permissions on shared file systems and exploitation of increased hardware resources.

How to cite: Buurman, M., Mieruch, S., Barth, A., Troupin, C., Thijsse, P., Zamani, T., and Krishnan, N.: A Web-Based Virtual Research Environment for Marine Data, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-17708, https://doi.org/10.5194/egusphere-egu2020-17708, 2020.

ICT technologies play an increasing role in almost every aspect of the environmental sciences. The adaption of the new technologies, however, consumes an increasing amount of scientist's time, which they could better spend on their actual research. Not adapting new technologies will lead to biased research, since many researchers are not familiar with the possibilities and methods available through modern technology. This dilemma can only be resolved by close collaboration and scientific partnership between researchers and IT experts. In contrast to traditional IT service provision, IT experts have to understand the scientific problems and methods of the scientists in order to help them to select and adapt suitable services. Furthermore, a sound partnership helps towards good scientific practice, since the IT experts can ensure the reproducibility of the research by professionalizing workflows and applying FAIR data principles. We elaborate on this dilemma with examples from an IT center’s perspective, and sketch a path towards unbiased research and the development of new IT services that are tailored for the scientific community.

How to cite: Frank, A. and Weismüller, J.: Scientific Partnership - a new level of collaboration between environmental scientists and IT specialists, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-5005, https://doi.org/10.5194/egusphere-egu2020-5005, 2020.

EGU2020-8868 | Displays | ESSI2.11

ESM-Tools: A common infrastructure for modular coupled earth system modelling

Dirk Barbi, Nadine Wieters, Luisa Cristini, Paul Gierz, Sara Khosravi, Fatemeh Chegini, Joakim Kjellson, and Sebastian Wahl

Earth system and climate modelling involves the simulation of processes on a large range of scales, and within very different components of the earth system. In practice, component models from different institutes are mostly developed independently, and then combined using a dedicated coupling software.

This procedure not only leads to a wildly growing number of available versions of model components as well as coupled setups, but also to a specific way of obtaining and operating many of these. This can be a challenging problem (and potentially a huge waste of time) especially for unexperienced researchers, or scientists aiming to change to a different model system, e.g. for intercomparisons.

In order to define a standard way of downloading, configuring, compiling and running modular ESMs on a variety of HPC systems, AWI and partner institutions develop and maintain the OpenSource ESM-Tools software (https://www.esm-tools.net). Our aim is to provide standard solutions to typical problems occurring within the workflow of model simulations such as calendar operations, data postprocessing and monitoring, sanity checks, sorting and archiving of output, and script-based coupling (e.g. ice sheet models, isostatic adjustment models). The user only provides a short (30-40 lines) runscript of absolutely necessary experiment specific definitions, while the ESM-Tools execute the phases of a simulation in the correct order. A user-friendly API ensures that more experienced users have full control over each of these phases, and can easily add functionality. A GUI has been developed to provide a more intuitive approach to the modular system, and also to add a graphical overview over the available models and combinations.

Since revision 2 (released on March 19th 2019), the ESM-Tools were entirely re-written, separating the implementation of actions (written in Python 3) from any information that we have, either on models, coupled setups, software tools, HPC systems etc. into nicely structured yaml configuration files. This has been done to reduce maintenance problems, and also to ensure that also unexperienced scientist can easily edit configurations, or even add new models or software without any programming experience. Since revision 3 the ESM-Tools support four ocean models (FESOM1, FESOM2, NEMO, MPIOM), three atmosphere models (ECHAM6, OpenIFS, ICON), two BGC models (HAMOCC, REcoM), an ice sheet (PISM) and an isostatic adjustment model (VILMA) as well as standard settings for five HPC systems. For the future we plan to add interfaces to regional models and soil/hydrology models.

The Tools currently have more than 70 registered users from 5 institutions, and more than 40 authors of contributions to either model configurations or functionality.

How to cite: Barbi, D., Wieters, N., Cristini, L., Gierz, P., Khosravi, S., Chegini, F., Kjellson, J., and Wahl, S.: ESM-Tools: A common infrastructure for modular coupled earth system modelling, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-8868, https://doi.org/10.5194/egusphere-egu2020-8868, 2020.

EGU2020-11080 | Displays | ESSI2.11

Experiments on Machine Learning Techniques for Soil Classification Using Sentinel-2 Products

Victor Bacu, Teodor Stefanut, and Dorian Gorgan

Agricultural management relies on good, comprehensive and reliable information on the environment and, in particular, the characteristics of the soil. The soil composition, humidity and temperature can fluctuate over time, leading to migration of plant crops, changes in the schedule of agricultural work, and the treatment of soil by chemicals. Various techniques are used to monitor soil conditions and agricultural activities but most of them are based on field measurements. Satellite data opens up a wide range of solutions based on higher resolution images (i.e. spatial, spectral and temporal resolution). Due to this high resolution, satellite data requires powerful computing resources and complex algorithms. The need for up-to-date and high-resolution soil maps and direct access to this information in a versatile and convenient manner is essential for pedology and agriculture experts, farmers and soil monitoring organizations.

Unfortunately, the satellite image processing and interpretation are very particular to each area, time and season, and must be calibrated by the real field measurements that are collected periodically. In order to obtain a fairly good accuracy of soil classification at a very high resolution, without using interpolation methods of an insufficient number of measurements, the prediction based on artificial intelligence techniques could be used. The use of machine learning techniques is still largely unexplored, and one of the major challenges is the scalability of the soil classification models toward three main directions: (a) adding new spatial features (i.e. satellite wavelength bands, geospatial parameters, spatial features); (b) scaling from local to global geographical areas; (c) temporal complementarity (i.e. build up the soil description by samples of satellite data acquired along the time, on spring, on summer, in another year, etc.).

The presentation analysis some experiments and highlights the main issues on developing a soil classification model based on Sentinel-2 satellite data, machine learning techniques and high-performance computing infrastructures. The experiments concern mainly on the features and temporal scalability of the soil classification models. The research is carried out using the HORUS platform [1] and the HorusApp application [2], [3], which allows experts to scale the computation over cloud infrastructure.

 

References:

[1] Gorgan D., Rusu T., Bacu V., Stefanut T., Nandra N., “Soil Classification Techniques in Transylvania Area Based on Satellite Data”. World Soils 2019 Conference, 2 - 3 July 2019, ESA-ESRIN, Frascati, Italy (2019).

[2] Bacu V., Stefanut T., Gorgan D., “Building soil classification maps using HorusApp and Sentinel-2 Products”, Proceedings of the Intelligent Computer Communication and Processing Conference – ICCP, in IEEE press (2019).

[3] Bacu V., Stefanut T., Nandra N., Rusu T., Gorgan D., “Soil classification based on Sentinel-2 Products using HorusApp application”, Geophysical Research Abstracts, Vol. 21, EGU2019-15746, 2019, EGU General Assembly (2019).

How to cite: Bacu, V., Stefanut, T., and Gorgan, D.: Experiments on Machine Learning Techniques for Soil Classification Using Sentinel-2 Products , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-11080, https://doi.org/10.5194/egusphere-egu2020-11080, 2020.

EGU2020-11372 | Displays | ESSI2.11 | Highlight

High performance computing of waves, currents and contaminants in rivers and coastal areas of seas on multi-processors systems and GPUs

Maxim Sorokin, Mark Zheleznyak, Sergii Kivva, Pavlo Kolomiets, and Oleksandr Pylypenko

The shallow water flows in coastal areas of seas, rivers and reservoirs are simulated usually by 2-D depth averaged models. However, the needs for fine resolution of the computational grids and large scales of the modeling areas require in practical applications to use the algorithms and hardware of HPC. We present comparison of the computational efficiency of the developed parallel 2-D modeling system COASTOX on CPU based multi-processor systems and GPUs.

The hydrodynamic module of COASTOX is based on nonlinear shallow water equations (SWE), which describe currents and long waves, including tsunami, river flood waves and wake waves, generated by big vessels in shallow coastal areas. The special pressure term in momentum equations depending from the form of the draft of the vessel is used for wave generation by moving vessels. The currents in the marine nearshore areas generated by wind waves are described by the including into the SWE the wave-radiation stress terms. Sediment and pollutant transport are described by the 2-D advection-diffusion equations with the sink-source terms describing sedimentation-erosion and water-bottom contaminate exchange.

Model equations are solved by finite volume method on rectangular grids or unstructured grids with triangular cells. Solution scheme of SWE is Godunov-type, explicit, conservative, has TVD property. Second order in time and space is achieved by Runge-Kutta predictor-corrector method and using different methods for calculating fluxes at predictor and corrector steps. Transport equations schemes are simple upwind and have first order in time and space.

Model parallelized for computations on multi-core CPU systems based on domain decomposition approach with halo boundary structures and message-passing updating. To decompose an unstructured model grid, METIS graph partition library is used. For halo values updating the MPI technology is implemented with using of non-blocking send and receive functions.

For computations on GPU the model is parallelized using OpenACC directive-based programming interface. Numerical schemes of the model are implemented in the form of loops for cells, nodes, faces with independent iterations because of scheme explicitness and locality. So, OpenACC directives inserted in model code specify for compiler the loops that may be computed in parallel.

The efficiency of the developed parallel algorithms is demonstrated for CPU and GPU computing systems by such applications:

  1. Simulation of river flooding of July 2008 extreme flood on Prut river (Ukraine).
  2. Modeling of ship waves caused by tanker passage on the San Jacinto river near Barbours Cut Container Terminal (USA) and loads on moored container ship.
  3. Simulation of the consequences of the breaks of the dikes constructed on the heavy contaminated floodplain of the Pripyat River upstream Chernobyl Nuclear Power Plant.

For parallel performance testing we use Dell 7920 Workstation with 2 Intel Xeon Gold 6230 20 cores processors and NVIDIA Quadro RTX 5000 GPU. We obtain that multi-core computation up to 17.3 times faster than single core with parallel efficiency 43%. And for big computational grid (about or more than a million nodes) GPU faster than single core in 47.5-79.6 times and faster than workstation in 3-4.6 times.

How to cite: Sorokin, M., Zheleznyak, M., Kivva, S., Kolomiets, P., and Pylypenko, O.: High performance computing of waves, currents and contaminants in rivers and coastal areas of seas on multi-processors systems and GPUs, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-11372, https://doi.org/10.5194/egusphere-egu2020-11372, 2020.

EGU2020-13637 | Displays | ESSI2.11

Performance analysis and optimization of a TByte-scale atmospheric observation database

Clara Betancourt, Sabine Schröder, Björn Hagemeier, and Martin Schultz

The Tropospheric Ozone Assessment Report (TOAR) created one of the world’s largest databases for near-surface air quality measurements. More than 150 users from 35 countries have accessed TOAR data via a graphical web interface (https://join.fz-juelich.de) or a REST API (https://join.fz-juelich.de/services/rest/surfacedata/) and downloaded station information and aggregated statistics of ozone and associated variables. All statistics are calculated online from the hourly data that are stored in the database to allow for maximum user flexibility (it is possible, for example, to specify the minimum data capture criterion that shall be used in the aggregation). Thus, it is of paramount importance to measure and, if necessary, optimize the performance of the database and of the web services, which are connected to it. In this work, two aspects of the TOAR database service infrastructure are investigated: Performance enhancements by database tuning and the implementation of flux-based ozone metrics, which – unlike the already existing concentration based metrics – require meteorological data and embedded modeling.

The TOAR database is a PostgreSQL V10 relational database hosted on a virtual machine, connected to the JOIN web server. In the current set-up the web services trigger SQL queries and the resulting raw data are transferred on demand to the JOIN server and processed locally to derive the requested statistical quantities. We tested the following measures to increase the database performance: optimal definition of indices, server-side programming in PL/pgSQL and PL/Python, on-line aggregation to avoid transfer of large data, and query enhancement by the explain-analyze tool of PostgreSQL. Through a combination of the above mentioned techniques, the performance of JOIN can be improved in a range of 20 - 70 %.

Flux-based ozone metrics are necessary for an accurate quantification of ozone damage on vegetation. In contrast to the already available concentration based metrics of ozone, they require the input of meteorological and soil data, as well as a consistent parametrization of vegetation growing seasons and the inclusion of a stomatal flux model. Embedding this model with the TOAR database will make a global assessment of stomatal ozone fluxes possible for the first time ever. This requires new query patterns, which need to merge several variables onto a consistent time axis, as well as more elaborate calculations, which are presently coded in FORTRAN.

The presentation will present the results from the performance tuning and discuss the pros and cons of various ways how the ozone flux calculations can be implemented.

How to cite: Betancourt, C., Schröder, S., Hagemeier, B., and Schultz, M.: Performance analysis and optimization of a TByte-scale atmospheric observation database, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-13637, https://doi.org/10.5194/egusphere-egu2020-13637, 2020.

EGU2020-17031 | Displays | ESSI2.11

A Python-oriented environment for climate experiments at scale in the frame of the European Open Science Cloud

Donatello Elia, Fabrizio Antonio, Cosimo Palazzo, Paola Nassisi, Sofiane Bendoukha, Regina Kwee-Hinzmann, Sandro Fiore, Tobias Weigel, Hannes Thiemann, and Giovanni Aloisio

Scientific data analysis experiments and applications require software capable of handling domain-specific and data-intensive workflows. The increasing volume of scientific data is further exacerbating these data management and analytics challenges, pushing the community towards the definition of novel programming environments for dealing efficiently with complex experiments, while abstracting from the underlying computing infrastructure. 

ECASLab provides a user-friendly data analytics environment to support scientists in their daily research activities, in particular in the climate change domain, by integrating analysis tools with scientific datasets (e.g., from the ESGF data archive) and computing resources (i.e., Cloud and HPC-based). It combines the features of the ENES Climate Analytics Service (ECAS) and the JupyterHub service, with a wide set of scientific libraries from the Python landscape for data manipulation, analysis and visualization. ECASLab is being set up in the frame of the European Open Science Cloud (EOSC) platform - in the EU H2020 EOSC-Hub project - by CMCC (https://ecaslab.cmcc.it/) and DKRZ (https://ecaslab.dkrz.de/), which host two major instances of the environment. 

ECAS, which lies at the heart of ECASLab, enables scientists to perform data analysis experiments on large volumes of multi-dimensional data by providing a workflow-oriented, PID-supported, server-side and distributed computing approach. ECAS consists of multiple components, centered around the Ophidia High Performance Data Analytics framework, which has been integrated with data access and sharing services (e.g., EUDAT B2DROP/B2SHARE, Onedata), along with the EGI federated cloud infrastructure. The integration with JupyterHub provides a convenient interface for scientists to access the ECAS features for the development and execution of experiments, as well as for sharing results (and the experiment/workflow definition itself). ECAS parallel data analytics capabilities can be easily exploited in Jupyter Notebooks (by means of PyOphidia, the Ophidia Python bindings) together with well-known Python modules for processing and for plotting the results on charts and maps (e.g., Dask, Xarray, NumPy, Matplotlib, etc.). ECAS is also one of the compute services made available to climate scientists by the EU H2020 IS-ENES3 project. 

Hence, this integrated environment represents a complete software stack for the design and run of interactive experiments as well as complex and data-intensive workflows. One class of such large-scale workflows, efficiently implemented through the environment resources, refers to multi-model data analysis in the context of both CMIP5 and CMIP6 (i.e., precipitation trend analysis orchestrated in parallel over multiple CMIP-based datasets).

How to cite: Elia, D., Antonio, F., Palazzo, C., Nassisi, P., Bendoukha, S., Kwee-Hinzmann, R., Fiore, S., Weigel, T., Thiemann, H., and Aloisio, G.: A Python-oriented environment for climate experiments at scale in the frame of the European Open Science Cloud, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-17031, https://doi.org/10.5194/egusphere-egu2020-17031, 2020.

EGU2020-17925 | Displays | ESSI2.11

OCRE: the game changer of Cloud and EO commercial services usage by the European research community

José Manuel Delgado Blasco, Antonio Romeo, David Heyns, Natassa Antoniou, and Rob Carrillo

The OCRE project, a H2020 funded by the European Commission, aims to increase the usage of Cloud and EO services by the European research community by putting available EC funds 9,5M euro, aiming to removing the barriers regarding the service discovery and providing services free-at-the-point-of-the-user.

The OCRE project, after one year running, has completed the requirements gathering by the European research community and during Q1 2020 has launched the tenders for the Cloud and EO services.

In the first part of 2020, these tenders will be closed and companies will be awarded to offer the services for which requirements had been collected by the project during 2019. The selection of such services will be based on the requirements gathered during the activities carried out by OCRE in 2019, with online surveys, face2face events, interviews among others. Additionally, OCRE team members had participated in workshops and conferences with the scope of project promotion and increase the awareness of the possibilities offered by OCRE for both research and service providers community.

In 2020, consumption of the services will start, and OCRE will distribute vouchers for individual researchers and institutions via known research organisations, which will evaluate the incoming request and distribute funds from the European Commission regularly.

This presentation will provide an overview or the possibilities offered by OCRE to researchers interested in boosting their activities using commercial cloud services.

How to cite: Delgado Blasco, J. M., Romeo, A., Heyns, D., Antoniou, N., and Carrillo, R.: OCRE: the game changer of Cloud and EO commercial services usage by the European research community, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-17925, https://doi.org/10.5194/egusphere-egu2020-17925, 2020.

EGU2020-20342 | Displays | ESSI2.11

CONDE: Climate simulation ON DEmand using HPCaaS

Diego A. Pérez Montes, Juan A. Añel, and Javier Rodeiro

CONDE (Climate simulation ON DEmand) is the final result of our work and research about climate and meteorological simulations over an HPC as a Service (HPCaaS) model. On our architecture we run very large climate ensemble simulations using a, adapted, WRF version that is executed on-demand and that can be deployed over different Cloud Computing environments (like Amazon Web Services, Microsoft Azure or Google Cloud) and that uses BOINC as middleware for the tasks execution and results gathering. Here, we also present as well some basic examples of applications and experiments to verify that the simulations ran in our system are correct and show valid results. 

How to cite: Pérez Montes, D. A., Añel, J. A., and Rodeiro, J.: CONDE: Climate simulation ON DEmand using HPCaaS, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-20342, https://doi.org/10.5194/egusphere-egu2020-20342, 2020.

ESSI2.12 – MATLAB-based programs, applications and technical resources for Geoscience Research

EGU2020-2479 | Displays | ESSI2.12

MATLAB tools for the post-processing of GRACE temporal gravity field solutions

Dimitrios Piretzidis and Michael Sideris

We present a collection of MATLAB tools for the post-processing of temporal gravity field solutions from the Gravity Recovery and Climate Experiment (GRACE) satellite mission. GRACE final products are in the form of monthly sets of spherical harmonic coefficients and have been extensively used by the scientific community to study the land surface mass redistribution that is predominantly due to ice melting, glacial isostatic adjustment, seismic activity and hydrological phenomena. Since the launch of GRACE satellites, a substantial effort has been made to develop processing strategies and improve the surface mass change estimates.

The MATAB software presented in this work is developed and used by the Gravity and Earth Observation group at the department of Geomatics Engineering, University of Calgary. A variety of techniques and tools for the processing of GRACE data are implemented, tested and analyzed. Some of the software capabilities are: filtering of GRACE data using decorrelation and smoothing techniques, conversion of gravity changes into mass changes on the Earth’s spherical, ellipsoidal and topographical surface, implementation of forward modeling techniques for the estimation and removal of long-term trends due to ice mass melting, basin-specific spatial averaging in the spatial and spectral domain, time series smoothing and decomposition techniques, and data visualization.

All tools use different levels of parameterization in order to assist both expert users and non-specialists. Such a software makes the comparison between different GRACE processing methods and parameters used easier, leading to optimal strategies for the estimation of surface mass changes and to the standardization of GRACE data post-processing. It could also facilitate the use of GRACE data to non-geodesists.

How to cite: Piretzidis, D. and Sideris, M.: MATLAB tools for the post-processing of GRACE temporal gravity field solutions, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-2479, https://doi.org/10.5194/egusphere-egu2020-2479, 2020.

Geoscientists from the University of Potsdam reconstruct environmental changes in East Africa over the past five million years. Micro-organisms such as diatoms and rotifers, clay minerals and pollen, thousands of years old, help to reconstruct large lakes and braided rivers, dense forests and hot deserts, high mountains and deep valleys. This is the habitat of our ancestors, members of a complicated family tree or network, of which only one single species, Homo sapiens, has survived. MATLAB is the tool of choice for analyzing these complicated and extensive data sets, extracted from up to 300 m long drill cores, from satellite images, and from the fossil remains of humans and other animals. The software is used to analyze to detect and classify important climate transitions in climate time series, to detect objects and quantify materials in microscope and satellite imagery, to predict river networks from digital terrain models, and to model lake-level fluctuations from environmental data. The advantage of MATLAB is the use of multiple methods with one single tool. Not least because of this, the software is also becoming increasingly popular in Africa, as shown by the program of an international summer school series in Africa and Germany for collecting, processing, and presenting geo-bio-information.

How to cite: Trauth, M. H.: Dust storms, blackouts and 50°C in the shade: Exploring the Roots of Humankind with MATLAB, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-4803, https://doi.org/10.5194/egusphere-egu2020-4803, 2020.

Middle Miocene sediments are the most important productive oil zone in the Sidri Member within Belayim oil field. The Belayim oil field is one of well-known oil fields in Egypt, which is located on the eastern side of the Gulf of Suez. The Sidri Member consists of shales, sandstones and limestone with net pay thickness ranges from 5 to 60 m. The oil saturated sandstone layers are coarse grained and poorly sorted, which are classified into sub-litharenite, lithic arkose and arkose microfacies with several diagenetic features. This study measured and collected petrophysical data from the sandstone core samples and well logging of drilling sites to evaluate oil potentiality and reservoir characteristics of the Sidri Member. The collected petrophysical data are porosity, permeability, water and oil saturation, resistivity and grain and bulk density. MATLAB tools were used to analyze the extensive dataset, quantify the correlation trends and visualize the spatial distribution. The porosity values range from 2% to 30%, which show very good positive correlation with horizontal permeability (0 to 1,300 md). The porosity as well as type and radius of pore throats present important relationship with permeability and fluid saturation. The petrophysical characteristics of the Sidri sandstones are controlled by the depositional texture, clay-rich matrix and diagenetic features. This study distinguished poorly, fairly, good to excellent reservoir intervals in the Sidri Member. The best quality reservoir potentiality is recorded in the well sorted sand layers with little clay matrix in the lower part of the Sidri Member. The petrophysical characteristics are high porosity (20% to 30%), high permeability (140 to 1250 md), high oil saturation (20% to 78%), low water saturation (13% to 36%), moderate to high resistivity and relatively low grain density. The hydrocarbon production rates reported from the Sidri reservoirs are greatly correlated with the petrophysical characteristics described in this study.

How to cite: Fathy, D. and Lee, E. Y.: Petrophysical data analysis using MATLAB tools for the middle Miocene sediments in the Gulf of Suez, Egypt, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-7253, https://doi.org/10.5194/egusphere-egu2020-7253, 2020.

EGU2020-7368 | Displays | ESSI2.12

TETGAR_C: a three-dimensional (3D) provenance plot and calculation tool for detrital garnets

Wolfgang Knierzinger, Michael Wagreich, and Eun Young Lee

We present a new interactive MATLAB-based visualization and calculation tool (TETGAR_C) for assessing the provenance of detrital garnets in a four-component (tetrahedral) plot system (almandine–pyrope–grossular–spessartine). The chemistry of more than 2,600 garnet samples was evaluated and used to create various subfields in the tetrahedron that correspond to calc-silicate rocks, felsic igneous rocks (granites and pegmatites) as well as metasedimentary and metaigneous rocks of various metamorphic grades. These subfields act as reference structures facilitating assignments of garnet chemistries to source lithologies. An integrated function calculates whether a point is located in a subfield or not. Moreover, TETGAR_C determines the distance to the closest subfield. Compared with conventional ternary garnet discrimination diagrams, this provenance tool enables a more accurate assessment of potential source rocks by reducing the overlap of specific subfields and offering quantitative testing of garnet compositions. In particular, a much clearer distinction between garnets from greenschist-facies rocks, amphibolite-facies rocks, blueschist-facies rocks and felsic igneous rocks is achieved. Moreover, TETGAR_C enables a distinction between metaigenous and metasedimentary garnet grains. In general, metaigneous garnet tends to have higher grossular content than metasedimentary garnet formed under similar P–T conditions.

How to cite: Knierzinger, W., Wagreich, M., and Lee, E. Y.: TETGAR_C: a three-dimensional (3D) provenance plot and calculation tool for detrital garnets, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-7368, https://doi.org/10.5194/egusphere-egu2020-7368, 2020.

This study quantifies compaction trends of Jurassic-Quaternary sedimentary units in the Perth Basin, and applies the trends to reconstruct the sedimentation and subsidence history with 2D and 3D models. BasinVis 2.0, a MATLAB-based program, as well as MATLAB 3D surface plotting functions, ‘Symbolic Math’ and ‘Curve Fitting’ toolboxes are used to analyze well data. The data were collected from fourteen industry wells and IODP Site U1459 in a study area (200x70 km2) on an offshore part of the basin, which were arranged for four successive stratigraphic units; Cattamarra, Cadda, Yarragadee, and post-breakup sequences. The Perth Basin is a large north-south elongated sedimentary basin extending offshore and onshore along the rifted continental margin of southwestern Australia. It is a relatively under-explored region, despite being an established hydrocarbon producing basin. The basin has developed by multiple episodes of rifting, drifting and breakup of Greater Indian, Australian and Antarctic plates since the Permian. The basin consists of faulted structures, which are filled by Late Paleozoic to Cenozoic sedimentary rocks and sediments. After deltaic-fluvial and shallow marine deposition until early Cretaceous time, carbonate sedimentation has prevailed in the basin, which is related to the post-rift subsidence and the long-term northward drift of the Australian plate.

High-resolution porosity data of Site U1459 and well Houtman-1 were examined to estimate best fitting compaction trends with linear, single- and two-term exponential equations. In the compaction trend plot of Site U1459 (post-breakup Cenozoic carbonates), the linear and single-term exponential trends are relatively alike, while the two-term exponential trend has abrupt change near seafloor due to highly varying porosity. The compaction trends at well Houtman-1 (Jurassic sandstones) are alike in the estimated interval, however initial porosities are quite low and different. In the compilation plot of the two wells, the two-term exponential trend presents better the porosity distribution, by adopting a trend change as estimation overfitting, by the lithologic transition from carbonates to sandstones. The abrupt trend change suggests that the multiple piece-wise compaction trend is suitable for the Perth Basin. The compaction trends are used to quantify the sedimentation profile and subsidence curves at Site U1459. 2D and 3D models of unit thickness, sedimentation rate and subsidence of the study area are reconstructed by applying the exponential trend to the stratigraphic data of industry wells. The models are visualized using the Ordinary Kriging spatial interpolation. The results allow us to compare differences between compacted (present) and decompacted (original) units through depth and age. The compaction trend has an impact on thickness restoration as well as subsidence analysis. The differences become larger with increasing depth due to the rising compaction effect during burial. Other factors can deviate the compaction trend further through age. This phenomenon highlights the fact that the restoration of largely compacted (usually deeper or older) layers is crucial to reconstruct sedimentation systems and basin evolution. This has often been underestimated in academic and industry fields. This study suggests that researchers apply the appropriate compaction trend estimated from on-site data for basin reconstruction and modelling.

How to cite: Lee, E. Y.: Quantitative analysis for compaction trend and basin reconstruction of the Perth Basin, Australia: Limitations, uncertainties and requirements, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-8825, https://doi.org/10.5194/egusphere-egu2020-8825, 2020.

The presented work focuses on disaster risk management of cities which are prone to natural hazards. Based on aerial imagery captured by drones of regions in Caribbean islands, we show how to process and automatically identify roof material of individual structures using a deep learning model. Deep learning refers to a machine learning technique using deep artificial neural networks. Unlike other techniques, deep learning does not necessarily require feature engineering but may process raw data directly. The outcome of this assessment can be used for steering risk mitigations measures, creating risk hazard maps or advising municipal bodies or help organizations on investing their resources in rebuilding reinforcements. Data at hand consists of images in BigTIFF format and GeoJSON files including the building footprint, unique building ID and roof material labels. We demonstrate how to use MATLAB and its toolboxes for processing large image files that do not fit in computer memory. Based on this, we perform the training of a deep learning model to classify roof material present in the images. We achieve this by subjecting a pretrained ResNet-18 neural network to transfer learning. Training is further accelerated by means of GPU computing. The accuracy computed from a validation data set achieved by this baseline model is 74%. Further tuning of hyperparameters is expected to improve accuracy significantly.

How to cite: Bomberg, S. and Goel, N.: Supporting risk management in the Caribbean by application of Deep Learning for object classification of aerial imagery with MATLAB, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-9232, https://doi.org/10.5194/egusphere-egu2020-9232, 2020.

Geoscience is a highly interdisciplinary field of study, drawing upon conclusions from physics, chemistry and many other academic disciplines. Over the course of the last decades, computer science has become an integral component of geoscientific research. This coincides with the rising popularity of the open-source movement, which helped to develop better tools for collaboration on complex software projects across physical distances and academic boundaries.

However, while the technical frameworks supporting interdisciplinary work between geoscience and computer science exist, there are still several hurdles one must take in order to achieve successful collaborations. This work summarizes the lessons learned from the development of BasinVis from the perspective of a computer science collaborator. BasinVis is a modular open-source application that aims to allow geoscientists to analyze and visualize sedimentary basins in a comprehensive workflow. A particular development goal was to introduce the advances of 2D and 3D visualization techniques to the quantitative analysis of the stratigraphic setting and subsidence of sedimentary basins based on well data and/or stratigraphic profiles.

Development of BasinVis started in 2013 with its first release as a MATLAB GUI application in 2016. Apart from functionality, one of the major problems to solve in this period was the alignment of research goals and methodology, which may diverge greatly between geoscience and computer science. Examples of this would be to clarify the scientific terminologies of each fields early on and to clearly establish the expected results of the application in terms of mathematical accuracy and uncertainty (a concept that may catch computer scientists off guard).

How to cite: Novotny, J.: The Development of BasinVis: Lessons learned from an open-source collaboration of geoscience and computer science, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-15695, https://doi.org/10.5194/egusphere-egu2020-15695, 2020.

We report a MATLAB code for the stochastic optimization of in situ horizontal stress magnitudes from wellbore wall image and sonic logging data in a vertical borehole. In undeformed sedimentary formations, one of the principal stresses is commonly assumed to be vertical, and its magnitude (σv) is simply related to the gravitational overburden. The two horizontal far-field principal stresses (σH and σh) are then theoretically constrained by the relationship between the breakout width (or angular span) and rock compressive strength at a given depth. However, the deterministic relationship yields indeterminate solutions for the two unknown stresses. Instead of using the deterministic relationship between their average values in an interval of borehole, we introduce probabilistic distributions of rock strength and breakout width in the interval. This method optimizes the complete set of in situ principal stresses (σH, σh, and σv) by minimizing the objective function. For the rock failure model, we use a true triaxial failure criterion referred to as the modified Wiebols and Cook criterion that incorporates all three principal stresses. This criterion is expressed in the form of an implicit function with two equation parameters; the uniaxial compressive strength UCS and the internal friction coefficient μ. The Weibull distribution model of UCS in a borehole section (~30 m interval) is obtained from the wellbore sonic logging data using the relation between UCS and P-wave velocity. The value of μ is assumed to be constant at 0.6 based on a previous experimental study. The breakout model is established based on the probabilistic distribution of rock strength at the margins of the breakout for a uniform set of far-field stresses. The inverse problem is solved with a MATLAB algorithm for the optimization by choosing the best-fit set of far-field stresses in a stress polygon. This process also enables one to evaluate the statistical reliability in terms of sensitivity and uncertainty. The stochastic optimization process is demonstrated using borehole images and sonic logging data obtained from the Integrated Ocean Drilling Program (IODP) Hole C0002A, a vertical hole near the seaward margin of the Kumano basin offshore from the Kii Peninsula, southwest Japan.

How to cite: Song, I. and Chang, C.: Stochastic optimization using MATLAB code for the determination of in situ horizontal stress magnitudes from wellbore logging data, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-17594, https://doi.org/10.5194/egusphere-egu2020-17594, 2020.

EGU2020-18878 | Displays | ESSI2.12 | Highlight

From ZMAP to ZMAP7: Fast-forwarding 25 years of software evolution

Celso Reyes and Stefan Wiemer

Science is a transient discipline. Research priorities change with funding, students come and go, and faculty moves on. What remains are the published and unpublished articles, as well as the data and code they depend upon. Few projects outlive their immediate use, and even projects with thousands of hours of investment grow stagnant and irrelevant, causing confusion and draining time from the following generation of researchers.

However, a moderate investment in maintenance may save many hours of headache down the road. As a case study, I present practical lessons from the recent overhaul of ZMAP v6 to ZMAP7. ZMAP is a set of MATLAB tools driven by a graphical user interface, designed to help seismologists analyze catalog data. It debuted in 1994 as a collection of scripts written in MATLAB 4. The last official update was 7 years later, with a formal release of ZMAP v6 (in 2001). This version was the agglomeration of code written by a host of scientists and scientists-in-training as they tackled their various independent projects. In a way, ZMAP is already a success story, having survived 26 years of alternate development and stagnation, and is still in use around the world. Dozens of research papers have used ZMAP through time, with the 2001 publication having 825 google scholar citations.

With the release of MATLAB R2014b, changes to the graphics engine had rendered ZMAP 6 largely unusable. Over the interim, not only had both the MATLAB language and toolbox changed significantly, but so had common programming practices. The ZMAP7 project started as a “simple” graphical retrofit, but has evolved to leverage modern MATLAB’s new capabilities and updated toolboxes. Targeting a recent version of MATLAB (R2018a) while foregoing backwards language compatibility opened up a wide array of tools for use, and also extended the expected lifetime of ZMAP. A subset of techniques employed follows:

All changes were tracked in Git, providing snapshots and a safety net while avoiding a proliferation of folders. Code was simultaneously changed across the entire project using regular expressions. Variables were renamed according to their purpose. Unreachable files were removed to reduce the maintenance burden. Scripts and global variables were transformed to functions and classes, providing robust error checking and improving readability. Quoted scripts were extracted into functions where MATLAB itself could help evaluate their correctness. Wrapper functions allowed for global behavior changes and instrumentation. Time-consuming activities, such as determining UI placement for dialog boxes were automated.

Though irregular updates still occur, approximately 2000 hours have been devoted to the project, or one year of work by a full-time employee. This time may be amortized through both application speed-up and reliability for researchers across the globe and transparency/reproducibility provided by open-source, version-controlled code. ZMAP7 is hosted on GitHub, where the community is welcome to keep up with the latest developments. More information about Zmap7 can be accessed from the main SED page: http://www.seismo.ethz.ch/en/research-and-teaching/products-software/software/ZMAP/.

How to cite: Reyes, C. and Wiemer, S.: From ZMAP to ZMAP7: Fast-forwarding 25 years of software evolution, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-18878, https://doi.org/10.5194/egusphere-egu2020-18878, 2020.

EGU2020-19260 | Displays | ESSI2.12 | Highlight

The TopoToolbox v2.4: new tools for topographic analysis and modelling

Dirk Scherler and Wolfgang Schwanghart

The TopoToolbox v2 (TT2; available at https://github.com/wschwanghart/topotoolbox) (Schwanghart and Scherler, 2014) is a set of functions for the analysis of digital elevation models (DEM) in the MATLAB programming environment. Its functionality is mainly developed along the lines of hydrological and geomorphic terrain analysis, complemented with a wide range of functions for visual display, including a class for swath profiles. Fast and efficient algorithms in TopoToolbox form the backbone of the numerical landscape evolution model TTLEM (Campforts et al., 2017). In this presentation, we will demonstrate new functionalities that are part of the upcoming release v 2.4: DIVIDEobj and PPS.

DIVIDEobj is a numerical class to store, analyze and visualize drainage divide networks. Drainage networks are derived from flow directions and a stream network. We will present the extraction and analysis of the drainage divide network of the Big Tujunga catchment, CA, to illustrate it functionality and associated analysis tools. PPS is a class to explore, analyze and model spatial point processes on or alongside river networks. Specifically, PPS provides access to a set of statistical tools to work with inhomogeneous Poisson point processes that facilitate the statistical modelling of phenomena such as river bank failures, landslide dams, or wood jams at the regional scale.

Campforts, B., Schwanghart, W., and Govers, G.: Accurate simulation of transient landscape evolution by eliminating numerical diffusion: the TTLEM 1.0 model, Earth Surface Dynamics, 5, 47-66. https://doi.org/10.5194/esurf-5-47-2017, 2017.

Schwanghart, W., and Scherler, D.: Short Communication: TopoToolbox 2 – MATLAB-based software for topographic analysis and modeling in Earth surface sciences, Earth Surf. Dynam., 2, 1-7, https://doi.org/10.5194/esurf-2-1-2014, 2014.

How to cite: Scherler, D. and Schwanghart, W.: The TopoToolbox v2.4: new tools for topographic analysis and modelling, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-19260, https://doi.org/10.5194/egusphere-egu2020-19260, 2020.

EGU2020-20519 | Displays | ESSI2.12 | Highlight

Examples of DAFNE application to multi-temporal and multi-frequency remote sensed images and geomorphic data for accurate flood mapping.

Annarita D'Addabbo, Alberto Refice, Francesco Lovergine, and Guido Pasquariello

DAFNE(Data Fusion by Bayesian Network) is a Matlab-based open source toolbox, conceived to produce flood maps from remotely sensed and other ancillary information, through a data fusion approach [1]. It is based on Bayesian Networks and it is composed of five modules, which can be easily modified or upgraded to meet different user needs. DAFNE provides, as output products, probabilistic flood maps, i.e., for each pixel in a given output map, the probability value that the corresponding area has been reached from the inundation is reported. Moreover, if remote sensed images have been acquired in different days during a flood event, DAFNE allows to follow the inundation temporal evolution.

It is well known that flood scenarios are typical examples of complex situations in which different factors have to be considered to provide accurate and robust interpretation of the situation on the ground [2]. In particular, the combined analysis of multi-temporal and multi-frequency SAR intensity and coherence trends, together with optical data and other ancillary information, can be particularly useful to map flooded area, characterized by different land cover and land use [3]. Here a recent upgrade is presented that allows to consider as input data multi-frequency SAR intensity images, such as X-band, C-band and L-band images.

Three different inundation events have been considered as applicative examples: for each one, multi-temporal probabilistic flood maps have been produced by combining multi-temporal and multi-frequency SAR intensity images images (such as COSMO-SkyMed , Sentinel-1 images and ALOS 2 images), InSAR coherence and optical data (such as Landsat 5 images or High Resolution images), together with geomorphic and other ground information. Experimental results show good capabilities of producing accurate flood maps with computational times compatible with a near real time application.

 

[1] A. D’Addabbo, A. Refice, F. Lovergine, G. Pasquariello, DAFNE: A Matlab toolbox for Bayesian multi-source remote sensing and ancillary data fusion, with application to flood mapping. Computer and Geoscience 112 (2018), 64-75.

[2] A. Refice et al, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 7, no. 7, pp. 2711–2722, 2014.

[3] A. D’Addabbo et al., “A Bayesian Network for Flood Detection combining SAR Imagery and Ancillary Data,” IEEE Transactions on Geoscience and Remote Sensing, vol.54, n.6, pp.3612-3625, 2016.

 

How to cite: D'Addabbo, A., Refice, A., Lovergine, F., and Pasquariello, G.: Examples of DAFNE application to multi-temporal and multi-frequency remote sensed images and geomorphic data for accurate flood mapping., EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-20519, https://doi.org/10.5194/egusphere-egu2020-20519, 2020.

EGU2020-22607 | Displays | ESSI2.12

GLImER: a MATLAB-based tool to image global lithospheric structure

Stéphane Rondenay, Lucas Sawade, and Peter Makus

Project GLImER (Global Lithospheric Imagining using Earthquake Recordings) aims to conduct a global survey of lithospheric interfaces using converted teleseismic body waves. Data from permanent and temporary seismic networks worldwide are processed automatically to produce global maps of key interfaces (crust-mantle boundary, intra-lithospheric interfaces, lithosphere-asthenosphere boundary). In this presentation, we reflect on the challenges associated with automating the analysis of converted waves and the potential of the resulting data products to be used in novel imaging approaches. A large part of the analysis and the visualization are carried out via MATLAB-based applications. The main steps of the workflow include signal processing for quality control of the input data and earthquake source normalization, mapping of the data to depth for image generation, and interactive 2-D/3-D plotting for visualization. We discuss how these various tools, especially the visualization ones, can be used for both research and education purposes.

How to cite: Rondenay, S., Sawade, L., and Makus, P.: GLImER: a MATLAB-based tool to image global lithospheric structure, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-22607, https://doi.org/10.5194/egusphere-egu2020-22607, 2020.

EGU2020-22668 | Displays | ESSI2.12

Advances in Scientific Visualization of Geospatial Data with MATLAB

Lisa Kempler and Steve Schäfer

Data visualization plays an essential role in conveying complex relationships in real-world physical systems. As geoscience and atmospheric data quantities and data sources have increased, so, too, have the corresponding capabilities in MATLAB and Mapping Toolbox for analyze and visualize them. The talk will present end-to-end geospatial analysis workflows, from data access to visualization to publication. The talk will include software demonstrations of how to process and visualize large out-of-memory data, including accessing remote and cloud-based file systems; working with multiple data formats and sources, such as Web Map Service (WMS), for visualizing of publicly accessible geospatial information; and applying new 2-D and 3-D high resolution mapping visualizations. MATLAB live notebooks combining code, pictures, graphics, and equations will show the use of interactive notebooks for capturing, teaching, and research sharing.

How to cite: Kempler, L. and Schäfer, S.: Advances in Scientific Visualization of Geospatial Data with MATLAB, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-22668, https://doi.org/10.5194/egusphere-egu2020-22668, 2020.

ESSI2.19 – Management and integration of environmental observation data

EGU2020-17516 | Displays | ESSI2.19

MOSAiC goes O2A - Arctic Expedition Data Flow from Observations to Archives

Antonia Immerz and Angela Schaefer and the AWI Data Centre MOSAiC Team

During the largest polar expedition in history starting in September 2019, the German research icebreaker Polarstern spends a whole year drifting with the ice through the Arctic Ocean. The MOSAiC expedition takes the closest look ever at the Arctic even throughout the polar winter to gain fundamental insights and most unique on-site data for a better understanding of global climate change. Hundreds of researchers from 20 countries are involved. Scientists will use the in situ gathered data instantaneously in near-real time modus as well as long afterwards all around the globe taking climate research to a completely new level. Hence, proper data management, sampling strategies beforehand, and monitoring actual data flow as well as processing, analysis and sharing of data during and long after the MOSAiC expedition are the most essential tools for scientific gain and progress.

To prepare for that challenge we adapted and integrated the research data management framework O2A “Data flow from Observations to Archives” to the needs of the MOSAiC expedition on board Polarstern as well as on land for data storage and access at the Alfred Wegener Institute Computing and Data Center in Bremerhaven, Germany. Our O2A-framework assembles a modular research infrastructure comprising a collection of tools and services. These components allow researchers to register all necessary sensor metadata beforehand linked to automatized data ingestion and to ensure and monitor data flow as well as to process, analyze, and publish data to turn the most valuable and uniquely gained arctic data into scientific outcomes. The framework further allows for the integration of data obtained with discrete sampling devices into the data flow.

These requirements have led us to adapt the generic and cost-effective framework O2A to enable, control, and access the flow of sensor observations to archives in a cloud-like infrastructure on board Polarstern and later on to land based repositories for international availability.

Major roadblocks of the MOSAiC-O2A data flow framework are (i) the increasing number and complexity of research platforms, devices, and sensors, (ii) the heterogeneous interdisciplinary driven requirements towards, e. g., satellite data, sensor monitoring, in situ sample collection, quality assessment and control, processing, analysis and visualization, and (iii) the demand for near real time analyses on board as well as on land with limited satellite bandwidth.

The key modules of O2A's digital research infrastructure established by AWI are implementing the FAIR principles:

  • SENSORWeb, to register sensor applications and sampling devices and capture controlled meta data before and alongside any measurements in the field
  • Data ingest, allowing researchers to feed data into storage systems and processing pipelines in a prepared and documented way, at best in controlled near real-time data streams
  • Dashboards allowing researchers to find and access data and share and collaborate among partners
  • Workspace enabling researchers to access and use data with research software utilizing a cloud-based virtualized infrastructure that allows researchers to analyze massive amounts of data on the spot
  • Archiving and publishing data via repositories and Digital Object Identifiers (DOI)

How to cite: Immerz, A. and Schaefer, A. and the AWI Data Centre MOSAiC Team: MOSAiC goes O2A - Arctic Expedition Data Flow from Observations to Archives, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-17516, https://doi.org/10.5194/egusphere-egu2020-17516, 2020.

EGU2020-10431 | Displays | ESSI2.19

Implementing a new data acquisition system for the advanced integrated atmospheric observation system KITcube

Martin Kohler, Mahnaz Fekri, Andreas Wieser, and Jan Handwerker

KITcube (Kalthoff et al, 2013) is a mobile advanced integrated observation system for the measurement of meteorological processes within a volume of 10x10x10 km3. A large variety of different instruments from in-situ sensors to scanning remote sensing devices are deployed during campaigns. The simultaneous operation and real time instrument control needed for maximum instrument synergy requires a real-time data management designed to cover the various user needs: Save data acquisition, fast loading, compressed storage, easy data access, monitoring and data exchange. Large volumes of data such as raw and semi-processed data of various data types, from simple ASCII time series to high frequency multi-dimensional binary data provide abundant information, but makes the integration and efficient management of such data volumes to a challenge.
Our data processing architecture is based on open source technologies and involves the following five sections: 1) Transferring: Data and metadata collected during a campaign are stored on a file server. 2) Populating the database: A relational database is used for time series data and a hybrid database model for very large, complex, unstructured data. 3) Quality control: Automated checks for data acceptance and data consistency. 4) Monitoring: Data visualization in a web-application. 5) Data exchange: Allows the exchange of observation data and metadata in specified data formats with external users.
The implemented data architecture and workflow is illustrated in this presentation using data from the MOSES project (http://moses.eskp.de/home).

References:

KITcube - A mobile observation platform for convection studies deployed during HyMeX .
Kalthoff, N.; Adler, B.; Wieser, A.; Kohler, M.; Träumner, K.; Handwerker, J.; Corsmeier, U.; Khodayar, S.; Lambert, D.; Kopmann, A.; Kunka, N.; Dick, G.; Ramatschi, M.; Wickert, J.; Kottmeier, C.
2013. Meteorologische Zeitschrift, 22 (6), 633–647. doi:10.1127/0941-2948/2013/0542 

How to cite: Kohler, M., Fekri, M., Wieser, A., and Handwerker, J.: Implementing a new data acquisition system for the advanced integrated atmospheric observation system KITcube, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10431, https://doi.org/10.5194/egusphere-egu2020-10431, 2020.

EGU2020-3708 | Displays | ESSI2.19

Implementing FAIR principles for dissemination of data from the French OZCAR Critical Observatory network: the Theia/OZCAR information system

Isabelle Braud, Véronique Chaffard, Charly Coussot, Sylvie Galle, and Rémi Cailletaud

OZCAR-RI, the French Critical Zone Research Infrastructure gathers 20 observatories sampling various compartments of the Critical Zone, and having historically developed their own data management and distribution systems. However, these efforts have generally been conducted independently. This has led to a very heterogeneous situation, with different levels of development and maturity of the systems and a general lack of visibility of data from the entire OZCAR-RI community. To overcome this difficulty, a common Information System (Theia/OZCAR IS) was built to make these in situ observation FAIR (Findable, Accessible, Interoperable, Reusable). The IS will allow the data to be visible in the European eLTER-RI (European Long Term Ecosystem Research) Research Infrastructure to which OZCAR-RI contributes.

The IS architecture was designed after consultation of the users, data producers and IT teams involved in data management. A common data model including all the requested information and based on several metadata standards was defined to set up information fluxes between observatories IS and the Theia/OZCAR IS. Controlled vocabularies were defined to develop a data discovery web portal offering a faceted search with various criteria, including variables names and categories that were harmonized in a thesaurus published on the web. The communication will describe the IS architecture, the pivot data model and open source solutions used to implement the data portal that allows data discovery. The communication will also present future steps to implement data downloading and interoperability services that will allow a full implementation of these FAIR principles.

How to cite: Braud, I., Chaffard, V., Coussot, C., Galle, S., and Cailletaud, R.: Implementing FAIR principles for dissemination of data from the French OZCAR Critical Observatory network: the Theia/OZCAR information system, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-3708, https://doi.org/10.5194/egusphere-egu2020-3708, 2020.

EGU2020-5210 | Displays | ESSI2.19

Solutions for providing web-accessible, semi-standardised ecosystem research site information

Christoph Wohner, Johannes Peterseil, Tomáš Kliment, and Doron Goldfarb

There are a number of systems dedicated to the storage of information about ecosystem research sites, often used for the management of such facilities within research networks or research infrastructures. If such systems provide interfaces for querying this information, these interfaces and especially their data formats may vary greatly with no established data format standard to follow.

DEIMS-SDR (Dynamic Ecological Information Management System - Site and Dataset Registry; https://deims.org) is one such service that allows registering and discovering long-term ecosystem research sites, along with the data gathered at those sites and networks associated with them. We present our approach to make the hosted information openly available via a REST-API. While this allows flexibility in the way information is structured, it also follows interoperability standards and specifications that provide clear rules on how to parse this information.

The REST-API follows the OpenAPI 3.0 specification, including the usage of JSON schemas for describing the exact structure of available records. In addition, DEIMS-SDR also issues persistent, unique and resolvable identifiers for sites independent of the affiliation with research infrastructures or networks.

The flexible design of the DEIMS-SDR data model and the underlying REST-API based approach provide a low threshold for incorporating information from other research domains within the platform itself as well as integrating its exposed metadata with third party information through external means.

How to cite: Wohner, C., Peterseil, J., Kliment, T., and Goldfarb, D.: Solutions for providing web-accessible, semi-standardised ecosystem research site information, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-5210, https://doi.org/10.5194/egusphere-egu2020-5210, 2020.

EGU2020-8671 | Displays | ESSI2.19

Put your models in the web - less painful

Nils Brinckmann, Massimiliano Pittore, Matthias Rüster, Benjamin Proß, and Juan Camilo Gomez-Zapata

Today's Earth-related scientific questions are more complex and more interdisciplinary than ever, so much that is extremely challenging for single-domain experts to master all different aspects of the problem at once. As a consequence, modular and distributed frameworks are increasingly gaining momentum, since they allow the collaborative development of complex, multidisciplinary processing solutions.

A technical implementation focus on the use of modern web technologies with their broad variety of standards, protocols and available development frameworks. RESTful services - one of the main drivers of the modern web - are often sub optimal for the implementation of complex scientific processing solutions. In fact, while they offer great flexibility, they also tend to be bound to very specific formats (and often poorly documented).

With the introduction of the Web Processing Service (WPS) specifications, the Open Geospatial Consortium (OGC) proposed a standard for the implementation of a new generation of computing modules overcoming most of the drawbacks of the RESTful approach. The WPS allow a flexible and reliable specification of input and output formats as well as the exploration of the services´capabilities with the GetCapabilities and DescribeProcess operations.

The main drawback of the WPS approach with respect to RESTful services is that the latter can be easily implemented for any programming language, while the efficient integration of WPS is currently mostly relying on Java, C and Python implementations. In the framework of Earth Science Research we are often confronted with a plethora of programming languages and coding environments. Converting already existing complex scientific programs into a language suitable for WPS integration can be a daunting effort and may even result in additional errors being introduced due to conflicts and misunderstandings between the original code authors and the developers working on the WPS integration. Also the maintenance of these hybrid processing components is often very difficult since most scientists are not familiar with web programming technologies and conversely the web developers cannot (or do not have the time to) get adequately acquainted with the underlying science.

Facing these problems in the context of the RIESGOS project we developed a framework for a Java-based WPS server able to run any kind of scientific code scripts or command line programs. The proposed approach is based on the use of Docker containers encapsulating the running processes, and Docker images to manage all necessary dependencies.

A simple set of ASCII configuration files provides all information needed for WPS integration: how to call the program, how to give input parameters - including command line arguments and input files - and how to interpret the output of the program - both from stdout and from serialized files. There are a bunch of predefined format converters and we also include mechanisms for extensions to allow maximum flexibility.

The result is a encapsulated, modular, safe and extendable architecture that allows scientists to expose their scientific programs on the web with little effort, and to collaboratively create complex, multidisciplinary processing pipelines. 

How to cite: Brinckmann, N., Pittore, M., Rüster, M., Proß, B., and Gomez-Zapata, J. C.: Put your models in the web - less painful, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-8671, https://doi.org/10.5194/egusphere-egu2020-8671, 2020.

EGU2020-9983 | Displays | ESSI2.19

Improving future optical Earth Observation products using transfer learning

Peter Kettig, Eduardo Sanchez-Diaz, Simon Baillarin, Olivier Hagolle, Jean-Marc Delvit, Pierre Lassalle, and Romain Hugues

Pixels covered by clouds in optical Earth Observation images are not usable for most applications. For this reason, only images delivered with reliable cloud masks are eligible for an automated or massive analysis. Current state of the art cloud detection algorithms, both physical models and machine learning models, are specific to a mission or a mission type, with limited transferability. A new model has to be developed every time a new mission is launched. Machine Learning may overcome this problem and, in turn obtain state of the art, or even better performances by training a same algorithm on datasets from different missions. However, simulating products for upcoming missions is not always possible and available actual products are not enough to create a training dataset until well after the launch. Furthermore, labelling data is time consuming. Therefore, even by the time when enough data is available, manually labelled data might not be available at all.

 

To solve this bottleneck, we propose a transfer learning based method using the available products of the current generation of satellites. These existing products are gathered in a database that is used to train a deep convolutional neural network (CNN) solely on those products. The trained model is applied to images from other - unseen - sensors and the outputs are evaluated. We avoid labelling manually by automatically producing the ground data with existing algorithms. Only a few semi-manually labelled images are used for qualifying the model. Even those semi-manually labelled samples need very few user inputs. This drastic reduction of user input limits subjectivity and reduce the costs.

 

We provide an example of such a process by training a model to detect clouds in Sentinel-2 images, using as ground-truth the masks of existing state-of-the-art processors. Then, we apply the trained network to detect clouds in previously unseen imagery of other sensors such as the SPOT family or the High-Resolution (HR) Pleiades imaging system, which provide a different feature space.

The results demonstrate that the trained model is robust to variations within the individual bands resulting from different acquisition methods and spectral responses. Furthermore, the addition of geo-located auxiliary data that is independent from the platform, such as digital elevation models (DEMs), as well as simple synthetic bands such as the NDVI or NDSI, further improves the results.

In the future, this approach opens up the possibility to be used on new CNES’ missions, such as Microcarb or CO3D.

How to cite: Kettig, P., Sanchez-Diaz, E., Baillarin, S., Hagolle, O., Delvit, J.-M., Lassalle, P., and Hugues, R.: Improving future optical Earth Observation products using transfer learning, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-9983, https://doi.org/10.5194/egusphere-egu2020-9983, 2020.

EGU2020-13338 | Displays | ESSI2.19

Design and Development of Interoperable Cloud Sensor Services to Support Citizen Science Projects

Henning Bredel, SImon Jirka, Joan Masó Pau, and Jaume Piera

Citizen Observatories are becoming a more and more popular source of input data in many scientific domains. This includes for example research on biodiversity (e.g. counts of specific species in an area of interest), air quality monitoring (e.g. low-cost sensor boxes), or traffic flow analysis (e.g. apps collecting floating car data).

For the collection of such data, different approaches exist. Besides frameworks providing re-usable software building blocks (e.g. wq framework, Open Data Kit), many projects rely on custom developments. However, these solutions are mainly focused on providing the necessary software components. Further work is necessary to set-up the necessary IT infrastructure. In addition, aspects such as interoperability are usually less considered which often leads to the creation of isolated information silos.

In our presentation, we will introduce selected activities of the European H2020 project COS4CLOUD (Co-designed citizen observatories for the EOS-Cloud). Among other objectives, COS4CLOUD aims at providing re-usable services for setting up Citizen Observatories based on the European Open Science (EOS) Cloud. We will especially discuss how it will make use of interoperability standards such as the Sensor Observation Service (SOS), SensorThings API as well as Observations and Measurements (O&M) of the Open Geospatial Consortium (OGC).

As a result, COS4CLOUD will not only facilitate the collection of Citizen Observatory data by reducing the work necessary to set-up a corresponding IT infrastructure. It will also support the exchange and integration of Citizen Observatory data between different projects as well as the integration with other authoritative data sources. This shall increase the sustainability of data collection efforts as Citizen Science data may be used as input for many data analysis processes beyond the project that originally collected the data.

How to cite: Bredel, H., Jirka, S., Masó Pau, J., and Piera, J.: Design and Development of Interoperable Cloud Sensor Services to Support Citizen Science Projects, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-13338, https://doi.org/10.5194/egusphere-egu2020-13338, 2020.

EGU2020-14903 | Displays | ESSI2.19

Providing a user-friendly outlier analysis service implemented as open REST API

Doron Goldfarb, Johannes Kobler, and Johannes Peterseil

As outliers in any data set may have detrimental effects on further scientific analysis, the measurement of any environmental parameter and the detection of outliers within these data are closely linked. However, outlier analysis is complicated, as the definition of an outlier is controversially discussed and thus - until now - vague. Nonetheless, multiple methods have been implemented to detect outliers in data sets. The application of these methods often requires some statistical know-how.

The present use case, developed as proof-of-concept implementation within the EOSC-Hub project, is dedicated to providing a user-friendly outlier analysis web-service via an open REST API processing environmental data either provided via Sensor Observation Service (SOS) or stored as data files in a cloud-based data repository. It is driven by an R-script performing the different operation steps consisting of data retrieval,  outlier analysis and final data export. To cope with the vague definition of an outlier, the outlier analysis step applies numerous statistical methods implemented in various R-packages.

The web-service encapsulates the R-script behind a REST API which is decribed by a dedicated OpenAPI specification defining two distinct access methods (i.e. SOS- and file-based) and the required parameters to run the R-script. This formal specification is subsequently used to automatically generate a server stub based on the Python FLASK framework which is customized to execute the R-script on the server whenever an appropriate web request arrives. The output is currently collected in a ZIP file which is returned after each successful web request. The service prototype is designed to be operated using generic resources provided by the European Open Science Cloud (EOSC) and the European Grid Initiative (EGI) in order to ensure sustainability and scalability.

Due to its user-friendliness and open availability, the presented web-service will facilitate access to standardized and scientifically-based outlier analysis methods not only for individual scientists but also for networks and research infrastructures like eLTER. It will thus contribute to the standardization of quality control procedures for data provision in distributed networks of data providers.

 

Keywords: quality assessment, outlier detection, web service, REST-API, eLTER, EOSC, EGI, EOSC-Hub

How to cite: Goldfarb, D., Kobler, J., and Peterseil, J.: Providing a user-friendly outlier analysis service implemented as open REST API, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-14903, https://doi.org/10.5194/egusphere-egu2020-14903, 2020.

Hydrological analyses generally require information from locations across a river system, and knowledge on how these locations are linked within that system. Hydrological monitoring data e.g. from sensors or samples of the status of river flow and water quality, and datasets on factors influencing this status e.g. sewage treatment input, riparian land use, lakes, abstractions, etc., are increasingly available as open datasets, sometimes via web-based APIs. However, retrieving information, for data discovery or for direct analysis, based on location within the river system is complex, and is therefore not a common feature of APIs for hydrological data.

We demonstrate an approach to extracting datasets based on river connectivity using a digital river network for the UK, converted to a directed graph, and the python networkX package. This approach enables very rapid identification of upstream and downstream reaches and features for sites of interest, with speeds suitable for on-the-fly analysis. We describe how such an approach could be deployed within an API for data discovery and data retrieval, and demonstrate linking data availability information, capturing observed properties and time series metadata, from large sensor networks, in a JSON-LD format based on concepts drawn from SSN/SOSA and INSPIRE EMF. This approach has been applied to identify up- and downstream water quality monitoring sites for lakes within the UK Lakes Database for nutrient retention analysis, and production of hierarchical datasets of river flow gauging stations to aide network understanding.

How to cite: Fry, M. and Rosecký, J.: Graph-based river network analysis for rapid discovery and analysis of linked hydrological data, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-17318, https://doi.org/10.5194/egusphere-egu2020-17318, 2020.

EGU2020-19393 | Displays | ESSI2.19

SIMILE: An integrated monitoring system to understand, protect and manage sub-alpine lakes and their ecosystem

Daniele Strigaro, Massimiliano Cannata, Fabio Lepori, Camilla Capelli, Michela Rogora, and Maria Brovelli

Lakes are an invaluable natural and economic resource for the insubric area, identified as the geographical area between the Po River (Lombardy, Italy) and the Monte Ceneri (Ticino, Switzerland). However, the increased anthropic activity and the climate change impacts are more and more threatening the health of these resources. In this context, universities and local administrations of the two regions, that share the trans-boundary lakes, joined their efforts and started a project, named SIMILE, to develop a system for the monitoring of lakes’ status providing updated and continuous information to support the management of the lakes. This project results from a pluriannual collaboration between the two countries, Switzerland and Italy, formalized in the CIPAIS commission (www.cipais.org). The aim is to introduce an innovative information system based on the combination of advanced automatic and continuous observation system, high resolution remote sensing data processing, citizen science and ecological and physical models. The project will capitalize the knowledge and experience of the resource managers with the creation of a Business Intelligence platform based on several interoperable geospatial Web services. The use of Open software and data will facilitate its adoption and will contribute to adequately keep the costs limited. The project, started few months ago is here presented and discussed.

How to cite: Strigaro, D., Cannata, M., Lepori, F., Capelli, C., Rogora, M., and Brovelli, M.: SIMILE: An integrated monitoring system to understand, protect and manage sub-alpine lakes and their ecosystem, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-19393, https://doi.org/10.5194/egusphere-egu2020-19393, 2020.

EGU2020-19453 | Displays | ESSI2.19

Accessing environmental time series data in R from Sensor Observation Services with ease

Daniel Nüst, Eike H. Jürrens, Benedikt Gräler, and Simon Jirka

Time series data of in-situ measurements is the key to many environmental studies. The first challenge in any analysis typically arises when the data needs to be imported into the analysis framework. Standardisation is one way to lower this burden. Unfortunately, relevant interoperability standards might be challenging for non-IT experts as long as they are not dealt with behind the scenes of a client application. One standard to provide access to environmental time series data is the Sensor Observation Service (SOS, ) specification published by the Open Geospatial Consortium (OGC). SOS instances are currently used in a broad range of applications such as hydrology, air quality monitoring, and ocean sciences. Data sets provided via an SOS interface can be found around the globe from Europe to New Zealand.

The R package sos4R (Nüst et al., 2011) is an extension package for the R environment for statistical computing and visualization (), which has been demonstrated a a powerful tools for conducting and communicating geospatial research (cf. Pebesma et al., 2012; ). sos4R comprises a client that can connect to an SOS server. The user can use it to query data from SOS instances using simple R function calls. It provides a convenience layer for R users to integrate observation data from data access servers compliant with the SOS standard without any knowledge about the underlying technical standards. To further improve the usability for non-SOS experts, a recent update to sos4R includes a set of wrapper functions, which remove complexity and technical language specific to OGC specifications. This update also features specific consideration of the OGC SOS 2.0 Hydrology Profile and thereby opens up a new scientific domain.

In our presentation we illustrate use cases and examples building upon sos4R easing the access of time series data in an R and Shiny () context. We demonstrate how the abstraction provided in the client library makes sensor observation data for accessible and further show how sos4R allows the seamless integration of distributed observations data, i.e., across organisational boundaries, into transparent and reproducible data analysis workflows.

References

Nüst D., Stasch C., Pebesma E. (2011) Connecting R to the Sensor Web. In: Geertman S., Reinhardt W., Toppen F. (eds) Advancing Geoinformation Science for a Changing World. Lecture Notes in Geoinformation and Cartography, Springer.

Pebesma, E., Nüst, D., & Bivand, R. (2012). The R software environment in reproducible geoscientific research. Eos, Transactions American Geophysical Union, 93(16), 163–163.

How to cite: Nüst, D., Jürrens, E. H., Gräler, B., and Jirka, S.: Accessing environmental time series data in R from Sensor Observation Services with ease, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-19453, https://doi.org/10.5194/egusphere-egu2020-19453, 2020.

EGU2020-21575 | Displays | ESSI2.19

Flood Monitoring using ACube - An Austrian Data Cube Solution

Claudio Navacchi, Bernhard Bauer-Marschallinger, and Wolfgang Wagner

Geospatial data come in various formats and originate from different sensors and data providers. This poses a challenge to users when aiming to combine or simultaneously access them. To overcome these obstacles, an easy-to-use data cube solution was designed for the Austrian user community and gathers various relevant and near real-time datasets. Here we show how such a system can be used for flood monitoring. 

In 2018, a joint project between the Earth Observation Data Centre for Water Resource Monitoring (EODC), TU Wien and BOKU has led to the emergence of the Austrian Data Cube (ACube). ACube implements the generic Python software from Open Data Cube, but further tailors it to national needs of Austrian ministries, universities or smaller companies. With user-driven input coming from all these partners, datasets and metadata attributes have been defined to facilitate query operations and data analysis. A focus was put on high-resolution remote sensing data from the Copernicus programme. This includes C-band radar backscatter, various optical bands, Surface Soil Moisture (SSM), Normalized Difference Vegetation Index (NDVI), Leaf Area Index (LAI), Fraction of Absorbed Photosynthetically Active Radiation (fAPAR), and monthly composites with pixel spacings varying between 10 and 500m. Static data like a digital elevation model (DEM), i.e. the EU-DEM, also reside next to the aforementioned dynamic datasets. Moreover, ACube offers different possibilities for data visualisation through QGIS or JupyterHub and, most importantly, enables access to a High Performance Computing (HPC) environment connected to a Petabyte-scale storage.

The ACube, as a centralised platform and interface to high-resolution datasets, prepares ground for many applications, e.g., land cover classification, snow melt monitoring, grassland yield estimation, land slide and flood detection. With a focus on the latter use case, first analyses based on Sentinel-1 radar backscatter data have already shown promising results. A near real-time fusion of radar, optical and ancillary data (DEM, land cover, etc.) through machine learning techniques could further improve an indication of flood events. Building a dedicated web service is foreseen as an upcoming action, relying on the latest data and the HPC environment in the background. Such an emergency service would provide much potential for authorities and users to assess damages, and also to determine vulnerability to progressing flooding.


The study received funding from the cooperative R&D FFG ASAP 14 project 865999 "Austrian Data Cube".

How to cite: Navacchi, C., Bauer-Marschallinger, B., and Wagner, W.: Flood Monitoring using ACube - An Austrian Data Cube Solution , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21575, https://doi.org/10.5194/egusphere-egu2020-21575, 2020.

EGU2020-21714 | Displays | ESSI2.19

Evolution of data infrastructure for effective integration and management of environmental and ecosystem data

Siddeswara Guru, Gerhard Weis, Wilma Karsdorp, Andrew Cleland, Jenny Mahuika, Edmond Chuc, Javier Sanchez Gonzalez, and Mosheh Eliyahu

The Terrestrial Ecosystem Research Network (TERN) is Australia's national research infrastructure to observe, monitor and support the study and forecasting of continental-scale ecological changes. TERN data are classified under two themes: Ecology and Biogeophysical.

 The Ecology theme relates predominantly to plot-based ecological observations conducted as a one-off, repeated surveys and sensor-based measurements. The Biogeophysical theme-related data collections are inclusive of point-based time-series eddy-covariance based micrometeorological measurements from flux towers; and continental and regional scale gridded data products related to remote sensing, soil and landscape ecology.

Integrating and querying data from different data sources are complicated. Furthermore,

The advancement of technology has transformed the mode of data collection. For instance, mobile sensors (drones) of various sizes are used more in recent times to sample the environment. The user-centric data handling mechanisms of different types of datasets are dissimilar, requiring heterogeneous data management practices alongside ease of access to data for users bundled with tools and platforms to interrogate, access, analyse and share analysis pipelines.

TERN is developing data e-infrastructure to support holistic capabilities that not only manage to store, curate and distribute data. But, enable processing based on user needs, linking consistent data to various analysis tools and pipelines and acquisition of data skills. The infrastructure would allow collaboration with other national and international data infrastructures and ingest data from partners including state and federal government institutes by adopting domain standards for metadata and data management and publications.

For effective data management of plot-based ecology data, we have developed an ontology-based on O&M and Semantic Sensor Network Ontology with an extension to support basic concepts of ecological sites and sampling. Besides, controlled vocabularies for observed properties, observation procedures and standard lists for taxa, geology, soils etc. will supplement the ontology.

The biogeophysical data is managed using domain standards in the data and metadata management. Each of the data products is represented in a standard file format and hosted in an OGC standard web services. All datasets are described and catalogued using ISO standards. An overarching discovery portal allows users to search, access and interact with data collections. The user’s interaction with data can be at the collection level, on a spatial map and via web services and Application Programming Interface (API).

TERN has also developed a cloud-based virtual desktop environment, CoESRA, accessible from a web browser to enable easy access to the computing platform with tools for the ecosystem science community. The advantage is that it allows access to all TERN data in a compute environment for performing analysis and synthesis activities from a single managed platform.

How to cite: Guru, S., Weis, G., Karsdorp, W., Cleland, A., Mahuika, J., Chuc, E., Sanchez Gonzalez, J., and Eliyahu, M.: Evolution of data infrastructure for effective integration and management of environmental and ecosystem data , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21714, https://doi.org/10.5194/egusphere-egu2020-21714, 2020.

ESSI3.1 – Free and Open Source Software (FOSS) and Cloud-based Technologies to Facilitate Collaborative Science

In recent years, NASA has invested significantly in developing an Analytics Center Framework (ACF) to encapsulate the scalable computational and data infrastructures and to harmonize data, tools and computation resources to enable scientific investigations. Since 2017, the Apache’s Science Data Analytics Platform (SDAP) (https://sdap.apache.org) has been adapted by various NASA-funded projects, including the NASA Sea Level Change Portal, GRACE and GRACE-FO missions, the CEOS Ocean Variables Enabling Research and Applications for GEO (COVERAGE) Initiative, etc. With much of existing approaches to Earth Science analysis are focusing on collocating all the relevant data under one system, running on the cloud, this open source platform empowers the global data centers to take on a federated analytics approach. With the growing community of SDAP centers, it is now possible for researcher to interactively analyze observational and model data hosted on different centers without having to collocate or download data to their own computing environment. This talk discusses the application of this professional open source big data analytics platform to establish a growing community of SDAP-based ACFs to enable distributed spatiotemporal analysis from any platform, using any programming languages.

How to cite: Huang, T.: Open Source Platform for Federated Spatiotemporal Analysis, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-4203, https://doi.org/10.5194/egusphere-egu2020-4203, 2020.

EGU2020-20868 | Displays | ESSI3.1

An Interoperable Low-Code Modelling Framework for Integrated Spatial Modelling

Alexander Herzig, Jan Zoerner, John Dymond, Hugh Smith, and Chris Phillips

An Interoperable Low-Code Modelling Framework for Integrated Spatial Modelling


Alexander Herzig, Jan Zoerner, John Dymond, Hugh Smith, Chris Phillips
Manaaki Whenua – Landcare Research New Zealand


Modelling complex environmental systems, such as earth surface processes, requires the representation and quantification of multiple individual but connected processes. In the Smarter Targeting Erosion Control (STEC) research programme, we are looking to improve understanding of where erosion occurs, how much and what type of sediment is produced and by which processes, how sediment moves through catchments, and how erosion and sediment transport can be targeted and mitigated cost-effectively. Different research groups involved in the programme will develop different model components representing different processes. To be able to assess the impact of sediment on water quality attributes in the river and for develop effective erosion control measures, the individual models need to be integrated to a composite model. 
In this paper we focus on the technical aspects and seamless integration of individual model components utilising the Basic Model Interface (BMI, Peckham et al. 2013) as interoperability standard and the extension of the LUMASS spatial modelling environment into a BMI-compliant model coupling framework. LUMASS provides a low-code visual development environment for building complex hierarchical system dynamics models that can be run in HPC environments and support sequential and parallel processing of large datasets. Each model developed in the framework can be exposed to other models and frameworks through the BMI-compliant LUMASS engine, without requiring any additional programming, thus greatly simplifying the development of interoperable model components. Here, we concentrate on the integration of BMI-compliant external model components and how they are coupled into the overall model structure. 
In the STEC programme, we use LUMASS for both the implementation of model components representing individual soil erosion processes, such as landslides, earthflows, and surficial erosion and for the integration (i.e. coupling) of other (external) BMI-compliant model components into a composite model. Using available (prototype) models we will demonstrate how LUMASS’ visual development environment can be used to build interoperable integrated component models with very little coding requirements. 

Peckham SD, Hutton EWH, Boyana N 2013. A Component-based approach to integrated modelling in the geosciences: The design of CSDMS. Computers & Geosciences 53: 3—12. http://dx.doi.org/10.1016/j.cageo.2012.04.002 

LUMASS: https://bitbucket.org/landcareresearch/lumass 

How to cite: Herzig, A., Zoerner, J., Dymond, J., Smith, H., and Phillips, C.: An Interoperable Low-Code Modelling Framework for Integrated Spatial Modelling, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-20868, https://doi.org/10.5194/egusphere-egu2020-20868, 2020.

EGU2020-11674 | Displays | ESSI3.1

Unlocking modern nation-scale LiDAR datasets with FOSS – the Laserchicken framework

Meiert W. Grootes, Christiaan Meijer, Zsofia Koma, Bouwe Andela, Elena Ranguelova, and W. Daniel Kissling

LiDAR as a remote sensing technology, enabling the rapid 3D characterization of an area from an air- or spaceborne platform, has become a mainstream tool in the (bio)geosciences and related disciplines. For instance, LiDAR-derived metrics are used for characterizing vegetation type, structure, and prevalence and are widely employed across ecosystem research, forestry, and ecology/biology. Furthermore, these types of metrics are key candidates in the quest for Essential Biodiversity Variables (EBVs) suited to quantifying habitat structure, reflecting the importance of this property in assessing and monitoring the biodiversity of flora and fauna, and consequently in informing policy to safeguard it in the light of climate change an human impact.

In all these use cases, the power of LiDAR point cloud datasets resides in the information encoded within the spatial distribution of LiDAR returns, which can be extracted by calculating domain-specific statistical/ensemble properties of well-defined subsets of points.  

Facilitated by technological advances, the volume of point cloud data sets provided by LiDAR has steadily increased, with modern airborne laser scanning surveys now providing high-resolution, (super-)national scale datasets, tens to hundreds of terabytes in size and encompassing hundreds of billions of individual points, many of which are available as open data.

Representing a trove of data and, for the first time, enabling the study of ecosystem structure at meter resolution over the extent of tens to hundreds of kilometers, these datasets represent highly valuable new resources. However, their scientific exploitation is hindered by the scarcity of Free Open Source Software (FOSS) tools capable of handling the challenges of accessing, processing, and extracting meaningful information from massive multi-terabyte datasets, as well as by the domain-specificity of any existing tools.

Here we present Laserchicken a FOSS, user-extendable, cross-platform Python tool for extracting user-defined statistical properties of flexibly defined subsets of point cloud data, aimed at enabling efficient, scalable, and distributed processing of multi-terabyte datasets. Laserchicken can be seamlessly employed on computing architectures ranging from desktop systems to distributed clusters, and supports standard point cloud and geo-data formats (LAS/LAZ, PLY, GeoTIFF, etc.) making it compatible with a wide range of (FOSS) tools for geoscience.

The Laserchicken feature extraction tool is complemented by a FOSS Python processing pipeline tailored to the scientific exploitation of massive nation-scale point cloud datasets, together forming the Laserchicken framework.

The ability of the Laserchicken framework to unlock nation-scale LiDAR point cloud datasets is demonstrated on the basis of its use in the eEcoLiDAR project, a collaborative project between the University of Amsterdam and the Netherlands eScience Center. Within the eEcoLiDAR project, Laserchicken has been instrumental in defining classification methods for wetland habitats, as well as in facilitating the use of high-resolution vegetation structure metrics in modelling species distributions at national scales, with preliminary results highlighting the importance of including this information.

The Laserchicken Framework rests on FOSS, including the GDAL and PDAL libraries as well as numerous packages hosted on the open source Python Package Index (PyPI), and is itself also available as FOSS (https://pypi.org/project/laserchicken/ and https://github.com/eEcoLiDAR/ ).

How to cite: Grootes, M. W., Meijer, C., Koma, Z., Andela, B., Ranguelova, E., and Kissling, W. D.: Unlocking modern nation-scale LiDAR datasets with FOSS – the Laserchicken framework, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-11674, https://doi.org/10.5194/egusphere-egu2020-11674, 2020.

EGU2020-10341 | Displays | ESSI3.1

NetCDF in the Cloud: modernizing storage options for the netCDF Data Model with Zarr

Ward Fisher and Dennis Heimbigner

NetCDF has historically offered two different storage formats for the netCDF data model: files based on the original netCDF binary format, and files based on the HDF5 format. While this has proven effective in the past for traditional disk storage, it is less efficient for modern cloud-focused technologies such as those provided by Amazon S3, Microsoft Azure, IBM Cloud Object Storage, and other cloud service providers. As with the decision to base the netCDF Extended Data Model and File Format on the HDF5 technology, we do not want to reinvent the wheel when it comes to cloud storage. There are a number of existing technologies that the netCDF team can use to implement native object storage capabilities. Zarr enjoys broad popularity within the Unidata community, particularly among our Python users. By integrating support for the latest Zarr specification (while not locking ourselves in to a specific version), we will be able to provide the broadest support for data written by other software packages which use the latest Zarr specification.

How to cite: Fisher, W. and Heimbigner, D.: NetCDF in the Cloud: modernizing storage options for the netCDF Data Model with Zarr, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10341, https://doi.org/10.5194/egusphere-egu2020-10341, 2020.

EGU2020-10939 | Displays | ESSI3.1

ERDA: External Version-Controlled Research Data

Willi Rath, Carsten Schirnick, and Claas Faber

This presentation will detail the design, implementation, and operation of ERDA, which is a collection of external version-controlled research datasets, of multiple synchronized deployments of the data, of a growing set of minimal examples using the datasets from various deployments, of stand-alone tools to create, maintain, and deploy new datasets, and of documentation targeting different audiences (users, maintainers, developers).

ERDA was designed with the following principles in mind: Provide clear data provenance and ensure long-term availability, minimize effort for adding data and make all contents available to all users immediately, ensure unambiguous referencing and develop transparent versioning conventions, embrace mobility of scientists and target independence from the infrastructure of specific institutions.

The talk will show how the data management is done with Git-LFS, demonstrate how data repositories are rendered from human-readable data, and give an overview of the versioning scheme that is applied.

How to cite: Rath, W., Schirnick, C., and Faber, C.: ERDA: External Version-Controlled Research Data, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10939, https://doi.org/10.5194/egusphere-egu2020-10939, 2020.

EGU2020-16661 | Displays | ESSI3.1

Event-driven Processing of Earth Observation Data

Matthes Rieke, Sebastian Drost, Simon Jirka, and Arne Vogt

Earth Observation data has become available and obtainable in continuously increasing quality as well as spatial and temporal coverage. To deal with the massive amounts of data, the WaCoDiS project aims at developing an architecture that allows its automated processing. The project focuses on the development of innovative water management analytics services based on Earth Observation data such as provided by the Copernicus Sentinel missions. The goal is to improve hydrological models including but not limited to: a) identification of the catchment areas responsible for pollutant and sediment inputs; b) detection of turbidity sources in water bodies and rivers. The central contribution is a system architecture design following the Microservice architecture pattern: small components fulfil different tasks and responsibilities (e.g. managing processing jobs, data discovery, process scheduling and execution).  In addition, processing algorithms, that are encapsulated by Docker containers, can be easily integrated using the OGC Web Processing Service Interface. The orchestration of the different components builds a fully functional ecosystem that is ready for deployment on single machines as well as cloud infrastructures such as a Copernicus DIAS node or commercial cloud environments (e.g. Google Cloud Platform, Amazon Web Services). All components are encapsulated within Docker containers.

The different components are loosely coupled and react to messages and events which are published on a central message broker component. This allows the flexible scaling and deployment of the system. For example, the management components can run on physical different locations than the processing algorithms. Thus, the system supports the reduction of manual work (e.g. identification of relevant input data, execution of algorithms) and minimizes the required interaction of domain users. Once a Processing Job is registered within the system, the user can track the status of it (e.g. when it was last executed, if an error occurred) and will eventually be informed when new processing results are available.

In summary, this work targets to develop a system that allows the automated and event-driven creation of Earth Observation products. It is suitable to run on Copernicus DIAS nodes or on dedicated environments such as a Kubernetes Cluster.

In our contribution, we will present the event-driven processing workflows within the WaCoDiS system that enables the automation of water management related analytics services. In addition, we will focus on architectural details of the Microservice oriented system design and discuss different deployment options.

How to cite: Rieke, M., Drost, S., Jirka, S., and Vogt, A.: Event-driven Processing of Earth Observation Data, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-16661, https://doi.org/10.5194/egusphere-egu2020-16661, 2020.

EGU2020-19989 | Displays | ESSI3.1

MAAP: The Mission Algorithm and Analysis Platform: A New Virtual and Collaborative Environment for the Scientific Community

Clement Albinet, Sebastien Nouvellon, Björn Frommknecht, Roger Rutakaza, Sandrine Daniel, and Carine Saüt

The ESA-NASA multi-Mission Algorithm and Analysis Platform (MAAP) is dedicated to the BIOMASS [1], NISAR [2] and GEDI [3] missions. This analysis platform will be a virtual open and collaborative environment. The main goal is to bring together data centres (Earth Observation and non-Earth Observation data), computing resources and hosted processing in order to better address the needs of scientists and federate the scientific community.

The MAAP will provide functions to access data and metadata from different sources such as Earth observation satellites data from science missions; visualisation functions to display the results of the system processing (trends, graphs, maps ...) and results of statistic and analysis tools; collaborative functions to share data, algorithms, ideas between the MAAP users; processing functions including development environments and an orchestration system allowing to create and run processing chains from official algorithms.

Currently, the MAAP is in its pilot phase. The architecture for the MAAP pilot foresees two independent elements, one developed by ESA, one developed by NASA, unified by a common user entry point. Both elements will be deployed on Cloud infrastructures. Interoperability between the elements is envisaged for data discovery, data access and identity and access management.

The ESA element architecture is based on technical solutions including: Microservices, Docker images, Kubernetes; Cloud-based virtual development environments (such as Jupyter or Eclipse CHE) for the MAAP algorithm developers; a framework to create, run and monitor chains of algorithms containerised as docker images. Interoperability between both ESA and NASA elements will be based on CMR (NASA Common Metadata Repository), services bases on OGC standards (such as WMS/WMTS, WCS and WPS) and secured with the OAUTH2 protocol.

This presentation focuses on the pilot platform and how interoperability between the NASA and ESA elements will be achieved. It also gives insight into the architecture of the ESA element and the technical implementation of this virtual environment. Finally, it will present the very first achievements and return of experience of the pilot platform.

 

REFERENCES

[1] T. Le Toan, S. Quegan, M. Davidson, H. Balzter, P. Paillou, K. Papathanassiou, S. Plummer, F. Rocca, S. Saatchi, H. Shugart and L. Ulander, “The BIOMASS Mission: Mapping global forest biomass to better understand the terrestrial carbon cycle”, Remote Sensing of Environment, Vol. 115, No. 11, pp. 2850-2860, June 2011.

[2] P.A. Rosen, S. Hensley, S. Shaffer, L. Veilleux, M. Chakraborty, T. Misra, R. Bhan, V. Raju Sagi and R. Satish, "The NASA-ISRO SAR mission - An international space partnership for science and societal benefit", IEEE Radar Conference (RadarCon), pp. 1610-1613, 10-15 May 2015.

[3] https://science.nasa.gov/missions/gedi

How to cite: Albinet, C., Nouvellon, S., Frommknecht, B., Rutakaza, R., Daniel, S., and Saüt, C.: MAAP: The Mission Algorithm and Analysis Platform: A New Virtual and Collaborative Environment for the Scientific Community, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-19989, https://doi.org/10.5194/egusphere-egu2020-19989, 2020.

EGU2020-10915 | Displays | ESSI3.1

Structural features extracted from voxelised full-waveform LiDAR using the open source software DASOS for detecting dead standing trees

Milto Miltiadou, Maria Prodromou, Athos Agapiou, and Diofantos G. Hadjimitsis

DASOS is an open source software developed by the authors of this abstract to support the usage of full-waveform (FW) LiDAR data. Traditionally LiDAR record only a few peak point returns, while FW LiDAR systems digitizes the entire backscattered signal returned to the instrument into discrete waveforms. Each waveform consists of a set of waveform samples equally spaced. Extraction of peak points from waveforms reduces data and they can be embedded into existing workflows. Nevertheless, this approach discretizes the data. In recent studies, voxelization of FW LiDAR data has been increased. The open source software DASOS uses voxelization for the interpretation the FW LiDAR data and has four main functionalities: (1) extraction of 2D metrics, e.g. height, density, (2) reconstruction of 3D polygonal meshes from the data (3) alignment with hyperspectral imagery for generating aligned metrics with the FW LiDAR data and colored polygonal meshes, (4) extraction of local features using 3D windows, e.g. standard deviation of heights within the 3D window.

Here, we do not only present the functionalities of DASOS but also how the extraction of complex structural features from local areas, 3D windows, could be used for improving forest inventories. In Southern Australia, dead trees plays a substantial role in managing biodiversity since they are more likely to contain hollows and consequently shelter native, protected species. The study area is a native River Red Gum (Eucalyptus camaldulensis) forest. Eucalypt trees are difficult to delineate due to their irregular shapes and multiple trunk split. Using field data, positive (dead standing trees) and negative (live trees) samples were defined and for each sample multiple features were extracted using 3D windows from DASOS. With 3D object detection, it was shown that it is possible to detect them without tree delineation. The studies was further improved with the introduction of multi-scale 3D windows for categorizing trees according to their height and doing a three pass detection, one for each size category. By cross validating the results, it was shown that the multi-scale 3D-window approach further improved detection of dead standing Eucalypt trees. The extraction of structural features using DASOS and the methodology implemented could be applied to further forest related applications.

The project ‘FOREST’ (OPPORTUNITY/0916/0005) is co-financed by the European Regional Development Fund and the Republic of Cyprus through the Research Innovation Foundation.

How to cite: Miltiadou, M., Prodromou, M., Agapiou, A., and Hadjimitsis, D. G.: Structural features extracted from voxelised full-waveform LiDAR using the open source software DASOS for detecting dead standing trees , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10915, https://doi.org/10.5194/egusphere-egu2020-10915, 2020.

EGU2020-6897 | Displays | ESSI3.1

Application of Python Script in National Geographic Conditions Monitoring Project of China

Zhaokun Zhai, Jianjun Liu, and Yin Gao

The geographic conditions monitoring is an important mission in geosciences. Its aim is to study, analyze and describe national conditions in the view of geography. The National Geographic Conditions Monitoring Project of China, based on remote sensing and geospatial information technology, has acquired large-scale and various kinds of geographic data in China, such as remote sensing images, land cover information and geographic conditional elements. The goal of this project is to build National Geographic Conditions Monitoring Database, which is aimed to offer reliable fundamental geoinformation for government decision-making. It plays an important role in natural resources supervision, environmental protection and emergency management. Moreover, it also contributes to the development of geosciences. However, as China is such a huge country, large quantity of data is produced by many institutions and companies. It makes it difficult to finish data quality check manually before importing data into oracle spatial database. Besides, there are many data applications from lots of institutions every year, which also spends plenty of time.

Python is an open source computer programming language. It has the characteristics of friendly, clear syntax and easy to learn. There are large numbers of standard libraries and third-party libraries. Based on python, we developed lots of python scripts for this project. From the viewpoint of geodatabase construction, we developed scripts to check collected data, mainly include directory check, structure check, attribute check and topology check to ensure data is standardized and correct. Spatial analysis and statistical calculation can also be finished rapidly and accurately using python script. For production supply, we also developed scripts which can distribute data from database automatically according to any region.

Tools are critical to the progress and development of science. The application of python scripts improves the efficiency of our work to some extent, which can make sure the project is successfully completed on time every year. Geographic data is obtained that covered all over the country, which contributes to the economic and social development, national strategic decision and planning. The source code of these scripts is public. It also helps to optimize and improve these scripts. I believe open source software will play a greater role in the future. Geoscience will get better and better when geographic data is processed and analyzed using open source software.

How to cite: Zhai, Z., Liu, J., and Gao, Y.: Application of Python Script in National Geographic Conditions Monitoring Project of China, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-6897, https://doi.org/10.5194/egusphere-egu2020-6897, 2020.

EGU2020-21619 | Displays | ESSI3.1

era5cli: The command line tool to download ERA5 data

Jaro Camphuijsen, Ronald van Haren, Yifat Dzigan, Niels Drost, Fakhareh Alidoost, Bouwe Andela, Jerom Aerts, Berend van Weel, Rolf Hut, and Peter Kalverla

With the release of the ERA5 dataset, worldwide high resolution reanalysis data became available with open access for public use. The Copernicus CDS (Climate Data Store) offers two options for accessing the data: a web interface and a Python API. Consequently, automated downloading of the data requires advanced knowledge of Python and a lot of work. To make this process easier, we developed era5cli. 

The command line interface tool era5cli enables automated downloading of ERA5 using a single command. All variables and options available in the CDS web form are now available for download in an efficient way. Both the monthly and hourly dataset are supported. Besides automation, era5cli adds several useful functionalities to the download pipeline.

One of the key options in era5cli is to spread one download command over multiple CDS requests, resulting in higher download speeds. Files can be saved in both GRIB and NETCDF format with automatic, yet customizable file names. The `info` command lists correct names of the available variables and pressure levels for 3D variables. For debugging purposes and testing the `--dryrun` option can be selected to return only the CDS request. An overview of all available options, including instructions on how to configure your CDS account, is available in our documentation. Source code is available on https://github.com/eWaterCycle/era5cli.

In this PICO presentation we will provide an overview of era5cli, as well as a short introduction on how to use era5cli.

How to cite: Camphuijsen, J., van Haren, R., Dzigan, Y., Drost, N., Alidoost, F., Andela, B., Aerts, J., van Weel, B., Hut, R., and Kalverla, P.: era5cli: The command line tool to download ERA5 data, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21619, https://doi.org/10.5194/egusphere-egu2020-21619, 2020.

EGU2020-8583 | Displays | ESSI3.1

Integrated field-agent based modelling using the LUE scientific data base

Oliver Schmitz, Kor de Jong, and Derek Karssenberg

The heterogeneous nature of environmental systems poses a challenge to researchers constructing environmental models. Many simulation models of integrated systems need to incorporate phenomena that are represented as spatially and temporally continuous fields as well as phenomena that are modelled as spatially and temporally bounded agents. Examples include moving animals (agents) interacting with vegetation (fields) or static water reservoirs (agents) as components of hydrological catchments (fields). However, phenomena bounded in space and time have particular properties mainly because they require representation of multiple (sometimes mobile) objects that each exist in a small subdomain of the space-time domain of interest. Moreover, these subdomains of objects may overlap in space and time such as interleaving branches due to tree crown growth. Efficient storage and access of different types of phenomena requires an approach that integrates representation of fields and objects in a single data model.

We develop the open-source LUE data model that explicitly stores and separates domain information, i.e. where phenomena exist in the space-time domain, and property information, i.e. what attribute value the phenomenon has at a particular space-time location, for a particular object. Notable functionalities are support for multiple spatio-temporal objects, time domains, objects linked to multiple space and time domains, and relations between objects. The design of LUE is based on the conceptual data model of de Bakker (2017) and implemented as a physical data model using HDF5 and C++ (de Jong, 2019). Our LUE data model is part of a new modelling language implemented in Python, allowing for operations accepting both fields and agents as arguments, and therefore resembling and extending the map algebra approach to field-agent modelling.

We present the conceptual and physical data models and illustrate the usage by implementing a spatial agent-based model simulating changes in human nutrition. We thereby consider the interaction between personal demand and supply of healthy food of nearby stores as well as the influence of agent's social network.


References:

de Bakker, M. P., de Jong, K., Schmitz, O., & Karssenberg, D. (2017). Design and demonstration of a data model to integrate agent-based and field-based modelling. Environmental Modelling & Software, 89, 172–189. https://doi.org/10.1016/j.envsoft.2016.11.016

de Jong, K., & Karssenberg, D. (2019). A physical data model for spatio-temporal objects. Environmental Modelling & Software. https://doi.org/10.1016/j.envsoft.2019.104553

LUE source code repository: https://github.com/pcraster/lue/

How to cite: Schmitz, O., de Jong, K., and Karssenberg, D.: Integrated field-agent based modelling using the LUE scientific data base, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-8583, https://doi.org/10.5194/egusphere-egu2020-8583, 2020.

EGU2020-19423 | Displays | ESSI3.1

Borehole Data Management System: a web interface for borehole data acquisition

Massimiliano Cannata, Milan Antonovic, Nils Oesterling, and Sabine Brodhag

The shallow underground is of primary importance in governing and planning the territories where we live. In fact, the uppermost 500 meters below the ground surface are interested by a growing number of anthropic activities like constructions, extraction of drinking water, mineral resources, installation of geothermal probes, etc. Borehole data are therefore essential as they reveal at specific location the vertical sequence of geological layers which in turns can provide an understanding of the geological conditions we can expect in the shallow underground. Unfortunately, data are rarely available in a FAIR way that as the acronym specify are Findable, Accessible, Interoperable and Reusable.

Most of the time data, particularly those collected in the past, are in the form of static data reports that describe the stratigraphy and the related characteristics; these data types are generally available as paper documents, or static files like .pdf of images (.ai). While very informative, these documents are not searchable, not interoperable nor easily reusable, since they require a non negligible time for data integration. Sometime, data are archived into database. This certainly improve the find-ability of the data and its accessibility but still do not address the interoperability requirement and therefore, combining data from different sources remain a problematic task. To enable FAIR borehole data and facilitate the different entities (public or private) management swisstopo (www.swisstopo.ch) has funded the development of a Web application named Borehole Data Management System (BDMS) [1] that adopt the borehole data model () [2] implemented by the Swiss Geological Survey.

Among the benefits of adopting a standard model we can identify:

  • Enhance the exchange, the usage and quality of the data
  • Reach data harmonization (level of detail, precise definitions, relationships and dependencies among the data),
  • Establish a common language between stakeholders

The Borehole Data Management System (BDMS)  was developed using the latest Free and Open Source Technologies. The new application integrates some of the today’s best OSGeo projects and is available as a modular open source solution on GitHub and ready to use in a docker container available on Docker Hub. Through two types of authorization, Explorer users are able to search the BDMS for specific boreholes, navigate a configurable user friendly map, apply filters, explore the stratigraphy layers of each borehole and export all the data in Shapefiles, CSV or PDF. Editors are able to manage in details the informations and publish the results after passing a validation process.

 

Links

[1] http://geoservice.ist.supsi.ch/docs/bdms/index.html

[2] https://www.geologieportal.ch/en/knowledge/lookup/data-models/borehole-data-model.html 

How to cite: Cannata, M., Antonovic, M., Oesterling, N., and Brodhag, S.: Borehole Data Management System: a web interface for borehole data acquisition, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-19423, https://doi.org/10.5194/egusphere-egu2020-19423, 2020.

EGU2020-1325 | Displays | ESSI3.1

Urban Thematic Exploitation Platform - supporting urban research with EO data processing, integrative data analysis and reporting

Felix Bachofer, Thomas Esch, Jakub Balhar, Martin Boettcher, Enguerran Boissier, Mattia Marconcini, Annekatrin Metz-Marconcini, Michal Opletal, Fabrizio Pacini, Tomas Soukup, Vaclav Svaton, and Julian Zeidler

Urbanization is among the most relevant global trends that affects climate, environment, as well as health and socio-economic development of a majority of the global population. As such, it poses a major challenge for the current urban population and the well-being of the next generation. To understand how to take advantage of opportunities and properly mitigate to the negative impacts of this change, we need precise and up-to-date information of the urban areas. The Urban Thematic Exploitation Platform (UrbanTEP) is a collaborative system, which focuses on the processing of earth observation (EO) data and delivering multi-source information on trans-sectoral urban challenges.

The U-TEP is developed to provide end-to-end and ready-to-use solutions for a broad spectrum of users (service providers, experts and non-experts) to extract unique information/ indicators required for urban management and sustainability. Key components of the system are an open, web-based portal connected to distributed high-level computing infrastructures and providing key functionalities for

i) high-performance data access and processing,

ii) modular and generic state-of-the art pre-processing, analysis, and visualization,

iii) customized development and sharing of algorithms, products and services, and

iv) networking and communication.

The service and product portfolio provides access to the archives of Copernicus and Landsat missions, Datacube technology, DIAS processing environments, as well as premium products like the World Settlement Footprint (WSF). External service providers, as well as researchers can make use of on-demand processing of new data products and the possibility of developing and deploying new processors. The onboarding of service providers, developers and researchers is supported by the Network of Resources program of the European Space Agency (ESA) and the OCRE initiative of the European Commission.

In order to provide end-to-end solutions, the VISAT tool on UrbanTEP allows analyzing and visualizing project-related geospatial content and to develop storylines to enhance the transport of research output to customers and stakeholders effectively. Multiple visualizations (scopes) are already predefined. One available scope exemplary illustrates the exploitation of the WSF-Evolution dataset by analyzing the settlement and population development for South-East Asian countries from 1985 to 2015 in the context of the Sustainable Development Goal (SDG) 11.3.1 indicator. Other open scopes focus on urban green, functional urban areas, land-use and urban heat island modelling (e.g.).

How to cite: Bachofer, F., Esch, T., Balhar, J., Boettcher, M., Boissier, E., Marconcini, M., Metz-Marconcini, A., Opletal, M., Pacini, F., Soukup, T., Svaton, V., and Zeidler, J.: Urban Thematic Exploitation Platform - supporting urban research with EO data processing, integrative data analysis and reporting , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-1325, https://doi.org/10.5194/egusphere-egu2020-1325, 2020.

EGU2020-10967 | Displays | ESSI3.1

CGI GeoData360: a cloud-based scalable production platform for big data-driven solutions of Earth Observation and Geospatial services.

Chandra Taposeea-Fisher, Andrew Groom, Jon Earl, and Peter Van Zetten

Our ability to observe the Earth is transforming, with substantially more satellite imagery and geospatial data fuelling big data-driven opportunities to better monitor and manage the Earth and its systems. CGI’s GeoData360 solves common technical challenges for those aiming to exploit these new opportunities.

Reliable monitoring solutions that run efficiently at scale require substantial ICT resources and more sophisticated data processing capabilities that can be complex and costly. Cloud-based resources enable new approaches using large, multi-tenant infrastructures, enabling solutions to benefit from massive infrastructural resources, otherwise unattainable for the individual user. GeoData360 makes these opportunities accessible to a wide user base.

GeoData360 is CGI’s cloud-hosted production platform for Earth Observation (EO) and Geospatial services. GeoData360 is designed for long running, large scale production pipelines as a Platform-as-a-Service. It supports deep customisation and extension, enabling production workflows that consume large volumes of EO and Geospatial data to run cost efficiently at scale.

GeoData360 is fully scalable, works dynamically and optimises the use of infrastructure resources available from commercial cloud providers, whilst also reducing elapsed processing times. It has the advantage of being portable and securely deployable within public or private cloud environments. Its operational design provides the reliable, consistent performance needed for commercially viable services. The platform is aimed at big data, with production capabilities applicable to services based on EO imagery and other Geospatial data (climate data, meteorological data, points, lines, polygons etc.). GeoData360 has been designed to support cost effective production, with applications using only the resources that are required.

CGI has already used GeoData360 as enabling technology on EO and non-EO initiatives, benefitting from: (1) granularity, with containerisation at the level of the individual processing step, allowing increased flexibility, efficient testing and implementation, and improved optimisation potential for dynamic scaling; (2) standardisation, with a centralised repository of standardised processing steps enabling efficient re-use for rapid prototyping; (3) orchestration and automation, by linking process steps into complete processing workflows, enabling the granular approach and reducing operational costs; (4) dynamic scaling, for processing resources and for storage; (5) inbuilt monitoring with graphical feedback providing transparency on system performance, allowing to maintain system control for highly automated workflows; (6) data access, with efficient access to online archives; (7) security, with access control and protection for third Party Intellectual Property. Example initiatives that benefit from GeoData360 include PASSES (Peatland Assessment in SE Asia via Satellite) and HiVaCroM (High Value Crop Monitoring). Both initiatives have used GeoData360 to enable data intensive production workflows to be deployed and run at national to regional scales.

GeoData360 solves the challenges of providing production-ready offerings: reliability, repeatability, traceability and monitoring. Our solution solves the scaling issues inherent in batch processing large volumes of bulky data and decoupling the algorithms from the underlying infrastructure. GeoData360 provides a trusted component in the development, deployment and successful commercialisation of big data-driven solutions.

How to cite: Taposeea-Fisher, C., Groom, A., Earl, J., and Van Zetten, P.: CGI GeoData360: a cloud-based scalable production platform for big data-driven solutions of Earth Observation and Geospatial services., EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10967, https://doi.org/10.5194/egusphere-egu2020-10967, 2020.

EGU2020-21974 | Displays | ESSI3.1

The Earth Observation Time Series Analysis Toolbox (EOTSA) - An R package with WPS, Web-Client and Spark integration

Ulrich Leopold, Benedikt Gräler, Henning Bredel, J. Arturo Torres-Matallana, Philippe Pinheiro, Mickaël Stefas, Thomas Udelhoven, Jeroen Dries, Bernard Valentin, Leslie Gale, Philippe Mougnaud, and Martin Schlerf

We present an implementation of a time series analysis toolbox for remote sensing imagery in R which has been largely funded by the European Space Agency within the PROBA-V MEP Third Party Services project. The toolbox is developed according to the needs of the time series analysis community. The data is provided by the PROBA-V mission exploitation platform (MEP) at VITO. The toolbox largely builds on existing specialized R packages and functions for raster and time series analysis combining these in a common framework.

In order to ease access and usage of the toolbox, it has been deployed in the MEP Spark Cluster to bring the algorithm to the data. All functions are also wrapped in a Web Processing Service (WPS) using 52°North’s WPS4R extension for interoperability across web platforms. The WPS can be orchestrated in the Automatic Service Builder (ASB) developed by Space Applications. Hence, the space-time analytics developed in R can be integrated into a larger workflow potentially integrating external data and services. The WPS provides a Webclient including a preview of the results in a map window for usage within the MEP. Results are offered for download or through Web Mapping and Web Coverage Services (WMS, WCS) provided through a Geoserver instance.

Through its interoperability features the EOTSA toolbox provides a contribution towards collaborative science.

How to cite: Leopold, U., Gräler, B., Bredel, H., Torres-Matallana, J. A., Pinheiro, P., Stefas, M., Udelhoven, T., Dries, J., Valentin, B., Gale, L., Mougnaud, P., and Schlerf, M.: The Earth Observation Time Series Analysis Toolbox (EOTSA) - An R package with WPS, Web-Client and Spark integration, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21974, https://doi.org/10.5194/egusphere-egu2020-21974, 2020.

ESSI3.2 – The evolving Open and FAIR ecosystem for Solid Earth and Environmental sciences: challenges, opportunities, and other adventures

EGU2020-10349 | Displays | ESSI3.2 | Highlight

The Role of Data Systems to Enable Open Science

Rahul Ramachandran, Kaylin Bugbee, and Kevin Murphy

Open science is a concept that represents a fundamental change in scientific culture. This change is characterized by openness, where research objects and results are shared as soon as possible, and connectivity to a wider audience. Understanding about what Open Science actually means  differs from various stakeholders.

Thoughts on Open Science fall into four distinct viewpoints. The first viewpoint strives to make science accessible to a larger community by focusing on allowing non-scientists to participate in the research process through citizen science project and by more effectively communicating research results to the broader public. The second viewpoint considers providing equitable knowledge access to everyone by not only considering access to journal publications but also to other objects in the research process such as data and code. The third viewpoint focuses on making both the research process and the communication of results more efficient. There are two aspects to this component which can be described as social and technical components. The social component is driven by the need to tackle complex problems that require collaboration and a team approach to science while the technical component focuses on creating tools, services and especially scientific platforms to make the scientific process more efficient. Lastly, the fourth viewpoint strives to develop new metrics to measure scientific contributions that go beyond the current metrics derived solely from scientific publications and to consider contributions from other research objects such as data, code or knowledge sharing through blogs and other social media communication mechanisms. 

Technological change is a factor in all four of these viewpoints on Open Science. New capabilities in compute, storage, methodologies, publication and sharing enable technologists to better serve as the primary drivers for Open Science by providing more efficient technological solutions. Sharing knowledge, information and other research objects such as data and code has become easier with new modalities of sharing available to researchers. In addition, technology is enabling the democratization of science at two levels. First, researchers are no longer constrained by lack of infrastructure resources needed to tackle difficult problems. Second, the Citizen Science projects now involve the public at different steps of the scientific process from collecting the data to analysis.

This presentations investigates the four described viewpoints on Open Science from the perspective of any large organization involved in scientific data stewardship and management. The presentation will list possible technological strategies that organizations may adopt to further align with all aspects of the Open Science movement. 

How to cite: Ramachandran, R., Bugbee, K., and Murphy, K.: The Role of Data Systems to Enable Open Science, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10349, https://doi.org/10.5194/egusphere-egu2020-10349, 2020.

EGU2020-13291 | Displays | ESSI3.2

Building the Foundations for Open Applied Earth System Science in ENVRI-FAIR

Ari Asmi, Daniela Franz, and Andreas Petzold

The EU project ENVRI-FAIR builds on the Environmental Research Infrastructure (ENVRI) community that includes principal European producers and providers of environmental research data and research services. The ENVRI community integrates the four subdomains of the Earth system - Atmosphere, Ocean, Solid Earth, and Biodiversity/Terrestrial Ecosystems. The environmental research infrastructures (RI) contributing to ENVRI-FAIR have developed comprehensive expertise in their fields of research, but their integration across the boundaries of applied subdomain science is still not fully developed. However, this integration is critical for improving our current understanding of the major challenges to our planet such as climate change and its impacts on the whole Earth system, our ability to respond and predict natural hazards, and our understanding and preventing of ecosystem loss.

 

ENVRI-FAIR targets the development and implementation of the technical framework and policy solutions to make subdomain boundaries irrelevant for environmental scientists, and prepare Earth system science for the new paradigm of Open Science. Harmonization and standardization activities across disciplines together with the implementation of joint data management and access structures at RI level facilitate the strategic coordination of observation systems required for truly interdisciplinary science. ENVRI-FAIR will finally create an open access hub for environmental data and services provided by the contributing environmental RIs, utilizing the European Open Science Cloud (EOSC) as Europe´s answer to the transition to Open Science.

 

How to cite: Asmi, A., Franz, D., and Petzold, A.: Building the Foundations for Open Applied Earth System Science in ENVRI-FAIR, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-13291, https://doi.org/10.5194/egusphere-egu2020-13291, 2020.

EGU2020-18570 | Displays | ESSI3.2

Sustainable FAIR Data management is challenging for RIs and it is challenging to solid Earth scientists

Massimo Cocco, Daniele Bailo, Keith G. Jeffery, Rossana Paciello, Valerio Vinciarelli, and Carmela Freda

Interoperability has long been an objective for research infrastructures dealing with research data to foster open access and open science. More recently, FAIR principles (Findability, Accessibility, Interoperability and Reusability) have been proposed. The FAIR principles are now reference criteria for promoting and evaluating openness of scientific data. FAIRness is considered a necessary target for research infrastructures in different scientific domains at European and global level.

Solid Earth RIs have long been committed to engage scientific communities involved in data collection, standardization and quality management as well as providing metadata and services for qualification, storage and accessibility. They are working to adopt FAIR principles, thus addressing the onerous task of turning these principles into practices. To make FAIR principles a reality in terms of service provision for data stewardship, some RI implementers in EPOS have proposed a FAIR-adoption process leveraging a four stage roadmap that reorganizes FAIR principles to better fit to scientists and RI implementers mindset. The roadmap considers FAIR principles as requirements in the software development life cycle, and reorganizes them into data, metadata, access services and use services. Both the implementation and the assessment of “FAIRness” level by means of questionnaire and metrics is made simple and closer to day-to-day scientists works.

FAIR data and service management is demanding, requiring resources and skills and more importantly it needs sustainable IT resources. For this reason, FAIR data management is challenging for many Research Infrastructures and data providers turning FAIR principles into reality through viable and sustainable practices. FAIR data management also includes implementing services to access data as well as to visualize, process, analyse and model them for generating new scientific products and discoveries.

FAIR data management is challenging to Earth scientists because it depends on their perception of finding, accessing and using data and scientific products: in other words, the perception of data sharing. The sustainability of FAIR data and service management is not limited to financial sustainability and funding; rather, it also includes legal, governance and technical issues that concern the scientific communities.

In this contribution, we present and discuss some of the main challenges that need to be urgently tackled in order to run and operate FAIR data services in the long-term, as also envisaged by the European Open Science Cloud initiative: a) sustainability of the IT solutions and resources to support practices for FAIR data management (i.e., PID usage and preservation, including costs for operating the associated IT services); b) re-usability, which on one hand requires clear and tested methods to manage heterogeneous metadata and provenance, while on the other hand can be considered a frontier research field; c) FAIR services provision, which presents many open questions related to the application of FAIR principles to services for data stewardship, and to services for the creation of data products taking in input FAIR raw data, for which is not clear how FAIRness compliancy of data products can be still guaranteed.

How to cite: Cocco, M., Bailo, D., Jeffery, K. G., Paciello, R., Vinciarelli, V., and Freda, C.: Sustainable FAIR Data management is challenging for RIs and it is challenging to solid Earth scientists, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-18570, https://doi.org/10.5194/egusphere-egu2020-18570, 2020.

EGU2020-13475 | Displays | ESSI3.2

Status and challenges of FAIR data principles for a long-term repository

Chad Trabant, Rick Benson, Rob Casey, Gillian Sharer, and Jerry Carter

The data center of the National Science Foundation’s Seismological Facility for the Advancement of Geoscience (SAGE), operated by IRIS Data Services, has evolved over the past 30 years to address the data accessibility needs of the scientific research community.  In recent years a broad call for adherence to FAIR data principles has prompted repositories to increased activity to support them. As these principles are well aligned with the needs of data users, many of the FAIR principles are already supported and actively promoted by IRIS.  Standardized metadata and data identifiers support findability. Open and standardized web services enable a high degree of accessibility. Interoperability is ensured by offering data in a combination of rich, domain-specific formats in addition to simple, text-based formats. The use of open, rich (domain-specific) format standards enables a high degree of reuse.  Further advancement towards these principles includes: an introduction and dissemination of DOIs for data; and an introduction of Linked Data support, via JSON-LD, allowing scientific data brokers, catalogers and generic search systems to discover data. Naturally, some challenges remain such as: the granularity and mechanisms needed for persistent IDs for data; the reality that metadata is updated with corrections (having implications for FAIR data principles); and the complexity of data licensing in a repository with data contributed from individual PIs, national observatories, and international collaborations.  In summary, IRIS Data Services is well along the path of adherence of FAIR data principles with more work to do. We will present the current status of these efforts and describe the key challenges that remain.

How to cite: Trabant, C., Benson, R., Casey, R., Sharer, G., and Carter, J.: Status and challenges of FAIR data principles for a long-term repository, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-13475, https://doi.org/10.5194/egusphere-egu2020-13475, 2020.

EGU2020-4169 | Displays | ESSI3.2 | Highlight

Practical data sharing with tangible rewards through publication in ESSD

David Carlson, Kirsten Elger, Jens Klump, Ge Peng, and Johannes Wagner

Envisioned as one solution to data challenges of the International Polar Year (2007-2008), the Copernicus data journal Earth System Science Data (ESSD) has developed into a useful rewarding data-sharing option for an unprecedented array of researchers. ESSD has published peer-reviewed descriptions of more than 500 easily- and freely-accessible data products, from more than 4000 data providers archiving their products at more than 100 data centres. ESSD processes and products provide a useful step toward Findable, Accessible, Interoperable, Reusable (FAIR) expectations but also a caution about implementation.

For ESSD, findable and accessible derive from the journal’s consistent mandate for open access coupled with useful title, author, abstract and full-text search functions on the publisher’s website (which lead users quickly to data sources) and excellent (but varied) topical, geographic, textual and chronologic search functions of host data centres. Due to an intense focus on data reliability and reusability during peer review of data descriptions, ESSD-referenced data products achieve very high standards of accessibility and reusability. ESSD experience over an amazing variety of data products suggests that ‘interoperability’ depends on the intended use of the data and experience of users. Many ESSD-published products adopt a shared grid format compatible with climate models. Other ESSD products, for example in ocean biogeochemistry or land agricultural cultivation, adopt or even declare interoperable terminologies and new standards for expression of uncertainty. Very often an ESSD publication explicitly describes data collections intended to enhance interoperability within a specific user community, through a new database for example. For a journal that prides itself on diversity and quality of its products published in service to a very broad array of oceanographic, terrestrial, atmospheric, cryospheric and global research communities, the concept of interoperability remains elusive.

Implementing open access to data has proven difficult. FAIR principles give us guidelines on the technical implementation of open data. However, ESSD’s experience (involving publisher, data providers, reviewers and data centres) in achieving very high impact factors (we consider these metrics as indicators of use and reuse of via ESSD published data products) can serve as a guide to the pursuit of the FAIR principles. For most researchers, data handling remains confusing and unrewarding. Data centres vary widely in capability, resources and approaches; even the ‘best’ (busiest) may change policies or practices according to internal needs independent of external standards or may - unexpectedly - go out of service. Software and computation resources grow and change rapidly, with simultaneous advances in open and proprietary tools. National mandates often conflict with international standards. Although we contend that ESSD represents one sterling example of promoting findable, accessible, interoperable and reusable data of high quality, we caution that those objectives remain a nebulous goal for any institution - in our case a data journal - whose measure of success remains a useful service to a broad research community.

How to cite: Carlson, D., Elger, K., Klump, J., Peng, G., and Wagner, J.: Practical data sharing with tangible rewards through publication in ESSD, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-4169, https://doi.org/10.5194/egusphere-egu2020-4169, 2020.

EGU2020-8463 | Displays | ESSI3.2

AtMoDat: Improving the reusability of ATmospheric MOdel DATa with DataCite DOIs paving the path towards FAIR data

Daniel Neumann, Anette Ganske, Vivien Voss, Angelina Kraft, Heinke Höck, Karsten Peters, Johannes Quaas, Heinke Schluenzen, and Hannes Thiemann

The generation of high quality research data is expensive. The FAIR principles were established to foster the reuse of such data for the benefit of the scientific community and beyond. Publishing research data with metadata and DataCite DOIs in public repositories makes them findable and accessible (FA of FAIR). However, DOIs and basic metadata do not guarantee the data are actually reusable without discipline-specific knowledge: if data are saved in proprietary or undocumented file formats, if detailed discipline-specific metadata are missing and if quality information on the data and metadata are not provided. In this contribution, we present ongoing work in the AtMoDat project, -a consortium of atmospheric scientists and infrastructure providers, which aims on improving the reusability of atmospheric model data.
  
Consistent standards are necessary to simplify the reuse of research data. Although standardization of file structure and metadata is well established for some subdomains of the earth system modeling community – e.g. CMIP –, several other subdomains are lacking such standardization. Hence, scientists from the Universities of Hamburg and Leipzig and infrastructure operators cooperate in the AtMoDat project in order to advance standardization for model output files in specific subdomains of the atmospheric modeling community. Starting from the demanding CMIP6 standard, the aim is to establish an easy-to-use standard that is at least compliant with the Climate and Forecast (CF) conventions. In parallel, an existing netCDF file convention checker is extended to check for the new standards. This enhanced checker is designed to support the creation of compliant files and thus lower the hurdle for data producers to comply with the new standard. The transfer of this approach to further sub-disciplines of the earth system modeling community will be supported by a best-practice guide and other documentation. A showcase of a standard for the urban atmospheric modeling community will be presented in this session. The standard is based on CF Conventions and adapts several global attributes and controlled vocabularies from the well-established CMIP6 standard.
  
Additionally, the AtMoDat project aims on introducing a generic quality indicator into the DataCite metadata schema to foster further reuse of data. This quality indicator should require a discipline-specific implementation of a quality standard linked to the indicator. We will present the concept of the generic quality indicator in general and in the context of urban atmospheric modeling data. 

How to cite: Neumann, D., Ganske, A., Voss, V., Kraft, A., Höck, H., Peters, K., Quaas, J., Schluenzen, H., and Thiemann, H.: AtMoDat: Improving the reusability of ATmospheric MOdel DATa with DataCite DOIs paving the path towards FAIR data, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-8463, https://doi.org/10.5194/egusphere-egu2020-8463, 2020.

We kill people based on metadata.” (Gen. Michael V. Hayden, 2014) [1]

Over the past fifteen years, a number of persistent identifier (PID) systems have been built to help identify the stakeholders and their outputs in the research process and scholarly communication. Transparency is a fundamental principle of science, but this principle of transparency can be in conflict with the principles of the right to privacy. The development of Knowledge Graphs (KG), however, introduces completely new, and possibly unintended uses of publication metadata that require critical discussion. In particular, when personal data, as is linked with ORCID identifiers, are used and linked with research artefacts and personal information, KGs allow identifying personal as well as collaborative networks of individuals. This ability to analyse KGs may be used in a harmful way. It is a sad fact that in some countries, personal relationships or research in certain subject areas can lead to discrimination, persecution or prison. We must, therefore, become aware of the risks and responsibilities that come with networked data in KGs. 

The trustworthiness of PID systems and KGs has so far been discussed in technical and organisational terms. The inclusion of personal data requires a new definition of ‘trust’ in the context of PID systems and Knowledge Graphs which should also include ethical aspects and consider the principles of the General Data Protection Regulation.

New, trustworthy technological approaches are required to ensure proper maintenance of privacy. As a prerequisite, the level of interoperability between PID needs to be enhanced. Further, new methods and protocols need to be defined which enable secure and prompt cascading update or delete actions of personal data between PID systems as well as knowledge graphs. 

Finally, new trustworthiness criteria must be defined which allow the identification of trusted clients for the exchange of personal data instead of the currently practised open data policy which can be in conflict with legislation protecting privacy and personal data.

[1] https://www.nybooks.com/daily/2014/05/10/we-kill-people-based-metadata/

How to cite: Huber, R. and Klump, J.: The Dark Side of the Knowledge Graph - How Can We Make Knowledge Graphs Trustworthy?, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-13071, https://doi.org/10.5194/egusphere-egu2020-13071, 2020.

EGU2020-9207 | Displays | ESSI3.2

FAIR access to soil and agricultural research data: The BonaRes Data Repository

Carsten Hoffmann, Xenia Specka, Nikolai Svoboda, and Uwe Heinrich

In the frame of the joint research project BonaRes (“Soil as a sustainable resource for the bioeconomy”, bonares.de) a data repository was set-up to upload, manage, and provide soil-, agricultural- and accompanying environmental research data. Research data are stored consistent and based on open and widely used standards within the repository over the long-term. Data visibility as well as its accessibility, reusability, and interoperability with international data infrastructures is fostered by rich description with standardized metadata and DOI allocation.

The specially developed metadata schema combines all elements from DataCite and INSPIRE. Metadata are entered by an online metadata editor and include thesauri (AGROVOC, GEMET), use licenses (Creative Commons: CC-BY for research data, CC-0 for metadata), lineage elements and data access points (geodata portal with OGC services). The repository meets thus the needs of the FAIR principles for research data.

In this paper we present and discuss functionalities and elements of the BonaRes Data Repository, show a typical data workflow from data owner to data (re-)user, demonstrate data accessibility and citeability, and introduce to central data policy elements, e.g. embargo times and licenses. Finally we provide an outlook of the planned integration and linkage with other soil-agricultural repositories within a government-funded comprehensive national research data infrastructure NFDI (NFDI4Agri, Germany).

How to cite: Hoffmann, C., Specka, X., Svoboda, N., and Heinrich, U.: FAIR access to soil and agricultural research data: The BonaRes Data Repository , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-9207, https://doi.org/10.5194/egusphere-egu2020-9207, 2020.

EGU2020-10057 | Displays | ESSI3.2

Putting the INGV data policy into practice: considerations after the first-year experience

Mario Locati, Francesco Mariano Mele, Vincenzo Romano, Placido Montalto, Valentino Lauciani, Roberto Vallone, Giuseppe Puglisi, Roberto Basili, Anna Grazia Chiodetti, Antonella Cianchi, Massimiliano Drudi, Carmela Freda, Maurizio Pignone, and Agata Sangianantoni

The Istituto Nazionale di Geofisica e Vulcanologia (INGV) has a long tradition of sharing scientific data, well before the Open Science paradigm was conceived. In the last thirty years, a great deal of geophysical data generated by research projects and monitoring activities were published on the Internet, though encoded in multiple formats and made accessible using various technologies.

To organise such a complex scenario, a working group (PoliDat) for implementing an institutional data policy operated from 2015 to 2018. PoliDat published three documents: in 2016, the data policy principles; in 2017, the rules for scientific publications; in 2018, the rules for scientific data management. These documents are available online in Italian, and English (https://data.ingv.it/docs/).

According to a preliminary data survey performed between 2016 and 2017, nearly 300 different types of INGV-owned data were identified. In the survey, the compilers were asked to declare all the available scientific data differentiating by the level of intellectual contribution: level 0 identifies raw data generated by fully automated procedures, level 1 identifies data products generated by semi-automated procedures, level 2 is related to data resulting from scientific investigations, and level 3 is associated to integrated data resulting from complex analysis.

A Data Management Office (DMO) was established in November 2018 to put the data policy into practice. DMO first goal was to design and establish a Data Registry aimed to satisfy the extremely differentiated requirements of both internal and external users, either at scientific or managerial levels. The Data Registry is defined as a metadata catalogue, i.e., a container of data descriptions, not the data themselves. In addition, the DMO supports other activities dealing with scientific data, such as checking contracts, providing advice to the legal office in case of litigations, interacting with the INGV Data Transparency Office, and in more general terms, supporting the adoption of the Open Science principles.

An extensive set of metadata has been identified to accommodate multiple metadata standards. At first, a preliminary set of metadata describing each dataset is compiled by the authors using a web-based interface, then the metadata are validated by the DMO, and finally, a DataCite DOI is minted for each dataset, if not already present. The Data Registry is publicly accessible via a dedicated web portal (https://data.ingv.it). A pilot phase aimed to test the Data Registry was carried out in 2019 and involved a limited number of contributors. To this aim, a top-priority data subset was identified according to the relevance of the data within the mission of INGV and the completeness of already available information. The Directors of the Departments of Earthquakes, Volcanoes, and Environment supervised the selection of the data subset.

The pilot phase helped to test and to adjust decisions made and procedures adopted during the planning phase, and allowed us to fine-tune the tools for the data management. During the next year, the Data Registry will enter its production phase and will be open to contributions from all INGV employees.

How to cite: Locati, M., Mele, F. M., Romano, V., Montalto, P., Lauciani, V., Vallone, R., Puglisi, G., Basili, R., Chiodetti, A. G., Cianchi, A., Drudi, M., Freda, C., Pignone, M., and Sangianantoni, A.: Putting the INGV data policy into practice: considerations after the first-year experience, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10057, https://doi.org/10.5194/egusphere-egu2020-10057, 2020.

EGU2020-12001 | Displays | ESSI3.2

Building a sustainable international research data infrastructure - Lessons learnt in the IGSN 2040 project

Jens Klump, Kerstin Lehnert, Lesley Wyborn, and Sarah Ramdeen and the IGSN 2040 Steering Committee

Like many research data infrastructures, the IGSN Global Sample Number started as a research project. The rapid uptake of IGSN in the last five years as well as the expansion of diversity of use cases, in particular beyond the geosciences, mean that IGSN has outgrown its current structure as implemented in 2011, and the technology is in urgent need of a refresh. The expected exponential growth of the operation requires the IGSN Implementation Organization (IGSN e.V.) to better align the organisation and technical architecture.

In 2018, the Alfred P. Sloan Foundation awarded a grant to redesign and improve the IGSN, to “achieve a trustworthy, stable, and adaptable architecture for the IGSN as a persistent unique identifier for material samples, both technically and organizationally, that attracts, facilitates, and satisfies participation within and beyond the Geosciences, that will be a reliable component of the evolving research data ecosystem, and that is recognized as a trusted partner by data infrastructure providers and the science community alike.” 

IGSN is not the first PID service provider to make the transition from project to product and there are lessons to be learnt from other PID services. To this end, the project invited experts in the field of research data infrastructures and facilitated workshops to develop an organisational and technological strategy and roadmap towards long-term sustainability of the IGSN. 

To be sustainable, a research data infrastructure like IGSN has to have a clearly defined service or product, underpinned by a scalable business model and technical system. We used the Lean Canvas to define the IGSN services. The resulting definition of service components helped us define IGSN user communities, cost structures and potential income streams. The workshop discussions had already highlighted the conflicting aims between offering a comprehensive service and keeping services lean to reduce their development and operational costs. Building on the Lean Canvas, the definition of a minimum viable product helped to define the role of the IGSN e.V. and the roles for actors offering value-added services based in IGSN outside of the core operation.

How to cite: Klump, J., Lehnert, K., Wyborn, L., and Ramdeen, S. and the IGSN 2040 Steering Committee: Building a sustainable international research data infrastructure - Lessons learnt in the IGSN 2040 project, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-12001, https://doi.org/10.5194/egusphere-egu2020-12001, 2020.

The U.S. National Oceanic and Atmospheric Administration’s (NOAA) National Centers for Environmental Information (NCEI) stewards one of the world’s largest and most diverse collections of environmental data. The longevity of this organization has led to a great diversity of digital and physical data in multiple formats and media. NCEI strives to develop and implement processes, guidance, tools and services to facilitate the creation and preservation of independently understandable data that is open and FAIR (Findable, Accessible, Interoperable, Reusable).

The Foundations for Evidence-Based Policymaking Act (Evidence Act) (PL 115-435), which includes the Open, Public, Electronic, and Necessary Government Data (OPEN) Act (2019), requires all U.S. Federal data to be shared openly. Meeting the requirements of the Evidence Act, FAIR and OPEN has many challenges. One challenge is the requirements are not static, they evolve over time based on the data lifecycle, changes within the designated user community (ex. user needs and skills) and transition to new technology such as cloud. Consistently measuring and documenting compliance is another challenge. 

NCEI is tackling the challenges of ensuring our data holdings meet the requirements of OPEN, FAIR and the Evidence Act in multiple areas through the consistent implementation of community best practices, knowledge of current and potential user communities, and elbow grease.

This presentation will focus on NCEI’s experiences with taking data beyond independently understandable to meeting the Evidence Act, FAIR, and OPEN.

 

How to cite: Ritchey, N.: NOAA/NCEI‘s Challenges in Meeting New Open Data Requirements, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-12419, https://doi.org/10.5194/egusphere-egu2020-12419, 2020.

EGU2020-13285 | Displays | ESSI3.2

The challenging research data management support in the interdisciplinary cluster of excellence CliCCS

Ivonne Anders, Andrea Lammert, and Karsten Peters

In 2019 the Universität Hamburg was awarded funding for 4 clusters of excellence in the Excellence Strategy of the Federal and State Governments. One of these clusters funded by the German Research Foundation (DFG) is “CliCCS – Climate, Climatic Change, and Society”. The scientific objectives of CliCCS are achieved within three intertwined research themes, on the Sensitivity and Variability in the Climate System, Climate-Related Dynamics of Social Systems, and Sustainable Adaption Scenarios. Each theme is structured into multiple projects addressing sub-objectives of each theme. More than 200 researchers the Hamburg University, but also other connected research centers and partner institutions  are involved and almost all of them are using but mainly produce new data.

Research data is produced with great effort and is therefore one of the valuable assets of scientific institutions. It is part of good scientific practice to make research data freely accessible and available in the long term as a transparent basis for scientific statements.

Within the interdisciplinary cluster of excellence CliCCS, the type of research data is very different. The data range from results from physically dynamic ocean and atmosphere models, to measurement data in the coastal area, to survey and interview data in the field of sociology. 

The German Climate Computing Center (DKRZ) is taking care on the Research Data Management and supports the researchers in creating data management plans, keeping naming conventions or simply finding the optimal repository to publish the data. The goal is to store and long-term archiving of the data, but also to ensure the quality of the data and thus to facilitate potential reuse.

How to cite: Anders, I., Lammert, A., and Peters, K.: The challenging research data management support in the interdisciplinary cluster of excellence CliCCS, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-13285, https://doi.org/10.5194/egusphere-egu2020-13285, 2020.

EGU2020-15358 | Displays | ESSI3.2

EPOS Multi-scale laboratories Data Services & Trans-national access program

Richard Wessels and Otto Lange and the EPOS TCS Multi-scale laboratories Team

EPOS (European Plate Observing System) is an ESFRI Landmark and European Research Infrastructure Consortium (ERIC). The EPOS Thematic Core Service Multi-scale laboratories (TCS MSL) represents a community of European solid Earth sciences laboratories including high temperature and pressure experimental facilities, electron microscopy, micro-beam analysis, analogue tectonic and geodynamic modelling, paleomagnetism, and analytical laboratories.

Participants and collaborating laboratories from Belgium, Bulgaria, France, Germany, Italy, Norway, Portugal, Spain, Switzerland, The Netherlands, and the UK are already organized in the TCS MSL. Unaffiliated European solid Earth sciences laboratories are welcome and encouraged to join the growing TCS MSL community. Members of the TCS MSL are also represented in the EPOS Sustainability Phase (SP).

Laboratory facilities are an integral part of Earth science research. The diversity of methods employed in such infrastructures reflects the multi-scale nature of the Earth system and is essential for the understanding of its evolution, for the assessment of geo-hazards, and for the sustainable exploitation of geo-resources.

Although experimental data from these laboratories often provide the backbone for scientific publications, they are often only available as supplementary information to research articles. As a result, much of the collected data remains unpublished, inaccessible, and often not preserved for the long term.  

The TCS MSL is committed to make Earth science laboratory data Findable, Accessible, Interoperable, and Reusable (FAIR). For this purpose the TCS MSL has developed an online portal that brings together DOI-referenced data publications from research data repositories related to the TCS MSL context (https://epos-msl.uu.nl/).

In addition, the TCS MSL has developed a Trans-national access (TNA) program that allows researchers and research teams to apply for physical or remote access to the participating EPOS MSL laboratories. Three pilot calls were launched in 2017, 2018, and 2019, with a fourth call scheduled for 2020. The pilot calls were used to develop and refine the EPOS wide TNA principles and to initialize an EPOS brokering service, where information on each facility offering access will be available for the user and where calls for proposals are advertised. Access to the participating laboratories is currently supported by national funding or in-kind contribution. Based on the EPOS Data policy & TNA General Principles, access to the laboratories is regulated by common rules and a transparent policy, including procedures and mechanisms for application, negotiation, proposal evaluation, user feedback, use of laboratory facilities and data curation.

Access to EPOS Multi-scale laboratories is a unique opportunity to create new synergy, collaboration and innovation, in a framework of trans-national access rules.

An example of such a successful collaboration is between MagIC and EPOS TCS MSL. This collaboration will allow paleomagnetic data and metadata to be exchanged between EPOS and the MagIC (https://www.earthref.org/MagIC) database. Such collaborations are beneficial to all parties involved and support the harmonization and integration of data at a global scale.

How to cite: Wessels, R. and Lange, O. and the EPOS TCS Multi-scale laboratories Team: EPOS Multi-scale laboratories Data Services & Trans-national access program, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-15358, https://doi.org/10.5194/egusphere-egu2020-15358, 2020.

EGU2020-18398 | Displays | ESSI3.2

Towards FAIR GNSS data: challenges and open problems

Anna Miglio, Carine Bruyninx, Andras Fabian, Juliette Legrand, Eric Pottiaux, Inge Van Nieuwerburgh, and Dries Moreels

Nowadays, we measure positions on Earth’s surface thanks to Global Navigation Satellite Systems (GNSS) e.g. GPS, GLONASS, and Galileo. Activities such as navigation, mapping, and surveying rely on permanent GNSS tracking stations located all over the world.
The Royal Observatory of Belgium (ROB) maintains and operates a repository containing data from hundreds of GNSS stations belonging to the European GNSS networks (e.g. EUREF, Bruyninx et al., 2019). 

ROB’s repository contains GNSS data that are openly available and rigorously curated. The curation data include detailed GNSS station descriptions (e.g. location, pictures, and data author) as well as quality indicators of the GNSS observations.

However, funders and research policy makers are progressively asking for data to be made Findable, Accessible, Interoperable, and Reusable (FAIR) and therefore to increase data transparency, discoverability, interoperability, and accessibility.

In particular, within the GNSS community, there is no shared agreement yet on the need for making data FAIR. Therefore, turning GNSS data FAIR presents many challenges and, although FAIR data has been included in EUREF’s strategic plan, no practical roadmap has been implemented so far. We will illustrate the specific difficulties and the need for an open discussion including also other communities working on FAIR data.

For example, making GNSS data easily findable and accessible would require to attribute persistent identifiers to the data. It is worth noting that the International GNSS Service (IGS) is only now beginning to consider the attribution of DOIs (Digital Object Identifiers) to GNSS data, mainly to allow data citation and acknowledgement of data providers. Some individual GNSS data repositories are using DOIs (such as UNAVCO, USA).  Are DOIs the only available option or are there more suitable types of URIs (Uniform Resource Identifiers) to consider?

The GNSS community would greatly benefit from FAIR data practices, as at present, (almost) no licenses have been attributed to GNSS data, data duplication is still an issue, historical provenance information is not available because of data manipulations in data centres, citation of the data providers is far from the rule, etc.

To move further along the path towards FAIR GNSS data, one would need to implement standardised metadata models to ensure data interoperability, but, as several metadata standards are already in use in various scientific disciplines, which one to choose?

Then, to facilitate the reuse (and long-term preservation) of GNSS data, all metadata should be properly linked to the corresponding data and additional metadata, such as provenance and license information. The latter is a good example up for discussion: despite the fact that ‘CC BY’ license is already assigned to some of the GNSS data, other licenses might need to be enabled.

 

Bruyninx C., Legrand J., Fabian A., Pottiaux E. (2019) “GNSS Metadata and Data Validation in the EUREF Permanent Network”. GPS Sol., 23(4), https://doi: 10.1007/s10291-019-0880-9           

How to cite: Miglio, A., Bruyninx, C., Fabian, A., Legrand, J., Pottiaux, E., Van Nieuwerburgh, I., and Moreels, D.: Towards FAIR GNSS data: challenges and open problems, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-18398, https://doi.org/10.5194/egusphere-egu2020-18398, 2020.

The European Plate Observing System EPOS is the single coordinated framework for solid Earth science data, products and services on a European level. As one of the science domain structures within EPOS, EPOS Seismology brings together the three large European infrastructures in seismology, ORFEUS for seismic waveform data & related products, EMSC for parametric earthquake information, and EFEHR for seismic hazard and risk information. Across these three pillars, EPOS Seismology provides services to store, discover and access seismological data and products from raw waveforms to elaborated hazard and risk assessment. The initial data and product contributions come from academic institutions, government offices, or (groups of) individuals, and are generated as part of academic research as well as within officially mandated monitoring or assessment activities. Further products are then elaborated based on those initial inputs by small groups or specific institutions, usually mandated for these tasks by 'the community'. This landscape of coordinated data and products services has evolved in a largely bottom-up fashion over the last decades, and led to a framework of generally free and open data, products and services, for which formats, standards and specifications continue to be emerging and evolving from within the community under a rather loose global coordination.

The advent of FAIR and Open concepts and the push towards their (formalized) implementation from various directions has stirred up this traditional setting. While the obvious benefits of FAIR and Open have been readily accepted in the community, issues and challenges are surfacing in their practical application. How can we ensure (or enforce) appropriate attribution of all involved actors through the whole data life-cycle, and what actually is appropriate? How do we ensure end-to-end reproducibility and where do we draw the practical limits to it? What approach towards licensing should we take for which products and services, and what are the legal / downstream implications? How do we best use identifiers and which ones actually serve the intended purpose? And finally, how do we ensure that effort is rewarded, that best practices are followed, and that misbehavior is identified and potentially sanctioned?

In this contribution we present how the community organization behind EPOS Seismology is discussing these issues, what approaches towards addressing them are being considered, and where we today see the major hurdles on the way towards a truly fair FAIR and Open environment.

How to cite: Haslinger, F. and Consortium, E. S.: Staying fair while being FAIR - challenges with FAIR and Open data and services for distributed community services in Seismology, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-18847, https://doi.org/10.5194/egusphere-egu2020-18847, 2020.

Researchers are increasingly expected by funders and journals to make their data available for reuse as a condition of publication. At Springer Nature, we feel that publishers must support researchers in meeting these additional requirements, and must recognise the distinct opportunities data holds as a research output. Here, we outline some of the varied ways that Springer Nature supports research data sharing and report on key outcomes.

Our staff and journals are closely involved with community-led efforts, like the Enabling FAIR Data initiative and the COPDESS 2014 Statement of Commitment 1-4. The Enabling FAIR Data initiative, which was endorsed in January 2019 by Nature and Scientific Data, and by Nature Geoscience in January 2020, establishes a clear expectation that Earth and environmental sciences data should be deposited in FAIR5 Data-aligned community repositories, when available (and in general purpose repositories otherwise). In support of this endorsement, Nature and Nature Geoscience require authors to share and deposit their Earth and environmental science data, and Scientific Data has committed to progressively updating its list of recommended data repositories to help authors comply with this mandate.

In addition, we offer a range of research data services, with various levels of support available to researchers in terms of data curation, expert guidance on repositories and linking research data and publications.

We appreciate that researchers face potentially challenging requirements in terms of the ‘what’, ‘where’ and ‘how’ of sharing research data. This can be particularly difficult for researchers to negotiate given that huge diversity of policies across different journals. We have therefore developed a series of standardised data policies, which have now been adopted by more than 1,600 Springer Nature journals. 

We believe that these initiatives make important strides in challenging the current replication crisis and addressing the economic6 and societal consequences of data unavailability. They also offer an opportunity to drive change in how academic credit is measured, through the recognition of a wider range of research outputs than articles and their citations alone. As signatories of the San Francisco Declaration on Research Assessment7, Nature Research is committed to improving the methods of evaluating scholarly research. Research data in this context offers new mechanisms to measure the impact of all research outputs. To this end, Springer Nature supports the publication of peer-reviewed data papers through journals like Scientific Data. Analysis of citation patterns demonstrate that data papers can be well-cited, and offer a viable way for researchers to receive credit for data sharing through traditional citation metrics. Springer Nature is also working hard to improve support for direct data citation. In 2018 a data citation roadmap developed by the Publishers Early Adopters Expert Group was published in Scientific Data8, outlining practical steps for publishers to work with data citations and associated benefits in transparency and credit for researchers. Using examples from this roadmap, its implementation and supporting services, we outline how a FAIR-led data approach from publishers can help researchers in the Earth and environmental sciences to capitalise on new expectations around data sharing.

__

  1. https://doi.org/10.1038/d41586-019-00075-3
  2. https://doi.org/10.1038/s41561-019-0506-4
  3. https://copdess.org/enabling-fair-data-project/commitment-statement-in-the-earth-space-and-environmental-sciences/
  4. https://copdess.org/statement-of-commitment/
  5. https://www.force11.org/group/fairgroup/fairprinciples
  6. https://op.europa.eu/en/publication-detail/-/publication/d375368c-1a0a-11e9-8d04-01aa75ed71a1
  7. https://sfdora.org/read/
  8. https://doi.org/10.1038/sdata.2018.259

How to cite: Smith, G. and Hufton, A.: Beyond article publishing - support and opportunities for researchers in FAIR data sharing, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-17073, https://doi.org/10.5194/egusphere-egu2020-17073, 2020.

Many research papers include results based on data that is analyzed using a computational analysis implemented, e.g., in R. Publishing these materials is perceived as being good scientific practice and essential for the scientific progress. For these reasons, organizations that provide funding increasingly demand applicants to outline data and software management plans as part of their research proposals. Furthermore, the author guidelines for paper submissions more often include a section on data availability, and some reviewers reject submissions that do not contain the underlying materials without good reason [1]. This trend towards open and reproducible research puts some pressure on authors to make the source code and data used to produce the computational results in their scientific papers accessible. Despite these developments, publishing reproducible manuscripts is difficult and time-consuming. Moreover, simply providing access to code scripts and data files does not guarantee computational reproducibility [2]. Fortunately, several projects work on applications to assist authors in publishing executable analyses alongside papers considering the requirements of the aforementioned stakeholders. The chief contribution of this poster is a review of software solutions designed to solve the problem of publishing executable computational research results [3]. We compare the applications with respect to aspects that are relevant for the involved stakeholders, e.g., provided features and deployment options, and also critically discuss trends and limitations. This comparison can be used as a decision support by publishers who want to comply with reproducibility principles, editors and program committees who would like to add reproducibility requirements to the author guidelines, applicants of research proposals in the process of creating data and software management plans, and authors looking for ways to distribute their work in a verifiable and reusable manner. We also include properties related to preservation relevant for librarians dealing with long-term accessibility of research materials.

 

References:

1) Stark, P. B. (2018). Before reproducibility must come preproducibility. Nature, 557(7706), 613-614.

2) Konkol, M., Kray, C., & Pfeiffer, M. (2019). Computational reproducibility in geoscientific papers: Insights from a series of studies with geoscientists and a reproduction study. International Journal of Geographical Information Science, 33(2), 408-429.

3) Konkol, M., Nüst, D., & Goulier, L. (2020). Publishing computational research - A review of infrastructures for reproducible and transparent scholarly communication. arXiv preprint arXiv:2001.00484.

How to cite: Konkol, M., Nüst, D., and Goulier, L.: Publishing computational research – A review of infrastructures for reproducible and transparent scholarly communication, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-17013, https://doi.org/10.5194/egusphere-egu2020-17013, 2020.

EGU2020-22423 | Displays | ESSI3.2

Geographical scientific publications in ORBi, the ULiège institutional repository: analysis of the socio-economic influencing factors of downloads

Simona Stirbu

EGU2020-19682 | Displays | ESSI3.2

Designing services that are more than FAIR with User eXperience (UX) techniques

Carl Watson, Paulius Tvaranavicius, and Rehan Kaleem

EGU2020-16456 | Displays | ESSI3.2

Data download speed test for CMIP6 model output: preliminary results

Yufu Liu, Zhehao Ren, Karen K.Y. Chan, and Yuqi Bai

The World Climate Research Programme (WCRP) facilitates analysis and prediction of Earth system change for use in a range of practical applications of direct relevance, benefit and value to society. WCRP initialized the Coupled Model Intercomparison Project (CMIP) in 1995. The aim of CMIP is to better understand past, present and future climate changes arising from natural, unforced variability or in response to changes in radiative forcing in a multi-model context.

The climate model output data that are being produced during this sixth phase of CMIP (CMIP6) is expected to be 40~60 PB. It is still not very clear whether researchers worldwide may experience a big problem when downloading such a huge volume of data. This work addressed this issue by performing data download speed test for all the CMIP6 data nodes.

A Google Chrome-based data download speed test website (http://speedtest.theropod.tk) was implemented. It leverages the Allow CORS: Access-Control-Allow-Origin extension to access to each CMIP6 data node. This test consists of four steps: Installing and enabling Allow CORS extension in Chrome, performing data download speed test for all the CMIP6 data nodes, presenting the test results, and uninstalling the extension. The speed test is performed by downloading a certain chunk of model output data file from the thredds data server of each data node.

Researchers from 11 countries have performed this test in 24 cities against all the 26 CMIP6 data nodes. The fastest transfer speed was 124MB/s, and the slowest were 0 MB/s because of connect timeout. Data transfer speed in developed countries (United States, Netherland, Japan, Canada, Great Britain) is significantly faster than that in developing countries (China, India, Russia, Pakistan). In developed countries the data transfer mean speed is roughly 80Mb/s, equal to the median US residential broadband speed provided by cable or fiber(FCC Measuring Fixed Broadband - Eighth Report, but in developing countries the mean transfer speed is usually much slower, roughly 9Mb/s. Data transfer speed was significantly faster when the data nodes and test sites were both at developed countries, for example, downloading data from IPSL, DKRZ or GFDL at Wolvercote, UK.

Although further test are definitely needed, this preliminary result clearly show that the actual data download speed varies dramatically in different countries, and for different data node. This suggests that ensuring smooth access to CMIP6 data is still challenging.

How to cite: Liu, Y., Ren, Z., Chan, K. K. Y., and Bai, Y.: Data download speed test for CMIP6 model output: preliminary results, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-16456, https://doi.org/10.5194/egusphere-egu2020-16456, 2020.

ESSI3.5 – Breaking down the silos: enabling Open and convergent research and e-infrastructures to answer global challenges

EGU2020-13497 | Displays | ESSI3.5 | Highlight

e-infrastructures and natural hazards. The Center of Excellence for Exascale in Solid Earth (ChEESE)

Arnau Folch, Josep de la Puente, Laura Sandri, Benedikt Halldórsson, Andreas Fichtner, Jose Gracia, Piero Lanucara, Michael Bader, Alice-Agnes Gabriel, Jorge Macías, Finn Lovholt, Alexandre Fournier, Vadim Monteiller, and Soline Laforet

The Center of Excellence for Exascale in Solid Earth (ChEESE; https://cheese-coe.eu) is promoting the use of upcoming Exascale and extreme performance computing capabilities in the area of Solid Earth by harnessing institutions in charge of operational monitoring networks, tier-0 supercomputing centers, academia, hardware developers and third parties from SMEs, Industry and public-governance. The scientific challenging ambition is to prepare 10 European open-source flagship codes to solve Exascale problems on computational seismology, magnetohydrodynamics, physical volcanology, tsunamis, and data analysis. Preparation to Exascale is considering code inter-kernel aspects of simulation workflows like data management and sharing following the FAIR principles, I/O, post-process and visualization. The project is articulated around 12 Pilot Demonstrators (PDs) in which flagship codes are used for near real-time seismic simulations and full-wave inversion, ensemble-based volcanic ash dispersal forecasts, faster than real-time tsunami simulations and physics-based hazard assessments for earthquakes, volcanoes and tsunamis. This is a first step towards enabling of operational e-services requiring of extreme HPC on urgent computing, early warning forecast of geohazards, hazard assessment and data analytics. Additionally, and in collaboration with the European Plate Observing System (EPOS), ChEESE will promote and facilitate the integration of HPC services to widen the access to codes and fostering transfer of know-how to Solid Earth user communities. In this regard, the project aims at acting as a hub to foster HPC across the Solid Earth Community and related stakeholders and to provide specialized training on services and capacity building measures.

How to cite: Folch, A., de la Puente, J., Sandri, L., Halldórsson, B., Fichtner, A., Gracia, J., Lanucara, P., Bader, M., Gabriel, A.-A., Macías, J., Lovholt, F., Fournier, A., Monteiller, V., and Laforet, S.: e-infrastructures and natural hazards. The Center of Excellence for Exascale in Solid Earth (ChEESE), EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-13497, https://doi.org/10.5194/egusphere-egu2020-13497, 2020.

EGU2020-18279 | Displays | ESSI3.5

Building a Multimodal topographic dataset for flood hazard modelling and other geoscience applications

Dietmar Backes, Norman Teferle, and Guy Schumann

In remote sensing, benchmark and CalVal datasets are routinely provided by learned societies and professional organisations such as the Committee for Earth Observation Satellites (CEOS), European Spatial Data Research (EuroSDR) and International Societies for Photogrammetry and Remote Sensing (ISPRS). Initiatives are often created to serve specific research needs. Many valuable datasets disappear after the conclusion of such projects even though the original data or the results of these investigations might have significant value to other scientific communities that might not have been aware of the projects. Initiatives such as FAIR data (Findable, Accessible, Interoperable, Re-usable) or the European Open Science Cloud (EOSC) aim to overcome this situation and preserve scientific data sets for wider scientific communities.

Motivated by increased public interest following the emerging effects of climate change on local weather and rainfall patterns, the field of urban flood hazard modelling has developped rapidly in recent years. New sensors and platforms are able to provide high-resolution topographic data from highly agile Earth Observation (EO) satellites to small low-altitude drones or terrestrial mobile mapping systems. The question arises as to which type of topographic information is most suitable for realistic and accurate urban flood modelling and are current methodologies able to exploit the increased level of detail contained in such data? 

In the presented project, we aim to assemble a topographic research data repository to provide multimodal 3D datasets to optimise and benchmark urban flood modelling. The test site chosen is located in the South of Luxembourg in the municipality of Dudelange, which provides a typical European landscape with rolling hills, urban, agricultural but also re naturalised areas over a local stream catchment. The region has been affected by flash flooding following heavy rainfall events in the past.

The assembled datasets were derived from LiDAR and photogrammetric methodologies and consist of topographic surface representation ranging from medium resolutions DEMs with 10m GSD to highly dense point clouds derived from drone photogrammetry. The data were collected from spaceborne, traditional airborne, low-altitude drone as well as terrestrial platforms. The datasets are well documented with adequate meta information to describe their origin, currency, quality and accuracy. Raw data is provided where intellectual property rights permit the dissemination. Terrain models and point clouds are generally cleaned for blunders using standard methods and manual inspection. However, elaborate cleaning and filtering should be done by the investigators to allow the optimisation towards the requirements of their methodologies. Additional value-added terrain representations e.g. generated through data fusion approaches are also provided.

It is the intention of the project team to create a ‘living data set’ following the FAIR data principles. The expensive and comprehensive data set collected for flood hazard mapping could also be valuable to other scientific communities. Results of ongoing work should be integrated, and newly collected data layers will keep the research repository relevant and UpToDate. Sharing this well-maintained dataset amongst any interested research community will maximize its value.

How to cite: Backes, D., Teferle, N., and Schumann, G.: Building a Multimodal topographic dataset for flood hazard modelling and other geoscience applications, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-18279, https://doi.org/10.5194/egusphere-egu2020-18279, 2020.

EGU2020-12417 | Displays | ESSI3.5

GAGE Facility Geodetic Data Archive: Discoverability, Accessibility, Interoperability & Attribution

James Riley, Charles Meertens, David Mencin, Kathleen Hodgkinson, Douglas Ertz, David Maggert, Dan Reiner, Christopher Crosby, and Scott Baker

The U.S. National Science Foundation’s Geodesy Advancing Geosciences (GAGE) Facility, operated by UNAVCO, is building systems and adopting practices in support of more comprehensive data discovery, search and access capabilities across our various geodetic data holdings. As a World Data Center, the GAGE Facility recognizes the need for interoperability of its Earth data holdings in its archives, as represented by the FAIR Data Principles. To this end, web services, both as back-end and front-end resources, are being developed to provide new and enhanced capabilities. 

UNAVCO is exploring international standards such as ISO Geographic information Metadata and the Open Geospatial Consortium’s (OGC) web services that have been in development for decades to help facilitate interoperability. Through various collaborations, UNAVCO seeks to develop and promote infrastructure, metadata, and interoperability standards for the community. We are participating in the development of the next version of GeodesyML, being led by Geoscience Australia, which will leverage standards and help codify metadata practices for the geodetic community. New web technologies like Linked Data, are arising to augment these standards and provide greater connectivity and interoperability of structured data and UNAVCO has implemented Schema.org metadata for its datasets and partnered with EarthCube’s Project 418/419 and Google Dataset Search. Persistent identifiers are being adopted with DOI’s for datasets and exploration into RORs for organizational affiliation, and ORCID iDs for identity and access management and usage metrics are being explored. As UNAVCO investigates these various technologies and practices, they remain in various states of acceptance and implementation, we share our experiences to date.

How to cite: Riley, J., Meertens, C., Mencin, D., Hodgkinson, K., Ertz, D., Maggert, D., Reiner, D., Crosby, C., and Baker, S.: GAGE Facility Geodetic Data Archive: Discoverability, Accessibility, Interoperability & Attribution, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-12417, https://doi.org/10.5194/egusphere-egu2020-12417, 2020.

EGU2020-12193 | Displays | ESSI3.5

Putting Data to Work: ESIP and EarthCube working together to transform geoscience

Erin Robinson

EGU2020-12718 | Displays | ESSI3.5

Converging Seismic and Geodetic Data Services

Jerry A Carter, Charles Meertens, Chad Trabant, and James Riley

One of the fundamental tenets of the Incorporated Research Institutions for Seismology’s (IRIS’s) mission is to “Promote exchange of seismic and other geophysical data … through pursuing policies of free and unrestricted data access.”  UNAVCO also adheres to a data policy that promotes free and unrestricted use of data.  A major outcome of these policies has been to reduce the time that researchers spend finding, obtaining, and reformatting data.  While rapid, easy access to large archives of data has been successfully achieved in seismology, geodesy and many other distinct disciplines, integrating different data types in a converged data center that promotes interdisciplinary research remains a challenge.  This challenge will be addressed in an integrated seismological and geodetic data services facility that is being mandated by the National Science Foundation (NSF).  NSF’s Seismological Facility for the Advancement of Geoscience (SAGE), which is managed by IRIS, will be integrated with NSF’s Geodetic Facility for the Advancement of Geoscience (GAGE), which is managed by UNAVCO.  The combined data services portion of the facility, for which a prototype will be developed over the next two to three years, will host a number of different data types including seismic, GNSS, magnetotelluric, SAR, infrasonic, hydroacoustic, and many others.  Although IRIS and UNAVCO have worked closely for many years on mutually beneficial projects and have shared their experience with each other, combining the seismic and geodetic data services presents challenges to the well-functioning SAGE and GAGE data facilities that have served their respective scientific communities for more than 30 years. This presentation describes some preliminary thoughts and guiding principles to ensure that we build upon the demonstrated success of both facilities and how an integrated GAGE and SAGE data services facility might address the challenges of fostering interdisciplinary research. 

How to cite: Carter, J. A., Meertens, C., Trabant, C., and Riley, J.: Converging Seismic and Geodetic Data Services, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-12718, https://doi.org/10.5194/egusphere-egu2020-12718, 2020.

AuScope is the national provider of research infrastructure to the earth and geospatial sciences communities in Australia. Funded through the NCRIS scheme since 2006 we have invested heavily in a diverse suite of infrastructures in that time, from VLBI telescopes to geochronology laboratories, and national geophysical data acquisitions to development of numerical simulation and inversion codes.

Each of these programs, and the communities they support have different requirements relating to data structures, data storage, compute and access and as a result there has been a tendency in the past to build bespoke discipline specific data systems.  This approach limits the opportunities for cross domain research activity and investigation.

AuScope recently released our plans to build an Australian Downward Looking Telescope (or DLT).  This will be a distributed observational, characterisation and computational infrastructure providing the capability for Australian geoscientists to image and understand the composition of the Australian Plate with unprecedented fidelity.

The recent development of an investment plan for the construction of this National Research Infrastructure has allowed our community to reassess existing data deliver strategies and architectures to bring them in line with current international best practice.

Here we present the proposed e-infrastructure that will underpin the DLT.  This FAIR data platform will facilitate open and convergent research across the geosciences and will underpin efforts currently underway to connect international research infrastructures, including EPOS, AuScope and IRIS and others, to create a global research infrastructure network for earth science.

How to cite: Rawling, T.: Geoscience data interoperability through a new lens: how designing a telescope that looks down changed our view of data., EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-12673, https://doi.org/10.5194/egusphere-egu2020-12673, 2020.

EGU2020-18310 | Displays | ESSI3.5

Evolution and Future Architecture for the Earth System Grid Federation

Philip Kershaw, Ghaleb Abdulla, Sasha Ames, Ben Evans, Tom Landry, Michael Lautenschlager, Venkatramani Balaji, and Guillaume Levavasseur

The Earth System Grid Federation (ESGF) is a globally distributed e-infrastructure for the hosting and dissemination of climate-related data.  ESGF was originally developed to support the community in the analysis of CMIP5 (5th Coupled Model Intercomparison Project) data in support of the 5th Assessment report made by the IPCC (Intergovernmental Panel on Climate Change).  Recognising the challenge of the large volumes of data concerned and the international nature of the work, a federated system was developed linking together a network of collaborating data providers around the world. This enables users to discover, download and access data through a single unified system such that they can seamlessly pull data from these multiple hosting centres via a common set of APIs.  ESGF has grown to support over 16000 registered users and besides the CMIPs, supports a range of other projects such as the Energy Exascale Earth System Model, Obs4MIPS, CORDEX and the European Space Agency’s Climate Change Initiative Open Data Portal.

Over the course of its evolution, ESGF has pioneered technologies and operational practice for distributed systems including solutions for federated search, metadata modelling and capture, identity management and large scale replication of data.  Now in its tenth year of operation, a major review of the system architecture is underway. For this next generation system, we will be drawing from our experience and lessons learnt running an operational e-infrastructure but also considering other similar systems and initiatives.  These include for example, ESA’s Earth Observation Exploitation Platform Common Architecture, outputs from recent OGC Testbeds and Pangeo (https://pangeo.io/), a community and software stack for the geosciences.   Drawing from our own recent pilot work, we look at the role of cloud computing with its impact on deployment practice and hosting architecture but also new paradigms for massively parallel data storage and access, such as object store. The cloud also offers a potential point of entry for scientists without access to large-scale computing, analysis, and network resources.  As trusted international repositories, the major national computing centres that host and replicate large corpuses of ESGF have increasingly been supporting a broader range of domains and communities in the Earth sciences. We explore the critical role of standards for connecting data and the application of FAIR data principles to ensure free and open access and interoperability with other similar systems in the Earth Sciences.

How to cite: Kershaw, P., Abdulla, G., Ames, S., Evans, B., Landry, T., Lautenschlager, M., Balaji, V., and Levavasseur, G.: Evolution and Future Architecture for the Earth System Grid Federation, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-18310, https://doi.org/10.5194/egusphere-egu2020-18310, 2020.

The geoscience disciplines are either gathering or generating data in ever-increasing volumes. To ensure that the science community and society reap the utmost benefits in research and societal applications from such rich and diverse data resources, there is a growing interest in broad-scale, open data sharing to foster myriad scientific endeavors. However, open access to data is not sufficient; research outputs must be reusable and reproducible to accelerate scientific discovery and catalyze innovation.

As part of its mission, Unidata, a geoscience cyberinfrastructure facility, has been developing and deploying data infrastructure and data-proximate scientific workflows and analysis tools using cloud computing technologies for accessing, analyzing, and visualizing geoscience data.

Specifically, Unidata has developed techniques that combine robust access to well-documented datasets with easy-to-use tools, using workflow technologies. In addition to fostering the adoption of technologies like pre-configured virtual machines through Docker containers and Jupyter notebooks, other computational and analytic methods are enabled via “Software as a Service” and “Data as a Service” techniques with the deployment of the Cloud IDV, AWIPS Servers, and the THREDDS Data Server in the cloud. The collective impact of these services and tools is to enable scientists to use the Unidata Science Gateway capabilities to not only conduct their research but also share and collaborate with other researchers and advance the intertwined goals of Reproducibility of Science and Open Science, and in the process, truly enabling “Science as a Service”.

Unidata has implemented the aforementioned services on the Unidata Science Gateway ((http://science-gateway.unidata.ucar.edu), which is hosted on the Jetstream cloud, a cloud-computing facility that is funded by the U. S. National Science Foundation. The aim is to give geoscientists an ecosystem that includes data, tools, models, workflows, and workspaces for collaboration and sharing of resources.

In this presentation, we will discuss our work to date in developing the Unidata Science Gateway and the hosted services therein, as well as our future directions toward increasing expectations from funders and scientific communities that they will be Open and FAIR (Findable, Accessible, Interoperable, Reusable). In particular, we will discuss how Unidata is advancing data and software transparency, open science, and reproducible research. We will share our experiences in how the geoscience and information science communities are using the data, tools and services provided through the Unidata Science Gateway to advance research and education in the geosciences.

How to cite: Ramamurthy, M.: A Cloud-based Science Gateway for Enabling Science as a Service to Facilitate Open Science and Reproducible Research, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10761, https://doi.org/10.5194/egusphere-egu2020-10761, 2020.

EGU2020-20708 | Displays | ESSI3.5

ENVRI knowledge base: A community knowledge base for research, innovation and society

XiaoFeng Liao, Doron Goldfarb, Barbara Magagna, Markus Stocker, Peter Thijsse, Dick Schaap, and Zhiming Zhao

The Horizon 2020 ENVRI-FAIR project brings together 14 European environmental research infrastructures (ENVRI) to develop solutions to improve the FAIRness of their data and services, and eventually to connect the ENVRI community with the European Open Science Cloud (EOSC). It is thus essential to share the reusable solutions while RIs are tackling common challenges in improving their FAIRness, and to continually assess the FAIRness of ENVRI (meta)data services as they are developed. 
The FAIRness assessment is, however, far from trivial. On the one hand, the task relies on gathering the required information from RIs, e.g. information about the metadata and data repositories operated by RIs, the kind of metadata standards repositories implement, the use of persistent identifier systems. Such information is gathered using questionnaires whose processing can be time-consuming. On the other hand, to enable efficient querying, processing and analysis, the information needs to be machine-actionable and curated in a knowledge base.
Besides acting as a general resource to learn about RIs, the ENVRI knowledge base (KB) supports RI managers in identifying current gaps in their RI’s implementation of the FAIR Data Principles. For instance, a RI manager can interrogate the KB to discover whether a data repository of the RI uses a persistent identifier service or if the repository is certified according to some scheme. Having identified a gap, the KB can support the RI manager in exploring the solutions implemented by other RIs.
By linking questionnaire information to training resources, the KB also supports the discovery of materials that provide hands-on demonstrations for how state-of-the-art technologies can be used and implemented to address FAIR requirements. For instance, if a RI manager discovers that the metadata of one of the RI’s repositories does not include machine-readable provenance, the ENVRI KB can inform the manager about available training material demonstrating how the PROV Ontology can be used to implement machine-readable provenance in systems. Such demonstrators can be highly actionable as they can be implemented in Jupyter and executed with services such as mybinder. Thus, the KB can seamlessly integrate the state of FAIR implementation in RIs with actionable training material and is therefore a resource that is expected to contribute substantially to improving ENVRI FAIRness.
The ENVRI KB is implemented using the W3C Recommendations developed within the Semantic Web Activity, specifically RDF, OWL, and SPARQL. To effectively expose its content to RI communities, ranging from scientists to managers, and other stakeholders, the ENVRI-FAIR KB will need a customisable user interface for context-aware information discovery, visualisation, and content update. The current prototype can be accessed: kb.oil-e.net. 

How to cite: Liao, X., Goldfarb, D., Magagna, B., Stocker, M., Thijsse, P., Schaap, D., and Zhao, Z.: ENVRI knowledge base: A community knowledge base for research, innovation and society, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-20708, https://doi.org/10.5194/egusphere-egu2020-20708, 2020.

EGU2020-19050 | Displays | ESSI3.5

EPOS ICS Data Portal

Carmela Freda, Rossana Paciello, Jan Michalek, Kuvvet Atakan, Daniele Bailo, Keith Jeffery, Matt Harrison, Massimo Cocco, and Epos Team

The European Plate Observing System (EPOS) addresses the problem of homogeneous access to heterogeneous digital assets in geoscience of the European tectonic plate. Such access opens new research opportunities. Previous attempts have been limited in scope and required much human intervention. EPOS adopts an advanced Information and Communication Technologies (ICT) architecture driven by a catalog of rich metadata. The architecture together with challenges and solutions adopted are presented. The EPOS ICS Data Portal is introducing a new way for cross-disciplinary research. The multidisciplinary research is raising new possibilities for both students and teachers. The EPOS portal can be used either to explore the available datasets or to facilitate the research itself. It can be very instructive in teaching as well, for example by demonstrating scientific use cases. 

EPOS is a European project about building a pan-European infrastructure for accessing solid Earth science data. The finished EPOS-IP project includes 47 partners plus 6 associate partners from 25 countries from all over Europe and several international organizations. However, the community contributing to the EPOS integration plan is larger than the official partnership of EPOS-IP project, because more countries are represented by the international organizations and because there are several research institutions involved within each country.

The recently developed EPOS ICS Data Portal provides access to data and data products from ten different geoscientific areas: Seismology, Near Fault Observatories, GNSS Data and Products, Volcano Observations, Satellite Data, Geomagnetic Observations, Anthropogenic Hazards, Geological Information and Modelling, Multi-scale laboratories and Geo-Energy Test Beds for Low Carbon Energy.

The presentation focusses on the EPOS ICS Data Portal, which is providing information about available datasets from TCS and access to them. We are demonstrating not only features of the graphical user interface but also the underlying architecture of the whole system.

How to cite: Freda, C., Paciello, R., Michalek, J., Atakan, K., Bailo, D., Jeffery, K., Harrison, M., Cocco, M., and Team, E.: EPOS ICS Data Portal, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-19050, https://doi.org/10.5194/egusphere-egu2020-19050, 2020.

EGU2020-18842 | Displays | ESSI3.5

EPOS-Norway – Integration of Norwegian geoscientific data into a common e-infrastucture

Jan Michalek, Kuvvet Atakan, Christian Rønnevik, Tor Langeland, Ove Daae Lampe, Gro Fonnes, Svein Mykkeltveit, Jon Magnus Christensen, Ulf Baadshaug, Halfdan Pascal Kierulf, Bjørn-Ove Grøtan, and Odleiv Olesen

The European Plate Observing System (EPOS) is a European project about building a pan-European infrastructure for accessing solid Earth science data, governed now by EPOS ERIC (European Research Infrastructure Consortium). The EPOS-Norway project (EPOS-N; RCN-Infrastructure Programme - Project no. 245763) is a Norwegian project funded by National Research Council. The aims of EPOS-N project are divided into four work packages where one of them is about integrating Norwegian geoscientific data into an e-infrastructure. The other three work packages are: management of the project, improving the geoscientific monitoring in the Arctic and establishing Solid Earth Science Forum to communicate the progress within the geoscientific community and also providing feedback to the development group of the e-infrastructure.

Among the six EPOS-N project partners, five institutions are actively participating and providing data in the EPOS-N project – University of Bergen (UIB), University of Oslo (UIO), Norwegian Mapping Authority (NMA), Geological Survey of Norway (NGU) and NORSAR. The data which are about to be integrated are divided into categories according to the thematic fields – seismology, geodesy, geological maps and geophysical data. Before the data can be integrated into the e-infrastructure their formats need to follow the international standards which were already developed by the communities of geoscientists around the world. Metadata are stored in Granularity Database tool and easily accessible by other tools via dedicated API. For now, there are 33 Data, Data Products, Software and Services (DDSS) described in EPOS-N list.     

We present the Norwegian approach of integration of the geoscientific data into the e-infrastructure, closely following the EPOS ERIC development. The sixth partner in the project – NORCE Norwegian Research Centre AS is specialized in visualizations of data and developing the EPOS-N Portal. It is web-based graphical user interface adopting Enlighten-web software which allows users to visualize and analyze cross-disciplinary data. Expert users can launch the visualization software through a web based programming interface (Jupyter Notebook) for processing of the data. The seismological waveform data (provided by UIB and NORSAR) will be available through an EIDA system, seismological data products (receiver functions, earthquake catalogues and macroseismic observations) as individual datasets or through a web service, GNSS data products (provided by NMA) through standalone files and geological and geophysical (magnetic, gravity anomaly) maps (provided by NGU) as WMS web services or standalone files. Integration of some specific geophysical data is still under discussion, such as georeferenced cross-sections which are of interest especially for visualization with other geoscientific data.     

Constant user feedback is achieved through dedicated workshops. Various use cases are defined by users and have been tested in these workshops. Collected feedback is being used for further development and improvements of the EPOS-N Portal.

How to cite: Michalek, J., Atakan, K., Rønnevik, C., Langeland, T., Lampe, O. D., Fonnes, G., Mykkeltveit, S., Magnus Christensen, J., Baadshaug, U., Kierulf, H. P., Grøtan, B.-O., and Olesen, O.: EPOS-Norway – Integration of Norwegian geoscientific data into a common e-infrastucture, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-18842, https://doi.org/10.5194/egusphere-egu2020-18842, 2020.

EGU2020-7627 | Displays | ESSI3.5

NGIC: turning concepts into reality

Nikolay Miloshev, Petya Trifonova, Ivan Georgiev, Tania Marinova, Nikolay Dobrev, Violeta Slabakova, Velichka Milusheva, and Todor Gurov

The National Geo-Information Center (NGIC) is a distributed research infrastructure funded by the National road map for scientific infrastructure (2017-2023) of Bulgaria. It operates in a variety of disciplines such as geophysics, geology, seismology, geodesy, oceanology, climatology, soil science, etc. providing data products and services. Created as a partnership between four institutes working in the field of Earth observation: the National Institute of Geophysics, Geodesy and Geography (NIGGG), the National Institute of Meteorology and Hydrology (NIMH), the Institute of Oceanology (IO), the Geological Institute (GI), and two institutes competent in ICT: the Institute of Mathematics and Informatics (IMI) and the Institute of Information and Communication Technologies (IICT), NGIC consortium serve as primary community of data collectors for national geoscience research. Besides the science, NGIC aims to support decision makers during the process of prevention and protection of the population from natural and anthropogenic risks and disasters.

Individual NGIC partners originated independently and differ from one another in management and disciplinary scope. Thus, the conceptual model of the NGIC system architecture is based on a federated model structure in which the partners retain their independence and contribute to the development of the common infrastructure through the data and research they carry out. The basic conceptual model of architecture uses both service and microservice concepts and may be altered according to the specifics of the organization environment and development goals of the NGIC information system. It consists of three layers: “Sources” layer containing the providers of Data, Data products, Services and Soft-ware (DDSS), “Interoperability”- regulating the access, automation of discovery and selection of DDSS and data collection from the sources, and “Integration” layer which produces integrated data products.

The diversity of NGIC’s data, data products, and services is a major strength and of high value to its users like governmental institutions and agencies, research organizations and universities, private sector enterprises, media and the public. NGIC will pursue collaboration with initiatives, projects and research infrastructures for Earth observation to enhance access to an integrated global data resource.

How to cite: Miloshev, N., Trifonova, P., Georgiev, I., Marinova, T., Dobrev, N., Slabakova, V., Milusheva, V., and Gurov, T.: NGIC: turning concepts into reality, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-7627, https://doi.org/10.5194/egusphere-egu2020-7627, 2020.

EGU2020-22592 | Displays | ESSI3.5

Federated and intelligent datacubes

Chris Atherton, Peter Löwe, and Torsten Heinen

We face unprecedented environmental challenges as a species, that threaten our existing way of life.  We are still learning to understand our planet, although we have a good idea how it works.  The speed of research needs to accelerate to provide information to decision makers, to better respond to our societal challenges.  To do this we need to move towards leveraging large datasets to speed up research, as proposed by Jim Grey in ‘The Fourth Paradigm’. In the world of research infrastructures we need to provide a means for scientists to access vast amounts of research data from multiple data sources in an easy and efficient way.  EOSC is addressing this but we are only scratching the surface when it comes to unleashing the full potential of the scientific community.  Datacubes have recently emerged as a technology in the Environmental and Earth system domain to store imagery data in a way that makes it easier and quicker for scientists to perform their research.  But with the scales of data volumes that are being considered, there are many challenges to curating, hosting, and funding this information in a centralised centre.  Our proposal seeks to leverage the existing National Research and Education (NRENs) infrastructures to store national repositories of regional Environmental and Earth system domain data, for this to be shared with scientists in an open, federated but secure way, conforming to FAIR principles.  This would provide levels of redundancy, data sovereignty and scalability for hosting global environmental datasets in an exascale world.

How to cite: Atherton, C., Löwe, P., and Heinen, T.: Federated and intelligent datacubes, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-22592, https://doi.org/10.5194/egusphere-egu2020-22592, 2020.

Due to the european INSPIRE directive to establish an infrastructure for spatial information in Europe, the number of national data sources in Europe which are open to the public or at least science continues to grow. 

However, challenges remain to enable easy access for society and science  to these previously unavailable data silos based on standardized web-services, as defined by the Open Geospatial Consortium (OGC). This is crucial to ensure sustainable data generation and reuse according to the FAIR principles (Findable, Accessible, Interoperable, Reusable). 

We report on an interdisciplinary application, using spatial data to improve longitudinal surveys in the social sciences, involving building plans encoded in CityGML, PostGIS, MapServer and R.

The Socio-economic Panel (SOEP) as part of the German Institute for Economic Research (DIW Berlin) provides longitudinal data on persons living in private households across Germany. Lately the SOEP sampled households in certain neighborhoods within cities, areas of the so called „Soziale Stadt“ (social town). Because of restricted area, spatially referenced data has been used. Information on the level of census tiles provided by the Federal Statistical Office was used to form regional clusters. 

Within these clusters addresses, spatially referenced by the German Federal Agency for Cartography and Geodesy (BKG), have been sampled. This way, we made sure addresses are within the neighborhoods to be surveyed. As this procedure turned out to reduce organizational burden for the survey research institute as well as for the interviewers and at the same time allows for generating random household samples, it is considered for future use. Yet, addresses can belong to residential buildings as well as cinemas or hotels. 

To meet with this obstacle we evaluate the use of 3D Building Models provided by the German Federal Agency for Cartography and Geodesy (BKG).
This data is distributed as compressed data archives for the 16 states of Germany, each containing very large numbers of CityGML files containing  LoD1 data sets for buildings. The large storage footprint of these data sets makes their reuse by social scientists using standard  statistical software (such as R or Stata) on desktop computers difficult at best. This is overcome by providing flexible access to Areas of Interest (AOI) through OGC Webservices (WMS/WFS) based on a PostGIS database. The ingestion process is based on the new GMLAS driver of the ogr software project for Complex Features encoded in Geographic Markup Language (GML) based on application schemas.

How to cite: Löwe, P., Gebel, T., and Steinhauer, H. W.: From Silos to FAIR Services: Interoperable application of geospatial data for longitudinal surveys in the Social Sciences., EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-18956, https://doi.org/10.5194/egusphere-egu2020-18956, 2020.

EGU2020-10778 | Displays | ESSI3.5

Implementing FAIR Principles in the IPCC Assessment Process

Martin Juckes, Anna Pirani, Charlotte Pascoe, Robin Matthews, Martina Stockhause, Bob Chen, and Xing Xiaoshi

The Assessment Reports of the Intergovernmental Panel on Climate Change (IPCC) have provided the scientific basis underpinning far reaching policy decisions.The reports also have a huge influence on public debate about climate change. The IPCC is not responsible either for the evaluation of climate data and related emissions and socioeconomic data and scenarios or for the provision of advice on policy (reports must be “neutral, policy-relevant but not policy-prescriptive”). These omissions may appear unreasonable at first sight, but they are part of the well-tested structure which enables the creation of authoritative reports on the complex and sensitive subject of climate change. The responsibility for evaluation of climate data and related emissions and socioeconomic data and scenarios remains with the global scientific community. The IPCC has the task of undertaking an expert, objective assessment of the state of scientific knowledge as expressed in the scientific literature. The exclusion of responsibility for providing policy advice from the IPCC remit allows the IPCC to stay clear of discussions of political priorities. 

These distinctions and limitations influence the way in which the findable, accessible, interoperable, and reusable (FAIR) data principles are applied to the work of the IPCC Assessment. There are hundreds of figures in the IPCC Assessment Reports, showing line graphs, global or regional maps, and many other displays of data and information. These figures are put together by the authors using data resources which are described in the scientific literature that is being assessed. The figures are there to illustrate or clarify points raised in the text of the assessment. Increasingly, the figures also provide quantitative information which is of critical importance for many individuals and organisations which are seeking to exploit IPCC knowledge. 

This presentation will discuss the process of implementing the FAIR data principles within the IPCC assessment process. It will also review both the value of the FAIR principles to the IPCC authors and the IPCC process and the value of the FAIR data products that the process is expected to generate.

How to cite: Juckes, M., Pirani, A., Pascoe, C., Matthews, R., Stockhause, M., Chen, B., and Xiaoshi, X.: Implementing FAIR Principles in the IPCC Assessment Process, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10778, https://doi.org/10.5194/egusphere-egu2020-10778, 2020.

EGU2020-5756 | Displays | ESSI3.5

Digital Earth in a Transformed Society

Paolo Mazzetti, Stefano Nativi, and Changlin Wang

Last September, about 400 delegates gathered in Florence, Italy from all over the world, to attend the 11th International Symposium on Digital Earth (ISDE11). The Opening Plenary session (held in the historic Salone dei Cinquecento at Palazzo Vecchio) included a celebration ceremony for the 20th anniversary of the International Symposium on Digital Earth, which was initiated in Beijing, China in November 1999 by the Chinese Academy of Sciences (CAS).

In the framework of ISDE11, about 30 sessions illustrated the various challenges and opportunities in building a Digital Earth. They included five Grand Debates and Plenary sessions dealing with issues related to: “Trust and Ethics in Digital Earth”; “Digital Earth for United Nations Sustainable Development Goals (SDGs)”; “ISDE in a Transformed Society”; “Challenges and Opportunities of Digital Transformation”; and “New Knowledge Ecosystems.” Moreover, ISDE11 endorsed and approved a new Declaration by the International Society for the Digital Earth (i.e. the 2019 ISDE Florence Declaration) that, after 10 years, lays the path to a new definition of Digital Earth that will be finalized in the first months of 2020.

This presentation will discuss the main outcomes of ISDE11 as well as the future vision of Digital Earth in a Transformed Society.

How to cite: Mazzetti, P., Nativi, S., and Wang, C.: Digital Earth in a Transformed Society, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-5756, https://doi.org/10.5194/egusphere-egu2020-5756, 2020.

The European Commission (EC) puts forward a European approach to artificial intelligence and robotics. It deals with technological, ethical, legal and socio-economic aspects to boost EU's research and industrial capacity and to put AI at the service of European citizens and economy.

Artificial intelligence (AI) has become an area of strategic importance and a key driver of economic development. It can bring solutions to many societal challenges from treating diseases to minimising the environmental impact of farming. However, socio-economic, legal and ethical impacts have to be carefully addressed.

It is essential to join forces in the EU to stay at the forefront of this technological revolution, to ensure competitiveness and to shape the conditions for its development and use (ensuring respect of European values). In this framework, the EC and the Member States published a Coordinated Plan on Artificial Intelligence”, COM(2018) 795, on the development of AI in the EU. The Coordinated Plan includes the recognition of common indicators to monitor AI uptake and development in the Union and the success rate of the strategies in place, with the support of the AI Watch instrument developed by the EC. Therefore, AI Watch is monitoring and assessing European AI landscapes from driving forces to technology developments, from research to market, from data ecosystems to applications. 

The presentation will first introduce the main AI Watch methodology and tasks. Then, it will focus on the interest of AI Watch to monitor and understand what has been the AI impact on Geosciences research and innovation –see for example Climate Change studies. Finally, a proposal to connect EGU Community (in particular ESSI division) and AI Watch will be introduced.

How to cite: Nativi, S. and Craglia, M.: European Commission AI Watch initiative: Artificial Intelligence uptake and the European Geosciences Community, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-5691, https://doi.org/10.5194/egusphere-egu2020-5691, 2020.

EGU2020-18121 | Displays | ESSI3.5

Performance gains in an ESM using parallel ad-hoc file systems

Stefan Versick, Ole Kirner, Jörg Meyer, Holger Obermaier, and Mehmet Soysal

Earth System Models (ESM) got much more demanding over the last years. Modelled processes got more complex and more and more processes are considered in models. In addition resolutions of the models got higher to improve weather and climate forecasts. This requires faster high performance computers (HPC) and better I/O performance.

Within our Pilot Lab Exascale Earth System Modelling (PL-EESM) we do performance analysis of the ESM EMAC using a standard Lustre file system for output and compare it to the performance using a parallel ad-hoc overlay file system. We will show the impact for two scenarios: one for todays standard amount of output and one with artificial heavy output simulating future ESMs.

An ad-hoc file system is a private parallel file system which is created on-demand for an HPC job using the node-local storage devices, in our case solid-state-disks (SSD). It only exists during the runtime of the job. Therefore output data have to be moved to a permanent file system before the job has finished. Quasi in-situ data analysis and post-processing allows to gain performance as it might result in a decreased amount of data which you have to store - saving disk space and time during the transfer of data to permanent storage. We will show first tests for quasi in-situ post-processing.

How to cite: Versick, S., Kirner, O., Meyer, J., Obermaier, H., and Soysal, M.: Performance gains in an ESM using parallel ad-hoc file systems, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-18121, https://doi.org/10.5194/egusphere-egu2020-18121, 2020.

EGU2020-8275 | Displays | ESSI3.5

SPOT World Heritage catalogue: 30 years of SPOT 1-to-5 observation

Julien Nosavan, Agathe Moreau, and Steven Hosford

SPOT 1-to-5 satellites have collected more than 30 million images all over the world during the last 30 years from 1986 to 2015 which represents an amazing historical dataset. The SPOT World Heritage (SWH) programme is a CNES initiative to preserve, open and generate positive impact from this SPOT 1-to-5 archive by providing new enhanced products to the general public.

Preservation has been supported for years by archiving raw data (GERALD format) in the CNES long term archive service (STAF) while the commercial market was served by images provided by our commercial partner Airbus. SWH opens a new era with the will to provide and share a new SPOT 1-to-5 archive at image level. The chosen image product is the well-known 1A SCENE product (DIMAP format) which has been one of the SPOT references for years. As a remind, 1A SCENE is a squared 60 km x 60 km GEOTIFF image including initial radiometric corrections from instrument distortions. Image resolution ranges from 20m to 5m depending on the SPOT satellite/instrument (2,5m using SPOT 5 THR on ground processing mode).

This new SWH-1A archive is currently composed of 17 M images which have been first extracted from STAF magnetic tapes over a period of 1 year and processed to 1A level using the standard processing chain on CNES High Processing Center (~432 processing cores). In parallel, additional images acquired by partner receiving stations are being retrieved to ensure that the archive is as exhaustive as possible.

The SPOT 1-to-5 1A archive will be accessible through a dedicated CNES SWH Web catalogue based on REGARDS software which is a CNES Open Source generic tool (GPLv3 license) used to manage data preservation and distribution in line with OAIS (Open Archival Information System) and FAIR (Findable, Accessible, Interoperable, Reusable) paradigms.

Once authenticated and in respect of the SWH license of use, users will then be able to request the catalogue and download products, manually or using APIs supporting OpenSearch requests.

This paper presents the architecture of the whole SPOT preservation process, from processing chains to data distribution with a first introduction to the SWH catalogue.

A last part of the presentation deals with some examples of use cases foreseen using this SPOT dataset.

How to cite: Nosavan, J., Moreau, A., and Hosford, S.: SPOT World Heritage catalogue: 30 years of SPOT 1-to-5 observation, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-8275, https://doi.org/10.5194/egusphere-egu2020-8275, 2020.

EGU2020-12650 | Displays | ESSI3.5

An International Cooperation Practice on the Analysis of Carbon Satellites data

Lianchong Zhang, Guoqinng Li, Jing Zhao, and Jing Li

Carbon satellite data is an essential part of the greenhouse observation and plays a critical role in global climate change assessment. Existing carbon data analysis e-science platforms are affected by restrictions in distributed resource management and tightly coupled service interoperability. These barriers currently offer no support for facilitating cross-disciplinary exploration and application,which have hindered the development of international cooperation. From 2018, the Cooperation on the Analysis of carbon SAtellites data (CASA), a new international scientific programme, was approved by the Chinese Academy of Sciences (CAS). So far, more than 9 research institutions have been integrated under this cooperation. The result is demonstrated in the global XCO2 dataset based on the Tansat satellite.

How to cite: Zhang, L., Li, G., Zhao, J., and Li, J.: An International Cooperation Practice on the Analysis of Carbon Satellites data, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-12650, https://doi.org/10.5194/egusphere-egu2020-12650, 2020.

ESSI3.6 – Best Practices and Realities of Research Data Repositories

EGU2020-1422 | Displays | ESSI3.6

Geophysics data management at the UK Polar Data Centre

Alice Fremand

The UK Polar Data Centre (UK PDC, https://www.bas.ac.uk/data/uk-pdc/) is the focal point for Arctic and Antarctic environmental data management in the UK. Part of the Natural Environmental Research Council’s (NERC) and based at the British Antarctic Survey (BAS), the UK PDC coordinate the management of polar data from UK-funded research and support researchers in complying with national and international data legislation and policy.

Reflecting the multidisciplinary nature of polar science, the datasets handled by the data centre are extremely diverse. Geophysics datasets include bathymetry, aerogravity, aeromagnetics and airborne radar depth soundings.  These data provide information about the seabed topography, the Earth’s geological structure and the ice thickness. The datasets are used in a large variety of scientific research and projects at BAS. For instance, the significant seabed multibeam coverage of the Southern Ocean enables BAS to be a major contributor to multiple international projects such as International Bathymetric Chart of the Southern Ocean (IBCSO) and Seabed 2030. That is why, it is crucial for the UK Polar Data Centre (PDC) to develop robust procedures to manage these data.

In the last few months’, the procedures to preserve, archive and distribute all these data have been revised and updated to comply with the recommendations from the Standing Committee on Antarctic Data Management (SCADM) and the requirements of CoreTrustSeal for a future certification. The goal is to develop standard ways to publish FAIR (Findable, Accessible, Interoperable and Reusable) data and set up workflows for long-term preservation and access to UK PDC holdings.

How to cite: Fremand, A.: Geophysics data management at the UK Polar Data Centre, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-1422, https://doi.org/10.5194/egusphere-egu2020-1422, 2020.

EGU2020-13237 | Displays | ESSI3.6 | Highlight

Towards a Specialized Environmental Data Portal: Challenges and Opportunities

Ionut Iosifescu-Enescu, Gian-Kasper Plattner, Dominik Haas-Artho, David Hanimann, and Konrad Steffen

EnviDat – www.envidat.ch – is the institutional Environmental Data portal of the Swiss Federal Institute for Forest, Snow and Landscape Research WSL. Launched in 2012 as a small project to explore possible solutions for a generic WSL-wide data portal, it has since evolved into a strategic initiative at the institutional level tackling issues in the broad areas of Open Research Data and Research Data Management. EnviDat demonstrates our commitment to accessible research data in order to advance environmental science.

EnviDat actively implements the FAIR (Findability, Accessibility, Interoperability and Reusability) principles. Core EnviDat research data management services include the registration, integration and hosting of quality-controlled, publication-ready data from a wide range of terrestrial environmental systems, in order to provide unified access to WSL’s environmental monitoring and research data. The registration of research data in EnviDat results in the formal publication with permanent identifiers (EnviDat own PIDs as well as DOIs) and the assignment of appropriate citation information.

Innovative EnviDat features that contribute to the global system of modern documentation and exchange of scientific information include: (i) a DataCRediT mechanism designed for specifying data authorship (Collection, Validation, Curation, Software, Publication, Supervision), (ii) the ability to enhance published research data with additional resources, such as model codes and software, (iii) in-depth documentation of data provenance, e.g., through a dataset description as well as related publications and datasets, (iv) unambiguous and persistent identifiers for authors (ORCIDs) and, in the medium-term, (v) a decentralized “peer-review” data publication process for safeguarding the quality of available datasets in EnviDat.

More recently, the EnviDat development has been moving beyond the set of core features expected from a research data management portal with a built-in publishing repository. This evolution is driven by the diverse set of researchers’ requirements for a specialized environmental data portal that formally cuts across the five WSL research themes forest, landscape, biodiversity, natural hazards, and snow and ice, and that concerns all research units and central IT services.

Examples of such recent requirements for EnviDat include: (i) immediate access to data collected by automatic measurements stations, (ii) metadata and data visualization on charts and maps, with geoservices for large geodatasets, and (iii) progress towards linked open data (LOD) with curated vocabularies and semantics for the environmental domain.

There are many challenges associated with the developments mentioned above. However, they also represent opportunities for further improving the exchange of scientific information in the environmental domain. Especially geospatial technologies have the potential to become a central element for any specialized environmental data portal, triggering the convergence between publishing repositories and geoportals. Ultimately, these new requirements demonstrate the raised expectations that institutions and researchers have towards the future capabilities of research data portals and repositories in the environmental domain. With EnviDat, we are ready to take up these challenges over the years to come.

How to cite: Iosifescu-Enescu, I., Plattner, G.-K., Haas-Artho, D., Hanimann, D., and Steffen, K.: Towards a Specialized Environmental Data Portal: Challenges and Opportunities, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-13237, https://doi.org/10.5194/egusphere-egu2020-13237, 2020.

EGU2020-13531 | Displays | ESSI3.6 | Highlight

Towards publishing soil and agricultural research data: the BonaRes DOI

Nikolai Svoboda, Xenia Specka, Carsten Hoffmann, and Uwe Heinrich

The German research initiative BonaRes (“Soil as a sustainable resource for the bioeconomy”, financed by the Federal Ministry of Education and Research, BMBF) was launched in 2015 with a duration of 9 years and perpetuation envisaged. BonaRes includes 10 collaborative soil research projects and, additionally, the BonaRes Centre.

Within the BonaRes Data Centre (important infrastructure in the planned NFDI4Agri), diverse research data with mostly agricultural and soil background are collected from BonaRes collaborative projects and external scientists.  After a possible embargo expires, all data are made available in a standardized form for free reuse via the BonaRes Repository. Once the administrative and technical infrastructure has been established, the Data Centre provides services for scientists in all terms of data management. The focus here is on the publication of research data (e.g. long-term experiments, field trials, model results) to ensure availability and citeability and thus foster scientific reuse. Available data can be accessed via the BonaRes Repository. For instance: https://doi.org/10.20387/BonaRes-BSVY-R418.

Due to the high diversity of agricultural data provided via our repository, we have developed individually tailored strategies to make them citable for 1.) finalized data, 2.) regularly updating and 3.) data collections with related tables. The challenge is that the author's rights (license CC-BY) must be preserved and yet a user-friendly citation of even large amounts of data must be ensured. We will present our BonaRes DOI concept by means of use cases and will be looking forward to discuss it with the professional community.

How to cite: Svoboda, N., Specka, X., Hoffmann, C., and Heinrich, U.: Towards publishing soil and agricultural research data: the BonaRes DOI, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-13531, https://doi.org/10.5194/egusphere-egu2020-13531, 2020.

EGU2020-17116 | Displays | ESSI3.6 | Highlight

Mirror, mirror…is GBIF the FAIRest of them all?

Kyle Copas

GBIF—the Global Biodiversity Information Facility—and its network of more than 1,500 institutions maintain the world's largest index of biodiversity data (https://www.gbif.org), containing nearly 1.4 billion species occurrence records. This infrastructure offers a model of best practices, both technological and cultural, that other domains may wish to adapt or emulate to ensure that its users have free, FAIR and open access to data.

The availability of community-supported data and metadata standards in the biodiversity informatics community, combined with the adoption (in 2014) of open Creative Commons licensing for data shared with GBIF, established the necessary preconditions for the network's recent growth.

But GBIF's development of a data citation system based on the uses of DOIs—Digital Object Identifiers—has established an approach for using unique identifiers to establish direct links between scientific research and the underlying data on which it depends. The resulting state-of-the-art system tracks uses and reuses of data in research and credits data citations back to individual datasets and publishers, helping to ensure the transparency of biodiversity-related scientific analyses.

In 2015, GBIF began issuing a unique Digital Object Identifier (DOI) for every data download. This system resolves each download to a landing page containing 1) the taxonomic, geographic, temporal and other search parameters used to generate the download; 2) a quantitative map of the underlying datasets that contributed to the download; and 3) a simple citation to be included in works that rely on the data.

When authors cite these download DOIs, they in effect assert direct links between scientific papers and underlying data. Crossref registers these links through Event Data, enabling GBIF to track citation counts automatically for each download, dataset and publisher. These counts expand to display a bibliography of all research reuses of the data.This system improves the incentives for institutions to share open data by providing quantifiable measures demonstrating the value and impact of sharing data for others' research.

GBIF is a mature infrastructure that supports a wide pool of researchers publish two peer-reviewed journal articles that rely on this data every day. That said, the citation-tracking and -crediting system has room for improvement. At present, 21% of papers using GBIF-mediated data provide DOI citations—which represents a 30% increase over 2018. Through outreach to authors and collaboration with journals, GBIF aims to continue this trend.

In addition, members of the GBIF network are seeking to extend citation credits to individuals through tools like Bloodhound Tracker (https://www.bloodhound-tracker.net) using persistent identifiers from ORCID and Wikidata IDs. This approach provides a compelling model for the scientific and scholarly benefits of treating individual data records from specimens as micro- or nanopublications—first-class research objects that advancing both FAIR data and open science.

How to cite: Copas, K.: Mirror, mirror…is GBIF the FAIRest of them all?, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-17116, https://doi.org/10.5194/egusphere-egu2020-17116, 2020.

The Enabling FAIR Data project is an international, community-driven effort in the Earth, space, and environmental sciences that promotes that the data and software supporting our research is to be deposited in a community-accepted, trusted repository and cited in the paper.  Journals will no longer accept data only placed in the supplemental information of the paper.  The supplement is not an archive and does not provide the necessary information about the data, nor is there any way to discover the data separate from the paper.  Repositories provide the critical infrastructure in our research ecosystem, managing and preserving data and software for future researchers to discovery and use.

As a signatory of the Enabling FAIR Data Commitment Statement repositories agree to comply with the defined tenets.  Not all repositories provide the same level of services to researchers or their data holdings. Many researchers find it difficult to select the right repository and understand the process for depositing their data.  Through better coordination between journals and repositories journals can guide researchers to the right repository for deposition.  This is a significant benefit to authors, but there are unintended challenges that result. Here we will discuss the Enabling FAIR Data project, the successes, and the continued effort necessary to make sure our data is treated as a “world heritage.”

How to cite: Stall, S.: Enabling FAIR Data - The Importance of our Scientific Repositories, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-17993, https://doi.org/10.5194/egusphere-egu2020-17993, 2020.

EGU2020-22432 | Displays | ESSI3.6

The AuScope Geochemistry Laboratory Network

Alex Prent, Brent McInnes, Andy Gleadow, Suzanne O'Reilly, Samuel Boone, Barry Kohn, Erin Matchan, and Tim Rawling

AuScope is an Australian consortium of Earth Science institutes cooperating to develop national research infrastructure. AuScope received federal funding in 2019 to establish the AuScope Geochemistry Laboratory Network (AGN), with the objective of coordinating FAIR-based open data initiatives, support user access to laboratory facilities, and strengthen analytical capability on a national scale. 

Activities underway include an assessment of best practices for researchers to register samples using the International Geo Sample Number (IGSN) system in combination with prescribed minima for meta-data collection. Initial activities will focus on testing meta-data schema on high value datasets such as geochronology (SHRIMP U-Pb, Curtin University), geochemistry (Hf-isotopes, Macquarie University) and low-temperature thermochronology analyses (fission track/U-He, University of Melbourne). Collectively, these datasets will lead to a geochemical data repository in the form of an Isotopic Atlas eResearch Platform that is available to the public via the AuScope Discovery Portal. Over time, the repository will aggregate a large volume of publicly funded geochemical data, providing a key resource in quantitatively understanding the evolution of Earth system processes that have shaped the Australian continent and its resources.

How to cite: Prent, A., McInnes, B., Gleadow, A., O'Reilly, S., Boone, S., Kohn, B., Matchan, E., and Rawling, T.: The AuScope Geochemistry Laboratory Network, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-22432, https://doi.org/10.5194/egusphere-egu2020-22432, 2020.

EGU2020-22533 | Displays | ESSI3.6

Best Practices: The Value and Dilemma of Domain Repositories

Kerstin Lehnert, Lucia Profeta, Annika Johansson, and Lulin Song

Modern scientific research requires open and efficient access to well-documented data to ensure transparency and reproducibility, and to build on existing resources to solve scientific questions of the future. Open access to the results of scientific research - publications, data, samples, code - is now broadly advocated and implemented in policies of funding agencies and publishers because it helps build trust in science, galvanizes the scientific enterprise, and accelerates the pace of discovery and creation of new knowledge. Domain specific data facilities offer specialized services for data curation that are tailored to the needs of scientists in a given domain, ensuring rich, relevant, and consistent metadata for meaningful discovery and reuse of data, as well as data formats and encodings that facilitate data access, data integration, and data analysis for disciplinary and interdisciplinary applications. Domain specific data facilities are uniquely poised to implement best practices that ensure not only the Findability and Accessibility of data under their stewardship, but also their Interoperability and Reusability, which requires detailed data type specific documentation of methods, including data acquisition and processing steps, uncertainties, and other data quality measures. 

The dilemma for domain repositories is that the rigorous implementation of such Best Practices requires substantial effort and expertise, which becomes a challenge when usage of the repository outgrows its resources. Rigorous implementation of Best Practices can also cause frustration of users, who are asked to revise and improve their data submissions, and may make them deposit their data in other, often general repositories that do not perform such rigorous review and therefore minimize the burden of data deposition. 

We will report on recent experiences of EarthChem, a domain specific data facility for the geochemical and petrological science community. EarthChem is recommended by publishers as a trusted repository for the preservation and open sharing of geochemical data. With the implementation of the FAIR Data principles at multiple journals that publish geochemical and petrological research over the past year, the number, volume, and diversity of data submitted to the EarthChem Library has grown dramatically and is challenging existing procedures and resources that do not scale to the new level of usage. Curators are challenged to meet expectations of users for immediate data publication and DOI assignment, and to process submissions that include new data types, are poorly documented, or contain code, images, and other digital content that is outside the scope of the repository. We will discuss possible solutions ranging from tiered data curation support, collaboration with other data repositories, and engagement with publishers and editors to enhance guidance and education of authors.

 

 

How to cite: Lehnert, K., Profeta, L., Johansson, A., and Song, L.: Best Practices: The Value and Dilemma of Domain Repositories, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-22533, https://doi.org/10.5194/egusphere-egu2020-22533, 2020.

EGU2020-16466 | Displays | ESSI3.6 | Highlight

Long-tail data curation in the times of the FAIR Principles and Enabling FAIR Data – challenges and best practices from GFZ Data Services

Damian Ulbricht, Kirsten Elger, Boris Radosavljevic, and Florian Ott

Following the FAIR principles, research data should be Findable, Accessible, Interoperable and Reuseable. Publishing research output under these principles requires to generate machine-readable metadata and to use persistent identifiers for cross-linking with descriptive articles, related software for processing or physical samples that were used to derive the data. In addition, research data should be indexed with domain keywords to facilitate discovery. Software solutions are required that help scientists in generating metadata, since metadata models tend to be complex and the serialisation into a format for metadata dissemination is a difficult task, especially in the long-tail communities.

GFZ Data Services is a domain repository for geosciences data, hosted at GFZ German Research Centre for Geosciences, that assigns DOIs to data and scientific software since 2004. The repository has a focus on the curation of long-tail data but also provides DOI minting services for several global monitoring networks/observatories in geodesy and geophysics (e.g. INTERMAGNET, IAG Services ICGEM and IGETS, GEOFON) and collaborative projects (e.g. TERENO, EnMAP, GRACE, CHAMP). Furthermore, GFZ is allocating agent for IGSN, a globally unique persistent identifier for physical samples with discovery functionality of digital sample descriptions via the internet. GFZ Data Services will also contribute to the National Research Data Infrastructure Consortium for Earth System Sciences (NFDI4Earth) in Germany.

GFZ Data Services increases the interoperability of long-tail data by (1) the provision of comprehensive domain-specific data description via standardised and machine-readable metadata complemented with controlled “linked-data” domain vocabularies; (2) complementing the metadata with technical data descriptions or reports; and (3) embedding the research data in wider context by providing cross-references through Persistent Identifiers (DOI, IGSN, ORCID, Fundref) to related research products and people or institutions involved.

A key tool for metadata generation is the GFZ Metadata Editor that assists scientists to create metadata in different metadata schemas that are popular in the Earth sciences (ISO19115, NASA GCMD DIF, DataCite). Emphasis is placed on removing barriers, in particular the editor is publicly available on the internet without registration, a copy of the metadata can be saved to and loaded from the local hard disk and scientists are not requested to provide information that may be generated automatically. To improve usability, form fields are translated into the scientific language and we offer a facility to search structured vocabulary lists. In addition, multiple geospatial references can be entered via an interactive mapping tool, which helps to minimize problems with different conventions to provide latitudes and longitudes.

Visiblity of the data is established through registration of the metadata at DataCite and the dissemination of metadata in standard protocols. The DOI Landing Pages embed metadata in Schema.org to facilitate discovery through internet search engines like the Google Dataset Search. In addition, we feed links of data and related research products into Scholix, which allows to link data publications and scholarly literature, even when the data are published years after the article.

How to cite: Ulbricht, D., Elger, K., Radosavljevic, B., and Ott, F.: Long-tail data curation in the times of the FAIR Principles and Enabling FAIR Data – challenges and best practices from GFZ Data Services, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-16466, https://doi.org/10.5194/egusphere-egu2020-16466, 2020.

EGU2020-7534 | Displays | ESSI3.6

The CDGP Repository for Geothermal Data

Mathieu Turlure, Marc Schaming, Alice Fremand, Marc Grunberg, and Jean Schmittbuhl

The CDGP Repository for Geothermal Data

The Data Center for Deep Geothermal Energy (CDGP – Centre de Données de Géothermie Profonde, https://cdgp.u-strasbg.fr) was launched in 2016 by the LabEx G-EAU-THERMIE PROFONDE (http://labex-geothermie.unistra.fr) to preserve, archive and distribute data acquired on geothermal sites in Alsace. Since the beginning of the project, specific procedures are followed to respect international requirements for data management. In particular, FAIR recommendations are used to distribute Findable, Accessible, Interoperable and Reusable data.

Data currently available on the CDGP mainly consist of seismological and hydraulic data acquired at the Soultz-sous-Forêts geothermal plant pilot project. Data on the website are gathered in episodes. Episodes 1994, 1995, 1996, and 2010 from Soultz-sous-Forêts have been recently added to the episodes already available on the CDGP (1988, 1991, 1993, 2000, 2003, 2004 and 2005). All data are described with metadata and interoperability is promoted with use of open or community-shared data formats: SEED, csv, pdf, etc. Episodes have DOIs.

To secure Intellectual Property Rights (IPR) set by data providers that partly come from Industry, an Authentication, Authorization and Accounting Infrastructure (AAAI) grants data access depending to distribution rules and user’s affiliation (i.e. academic, industrial, …).

The CDGP is also a local node for the European Plate Observing System (EPOS) Anthropogenic Hazards platform (https://tcs.ah-epos.eu). The platform provides an environment and facilities (data, services, software) for research onto anthropogenic hazards, especially related to the exploration and exploitation of geo-resources. Some episodes from Soultz-sous-Forêts are already available and the missing-ones will be soon on the platform.

The next step for the CDGP is first to complete data from Soultz-sous-Forêts. Some data are still missing and must be recovered from the industrial partners. Then, data from the other geothermal sites in Alsace (Rittershoffen, Illkirch, Vendenheim) need to be collected in order to be distributed. Finally, with other French data centers, we are on track to apply the CoreTrustSeal certification (ANR Cedre).

The preservation of data can be very challenging and time-consuming. We had to deal with obsolete tapes and formats, even incomplete data. Old data are frequently not well documented and the identification of owner is sometimes difficult. However, the hard work to retrieve, collect old geothermal data and make them FAIR is necessary for new analysis and the valorization of these patrimonial data. The re-use of data (e.g. Cauchie et al, 2020) demonstrates the importance of the CDGP.

How to cite: Turlure, M., Schaming, M., Fremand, A., Grunberg, M., and Schmittbuhl, J.: The CDGP Repository for Geothermal Data, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-7534, https://doi.org/10.5194/egusphere-egu2020-7534, 2020.

EGU2020-11998 | Displays | ESSI3.6

The evolution of data and practices within a single mission Science Data Center.

Kristopher Larsen, Kim Kokkonen, Adrian Gehr, Julie Barnum, James Craft, and Chris Pankratz

Now entering it’s fifth year of on-orbit operations, the Magnetospheric MultiScale (MMS) Mission has produced over eleven million data files, totaling nearly 180 terabytes (as of early 2020) that are available to the science team and heliophysics community. MMS is a constellation of four identical satellites, each with twenty-five instruments across five distinct instrument teams, examining the interaction of the solar wind with Earth’s magnetic field. Each instrument team developed their data products in compliance with standards set by the mission’s long term data repository, NASA’s Space Physics Data Facility (SPDF). The Science Data Center at the Laboratory for Atmospheric and Space Physics at the University of Colorado is responsible for producing and distributing these data products to both the project’s science team as well as the global scientific community.

                This paper will highlight the challenges the MMS SDC has found with maintaining a data repository during an extended mission, from overall data volumes that preclude providing access to every version of each data product (currently nearing one petabyte for MMS) to adjusting to changing standards and publication requirements. We will also discuss the critical need for cooperation between a mission’s science team, instrument teams, data production, and repositories in order to ensure the data meets the needs of the science community both today and in the future, particularly after the end of a given mission.

How to cite: Larsen, K., Kokkonen, K., Gehr, A., Barnum, J., Craft, J., and Pankratz, C.: The evolution of data and practices within a single mission Science Data Center., EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-11998, https://doi.org/10.5194/egusphere-egu2020-11998, 2020.

EGU2020-12088 | Displays | ESSI3.6

The Magnetics Information Consortium (MagIC) Data Repository: Successes and Continuing Challenges

Nicholas Jarboe, Rupert Minnett, Catherine Constable, Anthony Koppers, and Lisa Tauxe

MagIC (earthref.org/MagIC) is an organization dedicated to improving research capacity in the Earth and Ocean sciences by maintaining an open community digital data archive for rock and paleomagnetic data with portals that allow users access to archive, search, visualize, download, and combine these versioned datasets. We are a signatory of the Coalition for Publishing Data in the Earth and Space Sciences (COPDESS)'s Enabling FAIR Data Commitment Statement and an approved repository for the Nature set of journals. We have been in collaboration with EarthCube's GeoCodes data search portal, adding schema.org/JSON-LD headers to our data set landing pages and suggesting extensions to schema.org when needed. Collaboration with the European Plate Observing System (EPOS)'s Thematic Core Service Multi-scale laboratories (TCS MSL) is ongoing with MagIC sending its contributions' metadata to TCS MSL via DataCite records.

Improving and updating our data repository to meet the demands of the quickly changing landscape of data archival, retrieval, and interoperability is a challenging proposition. Most journals now require data to be archived in a "FAIR" repository, but the exact specifications of FAIR are still solidifying. Some journals vet and have their own list of accepted repositories while others rely on other organizations to investigate and certify repositories. As part of the COPDESS group at Earth Science Information Partners (ESIP), we have been and will continue to be part of the discussion on the needed and desired features for acceptable data repositories.

We are actively developing our software and systems to meet the needs of our scientific community. Some current issues we are confronting are: developing workflows with journals on how to publish the journal article and data in MagIC simultaneously, sustainability of data repository funding especially in light of the greater demands on them due to data policy changes at journals, and how to best share and expose metadata about our data holdings to organizations such as EPOS, EarthCube, and Google.

How to cite: Jarboe, N., Minnett, R., Constable, C., Koppers, A., and Tauxe, L.: The Magnetics Information Consortium (MagIC) Data Repository: Successes and Continuing Challenges, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-12088, https://doi.org/10.5194/egusphere-egu2020-12088, 2020.

EGU2020-13663 | Displays | ESSI3.6

The mobile Drilling Information System (mDIS) for core repositories

Knut Behrends, Katja Heeschen, Cindy Kunkel, and Ronald Conze

The Drilling Information System (DIS) is a data entry system for field data, laboratory data and sampling data. The International Continental Scientific Drilling Program (ICDP) provides the system to facilitate data management of drilling projects during field work and afterwards. Previously, a legacy DIS client-server application was developed in 1998, and has been refined over the years. The most recent version was released in 2010. However, legacy DIS was locked-in to very specific versions of the Windows- and Office platforms that are non-free, and, more importantly, are no longer supported by Microsoft.

 

Therefore we have developed a new version of the DIS called the mobile DIS, or mDIS. It is entirely based on open-source components and is platform-independent. We have introduced a basic (beta) version of mDIS at EGU 2019. That version was designed for fieldwork. At EGU 2020 we present an extended version designed for core repositories.

 

The basic or expedition mDIS manages basic datasets gained during the field work of a drilling project. These datasets comprise initial measurements of the recovered rock samples, such as core logs, special on-site sample requests, and drilling engineering data. It supports label-printing including QR codes, and the automatic assignment of unique International Geo Sample Numbers (IGSN). The data are available online for all project scientists on site as well as offsite.

 

The curation mDIS, however, satisfies additional requirements of core repositories, which store drill cores for the long term. Additional challenges for the mDIS that occur during long-term sample curation include: (a) the import of large datasets from the expedition mDIS, (b) complex inventory management requirements for physical storage locations, such as shelves, racks, or even buildings, used by the repositories, (c) mass printing of custom labels and custom reports, (d) managing researchers' sample requests, sample curation and sample distribution, (e) providing access to science data according to FAIR principles.

How to cite: Behrends, K., Heeschen, K., Kunkel, C., and Conze, R.: The mobile Drilling Information System (mDIS) for core repositories, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-13663, https://doi.org/10.5194/egusphere-egu2020-13663, 2020.

EGU2020-18498 | Displays | ESSI3.6

The Antarctic Seismic Data Library System (SDLS): fostering collaborative research through Open Data and FAIR principles

Chiara Sauli, Paolo Diviacco, Alessandro Busato, Alan Cooper, Frank O. Nitsche, Mihai Burca, and Nikolas Potleca

Antarctica is one of the most studied areas on the planet for its profound effects on the Earth’s climate and ocean systems. Antarctic geology keeps records of events that took place in remote times but that can spread light on climate phenomena taking place today. It is therefore of overwhelming importance, to make all data in the area available to the widest scientific community. The remoteness, extreme weather conditions, and environmental sensitivity of Antarctica make new data acquisition complicated and existing seismic data very valuable. It is, therefore, critical that existing data are findable, accessible and reusable..

The Antarctic Seismic Data Library System (SDLS) was created in 1991 under the mandates of the Antarctic Treaty System (ATS) and the auspices of the Scientific Committee on Antarctic Research (SCAR), to provide open access to Antarctic multichannel seismic-reflection data (MCS) for use in cooperative research projects. The legal framework of the ATS dictates that all institutions that collect MCS data in Antarctica must submit their MCS data to the SDLS within 4 years of collection and remain in the library under SDLS guidelines until 8 years after collection. Thereafter, the data switch to unrestricted use in order to trigger and foster as much as possible collaborative research within the Antarctic research community.  In this perspective, the SDLS developed a web portal (http://sdls.ogs.trieste.it) that implements tools that allow all data to be discovered, browsed, accessed and downloaded directly from the web honoring at the same time the ATS legal framework and the Intellectual protection of data owners. The SDLS web portal, is based on the SNAP geophysical web-based data access framework developed by Istituto Nazionale di Oceanografia e di Geofisica Sperimentale - OGS, and offers all standard OGC compliant metadata models, and OGC compliant data access services. It is possible to georeference, preview and even perform some processing on the actual data on the fly. Datasets are assigned DOIs so that they can be referenced  from within research papers or other publications.. We will present in details the SDLS web based system in the light of Open Data and FAIR principles, and the SDLS planned future developments.

How to cite: Sauli, C., Diviacco, P., Busato, A., Cooper, A., Nitsche, F. O., Burca, M., and Potleca, N.: The Antarctic Seismic Data Library System (SDLS): fostering collaborative research through Open Data and FAIR principles, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-18498, https://doi.org/10.5194/egusphere-egu2020-18498, 2020.

EGU2020-9811 | Displays | ESSI3.6

Facilitating global access to a high-volume flagship climate model dataset: the MPI-M Grand Ensemble experience

Karsten Peters, Michael Botzet, Veronika Gayler, Estefania Montoya Duque, Nicola Maher, Sebastian Milinski, Katharina Berger, Fabian Wachsmann, Laura Suarez-Gutierrez, Dirk Olonscheck, and Hannes Thiemann

In a collaborative effort, data management specialists at the German Climate Computing Centre (Deutsches Klimarechenzentrum, DKRZ) and researchers at the Max Planck Institute for Meteorology (MPI-M) are joining forces to achieve long-term and effective global availability of a high-volume flagship climate model dataset: the MPI-M Grand Ensemble (MPI-GE, Maher et al. 20191), which is the largest ensemble of a single state-of-the-art comprehensive climate model (MPI-ESM1.1-LR) currently available. The MPI-GE has formed the basis for a number of scientific publications over the past 4 years2. However, the wealth of data available from the MPI-GE simulations was essentially invisible to potential data users outside of DKRZ and MPI-M.

In this contribution, we showcase the adopted strategy, experiences made and the current status of FAIR long-term preservation of the MPI-GE dataset in the World Data Center for Climate (WDCC), hosted at DKRZ. The importance of synergistic cooperation between domain-expert data providers and knowledgeable repository staff will be highlighted.

Recognising the demand for MPI-GE data access outside of its native environment, the development of a strategy to make MPI-GE data globally available began in mid 2018. A two-stage dissemination/preservation process was decided upon.

In a first step, MPI-GE data would be published and made globally available via the Earth System Grid Federation (ESGF) infrastructure. Second, the ESGF-published data would be transferred to DKRZ’s long-term and FAIR archiving service WDCC. Datasets preserved in the WDCC can be made accessible via ESGF - global access via the established system would thus still be ensured.

To date, the first stage of the above process is completed and data are available via the ESGF3. Data published in the ESGF has to comply with strict data standards in order to ensure efficient data retrieval and interoperability of the dataset. Standardization of the MPI-GE data required selection of an applicable data standard (CMIP5 in this case) and an appropriate variable subset, adaptation and application of fit-for-purpose DKRZ-supplied post-processing software and of course the post-processing of the data itself. All steps required dedicated communication and collaboration between DKRZ and MPI-M staff and required significant time resources. Currently, some 87 TB, comprised of more than 55 000 records, of standardized MPI-GE data are available for search and download from the ESGF. About three to four thousand records with an accumulated volume of several hundred GB are downloaded by ESGF users each month.

The long-term archival of the standardized MPI-GE data using DKRZ’s WDCC-service is planned to begin within the first half of 2020. All preparatory work done so far, especially the data standardization, significantly reduces the effort and resources required for achieving FAIR MPI-GE data preservation in the WDCC.

1Maher, N. et al. ( 2019). J. Adv. Model Earth Sy., 11, 2050– 2069. https://doi.org/10.1029/2019MS001639

2http://www.mpimet.mpg.de/en/grand-ensemble/publications/

3https://esgf-data.dkrz.de/projects/mpi-ge/



How to cite: Peters, K., Botzet, M., Gayler, V., Montoya Duque, E., Maher, N., Milinski, S., Berger, K., Wachsmann, F., Suarez-Gutierrez, L., Olonscheck, D., and Thiemann, H.: Facilitating global access to a high-volume flagship climate model dataset: the MPI-M Grand Ensemble experience, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-9811, https://doi.org/10.5194/egusphere-egu2020-9811, 2020.

EGU2020-12006 | Displays | ESSI3.6

Community Built Infrastructure: The Dataverse Project

Danny Brooke

For more than a decade, the Dataverse Project (dataverse.org) has provided an open-source platform used to build data repositories around the world. Core to its success is its hybrid development approach, which pairs a core team based at the Institute for Quantitative Social Science at Harvard University with an empowered, worldwide community contributing code, documentation, and other efforts towards open science. In addition to an overview of the platform and how to join the community, we’ll discuss recent and future efforts towards large data support, geospatial data integrations, sensitive data support, integrations with reproducibility tools, access to computation resources, and many other useful features for researchers, journals, and institutions. 

How to cite: Brooke, D.: Community Built Infrastructure: The Dataverse Project, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-12006, https://doi.org/10.5194/egusphere-egu2020-12006, 2020.

EGU2020-20826 | Displays | ESSI3.6

Re-envisioning data repositories for the 21st century

Stephen Diggs and Danie Kinkade

Finding and integrating geoscience data that are fit for use can alter the scope and even the type of science exploration undertaken. Most of these difficulties in data discovery and use are due to a technical incompatibilities in the various data repositories that comprise the data system for a particular scientific problem.  We believe these obstacles to be unnecessary attributes of individual data centers that were created more than 20 years ago. This aspirational presentation charts a new way forward for data curators and users alike, and by employing technical advances in adjacent disciples, promises a new era of scientific discovery enabled by re-envisioned 21st century data repositories.

How to cite: Diggs, S. and Kinkade, D.: Re-envisioning data repositories for the 21st century, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-20826, https://doi.org/10.5194/egusphere-egu2020-20826, 2020.

Internationally Earth and environmental Science datasets have the potential to contribute significantly to resolving major societal challenges such as those outlined in the United Nations 2030 Sustainable Development Goals (SDGs). By 2030, we know that leading-edge computational infrastructures will be exascale (repositories, supercomputers, cloud, etc) and that these will facilitate realistic resolution of research challenges at scales and resolutions that cannot be undertaken today. Hence, by 2030, the capability for Earth and environmental science researchers to make valued contributions will depend on developing a global capacity to integrate data online from multiple distributed, heterogeneous repositories. Are we on the right path to achieve this?

Today, online, data repositories are a growing part of the research infrastructure ecosystem: their number and diversity has been slowly increasing over recent years to meet the demands that traditional institutional or other generic repositories can no longer satisfy. Although more specialised repositories are available (e.g., those for petascale volume data sets and domain specific long tail, complex data sets), funding for these specialised repositories is rarely long term.

Through initiatives such as the Commitment Statement from the Coalition for Publishing Data in the Earth and Space Sciences, publishers are now requiring that datasets that support a publication be curated and stored in a ‘trustworthy’ repository that can provide a DOI and a landing page for that dataset, and if possible, can also provide some domain quality assurance to ensure that data sets are not only Findable and Accessible, but also Interoperable and Reusable. But the demand for suitable domain expertise to provide the “I” and the “R” is far exceeding what is available. As a last resort, frustrated researchers are simply depositing the datasets that support their publications into generic repositories such as Figshare and Zenodo, which simply store the file of the data: rarely are domain-specific QA/QC procedures applied to the data. 

These generic repositories do ensure that data is not sitting on inaccessible personal c-drives and USB drives, but the content is rarely interoperable. This can only be achieved by repositories that have the domain expertise to curate the data properly, and ensure that the data meets minimum community standards and specifications that will enable online aggregation into global reference sets. In addition, most researchers are only depositing the files that support a particular publication, and as these files can be highly processed and generalised they difficult to reuse outside of the context of the specific research publication.

To achieve the ambition of Earth and environmental science datasets being reusable and interoperable and make a major contribution to the SDGs by 2030, then today we need: 

      More effort and coordination in the development of international community standards to enable technical, semantic and legal interoperability of datasets; 
      To ensure that publicly funded research data are also available without further manipulation or conversion to facilitate their broader reuse in scientific research particularly as by 2030 as we will also have greater computational capacity to analyse data at scales and resolutions currently not achievable.

 

How to cite: Wyborn, L.: Towards World-class Earth and Environmental Science Research in 2030: Will Today’s Practices in Data Repositories Get Us There?, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-22478, https://doi.org/10.5194/egusphere-egu2020-22478, 2020.

CC BY 4.0