Report

Mapping STI ecosystems via Open Data: overcoming the limitations of conflicting taxonomies. A case study for Climate Change Research in Denmark

Paper about the role of ecosystems "maps" to inform policy-makers in the field of Science, Technology and Innovation

Mapping STI ecosystems via Open Data: overcoming the limitations of conflicting taxonomies. A case study for Climate Change Research in Denmark

In 2022, SIRIS Academic presented a poster at the International Conference on Theory and Practice of Digital Libraries (TPDL).

The paper as it appears in the conference proceedings is available in pre-print format at this link, while a longer version, which delves more into the methodological details, is available in pre-print format at this link.

What follows is a presentation of the objectives of the paper.

To inform their decisions, policy-makers in the Science, Technology and Innovation (STI) sector typically need “maps'', to understand what are the relevant research domains and key actors within their territorial or institutional boundaries of interest. Generally, those maps need to enable effective policy-actions, so that they should generally be comprehensive to extensively cover i. the whole STI value chain (from basic research up to industrial innovation), ii. the different scientific domains of relevance and iii. all possible pertinent actors. As such, these maps should rely on different data sources that could offer the broadest possible view of STI inputs and outputs. 

Some major challenges faced at a policy level arise because many of those data sources are not openly available (undermining therefore possible participatory processes), they are not interoperable in terms of data classification schemes and institutional identification (therefore limiting transversal analyses) and they are hardly manageable by non-expert users.

In this paper, we present a proof of concept of an hypothetical analytical work to support STI policy-making which only makes use of open data to overcome the above challenges. To do so, we merge different open datasets and we analyse them with a common classification scheme.

After gathering the records from their respective data sources, we use open knowledge-bases and text mining to:

  • Identify STI documents linked with the Sustainable Development Goal (SDG) 13 - Climate Action,
  • Categorise documents within the 25 panels of the European Research Council (ERC)
  • Automatically identify thematic clusters by topic modelling

In this way, we aim at showcasing how research in emerging fields (such as the SDGs) can be gathered from open data sources and identified by means of modern, openly available AI models. Finally, we demonstrate how gaps in taxonomic classifications across datasets may be filled by means of Deep Learning textual classifiers, by using the ERC panels as a paradigmatic example.

Link on ZenodoLink on Zenodo