Method

The development of artificial intelligence (AI) has considerably broadened the applications of automatic methods to a wide range of natural language processing tasks. The SDG Prospector demonstrates that when used efficiently, AI can be a great ally to achieve the 2030 Agenda.
The SDG Prospector relies on a new generation of artificial intelligence models. In contrast with approaches using keyword counting, language models interpret words depending on the context from which they originate, and this allow to better understand the complexity of a text.

How does it work?

The SDG Prospector builds upon DistilRoBERTa, a light version of a model first developed by Meta.
  • Step 1

    DistilRoBERTa was pre-trained on millions of texts: it integrates a very diverse vocabulary, and understands the peculiarities of the English language, its conjugation and grammar rules. At this stage, the model was not yet specialized in the recognition of the SDGs.

  • Step 2

    To be able to detect the SDGs in a text, the SDG Prospector needed further training.
    We constructed an expert learning base composed of 9000+ paragraphs related
    to the SDGs.

  • Step 3

    Thanks to its deep neural network, its excellent command of English and the expert learning base, the SDG Prospector automatically learnt to detect whenever a text refers to the 2030 Agenda, and to distinguish the SDGs from one another.

  • Step 4

    After a series of robustness tests and fine-tuning, the SDG Prospector was born.

Importance of the learning base

The accuracy of the SDG Prospector derives directly from the quality of the learning base. Exclusively built by selecting relevant texts manually, our learning base consists of over 9,000 paragraphs related to the SDGs. For each SDG, we collected more
than 500 paragraphs, of maximum 200 words. These were mostly extracted
from specialized UN official documents and completed with texts from other sources.

Besides an exclusion universe teaches the algorithm not to confuse concepts that use the same vocabulary as the SDGs but do not refer to the 2030 Agenda. To illustrate, supporting the business climate has nothing to do with the fight against climate change and its impacts.

How does the SDG Prospector analyse documents?

The SDG Prospector builds upon DistilRoBERTa, a light version ofa model first developed by Meta.
  1. The SDG Prospector parses the text in 100-word paragraphs.
  2. It assesses whether each paragraph is related to one, several or no SDG.
  3. It automatically sums up SDG references.
  4. You can view your results in absolute or in relative values, to take into account the length of your document.