How Ceres2030 used machine learning to create an evidence map for agricultural research

When we knew how interventions were described in agricultural research, we could set about analyzing our sample of articles to find and classify specific interventions.

We found synonyms by looking at hypernyms and hyponyms, which are a type of relationship in semantics (for example, a lemon is a hyponym of fruit), and classified them into four broad categories—technical, socioeconomic, ecosystem, unclassified—and then, specifically, as 995 narrow intervention concepts.

This is a much more targeted approach to uncover important research. It gives us a wholly new way to classify and organize sciences that it is accessible to an audience interested in policy-relevant research. 

TOPIC MODELING AND GAPS IN EVIDENCE

Natural language processing enabled us to unify and explore data, despite the data came from many different places. We used topic modeling—a way of exploring text to see what is has in common with other text in the same corpus—to establish a baseline from which we could map the evolution of research from 2008 to 2018.

We can see the topics where there was a high level of research (the darker the blue in the image below, the greater the density of research papers) or where evidence and research were limited or missing. We can also create comprehensive research baselines to see the volume of research by topic, by funder, by country, and the potential relevancy of the research.

Having created a way of finding and classifying interventions in agriculture, we could automate bringing in new research, taking us closer to the possibility of real-time analysis of research for policy relevance. We used an open-source tool (Elastic Stack) to visualize queries and results. This helps make all this information accessible, visualizable, and shareable. It is also easy to add new sources of information.