A digital painting of a robot measuring myrtle leaves
Home » Machine Learning Unfolds New Insights Into Leaf Traits and Climate

Machine Learning Unfolds New Insights Into Leaf Traits and Climate

A novel machine learning-based study using digitized herbarium collections provides insights into the relationship between leaf size and climate across different plant species, and demonstrates the benefit of human involvement in training the AI model.

A pioneering study using artificial intelligence (AI) has demonstrated a novel approach to investigating the relationship between leaf size and climate in different plant species. The research, conducted by Wilde and colleagues and published in the American Journal of Botany, uses machine learning to analyze vast amounts of data from digitized herbarium collections. This innovative method could offer unprecedented insights into how plants adapt to their environments.

Herbarium collections are libraries of preserved plant specimens, often dating back centuries. They are a treasure trove of biological information, but their scale and complexity have traditionally made them challenging to analyze. Machine learning, a branch of AI, could automate the measurement process, dramatically increasing the data available for study.

The research team used a type of AI known as Convolutional Neural Networks (CNNs), which are particularly adept at analyzing images. They trained these CNNs to identify and measure leaves in images of plant specimens from two genera: Syzygium, a group of flowering plants in the myrtle family, and Ficus, more commonly known as figs.

Light green Syzygium leaves that look like your archetypal oval leaf shape.
Syzygium leaves. Image: Canva

“In this study, we wanted to find and count the pixels in leaves, as a measure of area, width and length. We started with simple leaves, with smooth margins, as a proof-of-concept, but would very much like to extend this to more complex leaf shapes,” the authors said in an email.

The team trained two versions of the CNN model, one with a random selection of images and another with user-selected images. The performance of these models was then tested using a set of validation images. Subsequently, the validated models were applied to over 3,800 digitized specimens from the National Herbarium of New South Wales, Australia.

“These specimens were all photographed recently in a huge digitization initiative. This meant the pictures were taken under uniform conditions, with similar equipment, configured in a constant way, so they were nicely standardized. Similar digitization initiatives are under way at Herbaria around the world, so soon, it will be possible to apply similar methods to a vast number of pictures (with adjustments for resolution, etc),” said the authors

The results were promising. The user-selected training approach was more effective, finding more leaves and a wider range of leaf sizes than the randomly trained model. This indicates that a degree of expert human involvement can improve the efficiency of AI in such complex tasks. 

“The human-in-the-loop model was particularly useful here where we wanted to minimize the amount of training data required to generate a robust model. If in the future we remain limited by training data, there might long be a role for these approaches. If, on the other hand, global efforts to train models result in large libraries of training data, perhaps volume will substitute for the benefits of human selectivity,” said the authors

When it came to the link between leaf size and climate, the CNN models confirmed that, across different species, larger leaves were associated with warmer, wetter climates, consistent with previous studies. However, the relationship was not as clear-cut within individual species, suggesting that other factors, such as genetic variation and population history, could influence leaf size.

“We see that if we move from a point in southeastern Australia to a point in warmer northeastern Australia, on average, the size of a Syzygium leaf gets bigger. This seems to be mainly because there are species with bigger leaves in the north. If we look at leaves from a single, widely distributed, species at the same two locations, on average, we find the leaves to be around the same size,” said the authors in their email.

“The explanation is probably related to evolutionary history or processes, within species. For example, within a species, populations that are far apart are still likely connected by gene flow. This can be indirect, for example, via relatively short movements of pollen between nearby populations, over generations. This means that genetic alleles that influence leaf size are likely being shared between populations, homogenizing leaf sizes among populations.

“If selection favored larger leaves in the north, the associated benefit of larger leaves would have to be quite strong in order to overcome the effect of gene flow. It is also possible that some widely distributed species have undergone recent population expansion, which would also lead to populations having similar trait values across large geographic areas. Therefore both evolutionary processes (e.g. pollen flow) and histories (e.g. population expansion) could lead to different observed trait-climate links within species.”

If you’re following the news regarding AI, then the results might interesting, but the method could be opaque. It’s essential not only that you know things, but also that you know how you know things. This is a problem that Wilde and colleagues address in the in their paper.

“Complex machine learning models (neural networks, etc) are less easily understood than the statistical models that have been pervasive in plant and biological sciences up until now. However, we are arriving at a point where there is a solid foundation for using them, and benefiting from their enormous inferential power, in ways that are reliable and robust,” the authors said in their email.

“This foundation for the sound use of ML rests on the processes of model validation and testing. This involves taking data (images) that are not used to train a model, and adding labels to those data of the features that a well-trained model should find. We can then ask, quantitatively, how well a model performs when shown the data. Does it find all of the leaves that were there to be found? Does it manage to avoid calling things leaves that were not leaves? Where it does correctly find a leaf, does it reach an accurate estimate of its size, in terms of pixels? If a machine learning model can perform well in relation to these questions, for a sufficiently large and representative set of training data, we become confident in the model.”

Once you have a robust method for measuring, then you can also expand what you measure. “We have also used machine learning models to find a range of other structures in herbarium specimens, including buds, fruit, and flowers – so there is definitely much scope to extend to other structures,” say the authors in their email. There is also hope of being able to expand the kind of measurements you can make they say. “We have thought about using machine learning to make field measurements of other leaf traits that are poorly represented by preserved specimens, but this is work still under development!”

READ THE ARTICLE

Wilde, B.C., Bragg, J.G. and Cornwell, W. (2023) “Analyzing trait-climate relationships within and among taxa using machine learning and herbarium specimens,” American Journal of Botany, p. e16167. Available at: https://doi.org/10.1002/ajb2.16167.

Alun Salt

Alun (he/him) is the Producer for Botany One. It's his job to keep the server running. He's not a botanist, but started running into them on a regular basis while working on writing modules for an Interdisciplinary Science course and, later, helping teach mathematics to Biologists. His degrees are in archaeology and ancient history.

Read this in your language

The Week in Botany

On Monday mornings we send out a newsletter of the links that have been catching the attention of our readers on Twitter and beyond. You can sign up to receive it below.

@BotanyOne on Mastodon

Loading Mastodon feed...

Audio


Archive