Plant models need high quality data for calibration and validation. Machine learning techniques are expected to take a prominent role in providing high quality image-based phenotyping data in the future. Yet machine learning typically requires large and diverse datasets to learn generalizable models and available datasets are often small and the costs associated with generating new data are high. Ubbens and coauthors address this problem using data from synthetic plants.

The authors demonstrate that machine learning models can be augmented using training data derived from rendered images of synthetic plants. Combining real with synthetic plant images as training data reduced mean absolute count error compared to using only images of real plants. Moreover, models completely trained only on synthetic rosettes were successfully applied to count leaves in real rosettes.
Rendered images of Arabidopsis rosettes were computer-generated from a descriptive model using L-systems that reproduced early developmental stages of the plant shoot based on direct observations and measurements.
The machine learning model used in this study was a platform for image-based plant phenotyping called Deep Plant Phenomics, which implements deep convolutional neural networks for of plant phenotyping, to count leaves (Ubbens and Stavness, 2017).
With the advancements made in this study, the next application could be modeling of entire plots of crops. βA simulated plot of plants could potentially make it possible to train algorithms for detecting biologically meaningful traits such as flowering time or response to stress with a reduced number of real (annotated) crop images.β