Processed soybeans are the world’s largest source of animal protein feed and the second largest source of vegetable oil.
The identification of genes that control important traits provides the basis for genetic improvements for developing crops that produce more yield to supply a growing population and are resistant to biotic (e.g., insect pests) and abiotic stresses (e.g., climate change). A transcriptome represents that small percentage of the genetic code that is transcribed into RNA molecules. By studying transcriptomes, researchers hope to determine when and where genes are turned on or off in various types of cells and tissues when exposed to different treatments. In the past decade, over 3000 samples of soybean transcriptomic data have accumulated in public repositories.

A new Review article from Dr. Thiago Venancio and coauthors from Universidade Estadual do Norte Fluminense in Brazil explores the state of the art in soybean transcriptomics resources and gene coexpression networks.
The article first introduces hybridization-based (i.e. microarrays) and sequence-based (i.e. RNA-seq) technologies and discusses the benefits of each. Most importantly, microarrays rely on species- or transcript-specific probes (i.e. short stretches of DNA or RNA) that are already known to indicate their relative expression. RNA-Seq, on the other hand can detect novel transcripts because it determines the nucleic acid sequence of a given DNA or RNA molecule, which is then identified. RNA-Seq technology can detect a higher percentage of differentially expressed genes, especially genes with low expression. For these reasons, RNA-Seq technology has begun replacing traditional microarray platforms for conducting transcriptional profiling. Major studies that have investigated soybean transcriptional programs in different tissues and conditions using both technologies are highlighted.
The authors then propose approaches integrating the huge amount of data on public repositories using gene coexpression networks (GCNs). GCNs are used for the exploration, interpretation, and visualization of the relationship among genes which work together to contribute to the expression of a particular trait (e.g., yield). “Nature loves pattern and order. In biological systems, molecular components (e.g., genes, proteins) are hierarchically organized in dense clusters commonly referred to as modules. GCNs are a powerful tool to identify modules of coexpressed genes that are likely participating in the same biological process. As genes in important crops have had their functions experimentally identified, GCNs can be used to infer functions of unknown genes based on the function of their coexpression partners. In an evolutionary perspective, these coexpression modules can be explored to identify genes that acquired new functions after duplication, and they can be compared across species to investigate conservation and divergence of orthogroups” explains Venancio.
Finally, the article identifies soybean transcriptomic resources and soybean expression data including the National Center for Biotechnology Information’s Sequence Read Archive (SRA) database – the largest publicly available repository of high throughput sequencing data and the Soybean Expression Atlas – a high-resolution gene expression database.