Machine learning reveals genes for nitrogen use efficiency in corn
URBANA, Ill. – Machine learning can pinpoint genes of importance that help crops grow with less fertilizer, according to a new study published in Nature Communications.
“Now that we can more accurately predict which corn hybrids are better at using nitrogen fertilizer in the field, we can rapidly improve this trait. Increasing nitrogen use efficiency in corn and other crops offers three key benefits by lowering farmer costs, reducing environmental pollution, and mitigating greenhouse gas emissions from agriculture,” said study author Stephen Moose, Alexander Professor of Crop Sciences at the University of Illinois at Urbana-Champaign.
Using genomic data to predict outcomes in agriculture is both a promise and challenge for biologists. Researchers are working to determine how to use vast amounts of genomic data to predict how organisms respond to changes in nutrition, toxins, and pathogen exposure—which in turn would inform crop improvement. But the implications go beyond crops, providing insights in disease prognosis, epidemiology, and public health.
However, accurately predicting complex outcomes in agriculture and medicine from genome-scale information remains a significant challenge.
As a proof-of-concept, the researchers demonstrated that machine learning models could predict genes of importance for nitrogen-use-efficiency in corn. A key first step was finding genes that respond to nitrogen in leaves of both field-grown corn plants and Arabidopsis, a small flowering plant widely used as a model organism in plant biology.
Nitrogen is a crucial nutrient for plants and the main component of fertilizer; crops that use nitrogen more efficiently grow better and require less fertilizer, which has economic and environmental benefits.
“We show that focusing on genes whose expression patterns are evolutionarily conserved across species enhances our ability to learn and predict ‘genes of importance’ to growth performance for staple crops, as well as disease outcomes in animals,” explained Gloria Coruzzi, Carroll & Milton Petrie Professor in NYU’s Department of Biology and Center for Genomics and Systems Biology and the paper’s senior author.
The researchers conducted experiments that tested whether eight “master switch” genes predicted from the machine learning model actually contribute to nitrogen-use-efficiency. They showed that altered expression of these switch genes in Arabidopsis or corn could increase plant growth in low nitrogen soils, which they tested both in the lab at NYU and in cornfields at the University of Illinois.
“Our approach exploits the natural variation of genome-wide expression and related phenotypes within or across species,” added Chia-Yi Cheng of NYU’s Center for Genomics and Systems Biology and National Taiwan University, the lead author of this study. “We show that paring down our genomic input to genes whose expression patterns are conserved within and across species is a biologically principled way to reduce dimensionality of the genomic data, which significantly improves the ability of our machine learning models to identify which genes are important to a trait.”
Moreover, the researchers proved that this evolutionarily informed machine learning approach can be applied to other traits and species by predicting additional traits in plants, including biomass and yield in both Arabidopsis and corn. They also showed that this approach can predict genes of importance to drought resistance in another staple crop, rice, as well as disease outcomes in animals through studying mouse models.
“Because we showed that our evolutionarily informed pipeline can also be applied in animals, this underlines its potential to uncover genes of importance for any physiological or clinical traits of interest across biology, agriculture, or medicine,” said Coruzzi.
In addition to Moose, Coruzzi, and Cheng, additional researchers involved in this study include co-PI Ying Li and Kranthi Varala, faculty in the Department of Horticulture and Landscape Architecture at Purdue University, as well as members of their research teams at NYU, the University of Illinois, and Purdue. The research was supported by the National Science Foundation’s Plant Genome Research Program (IOS-1339362), the U.S. Department of Agriculture National Institute of Food and Agriculture Hatch project (1013620), the USDA-NIFA predoctoral fellowship (2016-67011025167), and an NSF CompGen fellowship.