Elucidating the genomic basis and early stages of de novo gene emergence in yeasts
How new genes originate is a fundamental question in biology. Genetic novelty underlies molecular, phenotypic and organismal novelty and might even be linked to major evolutionary transitions such as the rise of eusociality. Understanding how, when and why novel genes arise is therefore essential to understand evolution at every level of biological organization. For a long time, new genes and protein functions were believed to result exclusively through tinkering and recombination, using pre-existing genes and gene parts as raw material. Consequently, processes such as duplication and divergence, gene fusion and fission, exon shuffling, or horizontal gene transfer have been extensively studied and their importance is established. Nonetheless, a radically different route to genetic novelty exists: a novel gene can evolve from entirely non-coding sequences in a process known as de novo gene emergence. Long considered so improbable as to be impossible, de novo genes have now been found in most eukaryotic lineages and can have central, even essential cellular functions. Yet much about de novo gene emergence remains unknown. Using population and evolutionary genomics approaches we study how genomic “randomness” is forged by evolution into a sequence that encodes fully functional protein with a defined structure and biological role. Our main model is the budding yeast S. cerevisiae as well as other Saccharomycotina yeasts.
Human microproteins: regulation, evolution, and role in disease
In recent years it has become evident that functional short proteins can be translated out of small Open Reading Frames (sORFs) found outside of known protein-coding genes. Such “microproteins” can have regulatory, structural or signaling roles, and can have considerable phenotypic consequences. In human, high throughput studies have identified thousands of consistently expressed microproteins, many of which are unambiguously functional, while the biological relevance of others is still uncertain. From an evolutionary standpoint, it has now been shown that a significant percentage of human microproteins lack sequence conservation. Some are even human-specific, with unequivocal evidence that they emerged entirely de novo from previously noncoding genomic regions. Furthermore, evidence is accumulating for the implication of microproteins in various diseases, including cancer. In our lab, we study human microproteins from an evolutionary and biomedical perspective, using computational tools. We want to understand why and in which contexts entirely novel human microproteins become pathogenic and how their expression is regulated. How do microprotein properties change during evolution and why do some ultimately evolve into longer full-blown proteins while others do not?