D o w nl oa de df ro m htt ps :// ac ad e mi c. ou p. co m/ gb e/ ar tic le- ab str ac t/1 0/ 11 /2 98 6/ 51 29 08 5b yg ue st on 22 N ov
GBE
TheContributionofPurifyingSelection,Linkage,andMutation BiastotheNegativeCorrelationbetweenGeneExpressionan dPolymorphismDensityinYeastPopulations
AgnieszkaMarekandKatarzynaTomala*
InstituteofEnvironmentalSciences,JagiellonianUniversity,Krakow,Poland
*Correspondingauthor:E-mail:katarzyna.tomala@uj.edu.pl.
Accepted:October10,2018
Abstract
Thenegativecorrelationbetweentherateofproteinevolutionandexpressionlevelofagenehasbeenrecognizedasauniversallawofthe evolutionarybiology(Koonin2011).Inourstudy,weapplyapopulation-
basedapproachtosystematicallyinvestigatetherelativeimportanceofunequalmutationrate,linkage,andselectionintheoriginoftheex pression-
polymorphismanticorrelation.WeanalyzedtheDNAsequenceofproteincodinggenesof24Saccharomycescerevisiaeand58Schizos accharomycespombestrains.Wefoundthathighlyexpressedgeneshadasubstantiallydecreasednumberofpolymorphicsiteswhenc omparedwithgenestranscribedlessextensively.Thisexpression-
dependentreductionwasespeciallystronginthenonsynonymoussites,althoughitwasalsopresentinthesynonymoussitesanduntransl atedregions,bothupanddownofagene.Mostimportantly,nosuchtrendwasfoundinintrons.Weusedtheseobservations,aswellasanal ysesofsitefrequencyspectraanddatafrommutationaccumu-
lationexperiments,toshowthatthepurifyingselectionactingonnonsynonymoussiteswasthemain,butnotexclusive,factorimpedingm olecularevolutionwithinthecodingsequencesofhighlyexpressedgenes.Linkagecouldnotfullyexplaintheobservedpatternofpolymor phismwithintheuntranslatedregionsandsynonymoussites,althoughthecontributionofselectionactingdirectlyonsynonymousvariant swasextremelysmall.Finally,wefoundthattheimpactofmutationalbiaswasrathernegligible.
Keywords:p roteinevolution,polymorphism,mutationbias,transcription,Saccharomycescerevisiae,Schizosaccharomycespombe.
Introduction
Thesequencesofdifferentproteinsfromthesamespeciesevolv eatdifferentrates.Studiescomparinghomologoussequencesb etweenspeciesdemonstratedthatthebestknownpredictorofth eproteindivergencerateisitsexpres-
sionlevel(P,aletal.2001;RochaandDanchin2004;Drummonda ndWilke2008).Thisresultcounterssomefor-
merexpectations,especiallythoseassumingthatthefunc- tionalimportanceofa proteinisa chiefdeterminantoftherateofe volution(KimuraandOhta1974).Severaldistinctexplanationsh avebeenproposedtoclarifywhyabundantproteinsundergoslow erevolution(foranextensivereviewseeZhangandYang2015).
Theseincludeselectionagainsterroneoustranslationandprotei ninstability(Drummondetal.2005;Yangetal.2010),selectionon proteinsynthesiseffi-
ciencyandspeed(Akashi2001;PlotkinandKudla2011),se- lectionontranscriptstability(Parketal.2013)andpreservationof properphysicalinteractionsofproteins
(Vavourietal.2009;Yangetal.2012).Thus,mostofthecurrentlyt oprankedhypothesesassumestrongerpurifyingselectionopera tingonfinalproteinproductsofthehighlyexpressedgenes.Thee xplanationslinkedtotranslationalro-
bustnessandmRNAfoldingenergyputadditionalconstraintsont ranscripts.
Thepossibleimpactofanunequalmutationrateonexpression -divergenceanticorrelationisusuallynotconsid-
ered.Thiscanbejustifiedbythefindingsofgenome-
wildstudiesofbacteria,yeast,andhuman(Parketal.2012;Chen andZhang2013,2014).Allofthesestudiesreportedelevatedmut ationratesforintensivelyexpressedgenes.Thisinturn,impliesth atmutationbiasaloneshouldgeneratepos-
itive,notnegative,correlationbetweengeneexpressionandevol utionrate.Itshouldbenotedthough,thattwooftheabovestudiesu seddataobtainedformutantswithdefectiveDNArepairsystem(
Parketal.2012;ChenandZhang2013).Relationsfoundforsuch mutantsmightnotaccuratelyreflect
©TheAuthor(s)2018.PublishedbyOxfordUniversityPressonbehalfoftheSocietyforMolecularBiologyandEvolution.
ThisisanOpenAccessarticledistributedunderthetermsoftheCreativeCommonsAttributionNon-CommercialLicense(http://creativecommons.org/licenses/by- nc/4.0/),whichpermitsnon-commercialre-use,distribution,andreproductioninanymedium,providedtheoriginalworkisproperlycited.Forcommercialre- use,pleasecontactjournals.permissions@oup.com
D o w nl oa de df ro m htt ps :// ac ad e mi c. ou p. co m/ gb e/ ar tic le- ab str ac t/1 0/ 11 /2 98 6/ 51 29 08 5b yg ue st on 22 N ov Causesof the expression-polymorphismanticorrelationinyeastpopulations
GBE
2987
GenomeBiol.Evol.10(11):2986–
2996d o i : 1 0 . 1 0 9 3 / g b e / e v y 2 2 5 A d v a n c e AccesspublicationOctober13,2018 thoseexistinginthewild-typeorganisms.Moreover,theneg-
ativecorrelationbetweentranscriptionandmutationratewasfou ndinwildE.colipopulations(Martincorenaetal.2012).Allthingsc onsidered,althoughmutationbiasisunlikelytobethemainevoluti onaryforceleadingtoexpression-
divergenceanticorrelation,itsimpactmaydifferbetweenspecies ,andthusitscontributionisworthfurtherinvestigation.
Here,weanalyzedthedistributionandfrequenciesofsin- glenucleotidevariantsinpreviouslypublishedwholegenomese quencingdatasetsfor24Saccharomycescerevisiaeand58Schi zosaccharomycespombestrainsisolatedfromtheirworld- wide/extantpopulations.Wereasonedthatfocusingonthegenet icvariationwithinpopulationscouldbea novelandpromisingwa ytoestimatetherelativecontributionofpurify-
ingselection,linkage,andmutationalpressureincreatingtheobs ervedpatternofexpression-
divergenceanticorrelation.InthecaseofS.cerevisiae,weassign edatotalof57,000singlenucleotidevariantstothecodingsequen ces,introns,30-UTRand50-
UTRofrv2,400genes(forwhichtheboundaryofatleastoneUTR regionisknown).Analogously,weinvestigatedrv150,000SNPsi n5,060S.pombegenes.Weappliedtwotypesoftestsforallvaria nts.First,weanalyzedtherelationbetweenpolymorphismdensit yandgeneexpressionlevel.Similartests,regardingnucleotideo raminoacidsubstitutions,hadbeenpreviouslyusedincomparis onsbetweencloselyanddistantlyrelatedspecies(Walletal.200 5;Zhouetal.2010).Nevertheless,theirresultsaresensitivetoba ckgroundselec-
tionandtheirreliabilitycriticallydependsonequalmutationratesf orcomparedgenesandgenes’regions.Therefore,wedecidedto extendouranalysestoencompasstheavailabledataonthedistri butionofmutationsaccumulatingunderthelaboratorypropagati onscheme,inwhichtheoperationofthenaturalselectionisminim al.Thisallowedustoestimatethestrengthofthetranscriptiondriv enmutationalbiasesinbothspecies.Forthesecondtest,wecalc ulatedthefractionsofsingletonswithinvariantsassignedtodiffer entgenesandtheirregions,andcheckedwhetherthesefractions dependongeneexpressionlevelandtypeofthemutatedsite.Alth oughthistesthaslowerpowertodetectnegativeselec-
tion,asitlosesinformationaboutthemostsevereandthusnotpre sentinpopulationsvariants,itsresultsareunaffectedbybackgro undselectionandmutationbiases.Therefore,thistestseemstob eespeciallyrelevanttotheanalysisofpolymor-
phismwithinuntranslatedregionsandsynonymoussites.
Wefoundthatinthenaturalpopulationsthelevelofpoly- morphismwasnegativelycorrelatedwithgeneexpressiononlyi nthefunctionalregionsofproteincodinggenes.Itissignificantton otethatsuchpatternwasnotfoundforthelaboratorymutationacc umulationlines.Notably,fractionofsingletonswithinvariantsassi gnedtofunctionalregionsofgenesincreasedwithgeneexpressi on,althoughthistrendwasratherweak.Thus,weshowthatinpop ulationsofbothyeastspeciesthedeclineinthegeneticvariationw ithinextensivelyexpressedgeneswasmostlikelyproducedbyth e
purifyingselectionoperatingongeneproteinproductsandtransc ripts.
MaterialsandMethods
SaccharomycescerevisiaePolymorphism
Weanalyzedthesequencesof23Saccharomycescerevisiaestr ainsfromSkellyetal.(2013)(http://www.yeastrc.org/g2p- data/raw-data/genome-
sequences).Thesequencedstrainswerehaploidsderivedfromn aturaldiploidalisolates.Rawreadswerealignedtothereference sequence(S288C,ge-nomeassembly:R64-1-
1)usingbowtie2.BAMfileswerema-
nipulatedwithsamtoolsprograms(version0.1.19).Variantswer ecalledseparatelyforeachstrainwithsamtoolsmpileupandf ilte redwithgrepandcustompythonscript.First,weremovedallindel s.Forfurtheranalysis,weusedonlySNPswithminimalQUALval ueof30,withcoverage>5andwithatleast80%ofreadssupportin gthealterednucleotide.Themappingandfilteringscripts(mappi ng.sh;readVCF.py)canbefoundinthesupplementaryfileS1,Su pplementaryMaterialonline.
Thefilteredvariantswereassignedtothecodingsequen- ces,introns,50-UTRand30-UTRregionsof2,432proteincod- ingSaccharomycescerevisiaegenes.Inordertobeanalyzedfur ther,geneswererequiredtofulfilltwocriteria:
1. KnownlocationofatleastoneUTRregion(retrievedfromtheS GDYeastMinepagehttps://yeastmine.yeastgenome.org/y eastmine):TheUTRslocationswerecompiledfromformerst udies(Zhangetal.2005;Miuraetal.2006;Nagalakshmietal.2 008;Xuetal.2009;Yassouretal.2009).Incaseswherebound ariesoftheUTRregionsdif-
feredbetweenstudies,wedecidedtouseonesinvolvingthelo ngeststretchesofDNA.
2. Codingsequencenonoverlappingwithanyothergene First,weassignedvariantstothecodingsequencesandintron softheanalyzedgenes.TheremainingSNPswerethenassig nedtotheUTRregions.Theveryshort(<9bp)andnonpolymor phicUTRregionswereexcludedfromfurtheranalysis.Varian tsinthecodingsequenceweresplitintosyn-
onymousandnonsynonymousgroupsbasedontheresultsfro mtheVariantEffectPredictortool(http://www.ensembl.org/i ndex.html).Insum,weanalyzed56,939SNPs.Next,wecalcu latedthenumberofpolymorphicsitesforeachgeneandgener egion(dividedbythelengthofrespectiveregions).Inaddition, thefractionofsingletonswasdetermined.Theexpectednum berofpotentialsynonymousandnonsynony-
moussiteswascalculatedforthecodingsequencesoftherefe rencestrain.WeusedtheNei–
Gojoborimethodundertheassumptionofequalmutationprob abilitiesforalltypesofsubstitutions(NeiandGojobori1986).
Obtainedvalueswerethenusedasthelengthsofsynonymou sandnonsynonymousregions.
D o w nl oa de df ro m htt ps :// ac ad e mi c. ou p. co m/ gb e/ ar tic le- ab str ac t/1 0/ 11 /2 98 6/ 51 29 08 5b yg ue st on 22 N ov Marek andTomala
GBE
DataonthemRNAlevel(mRNAmoleculesperhaploidcell)we readoptedfromCs,ardietal.(2015).Dataongeneindis-
pensabilitywasretrievedfromSGDYeastMinepagehttps://yea stmine.yeastgenome.org/yeastmine).Thecompileddataisava ilableinthesupplementaryfileS2,SupplementaryMaterialonlin e.
Thefractionofsingletonsexpectedinneutralityforthefoldedsi tefrequencyspectrumwasbasedononemillionsim-
ulatedsamplesofsize24.Forthecoalescentsimulations,weuse dprogramms(Hudson2002),withthecommand:ms241000000 -
s1.Then,wecountedcaseswherethederivedallelewaspresenti nonlyoneorin23copiespersample.
SchizosaccharomycespombePolymorphism
DataonS.pombevariantsweredownloadedfromthewebpage:
https://figshare.com/articles/SNP_calls_for_all_161_strains_
/3978303.ThesedatawereoriginallypublishedbyJeffaresetal.
(2015)andrepresentsinglenucleotidepolymor-
phismfoundin161naturalisolatesofSchizosaccharomycespo mbe.Thedownloadedvcffilewasalreadyannotated.Forthepur poseoffurtheranalyses,weextractedfromtheorig-
inalfileinformationonvariantspresentin57strains,de- scribedbyauthorsoftheoriginalresearchpaperas“nonredunda nt.”WefilteredthesevariantsusingthesamecriteriaasforS.cere visiae.SNPsthatpassedthefilterswerethenassignedtothediffe rentgeneregions(synonymouscod-
ing,nonsynonymouscoding,intron,30-UTR,50-
UTR)onthebasisofannotationsprovidedintheoriginalvcffile.In sum,weassigned153,407SNPsto5,069proteincodinggenes;
outofthose2,496containedintrons.Dataongeneexpres- sionwastakenfromMargueratetal.(2012)
(numberofmRNAmoleculesperhaploidcell).Alistofessentialg eneswasretrievedfromPomBase(https://www.pombase.org/
).Numberofpotentialsynonymousandnonsynonymoussitesw asobtainedasdescribedforS.cerevisiae.Thecompileddataseti savailableinthesupplementaryfileS2,SupplementaryMaterial online.
Theexpectedfractionofsingletonswasdeterminedanal- ogouslyasdescribedforS.cerevisiae,butthesizeofthesim- ulatedsampleswassetto58.
AnalysisoftheS.cerevisiaeMutationAccumulationEx periment
WeusedresultsofthestudybyZhuetal.
(2014).Thestudydescribes873singlenucleotidemutationsthat occurredin145yeastlinesduringthemutationaccumulationexp eriment.Outofthese,636wereassignedto577Saccharomyces cer-evisiaegeneswithmRNAlevelgiveninCs,ardietal.
(2015).Inthisprocedure,weusedtheVariantEffectPredictortoo l.Next,wecalculatedEMAstatistic:EMA¼Rni-
ln(mRNAi)/Rni,wherenistandsforthenumberofmutationsthatw ereassignedtotheigeneandmRNAiistheigenemRNAlevel.Thu s,theEMAstatisticistheweightedaverageofthenatural
logarithmsofthemRNAlevel,calculatedforthegenesthatweref oundmutatedinthemutationaccumulationexperi-
ment.CompileddatacanbefoundinthesupplementaryfileS2,S upplementaryMaterialonline.
SimulationofMutagenesisinS.cerevisiae
WeusedthegenomicdataonthemRNAexpression(Cs,ardietal .2015)andgeneGCcontent(Ensembl)tocreatetwolists.TheLAT listwascomposedoftheln(mRNAi)valuesfor5,850Saccharomy cescerevisiaegenes.Individualln(mRNAi)valueswerereprese ntedinnumbersequaltothenumbersofATpairsinrespectivege nes(insum5,349,347entries).TheLGClistwascomposedaccordi ngtothesamepatternforGCpairs(3,484,611entriesintotal).Th en,werandomlychose636valuesfromtheabovelists.Totakeac countforthehighermutationprobabilityoftheGCnucleotides,68
%theln(mRNAi)valuesweretakenfromtheLGClistand32%fromt heLATlist(assuggestedbyZhuetal.2014).Thearith-
meticmeanofeachrandomsample,ESwasthencalculated.The procedurewasrepeated10,000times.Weconsideredtheabove proceduretosimulatethemutationalprocessinwhichtheprobab ilityofmutationwithinanygenesequenceisproportionaltogenel engthanddependsongeneGCcon-
tent(simulated_data1).Inanothersimulation,weproceededint hesameway,withtheexceptionthattheln(mRNAi)valueswered rawnfromthemergedLATandLGClists.Thus,thesecondsimulation representedthesituationwheretheprob-
abilityofanygenemutatingdependsonlyonitslengthandnotonit sindividualGCcontent(simulated_data2).Thescriptusedtogen eratedataisinthesupplementaryfileS1,SupplementaryMateri alonline.
AnalysisoftheS.pombeMutationAccumulationExp eriments
Wemergedresultsfromtwomutationaccumulationexperi- ments:BehringerandHall(2015)
(422singlenucleotidemutations)andFarlow et al. (2015) (326singlenucleotidemutations).Weannotated615ofthesemut ationsto533differentS.pombegeneswithmRNAlevelgiveninM argueratetal.(2012)
(http://fungi.ensembl.org/Schizosaccharomyces_pombe/Too ls/VEP).TheEMAstatisticwascalculatedasdescribedforS.cerev isiae.Thecorrespond-
ingdatasetisinthesupplementaryfileS2,SupplementaryMateri alonline.
SimulationofMutagenesisinSchizosaccharomycespombe Simulateddatasetswerecreatedanalogouslyasdescribedfor S.cerevisiae.WeusedtheexpressiondatafromMargueratetal.
(2012)
(5,059proteincodinggenes)anddataongeneGCcontentfromt heEnsemblfungiwebpage.TheS.pombeLATandLGClistscontain ed6,907,927and4,071,402entriesaccordingly.Toobtainthesi mulateddatadependentonboth
Causesof the expression-polymorphismanticorrelationinyeastpopulations
GBE
2989
D o w nl oa de df ro m htt ps :// ac ad e mi c. ou p. co m/ gb e/ ar tic le- ab str ac t/1 0/ 11 /2 98 6/ 51 29 08 5b yg ue st on 22 N ov
GenomeBiol.Evol.10(11):2986–
2996d o i : 1 0 . 1 0 9 3 / g b e / e v y 2 2 5 A d v a n c e AccesspublicationOctober13,2018 genelengthanditsGCcontent,73.6%ofvalueswastakenfromth
eLGClistbecause457ofthe748mutationsfoundinbothexperime ntsrelatedtotheGCnucleotides.ThewholeS.pombegenomeco ntains4,538,978GCand8,052,273ATbasepairs.Thus,theprob abilityofbeingmutatedis2.786timeshigherfortheGCnucleotide sthanfortheATnucleo-
tides.Thisgivestherelativemutationfrequenciesof0.736and0.
264fortheGCandATnucleotidesaccordingly.Ateveryof10,000 iterations,anaverageof615randomlyse-
lectedln(mRNAi)valueswascalculated.Thescriptusedtogener atedataisinthesupplementaryfileS1,SupplementaryMaterial online.
AnalysisofCodonUsage
Wecountedcodonspresentwithinthecodingsequencesofallan alyzedgenes(forthereferencestrainsofS.cerevisiaeandS.po mbe).Thesecountswerethencorrelated(Spearman’srankcorr elation)withthenumbersofthematch-
ingtRNAgenespresentinthegenomesofbothspecies(mul- tipliedbytheirwobbleparameters).Correlationcoefficientsobtai nedforeachgeneconstitutedtherhotaistatistics.IncaseofS.cere visiae,tRNAgenecopynumbersandwobbleparam-
eterswereadoptedfromWeinbergetal.
(2016).ThegenecopynumbersoftRNAgenespresentinS.pom begenomeweretakenfromGtRNAdb(http://gtrnadb.ucsc.edu/
).ThewobbleparametersweresetasdescribedinCurranandYar us(1989)andLimandCurran(2001).Correlationcoeffi- cientsobtainedforeachgenealongwithtRNAcopynumbersand wobbleparametersusedaregiveninthesupplementaryfileS2,S upplementaryMaterialonline.
Dataanalysis
DatawasanalyzedinR(RCoreTeam2015).Weusedthefollowi ngfunctions:cor.test,glm,pcor.testfromtheppcorpackage(Kim 2015).Correlationcoefficientswerecomparedwithconcor(http:
//comparingcorrelations.org/)(DiedenhofenandMusch2015).
Result s
Westudiedsinglenucleotidepolymorphismwithindozensofyea ststrainsisolatedfromextantpopulationsofSaccharomycescer evisiaeandSchizosaccharomycespombe.Wefoundthatthede nsityofpolymorphicsiteswithingenescorrelatesnegativelywitht heexpressionlevelmeasuredasthenumberofmRNAmolecule sperhaploidcell.Inbothyeastspecies,therelationshipisespecia llyclearforthenonsynon-
ymousvariants(fig.1).A possibleinterpretationisthatthepurifyin gselectionagainstchangesinproteinsequenceoper-
atesandismoreintenseinthehighlyexpressedgenes.Indeed,th eratioofnonsynonymoustosynonymouspolymor-
phicsitesdensities(pN/pS)alsocorrelateswiththegeneex- pression(S.cerevisiae:Spearman’srankcorrelation
rho¼-0.35,P<2.2x10-
16,n¼2,373;S.pombe:Spearman’srankcorrelationrho¼- 0.28,P <2.2x10-16,n¼4,882).
Inthenextstep,weanalyzedminorsitefrequencyspectraforb othpopulationstotestwhetherhighlyexpressedgeneshaveele vatedfractionofsingletons.Wereasonedthatdele-
teriousmutationscouldbestillpresentinpopulationsiftheywere recent.Suchmutationswouldbechieflyfoundamongsingleton s,astheyareremovedfromthepopulationbeforereachinghigh erfrequencies.Thisshouldleadtotheelevatedfractiono f s ingle tonsw ithina llv ariants.I ndeed,w ef oundhigherthanexpected (forneutralpolymorphism)fractionsofnonsynonymouss inglet onsi nt hea nalyzedd atas et:0 .40versusexpected0.29forS.ce revisiaeand0.40versus0.23forS . p ombe.M oreover,t hesef r actionsw erea lsoh igherthanfractionsfoundforintronicvarian ts(0.29and0.26for
S.cerevisiaeandS.pombeaccordingly),whatindicatesthatthe highpercentageofnonsynonymoussingletonscannotbeentire lyexplainedneitherbythedemographicprocess(e.g.,populatio ngrowth)norbyinsufficientvariantfilteringproce-
dure(fig.2).M orei mportantly,t hef ractiono fn onsynony- moussingletonscorrelatespositivelywithgeneexpressionlev elforbothspecies(S.cerevisiae:rho¼0.06,P¼0.004,n¼2,13 3;S.pombe:rho¼0.11,P¼1.75x10-
13,n¼4,593).Thus,thenonsynonymoussitefrequencyspectra showclearsignsofthenegativeselectioninS.cerevisiaeand S.p ombep opulations.A bovea ll,h igherf ractiono fn ewlyarriv ingaminoacidalteringmutationsispresentinsequencescoding forabundantproteinsindicatingstrongerselection.
Wealsoaskedwhethertheobservedlowpolymorphisminhig hlyexpressedgenescouldbeascribedtosomespecificforces,s uchasselectionactingontranscriptsorcodoncom-
position.Inbothyeastspecies,thenegativecorrelationbe- tweengeneexpressionandvariantdensityisvisiblenotonlyforn onsynonymousbutalsoforsynonymoussitesandtheUTRregio ns.Inthetwolatterregions,however,thecorrela-
tionismuchweaker(fig.1).Suchpatternmayindicatethatabund anttranscriptsarealsomoreconstrained.Analterna-
tive,butnotmutuallyexclusiveexplanation,isthattheregionsofhi ghlyexpressedgenesaredepletedofpolymorphismbe- causetheyarelinkedtomoreconstrainednonsynonymoussites.
Weaddressedthisissueusingthreedistinctapproachesdescrib edbelow.
First,weperformedpartialcorrelations(mRNAvsvariantden sity)whilecontrollingforthenonsynonymousvariantden- sities.Thepurifyingselectionactingonnonsynonymousvar- iantcausesareductionintheeffectivepopulationsizesoftheadja centregions.Therefore,ifthedropinvariantdensitiesinUTRregi onsandsynonymoussitesresultsexclusivelyfromlinkage,thent hepartialcorrelationsshouldturninsignificant.Thisisnotthecase ,asmostofthecorrelationsremainvalid(table1).Interestingly,th epartialcorrelationbetweensynon-
ymouspolymorphismdensityandgeneexpression(control- lingfornonsynonymouspolymorphism)remainssignificant
= =
Marek andTomala
GBE
D o w nl oa de df ro m htt ps :// ac ad e mi c. ou p. co m/ gb e/ ar tic le- ab str ac t/1 0/ 11 /2 98 6/ 51 29 08 5b yg ue st on 22 N ov S.
c er e vi
S.
p o m
S.
c er e vi si
S.
p o m fractionofsingletons polymorphicsitesdensity
3UTR 5UTR introns nonsynonymous_sites synonymous_sites
rho=−0.117 p=8.29e−09 n=2425
rho=−0.061 p=0.0027n
=2398
rho=0.19 p=0.015 n=162
rho=−0.442 p<2.2e−16 n=2432
rho=−0.194 p<2.2e−16 n=2432 0.10
0.05
0.00 0.20
0.15
rho=−0.175 p<2.2e−16 n4 7 1 8
rho=−0.179 p<2.2e−16 n=4664
rho=−0.033 p=0.105n=
2496
rho=−0.338 p<2.2e−16 n=5069
rho=−0.054 p=1.26e−04 n5 0 6 9 0.10
0.05
0.00
−4−20 2 4 6−4−20 2 4 6−4−20 2 4 6−4−20 2 4 6−4−20 2 4 6
ln(mRNA)
Fig.1.—Relationshipbetweengeneexpression(mRNAmoleculespercell)andpolymorphism(numberofvariantsperregionlength).
1.00 0.75
p=5.7e−0
8 p=0.1 0
2 na
p=2.e − 1 2
p=0.4
6 Table1
ResultsoftheSpearman’sRankPartialCorrelationsbetweenmRNALevelan dNumberofPolymorphicSites(perregionlength),Controllingfor
0.50 4270 3305 799 8939 20695 NonsynonymousSNPsDensity
0.25
2608 1665 334 5976 8223
0.00
Region Rho Pvalue n
Saccharomycescerevisiae
1.00 p=4.7e−09 p=1.7e−06 na p<2.2e−16 p=0.98 Synonymoussites -0.0795 8.76e-05 2,432
0.75 50-UTR
30-UTR
-0.0299 -0.0634
0.14 0.0018
2,398 2,425
0.50 23550 17478 6394 20845 38283 Schizosaccharomycespombe
0.25 Synonymoussites 0.0085 0.55 5,069
0.00 9780 7083 2263 14197 13534 50-UTR -0.1275 2.39e-18 4,664
3UTR 5UTR introns nsyn syn 30-UTR -0.1251 6.26e-18 4,718
Fig.2.—
Fractionofsingletonswithinallanalyzedvariants(minorallelefrequencyspec tra).Partsofthebarscorrespondingtosingletonsarefilledwiththeopaqueco lors.Numbersinsidebarsindicatenumberofsiteswithineachcategory(singl etons/morefrequentSNPs).Greenlineshowsfractionofsingletonsfoundini ntrons—
usedasanapproximationoftheneutralvalue(S.cerevisiae:0.29;S.pombe:0 .26).YellowlineindicatesfractionofsingletonsexpectedforWright–
Fisherpopulation(S.cerevisiae:0.29;S.pombe:0.23).Significanceofv2t est scomparingfrac-
tionofsingletonswithinSNPsinagivenregionwithfractionrecordedforintron saregivenabovethebars.
forS.cerevisiaebutdisappearsforS.pombe.Thismaysuggestt hattheselectiononcodonusageispresentonlyintheformerspec ies.However,eveninS.cerevisiaetheobservedcorrela- tionisrestrictedtoasmallsubsetofgenesexpressedataveryhigh level.Thepartialcorrelationisnolongersignificant,whenremovi ng118geneswiththehighestexpression(i.e.,
>40mRNAmoleculespercell)(Spearman’srankcorrelationrho¼- 0.035,P ¼0.09,n ¼2,314).Itshouldbealsonoted
Marek andTomala
GBE
2990 GenomeBiol.Evol.10(11):2986–
2996d o i : 1 0 . 1 0 9 3 / g b e / e v y 2 2 5 A d v a n c e AccesspublicationOctober13,2018 here,thatpartialcorrelationsperformedonnoisydatamayprod
ucespuriouslysignificantresults(Drummondetal.2006).
Oursecondtestusestheobservationthatbackgroundse- lectionshouldremovepolymorphismnotonlyfromfunctionalre gionsofgenesbutalsofromintrons.Therefore,ifcorrela- tionsbetweenSNPsdensityandmRNAlevel,visibleforUTRsa ndsynonymoussites,wereonlybyproductsofselectionact- ingonnonsynonymousvariants,thensimilarlystrongcorre- lationsshouldbeseeninintrons,whentakingintoaccountsmall ersamplesize(i.e.,smallernumberofgeneswithintrons).Tothi send,werandomlydrewsampleof162genesforS.cerevisiaea nd2,496genesforS.pombefromtheorig-
inallyanalyzeddatasets.Thesenumbersequalnumbersofge neswithintronsusedinouranalyses.Next,weperformedSpea rman’srankcorrelationsbetweenmRNAlevelandvari- antdensitiesfortherestricteddata.Thisprocedurewas
Causesof the expression-polymorphismanticorrelationinyeastpopulations
GBE
p=0.43
3915 355
6291 587
p=0.0304
2963 342
4421 549
p=9.11e− 04
8697 242
14453 462
p=2.43e− 03
19537 1165
27220 1705
D o w nl oa de df ro m htt ps :// ac ad e mi c. ou p. co m/ gb e/ ar tic le- ab str ac t/1 0/ 11 /2 98 6/ 51 29 08 5b yg ue st on 22 N ov
S.
c er ev isi a
S.
p o m
rho(SNPdensityvsmRNA)fractionofsingletons
repeated500times.Thedistributionsofobtainedcorrelationcoef ficientsforallfunctionalregionsareshowninfigure3Aandinsuppl ementaryfileS2,SupplementaryMaterialonline.Notably,these distributionsareshiftedtowardnegativevaluesand,withtheexc eptionofsynonymousvariantsinS.pombe,their95confidencein tervalsliebelowthevalueobtainedforintrons(supplementaryfile S2,Supplementary Materialon-
line).Thus,resultsofthesetestsseemtoindicatethatpurify- ingselectionactsdirectlyonvariantsthatdonotchangeaminoaci dsequenceandthatsuchselectionismoreimpor-
tantforhighlyexpressedgenes.Itisimportanttonote,how- ever,thatthistestmaynotbestringentenoughincaseofthesynon ymouspolymorphism,asonanaverage,theselectednonsynony mousvariantslieincloserproximitytothesynon-
ymoussitesthantointrons.
Forthethirdanalysis,wedividedallconsideredgenesinto twogroupsaccordingtotheirexpression.The“high”groupcontai nedr v 10%ofgeneswiththehighestmRNAlevel(S.cerevisiae:
244geneshaving>22mRNAmoleculesper
A
0.25
0.00
−0.25
−0.50
−0.75
B1 . 2 5 1.00 0.75 0.50 0.25 0.00 1.25
S.cerevisiae S.pombe
3UTR5 U T R n s y n s y n 3UTR5 U T R n s y n s y n
3UTR 5UTR nsyn syn
cell;S.pombe:515geneswith> 12mRNAmoleculespercell).Th eremaininggenesformedthe“ lower”groups.Weusedv2teststo comparethefrequencyofsingletonswithinvariantsfoundinthes egroupsofgenes.Theresultsofthesecomparisonsaresummari zedinfigure3B.Clearly,SNPslocal-
izedwithinsequencesofgenesfromthe“ high”groupare
1.00 0.75 0.50 0.25 0.00
p=7.88e−04
222521 2 9 8
314001 9 3 0
p=8.51e−06
164651 0 6 8
229551 6 0 6
p=1.91e−05
198301 0 1 5
331871 8 5 5
p=0.292
350823 2 0 1
474444 3 7 3
morelikelytobesingletons.Thistrendholdsforallfunctionalregio nsofgenes,however,thedifferencesarenotalwayssignificanta ndareextremelysmallincaseofthesynonymoussites.Moreover ,thefrequenciesofsingletonscalculatedforthesynonymousvari antspresentinallanalyzedgenesdonotdifferfromthefrequenci esobtainedforintrons(fig.2).Tosummarize,theresultsofthethre edifferenttestspointtothehypothesisthatbothuntranslatedregi onsofgenesareunderweakpurifyingandexpression-
dependentselection,distinctfromtheselectiononproteinproduc ts.Onthecontrary,wefoundlittle(forS.cerevisiae)orno(incaseo fS.pombe)evidenceforthepurifyingselectionactingdirectlyont hesyn-
onymoussites.Evenifsuchselectiondoescontributetothedropo fpolymorphisminhighlyexpressedgenes,itseffectisratherwea kandrestrictedtoveryabundanttranscripts.
Finally,weaskedwhetherthenegativecorrelationbe- tweenexpressionlevelandpolymorphismdensitycanbe,atleast partially,causedbyunequalmutationrates.Westartedwithintro ns,aswereasonedthatvariantsinintronsshouldnotbeinfluence dbythepurifyingselection.Wefoundthatintronsofhighlyexpress edgenesinSaccharomycescerevisiaemayhaveanelevatednu mberofvariantsperonenucleotide(rho¼0.19,P ¼0.015,n ¼16 2)
(fig.1).However,onlya smallportionofgenesinthisspecieshasi ntrons(162inourdataset).Moreover,thesegenesaregenerally highlyexpressedandmaynotappropriatelyrepresentallgenesp re-
sentinthisspecies(Skellyetal.2009).Ananalogousanalysisinv olvingalmost2,500S.pombegenesprovidedcontradic- toryresults.Inthisspecies,theexpressionlevelofthegenedid
lowerhigh lowerhigh lowerhigh lowerhigh expression( m R N A )
Fig.3.—(A)DistributionoftheSpearman’srankcorrelationcoeffi- cients(mRNAlevelvsvariantdensity)obtainedinthesubsamplinganalyses(
mean6 95ci).Greenlineshowsvalueofthecorrelationcoefficientobtainedf orallintrons;
(B)Fractionsofsingletonswithinvariantspresentinhighlyandlessextensivel yexpressedgenes(seeResultsforthemoredetaileddescriptionoftheusede xpressioncategories).Singletons—opa-quecolors,morefrequentvariants
—
transientcolors.Numbersofvariantswithineachgrouparegiveninsidebars.
Greenlineshowsfractionofsingletonsfoundinintrons,yellowlineindicatesfr actionofsingletonsexpectedforidealWright–
Fisherpopulation.Significanceofv2te stscom-
paringfractionsofsingletonswithinvariantsfoundingenesbelongingtothet woexpressioncategoriesareshownabovethebars.
notinfluencethenumberofvariantspresentintheintronicseq uences(rho¼-0.033,P¼0.097,n¼2,496)(fig.1).
Astheaboveresultsmightappearinsufficientlyconclusive,w eturnedtostudiesonmutationaccumulationexperiments.Mutat ionsfoundinthesekindofstudiesarelikelytorepresentthemutati onalspectrumlargelyunaffectedbyselection.Weusedsimulati onstotestavailableempiricaldataforthepres-
enceofmutationalbias.Simulationsrepresentedthesitua- tionsinwhichthechanceofoccurringofnewmutationwithinanyg enedependedonboththegenelengthanditsGCcontent(fig.4,u pperpanel:simulated_data1)oronthegenelengthonly(fig.4,lo werpanel:simulated_data2).Then,wecomparedthesimulated distributionofpossibleresultswiththeEMAstatisticcalculatedforth
Causesof the expression-polymorphismanticorrelationinyeastpopulations
GBE
2991
GenomeBiol.Evol.10(11):2986–
2996d o i : 1 0 . 1 0 9 3 / g b e / e v y 2 2 5 A d v a n c e AccesspublicationOctober13,2018 eactualdata(inbrief:EMArepresentstheaverageexpressionof
genesthatwere
Marek andTomala
GBE
D o w nl oa de df ro m htt ps :// ac ad e mi c. ou p. co m/ gb e/ ar tic le- ab str ac t/1 0/ 11 /2 98 6/ 51 29 08 5b yg ue st on 22 N ov si
m ul at ed_d at
si m ul ated _d at
count
750 500 250 0
750 500 250 0
S.cerevisiae S.pombe
0.7 0.8 0.9 1.0 1.1 1.2 0.7 0.8 0.9 1.0 1.1 1.2 ln(mRNA)
Fig.4.—
Comparisonbetweentheexpressionofgenesmutatedinmutationaccumulationexperiments(EMAstatistics,redarrows)andvaluesobtainedforsimulateddata;s imulated_data1:probabilityofmutationwithina geneproportionaltoitslengthandhigherforG andC nucleotides,simulated_-
data2:probabilityofmutationproportionaltogenelengthandequalforallnucleotides(SeeMaterialsandMethodsfordetails).
foundmutatedinthemutationaccumulationexperiment;seeMa terialsandMethodsforfulldescription).Wefoundthatin
S.cerevisiaemutationsarenotlessbutmorelikelytooccurin highlytranscribedgenes(EMA¼1.1916,P¼0.0006;fig.4).Inthiss pecies,expressionislikelytobemutagenicitselfastheobserved effectdidnotdependonthegene’sGCcontent(fig.4).However,i nS.pombe,theactualmutationspectrumappearedshiftedtowar dgenesoflowexpression(EMA¼0.801,P ¼0.036),butnotwhen neglectingtheGCcontentofthegene(P¼0.094,fig.4).Thus,tran scriptiondoesnotinfluencethemutationrateinthisyeastspecies ,althoughthehighlyexpressedgenesmayhaveslightlylowermut ationratesduetotheirnucleotidecomposition.Astheresultsofth emutationaccumulation,dataanalysesdifferbe-
tweenthetwoconsideredyeastspecies,weconfirmedourconcl usionsbybuildingseveralPoissonregressionmodels(supplem entaryfileS2,Supplementary Materialonline).Takentogether, boththeanalysisofpolymorphisminintronsofextantpopulations andthetestofdistributionofmutationsinaccumulationlinesprovi deresultsconsistentwitheachother.Theyindicatethatmutation alprocessitselfisunlikelytoproduce(S.pombe)orevenactsagai nst(S.cerevisiae)theobserveddeclineinthepolymorphismleve lassociatedwithgeneexpression.
Discussio n
Studiesinvolvingcomparisonsbetweencloseanddistanttaxas howedthattherateofmolecularevolutionofa proteindependso ntheintensityofitsexpressionwiththeabundant
proteinsbeingmoreconserved.Weaskedwhetherthepre- dictednegativerelationshipbetweenthelevelofgeneexpres- sionanditsgeneticpolymorphismcanbeobservedalsoincurrent populationsofasinglespecies.WechoseS.cerevisiaeandS.po mbeduetothefactthatpopulationsofthesetwoyeastspeciesarel argeenough,Nerv107(Skellyetal.2009;Farlowetal.2015),toen ableaccumulationofa sufficientnumberofmutationssothatthei rdistributionwithina ge-
nomecouldbeusedtodetecttheworkofanevenrelativelyweaks election.Indeed,weobservedlowervariantdensitiesinregionsc odingforhighlyexpressedgenes.Relevantcorrela-
tioncoefficientsturnedouttoroughlymatchthosereportedforthe betweenspeciescomparisons(ZhangandYang2015).
Webelievethattheobservedpatternofmutationaccumu- lationcouldnotresultfromthedifferentrateofmutationatthecom paredsites.Firstly,wefocusedonisolatesfromextantpopulation sandfoundnocorrelation(S.pombe),orpossiblyapositivecorrel ation(S.cerevisiae),betweentheexpressionlevelandvariantde nsitywithinintrons.Thisresultsuggeststhatthefrequentlytransc ribedgeneshavea mutationratethatisequalorevenhigherthant hosetranscribedrarely.Wethenanalyzedthedistributionofnew mutationsaccumulatedinexperimentalpopulationsandfoundfu rthersupportforourclaim.Theprobabilityofbeingmutatedwashi gherforgeneswithhighmRNAlevelinS.cerevisiae.InS.pombe, wedetectedaweaktrendintheoppositedirectionwhichprob- ablyresultedfromabiasintheGCcontent.Weareawarethatour analysesofmutationaccumulationexperimentsareindirectinth esensethatweusedsimulateddatatoformthenulldistribution.Int hissimulateddataset,ageneunderwent
Causesof the expression-polymorphismanticorrelationinyeastpopulations
GBE
GenomeBiol.Evol.10(11):2986–
2996d o i : 1 0 . 1 0 9 3 / g b e / e v y 2 2 5 A d v a n c e AccesspublicationOctober13,2018 2993
D o w nl oa de df ro m htt ps :// ac ad e mi c. ou p. co m/ gb e/ ar tic le- ab str ac t/1 0/ 11 /2 98 6/ 51 29 08 5b yg ue st on 22 N ov amutationattherateproportionaltoitslengthandGCcon-
tent,neglectinga possibleroleofgenelocation,chromatinstatus ,orotherfactors.Whatisimportant,weadditionallyconfirmedthe seconclusionsusinggenerallinearmodelling.Onecouldalsoarg uethatalthoughintronsaregenerallynotexposedtoselection,th eymay,nevertheless,mutateataratedifferentthantheydoinexo nsduetothefactthatdifferentrepairsystemsarethoughttooperat ealongthecodingandnoncodingsequences(Frigolaetal.2017).
Regardlessoftheabove,wearereassuredbythefactthatouranal ysesinvolv-
ingtwocompletelydifferentapproachesgaveconsistentresults.
Anoverallpattern,a roughlyequalorevenhigherpolymorphism withinintronsofstronglyexpressedgenesappearsrobustandth ereforethereductionofpolymorphismknownforhighlyexpresse dexonsisunlikelytoresultfromthedifferencesinmutationrate.
Somepreviousresultsappeartoaccordwithourfindingthat polymorphismisnotlowerbuthigherinintronsofhighlyexpres sedgenesinSaccharomycescerevisiae.Onestudyusedmutati on-
accumulationlineswithinaDNArepairdefective(ung1D)stra inandfoundanelevatedmutationrateininten-
sivelytranscribedgenes(Parketal.2012).Datafromthemu- tationaccumulationexperimentscarriedoutwithawild-type S.cerevisiaestrainandreanalyzedhere(Zhuetal.2014)hadbee nalreadyexaminedforthedependencebetweenexpres- sionandmutationprobability(ChenandZhang2014;Zhuetal.20 14).Whileauthorsoftheoriginalstudyfoundnosuchrelation,con clusionssimilartoourswerereachedbyChenandZhang(2014).
Thus,ouranalysesprovidedadditionalev-
idencethatintensivetranscriptionisassociatedwithelevatedmu tationinS.cerevisiae.Whatisimportant,weapplieddif-
ferentanalyticalmethodstothoseusedintheabovestud- ies.A n a n a logousa n a lysisb a sedo n b o t h m u t ationacc umulationd a t a s e t s p u blishedf orS . p o mbe,f a i ledt o dete ctanyrelationbetweenmutationrateandexpressionlevel.T h e s a m e c onclusionsw e rer e a chedi n o n e o f t h e original m utationa c c u m ulations t udiesc a r r i edo u t f o r thiss p eci es(BehringerandH a l l 2 0 15).T h us,w h i lei ti stemptingtoco ncludethatinS.cerevisiaethepurifyingse-
lectioninexonsisstrongenoughtorevertthemutationalbia s,inS.pombet ran scriptiond oesno tleadt o elevatedmutat ionp r obability.T r a nscriptiona f f ectsm u t ationr a t e s thro ught h e p r ocesseso ft r a n scriptiona s sociatedm u t a- genesisa n d t r anscription-
coupledD NAr e p air(HanawaltandSpivak2008;KimandJinks -Robertson2012).Itispos-
siblethatthefinaloutcomeoftheseantagonisticprocessesisn o t t hes a m e f o r a l l s p ecies.H o w ever,m o r e i n t ensiveworko n t h e e xperimentale s t imationo f m u t ationr a t e i n differe ntgenomeregionsisneededbeforeitsresultscanbeused inquantitativetestsofevolutionaryhypotheses.
Apartfromthecleardifferenceinthedistributionofpoly- morphismbetweenintronsandexons,thereareother,moredirec t,argumentsthatselectionoperatesmostefficientlyinhighlyexpr essedcodingregions.First,thenegativecorrelation
betweengenemRNAlevelandthedensityofpolymorphisminex onswasstrongerforthenonsynonymouspolymorphism.Itisrem arkablethatthisresultwasstatisticallysignificantgiventhatthede nsityofvariantsrecordedfornonsynony-
moussitesinbothspecieswasgenerallylow.Second,thefraction ofrarenonsynonymousvariantstendedtoincreasewithgeneex pression.Whichiswhy,newmutationsaremostlikelytobedeleter iousiftheychangethefinalproteinprod-
uctsofanintensivelytranscribedgene.Itisalsoworthnotingthatin bothspeciestheexpression–
divergencecorrelationsforessentialandnonessentialgenesha vesimilarstrength(S.cerevisiae:Fisher’sz¼-
1.42,P ¼0.16;S.pombe:Fisher’sz¼-0.71,P ¼0.48)
(supplementaryfileS2,SupplementaryMaterialonline).Thus,t hefunctionalimportanceofproteinsisunlikelytobeanimportantf actorunderlyingtheconsideredrelation.
Thenegativecorrelationbetweengeneexpressionleveland polymorphismdensityappearstobeconsiderablystron- gerforthesurveyedpopulationsofS.cerevisiae.Thisfindingmig htsuggestthatthepurifyingselectionismoreeffectiveinthisspeci es.However,geneexpression,oneoftheanalyzedfactors,may havebeenestimatedwithdifferentaccuracyinthetwospecies.T hedatasetforS.cerevisiaeispossiblymorereliableasbasedona vastnumberofindependentRNAex-
pressionstudies(Cs,ardietal.2015).Moreover,itisimagin- ablethatinthisspeciestheexpressiondatagatheredunderlabor atoryconditionsaremoreindicativeof“ natural”geneexpression .Regardingthepolymorphismdata,theeffectofselectioncanbe muchmoredifficulttodetectinthecaseofS.pombebecauseofth estrongerlinkagedisequilibriumreportedforthisspecies(Jeffare setal.2015).Therefore,al-
thoughthecorrelationcoefficientsrelatingthefractionofnonsyn onymoussitesandexpressionleveldiffersignificantlybetweenS .cerevisiaeandS.pombe(Fisher’sz ¼5.0179,P<10-
6)thisresultdoesnotnecessarilyreflectdifferencesintheselectio npressure.
Withregardstothesynonymouspolymorphism,itwasalsone gativelycorrelatedwiththemRNAlevelinbothyeastspe- cies.Thesecorrelations,however,weremuchweakerthatthose foundforthenonsynonymoussites.Moreover,resultsofthreedis tinctandindependenttests(seeResults)indicatedthatthesecor relationsweregeneratedmainlybytheback-
groundselection.Thisinterpretationisstraightforwardinthecas eofS.pombe,asallthreetestsfailedtodetectsignaturesofnegati veselectionactingdirectlyonthesynonymousvar-
iantsinthisspecies.TheresultsobtainedforSaccharomycescer evisiaearemorechallengingtoexplain.Bothtestsutilizinginform ationonvariantdensities(i.e.,subsamplinganalysisandpartialc orrelationcontrollingfornonsynonymouspoly-
morphismdensity)gaveresultsthatsupportedtheroleofthedirec tpurifyingselection.Ontheotherhand,thefractionofsingletonsw ithinthesynonymoussiteswasnotanyhigherthanfractionexpec tedforneutrallyevolvingsites.Thisfrac-
tionwaselevatedforthe10%ofgeneshavingthehighest
Marek andTomala
GBE
D o w nl oa de df ro m htt ps :// ac ad e mi c. ou p. co m/ gb e/ ar tic le- ab str ac t/1 0/ 11 /2 98 6/ 51 29 08 5b yg ue st on 22 N ov expression,however,theeffectwasnotsubstantial.Onepos-
sibleexplanationisthatsynonymousvariantsareeffectivelyneut ralalsoinS.cerevisiaepopulation(withtheexceptionofsiteswithi nthesmallnumberofveryintensivelytranscribedgenes)andthat twooutofthreeappliedtestsweretooper-
missive.Theotherexplanationisthatsynonymousvariantsmay beeitherverydeleteriousandrapidlyselectedagainstorneu- tral.Thepaucityofsynonymousvariantswithlow/moderateselec tioncoefficientsmayexplaintheneutral-
likeshapeofthefrequencyspectrum.Interestinglyenough,such dichoto-
mousf itnesseffectsofthesynonymousmutationswerede- scribedforDrosophilamelanogaster(Lawrieetal.2013).
Inlightoftheabovefindings,itisinterestingtonote,that inbothyeastspeciesstrongcodonbiasexists,wherethesequen cesofhighlyexpressedgenesareenrichedinthemajorcodons(K urland1991;Hiraokaetal.2009).Suchcodonpat-
ternisusuallyexplainedasaresultoftheselectionfortrans- lationefficiencyandaccuracy,whichshouldbemoresignificantf ormoreabundantmRNAsandproteins(Rocha2004;Plotkinand Kudla2011).Suchselectionshouldleadtoadjustmentsbetween abundanceoftRNAmoleculesandtheusageofappropriatecod ons(Tulleretal.2010).Toesti-
matethisadjustment,wecalculatedSpearman’srankcorre- lationscoefficientsrelatingthesetwoquantitiesforthecodingseq uencesofallanalyzedgenes—
rhotaistatistics(seeMaterialsandMethods).Asexpected,obtain edrhotaicoeffi-
cientsarepositivelycorrelatedwithgene’sexpression(S.cer- evisiae:r¼0.696,P<2.2-10-16,n¼2,432;S.pombe:
r¼0.653,P<2.2-10-16,n¼5,069;supplementaryfig.S1, Supplementary Materialonline).Later,wecheckediftheycorre latewiththedensityofthesynonymousvariants.Whilewefound negativecorrelationfortheS.cerevisiaedata(rho¼-
0.11,P¼6.30-10-
8,n¼2,432),whichturnedinsignificantaftercontrollingforthemR NAlevel(rho¼0.022,P¼0.27,n ¼2,432),norelationwasvisiblef orS.pombe(rho¼0.04,P¼0.06,n¼5,069).Thus,alsothisanaly sisfailstodetectclearsignalsofselectionrelatedtotRNAabunda ncewithinthesynonymoussites.Itshouldbenotedthat,muchstro ngercorrelationswerevisibleforthenonsynonymousSNPsforb othspecies(S.cerevisiae:rho¼-0.38,P<2.2-10-
16,n ¼2,432;S.pombe:rho¼-0.30,
P<2.2-10-16,n¼5,069).Partialcorrelationscontrollingfor geneexpressionwerealsohighlysignificantinthiscase(S.cerevi siae:rho¼-0.13,P¼4.38-10-11,n¼2,432;S.pombe:rho¼- 0.12,P ¼2.91-10-
18,n¼5,069).Thismayimplythatthenonsynonymouschangesh avegreaterimpactonourmeasureoftranslationoptimization.In deed,manyoftheco-
donpairsthatareseparatedbyonesynonymouschangearedec odedbythesametRNAspecies.Moreover,thecorrela-
tionsbetweengene’scodoncompositionandtRNAgenecopynu mbersmaybealsoaffectedbydifferencesinaminoacidusage.T akentogether,ouranalysesshowthatthecon-
tributionofthenegativeselectionactingdirectlyonsynony- moussitestotheinvestigatedexpression–divergence
Marek andTomala
GBE
2994 GenomeBiol.Evol.10(11):2986–
2996d o i : 1 0 . 1 0 9 3 / g b e / e v y 2 2 5 A d v a n c e AccesspublicationOctober13,2018 anticorrelationmaybeextremelysmallorevennonexistent.Ou
rresultsalsoimplythatinbothanalyzedpopulationssuchselect ionmaybeveryweak,withtheselectioncoefficientsaround1/N
eorless.Notably,evensuchaveryweakselectionmightbesuffic ienttogeneratethecodonusagebias(McVeanandCharleswor th1999;Charlesworth2009).
Weakcorrelationsbetweenpolymorphismdensityandex- pressionwerealsofoundforthe50-UTRand30-
UTRregions.Moreover,analysesperformedtodeciphertheim pactofthebackgroundanddirectselectionconfirmedtheattrib utionofthelatter.Therefore,itispossiblethatsomeadditionalse lec-
tionpressures,actingdirectlyontranscripts,contributetothede clineinpolymorphismobservedforthehighlyexpressedgenes .OnepossiblefactorcouldbethestabilityofmRNA(Parketal.20 13),whileanothertherateoftranslationinitia-
tionandelongation(Kudlaetal.2009;Shahetal.2013;Yangeta l.2014).Thus,inpopulationsofbothyeastspecies,theuntransla tedregionsofthemostabundanttranscriptswouldbeundermor eintensepurifyingselection.
OuranalysesofthegenomicdataofS.cerevisiaeandS.pom beshowthatthedifferencesinthepurifyingselectionstrengthu nderliethenegativecorrelationbetweengeneex-
pressionanditsevolutionrate.Notably,thesedifferencesarebi genoughtobeeasilynoticeableeveninpopulationsofasingles pecies.Inbothexaminedpopulations,theexpression- dependentnegativeselectionaffectsmainlyaminoacidalter- ingvariants.Althoughtomuchweakerextent,italsoactsdirectly onpolymorphismwithinuntranslatedregionsofhighlyexpress edgenes.Thisresultmayimplythattheevolu-
tionaryconstraintisrelatednotonlytobiophysical/functionalpro pertiesofproteinsbutalsotosomefeaturesoftranscriptsortran slationinitiation.Interestingly,wefoundthatback-
groundselectioncanalmostentirelyexplaintheobservedpat- ternofsynonymouspolymorphism.Thestrengthofthepurifying selectionactingdirectlyonsilentsitesseemstobejustontheedg eofwhatcanbedetectedinbothanalyzedpopulations.Finally,t herelationbetweenexpressionandmu-
tationratediffersbetweenS.pombeandS.cerevisiae,asstrong mutageniceffectoftranscriptionisvisibleonlyinthelatterspecie s.Nevertheless,transcriptionassociatedmutationbiasdoesn otsignificantlycontributetotheconsideredcorre-
lationinanyoftheanalyzedyeasts.
SupplementaryMateri al
Supplementary dataareavailableatGenomeBiologyandEvo lutiononline.
Acknowledgment s
TheauthorsthankRyszardKoronaforreadingandcomment- ingourmanuscript.ThisworkwassupportedbythePolishNatio nalScienceCentergrant2014/13/B/NZ8/04668toK.T.
(2014/13/B/NZ8/04668toK.T.).
Causesof the expression-polymorphismanticorrelationinyeastpopulations
GBE
D o w nl oa de df ro m htt ps :// ac ad e mi c. ou p. co m/ gb e/ ar tic le- ab str ac t/1 0/ 11 /2 98 6/ 51 29 08 5b yg ue st on 22 N ov
LiteratureCited
AkashiH.2001.Geneexpressionandmolecularevolution.CurrOpinGenet Dev.11(6):660–666.
BehringerMG,HallDW.2015.Genome-
wideestimatesofmutationratesandspectruminSchizosaccharomyce spombeindicateCpGsitesarehighlymutagenicdespitetheabsenceofD NAmethylation.G3(Bethesda)6:149–160.
CharlesworthB.2009.Fundamentalconceptsingenetics:effectivepop- ulationsizeandpatternsofmolecularevolutionandvariation.NatRevGe net.10(3):195–205.
ChenX,ZhangJ.2013.Nogene-specificoptimizationofmutationratein Escherichiacoli.MolBiolEvol.30(7):1559–1562.
ChenX,ZhangJ.2014.Yeastmutationaccumulationexperimentsupportsel evatedmutationratesathighlytranscribedsites.ProcNatlAcadSciUSA .111(39):E4062.
Cs,ardiG,FranksA,ChoiDS,AiroldiEM,DrummondDA.2015.
AccountingforexperimentalnoiserevealsthatmRNAlevels,amplifiedb ypost-transcriptionalprocesses,largelydeterminesteady-statepro- teinlevelsinyeast.PLoSGenet.11(5):e1005206.
CurranJF,YarusM.1989.Ratesofaminoacyl-
tRNAselectionat29sensecodonsinvivo.JMolBiol.209(1):65–77.
DiedenhofenB,MuschJ.2015.cocor:acomprehensivesolutionforthestatist icalcomparisonofcorrelations.PLoSOne10(4):e0121945.
DrummondDA,BloomJD,AdamiC,WilkeCO,ArnoldFH.2005.Why highlyexpressedproteinsevolveslowly.ProcNatlAcadSciUSA.102(40 ):14338–14343.
DrummondDA,RavalA,WilkeCO.2006.Asingledeterminantdominatesth erateofyeastproteinevolution.MolBiolEvol.23(2):327–
337.DrummondDA,WilkeCO.2008.Mistranslation- inducedproteinmisfold-
ingasa dominantconstraintoncoding- sequenceevolution.Cell134(2):341–352.
FarlowA,etal.2015.Thespontaneousmutationrateinthefissionyeast Schizosaccharomycespombe.Genetics201(2):737–744.
FrigolaJ,etal.2017.Reducedmutationrateinexonsduetodifferentialmisma tchrepair.NatGenet.49(12):1684–1692.
HanawaltPC,SpivakG.2008.Transcription-
coupledDNArepair:twodecadesofprogressandsurprises.NatRevMol CellBiol.9(12):958–970.
HiraokaY,KawamataK,HaraguchiT,ChikashigeY.2009.Codonusagebiasisc orrelatedwithgeneexpressionlevelsinthefissionyeastSchizosaccharomy cespombe.GenesCellsDevotedMolCell14(4):499–509.
HudsonRR.2002.Generatingsamplesundera Wright–
Fisherneutralmodelofgeneticvariation.Bioinformatics18(2):337–338.
JeffaresDC,etal.2015.Thegenomicandphenotypicdiversityof Schizosaccharomycespombe.NatGenet.47(3):235–241.
KimN,Jinks-
RobertsonS.2012.Transcriptionasasourceofgenomeinstability.NatRevG enet.13(3):204–214.
KimS.2015.ppcor:anR packageforafastcalculationtosemi-partial correlationcoefficients.CommunStatApplMethods22(6):665–
674.KimuraM,OhtaT.1974.Onsomeprinciplesgoverningmolecularevolu- tion.ProcNatlAcadSciU S A.71(7):2848–2852.
KooninEV.2011.Aretherelawsofgenomeevolution?PLoSComputBiol.
7(8):e1002173.
KudlaG,MurrayAW,TollerveyD,PlotkinJB.2009.Coding-
sequencedeterminantsofgeneexpressioninEscherichiacoli.Science 324(5924):255–258.
KurlandCG.1991.Codonbiasandgeneexpression.FEBSLett.
285(2):165–169.
LawrieDS,MesserPW,HershbergR,PetrovDA.2013.Strongpurifyingselec tionatsynonymoussitesinD.melanogaster.PLoSGenet.9(5):e100352 7.
LimVI,CurranJF.2001.Analysisofcodon:anticodoninteractionswithintheri bosomeprovidesnewinsightsintocodonreadingandthege- neticcodestructure.RNA7(7):942–957.
MargueratS,etal.2012.Quantitativeanalysisoffissionyeasttranscrip- tomesandproteomesinproliferatingandquiescentcells.Cell151(3):671 –683.
MartincorenaI,SeshasayeeASN,LuscombeNM.2012.Evidenceofnon- randommutationratessuggestsanevolutionaryriskmanagementstrat egy.Nature485(7396):95–98.
McVeanGAT,CharlesworthB.1999.Apopulationgeneticmodelfortheevolu tionofsynonymouscodonusage:patternsandpredictions.GenetRes.7 4(2):145–158.
MiuraF,etal.2006.Alarge-scalefull-
lengthcDNAanalysistoexplorethebuddingyeasttranscriptome.ProcN atlAcadSciUS A.103(47):17846–17851.
NagalakshmiU,etal.2008.Thetranscriptionallandscapeoftheyeastgeno medefinedbyRNAsequencing.Science320(5881):1344–1349.
NeiM,GojoboriT.1986.Simplemethodsforestimatingthenumbersofsynon ymousandnonsynonymousnucleotidesubstitutions.MolBiolEvol.3:41 8–426.
P,alC,PappB,HurstLD.2001.Highlyexpressedgenesinyeastevolve slowly.Genetics158(2):927–931.
ParkC,ChenX,YangJ-
R,ZhangJ.2013.DifferentialrequirementsformRNAfoldingpartiallyex plainwhyhighlyexpressedproteinsevolveslowly.ProcNatlAcadSciUS A.110(8):E678–E686.
ParkC,QianW,ZhangJ.2012.Genomicevidenceforelevatedmutationrate sinhighlyexpressedgenes.EMBORep.13(12):1123–1129.
PlotkinJB,KudlaG.2011.Synonymousbutnotthesame:thecausesandcon sequencesofcodonbias.NatRevGenet.12(1):32–42.
RCoreTeam.2015.R:alanguageandenvironmentforstatisticalcom- puting.Vienna(Austria):RFoundationforStatisticalComputing.Availabl efrom:http://www.R-project.org/.
RochaEPC.2004.CodonusagebiasfromtRNA’spointofview:redun- dancy,specialization,andefficientdecodingfortranslationoptimiza- tion.GenomeRes.14(11):2279–2286.
RochaEPC,DanchinA.2004.Ananalysisofdeterminantsofaminoacidssub stitutionratesinbacterialproteins.MolBiolEvol.21(1):108–116.
ShahP,DingY,NiemczykM,KudlaG,PlotkinJB.2013.Rate- limitingstepsinyeastproteintranslation.Cell153(7):1589–1601.
SkellyDA,etal.2013.Integrativephenomicsrevealsinsightintothestruc- tureofphenotypicdiversityinbuddingyeast.GenomeRes.23(9):1496–
1504.
SkellyDA,RonaldJ,ConnellyCF,AkeyJM.2009.Populationgenomicsofintr onsplicingin38Saccharomycescerevisiaegenomesequences.Geno meBiolEvol.1:466–478.
TullerT,WaldmanYY,KupiecM,RuppinE.2010.Translationefficiencyisdet erminedbybothcodonbiasandfoldingenergy.ProcNatlAcadSciUSA.
107(8):3645–3650.
VavouriT,SempleJI,Garcia-
VerdugoR,LehnerB.2009.Intrinsicproteindisorderandinteractionpro miscuityarewidelyassociatedwithdosagesensitivity.Cell138(1):198–
208.
WallDP,etal.2005.Functionalgenomicanalysisoftheratesofproteinevo lution.ProcNatlAcadSciU SA.102(15):5483–5488.
WeinbergDE,etal.2016.Improvedribosome-footprintandmRNAmeas- urementsprovideinsightsintodynamicsandregulationofyeasttrans -lation.CellRep.14(7):1787–1799.
XuZ,etal.2009.Bidirectionalpromotersgeneratepervasivetranscriptioniny east.Nature457(7232):1033–1037.
YangJ-R,ChenX,ZhangJ.2014.Codon-by-codonmodulationoftrans- lationalspeedandaccuracyviamRNAfolding.PLoSBiol.12(7):e100191 0.
Marek andTomala
GBE
2996 GenomeBiol.Evol.10(11):2986–
2996d o i : 1 0 . 1 0 9 3 / g b e / e v y 2 2 5 A d v a n c e AccesspublicationOctober13,2018
D o w nl oa de df ro m htt ps :// ac ad e mi c. ou p. co m/ gb e/ ar tic le- ab str ac t/1 0/ 11 /2 98 6/ 51 29 08 5b yg ue st on 22 N ov YangJ-R,LiaoB-Y,ZhuangS-
M,ZhangJ.2012.Proteinmisinteractionavoidancecauseshighlyexpre ssedproteinstoevolveslowly.ProcNatlAcadSciUSA.109(14):E831–
E840.
YangJ-R,ZhuangS-M,ZhangJ.2010.Impactoftranslationalerror- inducedanderror-
freemisfoldingontherateofproteinevolution.MolSystBiol.6:421.
YassourM,etal.2009.Abinitioconstructionofaeukaryotictranscrip- tomebymassivelyparallelmRNAsequencing.ProcNatlAcadSciUSA.1 06(9):3264–3269.
ZhangJ,YangJ-
R.2015.Determinantsoftherateofproteinsequenceevolution.NatRev Genet.16(7):409–420.
ZhangL,SchroederS,FongN,BentleyDL.2005.Alterednucleosomeoccup ancyandhistoneH3K4methylationinresponseto‘transcrip-
tionalstress’.EMBOJ.24(13):2379–2390.
ZhouT,GuW,WilkeCO.2010.Detectingpositiveandpurifyingselectionatsy nonymoussitesinyeastandworm.MolBiolEvol.27(8):1912–1922.
ZhuYO,SiegalML,HallDW,PetrovDA.2014.Preciseestimatesofmu- tationrateandspectruminyeast.ProcNatlAcadSciUSA.111(22):E231 0–E2318.
Associateeditor:LaurenceHurst