• Nie Znaleziono Wyników

The contribution of purifying selection, linkage and mutation bias to the negative correlation between gene expression and polymorphism density in yeast populations

N/A
N/A
Protected

Academic year: 2022

Share "The contribution of purifying selection, linkage and mutation bias to the negative correlation between gene expression and polymorphism density in yeast populations"

Copied!
17
0
0

Pełen tekst

(1)

D o w nl oa de df ro m htt ps :// ac ad e mi c. ou p. co m/ gb e/ ar tic le- ab str ac t/1 0/ 11 /2 98 6/ 51 29 08 5b yg ue st on 22 N ov

GBE

TheContributionofPurifyingSelection,Linkage,andMutation BiastotheNegativeCorrelationbetweenGeneExpressionan dPolymorphismDensityinYeastPopulations

AgnieszkaMarekandKatarzynaTomala*

InstituteofEnvironmentalSciences,JagiellonianUniversity,Krakow,Poland

*Correspondingauthor:E-mail:katarzyna.tomala@uj.edu.pl.

Accepted:October10,2018

Abstract

Thenegativecorrelationbetweentherateofproteinevolutionandexpressionlevelofagenehasbeenrecognizedasauniversallawofthe evolutionarybiology(Koonin2011).Inourstudy,weapplyapopulation-

basedapproachtosystematicallyinvestigatetherelativeimportanceofunequalmutationrate,linkage,andselectionintheoriginoftheex pression-

polymorphismanticorrelation.WeanalyzedtheDNAsequenceofproteincodinggenesof24Saccharomycescerevisiaeand58Schizos accharomycespombestrains.Wefoundthathighlyexpressedgeneshadasubstantiallydecreasednumberofpolymorphicsiteswhenc omparedwithgenestranscribedlessextensively.Thisexpression-

dependentreductionwasespeciallystronginthenonsynonymoussites,althoughitwasalsopresentinthesynonymoussitesanduntransl atedregions,bothupanddownofagene.Mostimportantly,nosuchtrendwasfoundinintrons.Weusedtheseobservations,aswellasanal ysesofsitefrequencyspectraanddatafrommutationaccumu-

lationexperiments,toshowthatthepurifyingselectionactingonnonsynonymoussiteswasthemain,butnotexclusive,factorimpedingm olecularevolutionwithinthecodingsequencesofhighlyexpressedgenes.Linkagecouldnotfullyexplaintheobservedpatternofpolymor phismwithintheuntranslatedregionsandsynonymoussites,althoughthecontributionofselectionactingdirectlyonsynonymousvariant swasextremelysmall.Finally,wefoundthattheimpactofmutationalbiaswasrathernegligible.

Keywords:p roteinevolution,polymorphism,mutationbias,transcription,Saccharomycescerevisiae,Schizosaccharomycespombe.

Introduction

Thesequencesofdifferentproteinsfromthesamespeciesevolv eatdifferentrates.Studiescomparinghomologoussequencesb etweenspeciesdemonstratedthatthebestknownpredictorofth eproteindivergencerateisitsexpres-

sionlevel(P,aletal.2001;RochaandDanchin2004;Drummonda ndWilke2008).Thisresultcounterssomefor-

merexpectations,especiallythoseassumingthatthefunc- tionalimportanceofa proteinisa chiefdeterminantoftherateofe volution(KimuraandOhta1974).Severaldistinctexplanationsh avebeenproposedtoclarifywhyabundantproteinsundergoslow erevolution(foranextensivereviewseeZhangandYang2015).

Theseincludeselectionagainsterroneoustranslationandprotei ninstability(Drummondetal.2005;Yangetal.2010),selectionon proteinsynthesiseffi-

ciencyandspeed(Akashi2001;PlotkinandKudla2011),se- lectionontranscriptstability(Parketal.2013)andpreservationof properphysicalinteractionsofproteins

(Vavourietal.2009;Yangetal.2012).Thus,mostofthecurrentlyt oprankedhypothesesassumestrongerpurifyingselectionopera tingonfinalproteinproductsofthehighlyexpressedgenes.Thee xplanationslinkedtotranslationalro-

bustnessandmRNAfoldingenergyputadditionalconstraintsont ranscripts.

Thepossibleimpactofanunequalmutationrateonexpression -divergenceanticorrelationisusuallynotconsid-

ered.Thiscanbejustifiedbythefindingsofgenome-

wildstudiesofbacteria,yeast,andhuman(Parketal.2012;Chen andZhang2013,2014).Allofthesestudiesreportedelevatedmut ationratesforintensivelyexpressedgenes.Thisinturn,impliesth atmutationbiasaloneshouldgeneratepos-

itive,notnegative,correlationbetweengeneexpressionandevol utionrate.Itshouldbenotedthough,thattwooftheabovestudiesu seddataobtainedformutantswithdefectiveDNArepairsystem(

Parketal.2012;ChenandZhang2013).Relationsfoundforsuch mutantsmightnotaccuratelyreflect

©TheAuthor(s)2018.PublishedbyOxfordUniversityPressonbehalfoftheSocietyforMolecularBiologyandEvolution.

ThisisanOpenAccessarticledistributedunderthetermsoftheCreativeCommonsAttributionNon-CommercialLicense(http://creativecommons.org/licenses/by- nc/4.0/),whichpermitsnon-commercialre-use,distribution,andreproductioninanymedium,providedtheoriginalworkisproperlycited.Forcommercialre- use,pleasecontactjournals.permissions@oup.com

(2)

D o w nl oa de df ro m htt ps :// ac ad e mi c. ou p. co m/ gb e/ ar tic le- ab str ac t/1 0/ 11 /2 98 6/ 51 29 08 5b yg ue st on 22 N ov Causesof the expression-polymorphismanticorrelationinyeastpopulations

GBE

2987

GenomeBiol.Evol.10(11):2986–

2996d o i : 1 0 . 1 0 9 3 / g b e / e v y 2 2 5 A d v a n c e AccesspublicationOctober13,2018 thoseexistinginthewild-typeorganisms.Moreover,theneg-

ativecorrelationbetweentranscriptionandmutationratewasfou ndinwildE.colipopulations(Martincorenaetal.2012).Allthingsc onsidered,althoughmutationbiasisunlikelytobethemainevoluti onaryforceleadingtoexpression-

divergenceanticorrelation,itsimpactmaydifferbetweenspecies ,andthusitscontributionisworthfurtherinvestigation.

Here,weanalyzedthedistributionandfrequenciesofsin- glenucleotidevariantsinpreviouslypublishedwholegenomese quencingdatasetsfor24Saccharomycescerevisiaeand58Schi zosaccharomycespombestrainsisolatedfromtheirworld- wide/extantpopulations.Wereasonedthatfocusingonthegenet icvariationwithinpopulationscouldbea novelandpromisingwa ytoestimatetherelativecontributionofpurify-

ingselection,linkage,andmutationalpressureincreatingtheobs ervedpatternofexpression-

divergenceanticorrelation.InthecaseofS.cerevisiae,weassign edatotalof57,000singlenucleotidevariantstothecodingsequen ces,introns,30-UTRand50-

UTRofrv2,400genes(forwhichtheboundaryofatleastoneUTR regionisknown).Analogously,weinvestigatedrv150,000SNPsi n5,060S.pombegenes.Weappliedtwotypesoftestsforallvaria nts.First,weanalyzedtherelationbetweenpolymorphismdensit yandgeneexpressionlevel.Similartests,regardingnucleotideo raminoacidsubstitutions,hadbeenpreviouslyusedincomparis onsbetweencloselyanddistantlyrelatedspecies(Walletal.200 5;Zhouetal.2010).Nevertheless,theirresultsaresensitivetoba ckgroundselec-

tionandtheirreliabilitycriticallydependsonequalmutationratesf orcomparedgenesandgenes’regions.Therefore,wedecidedto extendouranalysestoencompasstheavailabledataonthedistri butionofmutationsaccumulatingunderthelaboratorypropagati onscheme,inwhichtheoperationofthenaturalselectionisminim al.Thisallowedustoestimatethestrengthofthetranscriptiondriv enmutationalbiasesinbothspecies.Forthesecondtest,wecalc ulatedthefractionsofsingletonswithinvariantsassignedtodiffer entgenesandtheirregions,andcheckedwhetherthesefractions dependongeneexpressionlevelandtypeofthemutatedsite.Alth oughthistesthaslowerpowertodetectnegativeselec-

tion,asitlosesinformationaboutthemostsevereandthusnotpre sentinpopulationsvariants,itsresultsareunaffectedbybackgro undselectionandmutationbiases.Therefore,thistestseemstob eespeciallyrelevanttotheanalysisofpolymor-

phismwithinuntranslatedregionsandsynonymoussites.

Wefoundthatinthenaturalpopulationsthelevelofpoly- morphismwasnegativelycorrelatedwithgeneexpressiononlyi nthefunctionalregionsofproteincodinggenes.Itissignificantton otethatsuchpatternwasnotfoundforthelaboratorymutationacc umulationlines.Notably,fractionofsingletonswithinvariantsassi gnedtofunctionalregionsofgenesincreasedwithgeneexpressi on,althoughthistrendwasratherweak.Thus,weshowthatinpop ulationsofbothyeastspeciesthedeclineinthegeneticvariationw ithinextensivelyexpressedgeneswasmostlikelyproducedbyth e

purifyingselectionoperatingongeneproteinproductsandtransc ripts.

MaterialsandMethods

SaccharomycescerevisiaePolymorphism

Weanalyzedthesequencesof23Saccharomycescerevisiaestr ainsfromSkellyetal.(2013)(http://www.yeastrc.org/g2p- data/raw-data/genome-

sequences).Thesequencedstrainswerehaploidsderivedfromn aturaldiploidalisolates.Rawreadswerealignedtothereference sequence(S288C,ge-nomeassembly:R64-1-

1)usingbowtie2.BAMfileswerema-

nipulatedwithsamtoolsprograms(version0.1.19).Variantswer ecalledseparatelyforeachstrainwithsamtoolsmpileupandf ilte redwithgrepandcustompythonscript.First,weremovedallindel s.Forfurtheranalysis,weusedonlySNPswithminimalQUALval ueof30,withcoverage>5andwithatleast80%ofreadssupportin gthealterednucleotide.Themappingandfilteringscripts(mappi ng.sh;readVCF.py)canbefoundinthesupplementaryfileS1,Su pplementaryMaterialonline.

Thefilteredvariantswereassignedtothecodingsequen- ces,introns,50-UTRand30-UTRregionsof2,432proteincod- ingSaccharomycescerevisiaegenes.Inordertobeanalyzedfur ther,geneswererequiredtofulfilltwocriteria:

1. KnownlocationofatleastoneUTRregion(retrievedfromtheS GDYeastMinepagehttps://yeastmine.yeastgenome.org/y eastmine):TheUTRslocationswerecompiledfromformerst udies(Zhangetal.2005;Miuraetal.2006;Nagalakshmietal.2 008;Xuetal.2009;Yassouretal.2009).Incaseswherebound ariesoftheUTRregionsdif-

feredbetweenstudies,wedecidedtouseonesinvolvingthelo ngeststretchesofDNA.

2. Codingsequencenonoverlappingwithanyothergene First,weassignedvariantstothecodingsequencesandintron softheanalyzedgenes.TheremainingSNPswerethenassig nedtotheUTRregions.Theveryshort(<9bp)andnonpolymor phicUTRregionswereexcludedfromfurtheranalysis.Varian tsinthecodingsequenceweresplitintosyn-

onymousandnonsynonymousgroupsbasedontheresultsfro mtheVariantEffectPredictortool(http://www.ensembl.org/i ndex.html).Insum,weanalyzed56,939SNPs.Next,wecalcu latedthenumberofpolymorphicsitesforeachgeneandgener egion(dividedbythelengthofrespectiveregions).Inaddition, thefractionofsingletonswasdetermined.Theexpectednum berofpotentialsynonymousandnonsynony-

moussiteswascalculatedforthecodingsequencesoftherefe rencestrain.WeusedtheNei–

Gojoborimethodundertheassumptionofequalmutationprob abilitiesforalltypesofsubstitutions(NeiandGojobori1986).

Obtainedvalueswerethenusedasthelengthsofsynonymou sandnonsynonymousregions.

(3)

D o w nl oa de df ro m htt ps :// ac ad e mi c. ou p. co m/ gb e/ ar tic le- ab str ac t/1 0/ 11 /2 98 6/ 51 29 08 5b yg ue st on 22 N ov Marek andTomala

GBE

DataonthemRNAlevel(mRNAmoleculesperhaploidcell)we readoptedfromCs,ardietal.(2015).Dataongeneindis-

pensabilitywasretrievedfromSGDYeastMinepagehttps://yea stmine.yeastgenome.org/yeastmine).Thecompileddataisava ilableinthesupplementaryfileS2,SupplementaryMaterialonlin e.

Thefractionofsingletonsexpectedinneutralityforthefoldedsi tefrequencyspectrumwasbasedononemillionsim-

ulatedsamplesofsize24.Forthecoalescentsimulations,weuse dprogramms(Hudson2002),withthecommand:ms241000000 -

s1.Then,wecountedcaseswherethederivedallelewaspresenti nonlyoneorin23copiespersample.

SchizosaccharomycespombePolymorphism

DataonS.pombevariantsweredownloadedfromthewebpage:

https://figshare.com/articles/SNP_calls_for_all_161_strains_

/3978303.ThesedatawereoriginallypublishedbyJeffaresetal.

(2015)andrepresentsinglenucleotidepolymor-

phismfoundin161naturalisolatesofSchizosaccharomycespo mbe.Thedownloadedvcffilewasalreadyannotated.Forthepur poseoffurtheranalyses,weextractedfromtheorig-

inalfileinformationonvariantspresentin57strains,de- scribedbyauthorsoftheoriginalresearchpaperas“nonredunda nt.”WefilteredthesevariantsusingthesamecriteriaasforS.cere visiae.SNPsthatpassedthefilterswerethenassignedtothediffe rentgeneregions(synonymouscod-

ing,nonsynonymouscoding,intron,30-UTR,50-

UTR)onthebasisofannotationsprovidedintheoriginalvcffile.In sum,weassigned153,407SNPsto5,069proteincodinggenes;

outofthose2,496containedintrons.Dataongeneexpres- sionwastakenfromMargueratetal.(2012)

(numberofmRNAmoleculesperhaploidcell).Alistofessentialg eneswasretrievedfromPomBase(https://www.pombase.org/

).Numberofpotentialsynonymousandnonsynonymoussitesw asobtainedasdescribedforS.cerevisiae.Thecompileddataseti savailableinthesupplementaryfileS2,SupplementaryMaterial online.

Theexpectedfractionofsingletonswasdeterminedanal- ogouslyasdescribedforS.cerevisiae,butthesizeofthesim- ulatedsampleswassetto58.

AnalysisoftheS.cerevisiaeMutationAccumulationEx periment

WeusedresultsofthestudybyZhuetal.

(2014).Thestudydescribes873singlenucleotidemutationsthat occurredin145yeastlinesduringthemutationaccumulationexp eriment.Outofthese,636wereassignedto577Saccharomyces cer-evisiaegeneswithmRNAlevelgiveninCs,ardietal.

(2015).Inthisprocedure,weusedtheVariantEffectPredictortoo l.Next,wecalculatedEMAstatistic:EMA¼Rni-

ln(mRNAi)/Rni,wherenistandsforthenumberofmutationsthatw ereassignedtotheigeneandmRNAiistheigenemRNAlevel.Thu s,theEMAstatisticistheweightedaverageofthenatural

logarithmsofthemRNAlevel,calculatedforthegenesthatweref oundmutatedinthemutationaccumulationexperi-

ment.CompileddatacanbefoundinthesupplementaryfileS2,S upplementaryMaterialonline.

SimulationofMutagenesisinS.cerevisiae

WeusedthegenomicdataonthemRNAexpression(Cs,ardietal .2015)andgeneGCcontent(Ensembl)tocreatetwolists.TheLAT listwascomposedoftheln(mRNAi)valuesfor5,850Saccharomy cescerevisiaegenes.Individualln(mRNAi)valueswerereprese ntedinnumbersequaltothenumbersofATpairsinrespectivege nes(insum5,349,347entries).TheLGClistwascomposedaccordi ngtothesamepatternforGCpairs(3,484,611entriesintotal).Th en,werandomlychose636valuesfromtheabovelists.Totakeac countforthehighermutationprobabilityoftheGCnucleotides,68

%theln(mRNAi)valuesweretakenfromtheLGClistand32%fromt heLATlist(assuggestedbyZhuetal.2014).Thearith-

meticmeanofeachrandomsample,ESwasthencalculated.The procedurewasrepeated10,000times.Weconsideredtheabove proceduretosimulatethemutationalprocessinwhichtheprobab ilityofmutationwithinanygenesequenceisproportionaltogenel engthanddependsongeneGCcon-

tent(simulated_data1).Inanothersimulation,weproceededint hesameway,withtheexceptionthattheln(mRNAi)valueswered rawnfromthemergedLATandLGClists.Thus,thesecondsimulation representedthesituationwheretheprob-

abilityofanygenemutatingdependsonlyonitslengthandnotonit sindividualGCcontent(simulated_data2).Thescriptusedtogen eratedataisinthesupplementaryfileS1,SupplementaryMateri alonline.

AnalysisoftheS.pombeMutationAccumulationExp eriments

Wemergedresultsfromtwomutationaccumulationexperi- ments:BehringerandHall(2015)

(422singlenucleotidemutations)andFarlow et al. (2015) (326singlenucleotidemutations).Weannotated615ofthesemut ationsto533differentS.pombegeneswithmRNAlevelgiveninM argueratetal.(2012)

(http://fungi.ensembl.org/Schizosaccharomyces_pombe/Too ls/VEP).TheEMAstatisticwascalculatedasdescribedforS.cerev isiae.Thecorrespond-

ingdatasetisinthesupplementaryfileS2,SupplementaryMateri alonline.

SimulationofMutagenesisinSchizosaccharomycespombe Simulateddatasetswerecreatedanalogouslyasdescribedfor S.cerevisiae.WeusedtheexpressiondatafromMargueratetal.

(2012)

(5,059proteincodinggenes)anddataongeneGCcontentfromt heEnsemblfungiwebpage.TheS.pombeLATandLGClistscontain ed6,907,927and4,071,402entriesaccordingly.Toobtainthesi mulateddatadependentonboth

(4)

Causesof the expression-polymorphismanticorrelationinyeastpopulations

GBE

2989

D o w nl oa de df ro m htt ps :// ac ad e mi c. ou p. co m/ gb e/ ar tic le- ab str ac t/1 0/ 11 /2 98 6/ 51 29 08 5b yg ue st on 22 N ov

GenomeBiol.Evol.10(11):2986–

2996d o i : 1 0 . 1 0 9 3 / g b e / e v y 2 2 5 A d v a n c e AccesspublicationOctober13,2018 genelengthanditsGCcontent,73.6%ofvalueswastakenfromth

eLGClistbecause457ofthe748mutationsfoundinbothexperime ntsrelatedtotheGCnucleotides.ThewholeS.pombegenomeco ntains4,538,978GCand8,052,273ATbasepairs.Thus,theprob abilityofbeingmutatedis2.786timeshigherfortheGCnucleotide sthanfortheATnucleo-

tides.Thisgivestherelativemutationfrequenciesof0.736and0.

264fortheGCandATnucleotidesaccordingly.Ateveryof10,000 iterations,anaverageof615randomlyse-

lectedln(mRNAi)valueswascalculated.Thescriptusedtogener atedataisinthesupplementaryfileS1,SupplementaryMaterial online.

AnalysisofCodonUsage

Wecountedcodonspresentwithinthecodingsequencesofallan alyzedgenes(forthereferencestrainsofS.cerevisiaeandS.po mbe).Thesecountswerethencorrelated(Spearman’srankcorr elation)withthenumbersofthematch-

ingtRNAgenespresentinthegenomesofbothspecies(mul- tipliedbytheirwobbleparameters).Correlationcoefficientsobtai nedforeachgeneconstitutedtherhotaistatistics.IncaseofS.cere visiae,tRNAgenecopynumbersandwobbleparam-

eterswereadoptedfromWeinbergetal.

(2016).ThegenecopynumbersoftRNAgenespresentinS.pom begenomeweretakenfromGtRNAdb(http://gtrnadb.ucsc.edu/

).ThewobbleparametersweresetasdescribedinCurranandYar us(1989)andLimandCurran(2001).Correlationcoeffi- cientsobtainedforeachgenealongwithtRNAcopynumbersand wobbleparametersusedaregiveninthesupplementaryfileS2,S upplementaryMaterialonline.

Dataanalysis

DatawasanalyzedinR(RCoreTeam2015).Weusedthefollowi ngfunctions:cor.test,glm,pcor.testfromtheppcorpackage(Kim 2015).Correlationcoefficientswerecomparedwithconcor(http:

//comparingcorrelations.org/)(DiedenhofenandMusch2015).

Result s

Westudiedsinglenucleotidepolymorphismwithindozensofyea ststrainsisolatedfromextantpopulationsofSaccharomycescer evisiaeandSchizosaccharomycespombe.Wefoundthatthede nsityofpolymorphicsiteswithingenescorrelatesnegativelywitht heexpressionlevelmeasuredasthenumberofmRNAmolecule sperhaploidcell.Inbothyeastspecies,therelationshipisespecia llyclearforthenonsynon-

ymousvariants(fig.1).A possibleinterpretationisthatthepurifyin gselectionagainstchangesinproteinsequenceoper-

atesandismoreintenseinthehighlyexpressedgenes.Indeed,th eratioofnonsynonymoustosynonymouspolymor-

phicsitesdensities(pN/pS)alsocorrelateswiththegeneex- pression(S.cerevisiae:Spearman’srankcorrelation

rho¼-0.35,P<2.2x10-

16,n¼2,373;S.pombe:Spearman’srankcorrelationrho¼- 0.28,P <2.2x10-16,n¼4,882).

Inthenextstep,weanalyzedminorsitefrequencyspectraforb othpopulationstotestwhetherhighlyexpressedgeneshaveele vatedfractionofsingletons.Wereasonedthatdele-

teriousmutationscouldbestillpresentinpopulationsiftheywere recent.Suchmutationswouldbechieflyfoundamongsingleton s,astheyareremovedfromthepopulationbeforereachinghigh erfrequencies.Thisshouldleadtotheelevatedfractiono f s ingle tonsw ithina llv ariants.I ndeed,w ef oundhigherthanexpected (forneutralpolymorphism)fractionsofnonsynonymouss inglet onsi nt hea nalyzedd atas et:0 .40versusexpected0.29forS.ce revisiaeand0.40versus0.23forS . p ombe.M oreover,t hesef r actionsw erea lsoh igherthanfractionsfoundforintronicvarian ts(0.29and0.26for

S.cerevisiaeandS.pombeaccordingly),whatindicatesthatthe highpercentageofnonsynonymoussingletonscannotbeentire lyexplainedneitherbythedemographicprocess(e.g.,populatio ngrowth)norbyinsufficientvariantfilteringproce-

dure(fig.2).M orei mportantly,t hef ractiono fn onsynony- moussingletonscorrelatespositivelywithgeneexpressionlev elforbothspecies(S.cerevisiae:rho¼0.06,P¼0.004,n¼2,13 3;S.pombe:rho¼0.11,P¼1.75x10-

13,n¼4,593).Thus,thenonsynonymoussitefrequencyspectra showclearsignsofthenegativeselectioninS.cerevisiaeand S.p ombep opulations.A bovea ll,h igherf ractiono fn ewlyarriv ingaminoacidalteringmutationsispresentinsequencescoding forabundantproteinsindicatingstrongerselection.

Wealsoaskedwhethertheobservedlowpolymorphisminhig hlyexpressedgenescouldbeascribedtosomespecificforces,s uchasselectionactingontranscriptsorcodoncom-

position.Inbothyeastspecies,thenegativecorrelationbe- tweengeneexpressionandvariantdensityisvisiblenotonlyforn onsynonymousbutalsoforsynonymoussitesandtheUTRregio ns.Inthetwolatterregions,however,thecorrela-

tionismuchweaker(fig.1).Suchpatternmayindicatethatabund anttranscriptsarealsomoreconstrained.Analterna-

tive,butnotmutuallyexclusiveexplanation,isthattheregionsofhi ghlyexpressedgenesaredepletedofpolymorphismbe- causetheyarelinkedtomoreconstrainednonsynonymoussites.

Weaddressedthisissueusingthreedistinctapproachesdescrib edbelow.

First,weperformedpartialcorrelations(mRNAvsvariantden sity)whilecontrollingforthenonsynonymousvariantden- sities.Thepurifyingselectionactingonnonsynonymousvar- iantcausesareductionintheeffectivepopulationsizesoftheadja centregions.Therefore,ifthedropinvariantdensitiesinUTRregi onsandsynonymoussitesresultsexclusivelyfromlinkage,thent hepartialcorrelationsshouldturninsignificant.Thisisnotthecase ,asmostofthecorrelationsremainvalid(table1).Interestingly,th epartialcorrelationbetweensynon-

ymouspolymorphismdensityandgeneexpression(control- lingfornonsynonymouspolymorphism)remainssignificant

(5)

= =

Marek andTomala

GBE

D o w nl oa de df ro m htt ps :// ac ad e mi c. ou p. co m/ gb e/ ar tic le- ab str ac t/1 0/ 11 /2 98 6/ 51 29 08 5b yg ue st on 22 N ov S.

c er e vi

S.

p o m

S.

c er e vi si

S.

p o m fractionofsingletons polymorphicsitesdensity

3UTR 5UTR introns nonsynonymous_sites synonymous_sites

rho=−0.117 p=8.29e−09 n=2425

rho=−0.061 p=0.0027n

=2398

rho=0.19 p=0.015 n=162

rho=−0.442 p<2.2e−16 n=2432

rho=−0.194 p<2.2e−16 n=2432 0.10

0.05

0.00 0.20

0.15

rho=−0.175 p<2.2e−16 n4 7 1 8

rho=−0.179 p<2.2e−16 n=4664

rho=−0.033 p=0.105n=

2496

rho=−0.338 p<2.2e−16 n=5069

rho=−0.054 p=1.26e−04 n5 0 6 9 0.10

0.05

0.00

−4−20 2 4 6−4−20 2 4 6−4−20 2 4 6−4−20 2 4 6−4−20 2 4 6

ln(mRNA)

Fig.1.—Relationshipbetweengeneexpression(mRNAmoleculespercell)andpolymorphism(numberofvariantsperregionlength).

1.00 0.75

p=5.7e−0

8 p=0.1 0

2 na

p=2.e − 1 2

p=0.4

6 Table1

ResultsoftheSpearman’sRankPartialCorrelationsbetweenmRNALevelan dNumberofPolymorphicSites(perregionlength),Controllingfor

0.50 4270 3305 799 8939 20695 NonsynonymousSNPsDensity

0.25

2608 1665 334 5976 8223

0.00

Region Rho Pvalue n

Saccharomycescerevisiae

1.00 p=4.7e−09 p=1.7e−06 na p<2.2e−16 p=0.98 Synonymoussites -0.0795 8.76e-05 2,432

0.75 50-UTR

30-UTR

-0.0299 -0.0634

0.14 0.0018

2,398 2,425

0.50 23550 17478 6394 20845 38283 Schizosaccharomycespombe

0.25 Synonymoussites 0.0085 0.55 5,069

0.00 9780 7083 2263 14197 13534 50-UTR -0.1275 2.39e-18 4,664

3UTR 5UTR introns nsyn syn 30-UTR -0.1251 6.26e-18 4,718

Fig.2.—

Fractionofsingletonswithinallanalyzedvariants(minorallelefrequencyspec tra).Partsofthebarscorrespondingtosingletonsarefilledwiththeopaqueco lors.Numbersinsidebarsindicatenumberofsiteswithineachcategory(singl etons/morefrequentSNPs).Greenlineshowsfractionofsingletonsfoundini ntrons—

usedasanapproximationoftheneutralvalue(S.cerevisiae:0.29;S.pombe:0 .26).YellowlineindicatesfractionofsingletonsexpectedforWright–

Fisherpopulation(S.cerevisiae:0.29;S.pombe:0.23).Significanceofv2t est scomparingfrac-

tionofsingletonswithinSNPsinagivenregionwithfractionrecordedforintron saregivenabovethebars.

forS.cerevisiaebutdisappearsforS.pombe.Thismaysuggestt hattheselectiononcodonusageispresentonlyintheformerspec ies.However,eveninS.cerevisiaetheobservedcorrela- tionisrestrictedtoasmallsubsetofgenesexpressedataveryhigh level.Thepartialcorrelationisnolongersignificant,whenremovi ng118geneswiththehighestexpression(i.e.,

>40mRNAmoleculespercell)(Spearman’srankcorrelationrho¼- 0.035,P ¼0.09,n ¼2,314).Itshouldbealsonoted

(6)

Marek andTomala

GBE

2990 GenomeBiol.Evol.10(11):2986–

2996d o i : 1 0 . 1 0 9 3 / g b e / e v y 2 2 5 A d v a n c e AccesspublicationOctober13,2018 here,thatpartialcorrelationsperformedonnoisydatamayprod

ucespuriouslysignificantresults(Drummondetal.2006).

Oursecondtestusestheobservationthatbackgroundse- lectionshouldremovepolymorphismnotonlyfromfunctionalre gionsofgenesbutalsofromintrons.Therefore,ifcorrela- tionsbetweenSNPsdensityandmRNAlevel,visibleforUTRsa ndsynonymoussites,wereonlybyproductsofselectionact- ingonnonsynonymousvariants,thensimilarlystrongcorre- lationsshouldbeseeninintrons,whentakingintoaccountsmall ersamplesize(i.e.,smallernumberofgeneswithintrons).Tothi send,werandomlydrewsampleof162genesforS.cerevisiaea nd2,496genesforS.pombefromtheorig-

inallyanalyzeddatasets.Thesenumbersequalnumbersofge neswithintronsusedinouranalyses.Next,weperformedSpea rman’srankcorrelationsbetweenmRNAlevelandvari- antdensitiesfortherestricteddata.Thisprocedurewas

(7)

Causesof the expression-polymorphismanticorrelationinyeastpopulations

GBE

p=0.43

3915 355

6291 587

p=0.0304

2963 342

4421 549

p=9.11e− 04

8697 242

14453 462

p=2.43e− 03

19537 1165

27220 1705

D o w nl oa de df ro m htt ps :// ac ad e mi c. ou p. co m/ gb e/ ar tic le- ab str ac t/1 0/ 11 /2 98 6/ 51 29 08 5b yg ue st on 22 N ov

S.

c er ev isi a

S.

p o m

rho(SNPdensityvsmRNA)fractionofsingletons

repeated500times.Thedistributionsofobtainedcorrelationcoef ficientsforallfunctionalregionsareshowninfigure3Aandinsuppl ementaryfileS2,SupplementaryMaterialonline.Notably,these distributionsareshiftedtowardnegativevaluesand,withtheexc eptionofsynonymousvariantsinS.pombe,their95confidencein tervalsliebelowthevalueobtainedforintrons(supplementaryfile S2,Supplementary Materialon-

line).Thus,resultsofthesetestsseemtoindicatethatpurify- ingselectionactsdirectlyonvariantsthatdonotchangeaminoaci dsequenceandthatsuchselectionismoreimpor-

tantforhighlyexpressedgenes.Itisimportanttonote,how- ever,thatthistestmaynotbestringentenoughincaseofthesynon ymouspolymorphism,asonanaverage,theselectednonsynony mousvariantslieincloserproximitytothesynon-

ymoussitesthantointrons.

Forthethirdanalysis,wedividedallconsideredgenesinto twogroupsaccordingtotheirexpression.The“high”groupcontai nedr v 10%ofgeneswiththehighestmRNAlevel(S.cerevisiae:

244geneshaving>22mRNAmoleculesper

A

0.25

0.00

−0.25

−0.50

−0.75

B1 . 2 5 1.00 0.75 0.50 0.25 0.00 1.25

S.cerevisiae S.pombe

3UTR5 U T R n s y n s y n 3UTR5 U T R n s y n s y n

3UTR 5UTR nsyn syn

cell;S.pombe:515geneswith> 12mRNAmoleculespercell).Th eremaininggenesformedthe“ lower”groups.Weusedv2teststo comparethefrequencyofsingletonswithinvariantsfoundinthes egroupsofgenes.Theresultsofthesecomparisonsaresummari zedinfigure3B.Clearly,SNPslocal-

izedwithinsequencesofgenesfromthe“ high”groupare

1.00 0.75 0.50 0.25 0.00

p=7.88e−04

222521 2 9 8

314001 9 3 0

p=8.51e−06

164651 0 6 8

229551 6 0 6

p=1.91e−05

198301 0 1 5

331871 8 5 5

p=0.292

350823 2 0 1

474444 3 7 3

morelikelytobesingletons.Thistrendholdsforallfunctionalregio nsofgenes,however,thedifferencesarenotalwayssignificanta ndareextremelysmallincaseofthesynonymoussites.Moreover ,thefrequenciesofsingletonscalculatedforthesynonymousvari antspresentinallanalyzedgenesdonotdifferfromthefrequenci esobtainedforintrons(fig.2).Tosummarize,theresultsofthethre edifferenttestspointtothehypothesisthatbothuntranslatedregi onsofgenesareunderweakpurifyingandexpression-

dependentselection,distinctfromtheselectiononproteinproduc ts.Onthecontrary,wefoundlittle(forS.cerevisiae)orno(incaseo fS.pombe)evidenceforthepurifyingselectionactingdirectlyont hesyn-

onymoussites.Evenifsuchselectiondoescontributetothedropo fpolymorphisminhighlyexpressedgenes,itseffectisratherwea kandrestrictedtoveryabundanttranscripts.

Finally,weaskedwhetherthenegativecorrelationbe- tweenexpressionlevelandpolymorphismdensitycanbe,atleast partially,causedbyunequalmutationrates.Westartedwithintro ns,aswereasonedthatvariantsinintronsshouldnotbeinfluence dbythepurifyingselection.Wefoundthatintronsofhighlyexpress edgenesinSaccharomycescerevisiaemayhaveanelevatednu mberofvariantsperonenucleotide(rho¼0.19,P ¼0.015,n ¼16 2)

(fig.1).However,onlya smallportionofgenesinthisspecieshasi ntrons(162inourdataset).Moreover,thesegenesaregenerally highlyexpressedandmaynotappropriatelyrepresentallgenesp re-

sentinthisspecies(Skellyetal.2009).Ananalogousanalysisinv olvingalmost2,500S.pombegenesprovidedcontradic- toryresults.Inthisspecies,theexpressionlevelofthegenedid

lowerhigh lowerhigh lowerhigh lowerhigh expression( m R N A )

Fig.3.—(A)DistributionoftheSpearman’srankcorrelationcoeffi- cients(mRNAlevelvsvariantdensity)obtainedinthesubsamplinganalyses(

mean6 95ci).Greenlineshowsvalueofthecorrelationcoefficientobtainedf orallintrons;

(B)Fractionsofsingletonswithinvariantspresentinhighlyandlessextensivel yexpressedgenes(seeResultsforthemoredetaileddescriptionoftheusede xpressioncategories).Singletons—opa-quecolors,morefrequentvariants

transientcolors.Numbersofvariantswithineachgrouparegiveninsidebars.

Greenlineshowsfractionofsingletonsfoundinintrons,yellowlineindicatesfr actionofsingletonsexpectedforidealWright–

Fisherpopulation.Significanceofv2te stscom-

paringfractionsofsingletonswithinvariantsfoundingenesbelongingtothet woexpressioncategoriesareshownabovethebars.

notinfluencethenumberofvariantspresentintheintronicseq uences(rho¼-0.033,P¼0.097,n¼2,496)(fig.1).

Astheaboveresultsmightappearinsufficientlyconclusive,w eturnedtostudiesonmutationaccumulationexperiments.Mutat ionsfoundinthesekindofstudiesarelikelytorepresentthemutati onalspectrumlargelyunaffectedbyselection.Weusedsimulati onstotestavailableempiricaldataforthepres-

enceofmutationalbias.Simulationsrepresentedthesitua- tionsinwhichthechanceofoccurringofnewmutationwithinanyg enedependedonboththegenelengthanditsGCcontent(fig.4,u pperpanel:simulated_data1)oronthegenelengthonly(fig.4,lo werpanel:simulated_data2).Then,wecomparedthesimulated distributionofpossibleresultswiththeEMAstatisticcalculatedforth

(8)

Causesof the expression-polymorphismanticorrelationinyeastpopulations

GBE

2991

GenomeBiol.Evol.10(11):2986–

2996d o i : 1 0 . 1 0 9 3 / g b e / e v y 2 2 5 A d v a n c e AccesspublicationOctober13,2018 eactualdata(inbrief:EMArepresentstheaverageexpressionof

genesthatwere

(9)

Marek andTomala

GBE

D o w nl oa de df ro m htt ps :// ac ad e mi c. ou p. co m/ gb e/ ar tic le- ab str ac t/1 0/ 11 /2 98 6/ 51 29 08 5b yg ue st on 22 N ov si

m ul at ed_d at

si m ul ated _d at

count

750 500 250 0

750 500 250 0

S.cerevisiae S.pombe

0.7 0.8 0.9 1.0 1.1 1.2 0.7 0.8 0.9 1.0 1.1 1.2 ln(mRNA)

Fig.4.—

Comparisonbetweentheexpressionofgenesmutatedinmutationaccumulationexperiments(EMAstatistics,redarrows)andvaluesobtainedforsimulateddata;s imulated_data1:probabilityofmutationwithina geneproportionaltoitslengthandhigherforG andC nucleotides,simulated_-

data2:probabilityofmutationproportionaltogenelengthandequalforallnucleotides(SeeMaterialsandMethodsfordetails).

foundmutatedinthemutationaccumulationexperiment;seeMa terialsandMethodsforfulldescription).Wefoundthatin

S.cerevisiaemutationsarenotlessbutmorelikelytooccurin highlytranscribedgenes(EMA¼1.1916,P¼0.0006;fig.4).Inthiss pecies,expressionislikelytobemutagenicitselfastheobserved effectdidnotdependonthegene’sGCcontent(fig.4).However,i nS.pombe,theactualmutationspectrumappearedshiftedtowar dgenesoflowexpression(EMA¼0.801,P ¼0.036),butnotwhen neglectingtheGCcontentofthegene(P¼0.094,fig.4).Thus,tran scriptiondoesnotinfluencethemutationrateinthisyeastspecies ,althoughthehighlyexpressedgenesmayhaveslightlylowermut ationratesduetotheirnucleotidecomposition.Astheresultsofth emutationaccumulation,dataanalysesdifferbe-

tweenthetwoconsideredyeastspecies,weconfirmedourconcl usionsbybuildingseveralPoissonregressionmodels(supplem entaryfileS2,Supplementary Materialonline).Takentogether, boththeanalysisofpolymorphisminintronsofextantpopulations andthetestofdistributionofmutationsinaccumulationlinesprovi deresultsconsistentwitheachother.Theyindicatethatmutation alprocessitselfisunlikelytoproduce(S.pombe)orevenactsagai nst(S.cerevisiae)theobserveddeclineinthepolymorphismleve lassociatedwithgeneexpression.

Discussio n

Studiesinvolvingcomparisonsbetweencloseanddistanttaxas howedthattherateofmolecularevolutionofa proteindependso ntheintensityofitsexpressionwiththeabundant

proteinsbeingmoreconserved.Weaskedwhetherthepre- dictednegativerelationshipbetweenthelevelofgeneexpres- sionanditsgeneticpolymorphismcanbeobservedalsoincurrent populationsofasinglespecies.WechoseS.cerevisiaeandS.po mbeduetothefactthatpopulationsofthesetwoyeastspeciesarel argeenough,Nerv107(Skellyetal.2009;Farlowetal.2015),toen ableaccumulationofa sufficientnumberofmutationssothatthei rdistributionwithina ge-

nomecouldbeusedtodetecttheworkofanevenrelativelyweaks election.Indeed,weobservedlowervariantdensitiesinregionsc odingforhighlyexpressedgenes.Relevantcorrela-

tioncoefficientsturnedouttoroughlymatchthosereportedforthe betweenspeciescomparisons(ZhangandYang2015).

Webelievethattheobservedpatternofmutationaccumu- lationcouldnotresultfromthedifferentrateofmutationatthecom paredsites.Firstly,wefocusedonisolatesfromextantpopulation sandfoundnocorrelation(S.pombe),orpossiblyapositivecorrel ation(S.cerevisiae),betweentheexpressionlevelandvariantde nsitywithinintrons.Thisresultsuggeststhatthefrequentlytransc ribedgeneshavea mutationratethatisequalorevenhigherthant hosetranscribedrarely.Wethenanalyzedthedistributionofnew mutationsaccumulatedinexperimentalpopulationsandfoundfu rthersupportforourclaim.Theprobabilityofbeingmutatedwashi gherforgeneswithhighmRNAlevelinS.cerevisiae.InS.pombe, wedetectedaweaktrendintheoppositedirectionwhichprob- ablyresultedfromabiasintheGCcontent.Weareawarethatour analysesofmutationaccumulationexperimentsareindirectinth esensethatweusedsimulateddatatoformthenulldistribution.Int hissimulateddataset,ageneunderwent

(10)

Causesof the expression-polymorphismanticorrelationinyeastpopulations

GBE

GenomeBiol.Evol.10(11):2986–

2996d o i : 1 0 . 1 0 9 3 / g b e / e v y 2 2 5 A d v a n c e AccesspublicationOctober13,2018 2993

D o w nl oa de df ro m htt ps :// ac ad e mi c. ou p. co m/ gb e/ ar tic le- ab str ac t/1 0/ 11 /2 98 6/ 51 29 08 5b yg ue st on 22 N ov amutationattherateproportionaltoitslengthandGCcon-

tent,neglectinga possibleroleofgenelocation,chromatinstatus ,orotherfactors.Whatisimportant,weadditionallyconfirmedthe seconclusionsusinggenerallinearmodelling.Onecouldalsoarg uethatalthoughintronsaregenerallynotexposedtoselection,th eymay,nevertheless,mutateataratedifferentthantheydoinexo nsduetothefactthatdifferentrepairsystemsarethoughttooperat ealongthecodingandnoncodingsequences(Frigolaetal.2017).

Regardlessoftheabove,wearereassuredbythefactthatouranal ysesinvolv-

ingtwocompletelydifferentapproachesgaveconsistentresults.

Anoverallpattern,a roughlyequalorevenhigherpolymorphism withinintronsofstronglyexpressedgenesappearsrobustandth ereforethereductionofpolymorphismknownforhighlyexpresse dexonsisunlikelytoresultfromthedifferencesinmutationrate.

Somepreviousresultsappeartoaccordwithourfindingthat polymorphismisnotlowerbuthigherinintronsofhighlyexpres sedgenesinSaccharomycescerevisiae.Onestudyusedmutati on-

accumulationlineswithinaDNArepairdefective(ung1D)stra inandfoundanelevatedmutationrateininten-

sivelytranscribedgenes(Parketal.2012).Datafromthemu- tationaccumulationexperimentscarriedoutwithawild-type S.cerevisiaestrainandreanalyzedhere(Zhuetal.2014)hadbee nalreadyexaminedforthedependencebetweenexpres- sionandmutationprobability(ChenandZhang2014;Zhuetal.20 14).Whileauthorsoftheoriginalstudyfoundnosuchrelation,con clusionssimilartoourswerereachedbyChenandZhang(2014).

Thus,ouranalysesprovidedadditionalev-

idencethatintensivetranscriptionisassociatedwithelevatedmu tationinS.cerevisiae.Whatisimportant,weapplieddif-

ferentanalyticalmethodstothoseusedintheabovestud- ies.A n a n a logousa n a lysisb a sedo n b o t h m u t ationacc umulationd a t a s e t s p u blishedf orS . p o mbe,f a i ledt o dete ctanyrelationbetweenmutationrateandexpressionlevel.T h e s a m e c onclusionsw e rer e a chedi n o n e o f t h e original m utationa c c u m ulations t udiesc a r r i edo u t f o r thiss p eci es(BehringerandH a l l 2 0 15).T h us,w h i lei ti stemptingtoco ncludethatinS.cerevisiaethepurifyingse-

lectioninexonsisstrongenoughtorevertthemutationalbia s,inS.pombet ran scriptiond oesno tleadt o elevatedmutat ionp r obability.T r a nscriptiona f f ectsm u t ationr a t e s thro ught h e p r ocesseso ft r a n scriptiona s sociatedm u t a- genesisa n d t r anscription-

coupledD NAr e p air(HanawaltandSpivak2008;KimandJinks -Robertson2012).Itispos-

siblethatthefinaloutcomeoftheseantagonisticprocessesisn o t t hes a m e f o r a l l s p ecies.H o w ever,m o r e i n t ensiveworko n t h e e xperimentale s t imationo f m u t ationr a t e i n differe ntgenomeregionsisneededbeforeitsresultscanbeused inquantitativetestsofevolutionaryhypotheses.

Apartfromthecleardifferenceinthedistributionofpoly- morphismbetweenintronsandexons,thereareother,moredirec t,argumentsthatselectionoperatesmostefficientlyinhighlyexpr essedcodingregions.First,thenegativecorrelation

betweengenemRNAlevelandthedensityofpolymorphisminex onswasstrongerforthenonsynonymouspolymorphism.Itisrem arkablethatthisresultwasstatisticallysignificantgiventhatthede nsityofvariantsrecordedfornonsynony-

moussitesinbothspecieswasgenerallylow.Second,thefraction ofrarenonsynonymousvariantstendedtoincreasewithgeneex pression.Whichiswhy,newmutationsaremostlikelytobedeleter iousiftheychangethefinalproteinprod-

uctsofanintensivelytranscribedgene.Itisalsoworthnotingthatin bothspeciestheexpression–

divergencecorrelationsforessentialandnonessentialgenesha vesimilarstrength(S.cerevisiae:Fisher’sz¼-

1.42,P ¼0.16;S.pombe:Fisher’sz¼-0.71,P ¼0.48)

(supplementaryfileS2,SupplementaryMaterialonline).Thus,t hefunctionalimportanceofproteinsisunlikelytobeanimportantf actorunderlyingtheconsideredrelation.

Thenegativecorrelationbetweengeneexpressionleveland polymorphismdensityappearstobeconsiderablystron- gerforthesurveyedpopulationsofS.cerevisiae.Thisfindingmig htsuggestthatthepurifyingselectionismoreeffectiveinthisspeci es.However,geneexpression,oneoftheanalyzedfactors,may havebeenestimatedwithdifferentaccuracyinthetwospecies.T hedatasetforS.cerevisiaeispossiblymorereliableasbasedona vastnumberofindependentRNAex-

pressionstudies(Cs,ardietal.2015).Moreover,itisimagin- ablethatinthisspeciestheexpressiondatagatheredunderlabor atoryconditionsaremoreindicativeof“ natural”geneexpression .Regardingthepolymorphismdata,theeffectofselectioncanbe muchmoredifficulttodetectinthecaseofS.pombebecauseofth estrongerlinkagedisequilibriumreportedforthisspecies(Jeffare setal.2015).Therefore,al-

thoughthecorrelationcoefficientsrelatingthefractionofnonsyn onymoussitesandexpressionleveldiffersignificantlybetweenS .cerevisiaeandS.pombe(Fisher’sz ¼5.0179,P<10-

6)thisresultdoesnotnecessarilyreflectdifferencesintheselectio npressure.

Withregardstothesynonymouspolymorphism,itwasalsone gativelycorrelatedwiththemRNAlevelinbothyeastspe- cies.Thesecorrelations,however,weremuchweakerthatthose foundforthenonsynonymoussites.Moreover,resultsofthreedis tinctandindependenttests(seeResults)indicatedthatthesecor relationsweregeneratedmainlybytheback-

groundselection.Thisinterpretationisstraightforwardinthecas eofS.pombe,asallthreetestsfailedtodetectsignaturesofnegati veselectionactingdirectlyonthesynonymousvar-

iantsinthisspecies.TheresultsobtainedforSaccharomycescer evisiaearemorechallengingtoexplain.Bothtestsutilizinginform ationonvariantdensities(i.e.,subsamplinganalysisandpartialc orrelationcontrollingfornonsynonymouspoly-

morphismdensity)gaveresultsthatsupportedtheroleofthedirec tpurifyingselection.Ontheotherhand,thefractionofsingletonsw ithinthesynonymoussiteswasnotanyhigherthanfractionexpec tedforneutrallyevolvingsites.Thisfrac-

tionwaselevatedforthe10%ofgeneshavingthehighest

(11)

Marek andTomala

GBE

D o w nl oa de df ro m htt ps :// ac ad e mi c. ou p. co m/ gb e/ ar tic le- ab str ac t/1 0/ 11 /2 98 6/ 51 29 08 5b yg ue st on 22 N ov expression,however,theeffectwasnotsubstantial.Onepos-

sibleexplanationisthatsynonymousvariantsareeffectivelyneut ralalsoinS.cerevisiaepopulation(withtheexceptionofsiteswithi nthesmallnumberofveryintensivelytranscribedgenes)andthat twooutofthreeappliedtestsweretooper-

missive.Theotherexplanationisthatsynonymousvariantsmay beeitherverydeleteriousandrapidlyselectedagainstorneu- tral.Thepaucityofsynonymousvariantswithlow/moderateselec tioncoefficientsmayexplaintheneutral-

likeshapeofthefrequencyspectrum.Interestinglyenough,such dichoto-

mousf itnesseffectsofthesynonymousmutationswerede- scribedforDrosophilamelanogaster(Lawrieetal.2013).

Inlightoftheabovefindings,itisinterestingtonote,that inbothyeastspeciesstrongcodonbiasexists,wherethesequen cesofhighlyexpressedgenesareenrichedinthemajorcodons(K urland1991;Hiraokaetal.2009).Suchcodonpat-

ternisusuallyexplainedasaresultoftheselectionfortrans- lationefficiencyandaccuracy,whichshouldbemoresignificantf ormoreabundantmRNAsandproteins(Rocha2004;Plotkinand Kudla2011).Suchselectionshouldleadtoadjustmentsbetween abundanceoftRNAmoleculesandtheusageofappropriatecod ons(Tulleretal.2010).Toesti-

matethisadjustment,wecalculatedSpearman’srankcorre- lationscoefficientsrelatingthesetwoquantitiesforthecodingseq uencesofallanalyzedgenes—

rhotaistatistics(seeMaterialsandMethods).Asexpected,obtain edrhotaicoeffi-

cientsarepositivelycorrelatedwithgene’sexpression(S.cer- evisiae:r¼0.696,P<2.2-10-16,n¼2,432;S.pombe:

r¼0.653,P<2.2-10-16,n¼5,069;supplementaryfig.S1, Supplementary Materialonline).Later,wecheckediftheycorre latewiththedensityofthesynonymousvariants.Whilewefound negativecorrelationfortheS.cerevisiaedata(rho¼-

0.11,P¼6.30-10-

8,n¼2,432),whichturnedinsignificantaftercontrollingforthemR NAlevel(rho¼0.022,P¼0.27,n ¼2,432),norelationwasvisiblef orS.pombe(rho¼0.04,P¼0.06,n¼5,069).Thus,alsothisanaly sisfailstodetectclearsignalsofselectionrelatedtotRNAabunda ncewithinthesynonymoussites.Itshouldbenotedthat,muchstro ngercorrelationswerevisibleforthenonsynonymousSNPsforb othspecies(S.cerevisiae:rho¼-0.38,P<2.2-10-

16,n ¼2,432;S.pombe:rho¼-0.30,

P<2.2-10-16,n¼5,069).Partialcorrelationscontrollingfor geneexpressionwerealsohighlysignificantinthiscase(S.cerevi siae:rho¼-0.13,P¼4.38-10-11,n¼2,432;S.pombe:rho¼- 0.12,P ¼2.91-10-

18,n¼5,069).Thismayimplythatthenonsynonymouschangesh avegreaterimpactonourmeasureoftranslationoptimization.In deed,manyoftheco-

donpairsthatareseparatedbyonesynonymouschangearedec odedbythesametRNAspecies.Moreover,thecorrela-

tionsbetweengene’scodoncompositionandtRNAgenecopynu mbersmaybealsoaffectedbydifferencesinaminoacidusage.T akentogether,ouranalysesshowthatthecon-

tributionofthenegativeselectionactingdirectlyonsynony- moussitestotheinvestigatedexpression–divergence

(12)

Marek andTomala

GBE

2994 GenomeBiol.Evol.10(11):2986–

2996d o i : 1 0 . 1 0 9 3 / g b e / e v y 2 2 5 A d v a n c e AccesspublicationOctober13,2018 anticorrelationmaybeextremelysmallorevennonexistent.Ou

rresultsalsoimplythatinbothanalyzedpopulationssuchselect ionmaybeveryweak,withtheselectioncoefficientsaround1/N

eorless.Notably,evensuchaveryweakselectionmightbesuffic ienttogeneratethecodonusagebias(McVeanandCharleswor th1999;Charlesworth2009).

Weakcorrelationsbetweenpolymorphismdensityandex- pressionwerealsofoundforthe50-UTRand30-

UTRregions.Moreover,analysesperformedtodeciphertheim pactofthebackgroundanddirectselectionconfirmedtheattrib utionofthelatter.Therefore,itispossiblethatsomeadditionalse lec-

tionpressures,actingdirectlyontranscripts,contributetothede clineinpolymorphismobservedforthehighlyexpressedgenes .OnepossiblefactorcouldbethestabilityofmRNA(Parketal.20 13),whileanothertherateoftranslationinitia-

tionandelongation(Kudlaetal.2009;Shahetal.2013;Yangeta l.2014).Thus,inpopulationsofbothyeastspecies,theuntransla tedregionsofthemostabundanttranscriptswouldbeundermor eintensepurifyingselection.

OuranalysesofthegenomicdataofS.cerevisiaeandS.pom beshowthatthedifferencesinthepurifyingselectionstrengthu nderliethenegativecorrelationbetweengeneex-

pressionanditsevolutionrate.Notably,thesedifferencesarebi genoughtobeeasilynoticeableeveninpopulationsofasingles pecies.Inbothexaminedpopulations,theexpression- dependentnegativeselectionaffectsmainlyaminoacidalter- ingvariants.Althoughtomuchweakerextent,italsoactsdirectly onpolymorphismwithinuntranslatedregionsofhighlyexpress edgenes.Thisresultmayimplythattheevolu-

tionaryconstraintisrelatednotonlytobiophysical/functionalpro pertiesofproteinsbutalsotosomefeaturesoftranscriptsortran slationinitiation.Interestingly,wefoundthatback-

groundselectioncanalmostentirelyexplaintheobservedpat- ternofsynonymouspolymorphism.Thestrengthofthepurifying selectionactingdirectlyonsilentsitesseemstobejustontheedg eofwhatcanbedetectedinbothanalyzedpopulations.Finally,t herelationbetweenexpressionandmu-

tationratediffersbetweenS.pombeandS.cerevisiae,asstrong mutageniceffectoftranscriptionisvisibleonlyinthelatterspecie s.Nevertheless,transcriptionassociatedmutationbiasdoesn otsignificantlycontributetotheconsideredcorre-

lationinanyoftheanalyzedyeasts.

SupplementaryMateri al

Supplementary dataareavailableatGenomeBiologyandEvo lutiononline.

Acknowledgment s

TheauthorsthankRyszardKoronaforreadingandcomment- ingourmanuscript.ThisworkwassupportedbythePolishNatio nalScienceCentergrant2014/13/B/NZ8/04668toK.T.

(2014/13/B/NZ8/04668toK.T.).

(13)

Causesof the expression-polymorphismanticorrelationinyeastpopulations

GBE

D o w nl oa de df ro m htt ps :// ac ad e mi c. ou p. co m/ gb e/ ar tic le- ab str ac t/1 0/ 11 /2 98 6/ 51 29 08 5b yg ue st on 22 N ov

LiteratureCited

AkashiH.2001.Geneexpressionandmolecularevolution.CurrOpinGenet Dev.11(6):660–666.

BehringerMG,HallDW.2015.Genome-

wideestimatesofmutationratesandspectruminSchizosaccharomyce spombeindicateCpGsitesarehighlymutagenicdespitetheabsenceofD NAmethylation.G3(Bethesda)6:149–160.

CharlesworthB.2009.Fundamentalconceptsingenetics:effectivepop- ulationsizeandpatternsofmolecularevolutionandvariation.NatRevGe net.10(3):195–205.

ChenX,ZhangJ.2013.Nogene-specificoptimizationofmutationratein Escherichiacoli.MolBiolEvol.30(7):1559–1562.

ChenX,ZhangJ.2014.Yeastmutationaccumulationexperimentsupportsel evatedmutationratesathighlytranscribedsites.ProcNatlAcadSciUSA .111(39):E4062.

Cs,ardiG,FranksA,ChoiDS,AiroldiEM,DrummondDA.2015.

AccountingforexperimentalnoiserevealsthatmRNAlevels,amplifiedb ypost-transcriptionalprocesses,largelydeterminesteady-statepro- teinlevelsinyeast.PLoSGenet.11(5):e1005206.

CurranJF,YarusM.1989.Ratesofaminoacyl-

tRNAselectionat29sensecodonsinvivo.JMolBiol.209(1):65–77.

DiedenhofenB,MuschJ.2015.cocor:acomprehensivesolutionforthestatist icalcomparisonofcorrelations.PLoSOne10(4):e0121945.

DrummondDA,BloomJD,AdamiC,WilkeCO,ArnoldFH.2005.Why highlyexpressedproteinsevolveslowly.ProcNatlAcadSciUSA.102(40 ):14338–14343.

DrummondDA,RavalA,WilkeCO.2006.Asingledeterminantdominatesth erateofyeastproteinevolution.MolBiolEvol.23(2):327–

337.DrummondDA,WilkeCO.2008.Mistranslation- inducedproteinmisfold-

ingasa dominantconstraintoncoding- sequenceevolution.Cell134(2):341–352.

FarlowA,etal.2015.Thespontaneousmutationrateinthefissionyeast Schizosaccharomycespombe.Genetics201(2):737–744.

FrigolaJ,etal.2017.Reducedmutationrateinexonsduetodifferentialmisma tchrepair.NatGenet.49(12):1684–1692.

HanawaltPC,SpivakG.2008.Transcription-

coupledDNArepair:twodecadesofprogressandsurprises.NatRevMol CellBiol.9(12):958–970.

HiraokaY,KawamataK,HaraguchiT,ChikashigeY.2009.Codonusagebiasisc orrelatedwithgeneexpressionlevelsinthefissionyeastSchizosaccharomy cespombe.GenesCellsDevotedMolCell14(4):499–509.

HudsonRR.2002.Generatingsamplesundera Wright–

Fisherneutralmodelofgeneticvariation.Bioinformatics18(2):337–338.

JeffaresDC,etal.2015.Thegenomicandphenotypicdiversityof Schizosaccharomycespombe.NatGenet.47(3):235–241.

KimN,Jinks-

RobertsonS.2012.Transcriptionasasourceofgenomeinstability.NatRevG enet.13(3):204–214.

KimS.2015.ppcor:anR packageforafastcalculationtosemi-partial correlationcoefficients.CommunStatApplMethods22(6):665–

674.KimuraM,OhtaT.1974.Onsomeprinciplesgoverningmolecularevolu- tion.ProcNatlAcadSciU S A.71(7):2848–2852.

KooninEV.2011.Aretherelawsofgenomeevolution?PLoSComputBiol.

7(8):e1002173.

KudlaG,MurrayAW,TollerveyD,PlotkinJB.2009.Coding-

sequencedeterminantsofgeneexpressioninEscherichiacoli.Science 324(5924):255–258.

KurlandCG.1991.Codonbiasandgeneexpression.FEBSLett.

285(2):165–169.

LawrieDS,MesserPW,HershbergR,PetrovDA.2013.Strongpurifyingselec tionatsynonymoussitesinD.melanogaster.PLoSGenet.9(5):e100352 7.

LimVI,CurranJF.2001.Analysisofcodon:anticodoninteractionswithintheri bosomeprovidesnewinsightsintocodonreadingandthege- neticcodestructure.RNA7(7):942–957.

MargueratS,etal.2012.Quantitativeanalysisoffissionyeasttranscrip- tomesandproteomesinproliferatingandquiescentcells.Cell151(3):671 –683.

MartincorenaI,SeshasayeeASN,LuscombeNM.2012.Evidenceofnon- randommutationratessuggestsanevolutionaryriskmanagementstrat egy.Nature485(7396):95–98.

McVeanGAT,CharlesworthB.1999.Apopulationgeneticmodelfortheevolu tionofsynonymouscodonusage:patternsandpredictions.GenetRes.7 4(2):145–158.

MiuraF,etal.2006.Alarge-scalefull-

lengthcDNAanalysistoexplorethebuddingyeasttranscriptome.ProcN atlAcadSciUS A.103(47):17846–17851.

NagalakshmiU,etal.2008.Thetranscriptionallandscapeoftheyeastgeno medefinedbyRNAsequencing.Science320(5881):1344–1349.

NeiM,GojoboriT.1986.Simplemethodsforestimatingthenumbersofsynon ymousandnonsynonymousnucleotidesubstitutions.MolBiolEvol.3:41 8–426.

P,alC,PappB,HurstLD.2001.Highlyexpressedgenesinyeastevolve slowly.Genetics158(2):927–931.

ParkC,ChenX,YangJ-

R,ZhangJ.2013.DifferentialrequirementsformRNAfoldingpartiallyex plainwhyhighlyexpressedproteinsevolveslowly.ProcNatlAcadSciUS A.110(8):E678–E686.

ParkC,QianW,ZhangJ.2012.Genomicevidenceforelevatedmutationrate sinhighlyexpressedgenes.EMBORep.13(12):1123–1129.

PlotkinJB,KudlaG.2011.Synonymousbutnotthesame:thecausesandcon sequencesofcodonbias.NatRevGenet.12(1):32–42.

RCoreTeam.2015.R:alanguageandenvironmentforstatisticalcom- puting.Vienna(Austria):RFoundationforStatisticalComputing.Availabl efrom:http://www.R-project.org/.

RochaEPC.2004.CodonusagebiasfromtRNA’spointofview:redun- dancy,specialization,andefficientdecodingfortranslationoptimiza- tion.GenomeRes.14(11):2279–2286.

RochaEPC,DanchinA.2004.Ananalysisofdeterminantsofaminoacidssub stitutionratesinbacterialproteins.MolBiolEvol.21(1):108–116.

ShahP,DingY,NiemczykM,KudlaG,PlotkinJB.2013.Rate- limitingstepsinyeastproteintranslation.Cell153(7):1589–1601.

SkellyDA,etal.2013.Integrativephenomicsrevealsinsightintothestruc- tureofphenotypicdiversityinbuddingyeast.GenomeRes.23(9):1496–

1504.

SkellyDA,RonaldJ,ConnellyCF,AkeyJM.2009.Populationgenomicsofintr onsplicingin38Saccharomycescerevisiaegenomesequences.Geno meBiolEvol.1:466–478.

TullerT,WaldmanYY,KupiecM,RuppinE.2010.Translationefficiencyisdet erminedbybothcodonbiasandfoldingenergy.ProcNatlAcadSciUSA.

107(8):3645–3650.

VavouriT,SempleJI,Garcia-

VerdugoR,LehnerB.2009.Intrinsicproteindisorderandinteractionpro miscuityarewidelyassociatedwithdosagesensitivity.Cell138(1):198–

208.

WallDP,etal.2005.Functionalgenomicanalysisoftheratesofproteinevo lution.ProcNatlAcadSciU SA.102(15):5483–5488.

WeinbergDE,etal.2016.Improvedribosome-footprintandmRNAmeas- urementsprovideinsightsintodynamicsandregulationofyeasttrans -lation.CellRep.14(7):1787–1799.

XuZ,etal.2009.Bidirectionalpromotersgeneratepervasivetranscriptioniny east.Nature457(7232):1033–1037.

YangJ-R,ChenX,ZhangJ.2014.Codon-by-codonmodulationoftrans- lationalspeedandaccuracyviamRNAfolding.PLoSBiol.12(7):e100191 0.

(14)

Marek andTomala

GBE

2996 GenomeBiol.Evol.10(11):2986–

2996d o i : 1 0 . 1 0 9 3 / g b e / e v y 2 2 5 A d v a n c e AccesspublicationOctober13,2018

D o w nl oa de df ro m htt ps :// ac ad e mi c. ou p. co m/ gb e/ ar tic le- ab str ac t/1 0/ 11 /2 98 6/ 51 29 08 5b yg ue st on 22 N ov YangJ-R,LiaoB-Y,ZhuangS-

M,ZhangJ.2012.Proteinmisinteractionavoidancecauseshighlyexpre ssedproteinstoevolveslowly.ProcNatlAcadSciUSA.109(14):E831–

E840.

YangJ-R,ZhuangS-M,ZhangJ.2010.Impactoftranslationalerror- inducedanderror-

freemisfoldingontherateofproteinevolution.MolSystBiol.6:421.

YassourM,etal.2009.Abinitioconstructionofaeukaryotictranscrip- tomebymassivelyparallelmRNAsequencing.ProcNatlAcadSciUSA.1 06(9):3264–3269.

ZhangJ,YangJ-

R.2015.Determinantsoftherateofproteinsequenceevolution.NatRev Genet.16(7):409–420.

ZhangL,SchroederS,FongN,BentleyDL.2005.Alterednucleosomeoccup ancyandhistoneH3K4methylationinresponseto‘transcrip-

tionalstress’.EMBOJ.24(13):2379–2390.

ZhouT,GuW,WilkeCO.2010.Detectingpositiveandpurifyingselectionatsy nonymoussitesinyeastandworm.MolBiolEvol.27(8):1912–1922.

ZhuYO,SiegalML,HallDW,PetrovDA.2014.Preciseestimatesofmu- tationrateandspectruminyeast.ProcNatlAcadSciUSA.111(22):E231 0–E2318.

Associateeditor:LaurenceHurst

Cytaty

Powiązane dokumenty

Herein, we report a case of functioning OZ-PGL without the HLRCC features that carry the FH gene mutation.. A 33-year-old woman presented in 2009 with severe hypertension

Distribution of genotype and allele frequency of SERT-P polymorphism in the constipation patients and controls.. the disease and the L/S heterozygous polymorphism similarly

Parameters No. mRNA relative expression level and inhibition ratio. The U251 cell line was transfected with siRNAto target KLK6 or with control siRNA. The result showed

The aim of this study is to estimate the association of –137 G/C IL-18 gene polymorphism, –1112 C/T IL-13 gene polymorphism, FLG R501X and 2282del4 gene muta- tions with AD risk

However, when the jump tests of soccer player and control groups were assessed according to ACTN3 R577X gene polymorphism distribution, it was found that RR-gen- otyped

We only observed a  change in the composition of gallstones between the analyzed groups which concerned the particularly low level of total cholesterol and increase in

Celem tego badania była ocena częstości występo- wania genotypów i alleli innych znanych polimorfizmów genu MDR1 – T129C i G2677T w grupie pacjentek popu- lacji kaukaskiej leczonych

The aim of the study was to assess the association between 331A/T polymorphism in the SHH gene and 385G/A polymorphism in the SMO gene and the development of basal cell