ERROR LOADING HTML FROM SOURCE (http://ncf.sobek.ufl.edu//design/skins/UFDC/html/header_item.html)

DIAGNOSING MENTAL ILLNESS USING MACHINE LEARNING

Permanent Link: http://ncf.sobek.ufl.edu/NCFE004851/00001

Material Information

Title: DIAGNOSING MENTAL ILLNESS USING MACHINE LEARNING
Physical Description: Book
Language: English
Creator: Rogers, Jack
Publisher: New College of Florida
Place of Publication: Sarasota, Fla.
Creation Date: 2013
Publication Date: 2013

Subjects

Subjects / Keywords: Machine Learning
Neuroinformatics
Python
Genre: bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: Machine learning algorithms are often used on large biological data sets for the purposes of identifying biomarkers associated to disease which can be used as a diagnostic or prognostic tool. In this thesis, we demonstrate the classification accuracy of eight machine learning algorithms demonstrated on multiple schizophrenia and bipolar disorder related data sets. These algorithms include support vector machines, naive Bayes classifiers, and other clustering and regression techniques. All software used in the classification is open source to elucidate the potential of accessible and robust data mining software.
Statement of Responsibility: by Jack Rogers
Thesis: Thesis (B.A.) -- New College of Florida, 2013
Supplements: Accompanying materials: CD
Electronic Access: RESTRICTED TO NCF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE
Bibliography: Includes bibliographical references.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The New College of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Local: Faculty Sponsor: McDonald, Patrick

Record Information

Source Institution: New College of Florida
Holding Location: New College of Florida
Rights Management: Applicable rights reserved.
Classification: local - S.T. 2013 R7
System ID: NCFE004851:00001

Permanent Link: http://ncf.sobek.ufl.edu/NCFE004851/00001

Material Information

Title: DIAGNOSING MENTAL ILLNESS USING MACHINE LEARNING
Physical Description: Book
Language: English
Creator: Rogers, Jack
Publisher: New College of Florida
Place of Publication: Sarasota, Fla.
Creation Date: 2013
Publication Date: 2013

Subjects

Subjects / Keywords: Machine Learning
Neuroinformatics
Python
Genre: bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: Machine learning algorithms are often used on large biological data sets for the purposes of identifying biomarkers associated to disease which can be used as a diagnostic or prognostic tool. In this thesis, we demonstrate the classification accuracy of eight machine learning algorithms demonstrated on multiple schizophrenia and bipolar disorder related data sets. These algorithms include support vector machines, naive Bayes classifiers, and other clustering and regression techniques. All software used in the classification is open source to elucidate the potential of accessible and robust data mining software.
Statement of Responsibility: by Jack Rogers
Thesis: Thesis (B.A.) -- New College of Florida, 2013
Supplements: Accompanying materials: CD
Electronic Access: RESTRICTED TO NCF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE
Bibliography: Includes bibliographical references.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The New College of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Local: Faculty Sponsor: McDonald, Patrick

Record Information

Source Institution: New College of Florida
Holding Location: New College of Florida
Rights Management: Applicable rights reserved.
Classification: local - S.T. 2013 R7
System ID: NCFE004851:00001


This item is only available as the following downloads:


Full Text

PAGE 1

DIAGNOSINGMENTALILLNESSUSING MACHINELEARNING BY JACKDENISONROGERS AThesis SubmittedtotheDivisionofNaturalSciences NewCollegeofFlorida Inpartialfulllmentoftherequirementsforthedegree BachelorofArts UnderthesponsorshipofDr.PatrickMcDonald Sarasota,Florida Spring2013

PAGE 2

DIAGNOSINGMENTALILLNESSUSINGMACHINE LEARNING JackDenisonRogers NewCollegeofFlorida,2013 Abstract Machinelearningalgorithmsareoftenusedonlargebiological datasetsforthepurposesofidentifyingbiomarkersassociatedto diseasewhichcanbeusedasadiagnosticorprognostictool.Inthis thesis,wedemonstratetheclassicationecacyofeightmachine learningalgorithmsdemonstratedonmultipleschizophreniaand bipolardisorderrelateddatasets.Thesealgorithmsincludesupportvectormachines,naiveBayesclassiers,andotherclustering andregressiontechniques.Allsoftwareusedintheclassication isopensourcetoelucidatethepotentialofaccessibleandrobust dataminingsoftware. Dr.PatrickMcDonald DivisionofNaturalSciences ii

PAGE 3

Acknowledgments ThankstoDavidWydeforacceleratingmyLinuxlearningprocess.To TannerRobartfortakingthesameclasseswithmeandsupportingmethrough myNewCollegecareer.ToChristopherHartforintroducingmetodatascience andbiology.AndspecialthankstoPatrickMcDonald,fortakingmeonasa thesisstudentwhenbothmypreviousadvisershaddeparted,forsupporting methroughtumultuousthirdandfourthyear,andformakingmeappreciate andlovemathematicsandalgorithmsinawayIhadneverdreamedpossible. iii

PAGE 4

Contents 1Introduction1 2Background4 3Methods12 4Results15 5DiscussionandFutureDirection19 6Appendix22 iv

PAGE 5

1Introduction Data,asetoffactsandstatisticsrecordedforreferenceoranalysis,is acommoditywhichisbeinggeneratedinincreasingamounts.WhileGregor Mendel'sdatawassmallenoughtoberecordedinanotebook,modernscientic dataisstoredelectronicallyinhugematricesandspreadsheets,aswellasa varietyofotherdatabasestructures.Aconsequenceofthesizeandcomplexity ofthedatabeinggeneratedisthattheprocessofanalysisbyhandbecomes nolongerpossible,requiringtheassistanceofcomputerbasedcomputational andstatisticaltools. Visualprojectionofdataandeasyaccesstoproperlycomputedstatistics areexcellentresourcesforscientiststotesthypothesesanddrawnovelconclusionsfromthedata,butthisrequiresahighlevelor'educated'observerfor interpretation,decidingwhichinformationismeaningfulandrelevanttothe dependentvariablebeingstudied. Machinelearning isatermtodescribethe softwareimplementationsofstatisticallearningalgorithmsforthepurposeof analyzingadataset.OriginallydenedbyArthurSamuelin1959,whocalled itaFieldofstudythatgivescomputerstheabilitytolearnwithoutbeing explicitlyprogrammed[17],machinelearningisusedtotackle bigdata problemssuchasthosefoundinphysics,chemistry,andbiology.Weareinterested inmachinelearninginthecontextofmedicalimaging. Inpractice,machinelearningiscarriedoutviatheconstructionofsoftware objectswhichareinturnimplementationsofmachinelearningalgorithms. Thereareavarietyofmachinelearningalgorithms,butallsuchalgorithms canbethoughtofasbelongingtotwoclasses:thosethatlearnby supervision andthosethatlearn unsupervised .Ourinterestwillberestrictedtoimplementationsofsupervisedmachinelearningalgorithms. 1

PAGE 6

Thereareaneverincreasingnumberofsupervisedmachinelearningalgorithms,butallshareacommondeningfeature.Beforethealgorithmcan beusedtoclassifydata,itmustbetrainedondatawhichhasalreadybeen classied.The classier ,thesoftwareimplementationofaclassifyingmachine learningalgorithm,'learns'or'ts'tothedata,meaningitcreatesanapproximatingfunctionordecisionstructurewhichmodelstherelationbetween independentanddependentvariablesinthedata.Once't'or'trained'on asetofobservations,theclassiercanbe'tested'onaseparatesetofdata, wheretheclassiertriestopredictthedependentvariableofapreviouslyunseenobservationusingitsindependentvariables.Thisprocessof'training'and 'testing'isoftendonebyseparatingasinglesetofdataintosubsetsdesignated foreachpurpose.Thismeanstheaccuracyoftheclassierspredictionsonthe testingsubsetareviewedinlightoftheschemaorfunctiontheyproduced afterttingtothetrainingsubset. Weperformcomputationalexperimentsinwhichwetesttheprediction powerofmachinelearningclassiersinclassifyingsubjects'mentaldisorders usingMRIscansoftheirbraininvariousimagingmodalities.Thescansare comprisedofschizophrenia,bipolardisorder,andhealthycontrolsubjects,diagnosedusingtheDSM-IV[13]criteriabytheMindResearchNetwork[15].We traintheclassiersthemselves,theclassiersonbothrawfeaturesfromthe dataaswellassmallersetsofdatatransformedbyvariousdimensionalscaling algorithmsontheoriginalfeaturespace.Opensourcesoftwareisemployed fortheexperimentsanditseectivenessisdemonstrated.Theexperiment compareseightdistinctmachinelearningclassiersandthreedimensionalreductiontechniquesonneurologicaldatainavarietyofmodalitiesandcombinationsofmodalities.Theeightclassiersdotheirpredictionsontheraw data,butalsotesttheiraccuracyondatathathasbeenpreprocessedusing 2

PAGE 7

threedatatransformationalgorithmswhichreducedimensionality. Thestructureofthisthesisisasfollows.Backgroundinformationregarding medicalimagingandrelevantmodalitieswillbediscussed.Themethodswill thenpresentinformationaboutthedatasetsanalyzedfollowedbyanoverview oftheprocessofconductingourexperiment.Afterasummaryoftheresults, therewillbeadiscussionofpossibleimprovementsanddirectionsforfuture workfollowedbyanappendixofinformationonthecompositionofthedata andthecompleteclassicationresultsforeachexperiment.Allprogramming codeusedinthisprojectwillbeincludedinadigitalformattachedtothis thesis. 3

PAGE 8

2Background MedicalImaging Medicalimagingisapracticeofusingvarioustoolstoexaminethecompositionoflivinghumantissue.AncommonexampleofthisistheuseofaX-Ray machinetodeterminethepresenceofafractureinasubjectsbone.Forneurologicalscans,magneticresonanceimagingMRIisthemostcommonmethod usedtodayandprovideshighresolutionthreedimensionalimagesofthetissuebeingscanned.Additionally,manymodernMRImachinesemployvarious imagingmodalitiesthatallowthemtofocusonspecictissuetypesusingthe uniquemagneticpropertiesofeach.Thisallowsforthingssuchasbloodow, waterdiusion,andwhiteandgreymattersegmentationtobeidentiedand focusedonusingaspecicmodalitybestsuitedtoidentifyit. FunctionalMRIfMRI[14]isascanningmethodwhichmeasuresbrain activitybydetectingchangesinbloodoxygenationandde-oxygenation.It takesadvantageofthechangesinthemagneticpropertiesofbloodcellswhen de-oxygenated.De-oxygenationofbloodimpliesmetabolismbyneuronsand othernearbybraincells,usuallyassociatedwiththeringofneuronsandthus brainactivity.ThefMRIdatainthisexperimentwascollectedduringan auditoryexperiment,monitoringtheirbrainsreactionandprocessingofthe stimuli.FractionalanisotropyFA[15]isanimagingmodalitywhichusesthe diusionofwaterthebraintoidentifyareasinwhichtheowofmoleculesis freeornarrowed,usingtheprotonsofthewatermoleculestomonitordirectionalityofow.Thisallowsaxons,elongatedtubesofcelltissuethatconnect neuronsinthebrain,tobeidentiedandmapped,AssumingtheBrownian motionofwatermoleculesisconstrictedinsideoftheaxon,eachdatapoint 4

PAGE 9

isthesquarerootofthesumofsquaresofthedierencesindiusity,dividedbythesumofsquaresofthediusivities,inthatparticularvoxelofthe brain.[19]AmplitudeoflowfrequencyuctuationsALFF[17]isamodality whichusesthesameimagingasfMRI,butisdoneinarestingstatewhere nostimuliortasksarerequiredandthesubjecttriestorelax.Astructural MRIofgreymatterdensityGM[20]wasalsoprovided.Insummary,we havetwofunctionalmodalities,fMRIandALFF,collectedoveratimeseries butaveragedintoacontrastmapofchangesinanareaoverthecourseofan experimentsuchthattheyhavethesamedimensionalityasastructuralscan, andtwostructuralmodalities,FAandGM. DimensionalityReduction Whilesomeoftheclassierswewillbeusingareknowntoworkwellwith high-dimensionaldata,wewouldliketotesttheabilitiesofthealgorithms onnotonlytherawhighdimensionaldata,butalsoonlowerdimensional representationsoftheoriginaldatawhichmaymaketheclassdierencesmore apparentandthusmoreeasilyclassiable.Dimensionalscalingalgorithmscan workinavarietyofways,supervisedorunsupervised,butforourpurposes wecansimplythinkofthemasalgorithmsthattaketherawdataasininput andreturnadatasetwithlessfeaturesforeachsubject. PrincipalcomponentanalysisPCAisaprocedurethatconvertsthefeaturesofadatasetintoasetoflinearuncorrelatedvaluesreferredtoasprincipal components.Inthelanguageoflinearalgebra,aprincipalcomponentdecompositionisequivalenttoasingularvaluedecomposition.[9]PCAreturnsa rstcomponentthatalwaysaccountsforthelargestpossiblevarianceinthe data,andeachadditionalcomponenthasthehighestvariancepossiblewhile remainingorthogonaltothepreviouscomponent.WeusedPCAtoreducedi5

PAGE 10

mensionalitybyassociatingtoeachsubjectthersttenprincipalcomponents. IndependentcomponentanalysisICAisaproceduresimilartoPCA.Insteadofaccountingforvariance,ICAconvertsthefeaturesofadatasetintoa setofcomponentswhicharemaximallystatisticallyindependent,wheremaximalstatisticalindependenceisdenedbyminimizationofmutualinformation. Inpractice,ICAattemptstocreatecomponentswhichidentifysourcesthat mayspanacrossmanyfeatureswithoverlap.[ ? ]Forexample,ifonehadan audiolewhichhadtwopeopletalkingatthesametime,onemightuseICA toidentifyandseparatethetwospeakersasseparatecomponents. LineardiscriminantanalysisLDAisamethodwhichcreatescomponents oflinearcombinationsoffeaturestoseparatebetweencategoricalvariablesin asetofobservations.WhileotherwisesimilartoPCAandICA,whichdo nottakeintoaccountthedependentvariables,LDAisasupervisedalgorithm whichtakesclassdistinctionsintoaccountwhentransformingadataset.Inadditiontobeingusedfordimensionalscaling,LDAcanalsobeusedasasupervisedlinearclassier,butLDAwillnotbeusedforsuchinthisexperiment.[11] ReducedversionsofeachdatasetwereconstructedusingPrincipalComponentAnalysisPCA,IndependentComponentAnalysisICA,andLinear DiscriminantAnalysisLDA.PCAandICAareunsupervisedandreturnan inputtednumberofcomponentsforeachsubjectwithoutreferringtothetargetclassesthesubjectscomefrom.LDAontheotherhandissupervisedand constructsitsreducedsetofcomponentsinlightofthat.Thismeansthereis nowanoriginalversionandanLDA,PCA,ICA,copyofeachoriginaldataset, eachofwhichisthensplitintoitsownclass,modalitycombination,andpreprocessingmethodsoallarraysrepresentingdierentdataarekeptseparate fromoneanother. Randomsamplingfunctionswerecreatedtotakeeithermultipledisease 6

PAGE 11

groupsinanymodalityorcombinationofmodalitiesandconstructanew samplecomposedofarandomsubsetofeachgroup.Thefunctiontakesa parameterthatsetsthesizeofsamplebyspecifyingwhatpercentageofeach grouptodrawfrom,i.e.takearandom20%oftheschizophreniasubjects' fMRIdataandarandom20%ofcontrolsubjects'fMRIdatatoconstructa trainingset.Thisistopreventarandomsamplefrombeingentirelycomposed ofsubjectsfromasingleclass.Theremainderofeachgroup'sdataisthen usedtoconstructatestingset.Therandomsamplingfunctionsweretedious tocodebecausetheyinvolvedtakingsubjectsfromeachclassgroup'sarray andverticallyconcatenatingthemandcreatinganewassociatedtargetarray. Selectingsubjectsintonewarraysandconcatenatingthemforeachrandom sampledenitelyhadasignicantimpactonruntime. Classiers Asmentionedabove,wetraineightmachinelearningclassiersoverthe courseofourstudy.Inthissection,wegiveabriefintroductiontoeach machinelearningclassierwithwhichwework. SupportvectormachinesSVMareasupervisedmachinelearningmethod thatattemptstondthemaximalseparatinghyperplanebetweencategorical dependentvariablesinafeaturespace.Thisfeaturespaceissubjecttothe curseofdimensionality,whereruntimesandsystemresourcesareconsumed proportionaltothesizeanddimensionsofthedata,butsupportvectormachinesexcelindealingwiththiscommonmachinelearningproblem.TheSVM createsatransformedfeaturespacedenedintermsofamappingfromthe originalfeaturespaceusing kernelfunction. Theresultisaspacethatismuch higherdimensional,butcomputationsareinprinciplesimplerasgoodlinear 7

PAGE 12

Figure1: Exampleofhowsupportvectormachinescreateamaximumseperatinghyperplanebyprojectingtheoriginalfeaturespaceintoahigherdimensionalspace.Thepoints onthedottedlinesarecalledsupportvectors. separatinghyperplanesareeasiertond.[12] Anaddedbenetofsupportvectormachinesisthatthekernelfunctionis somewhatmodularinthatitcanbeswappedoutwithavarietyoffunctions totailortheclassiertothegivenproblem.Twokernelfunctionswereselected fortesting:alinearkernelandaradialbasisfunctionkernel. K-nearestneighborskNNisasupervisedclassicationalgorithmthat comparesexamplesbasedontheirdistancefromoneotherinthefeaturespace. Thisisusuallydoneusingtheeuclideandistancebutothermetricscanbe used.Usingthedistancesbetweenobservationsofindependentvariablesin thefeaturespace,eachexamplecanrankallothertrainingexamplesbasedon theirsimilaritythroughdistanceinthefeaturespace.Inactuallypredicting theclassofapreviouslyunseenexample,thedistancetoallthetrainingset examplesarecomputed,andthekmostsimilarexampleshavetheirclasslabels considered.Fromthe'knearestneighbor'theclassorgrouplabelthatismost commonintheclassischosenasthepredictedclassofthetestingsamplein question. LogisticRegressionisaregressionmethodforoftenusedforpredictionof 8

PAGE 13

Figure2: Asimpleexampleoflinearregressionseparatingbetweentwoclassesofdata. categoricaldependentvariables.Whilelogisticregressionisusedforbinary categoricalvariables,ageneralization,multinomiallogisticregression,canbe usedforproblemsinvolvingmorethantwoclasses.Inadditiontopredicting theclassofanexample,logisticregressionalsocalculatesaprobabilityof theexamplehavingthatparticularclassgroup.Thisisspecialbecausemost classiersoutputonlythepredictedclassofasample,butlogisticregression alsogivesusacondencestatisticfortheprediction. Recallthatregressionworksbyttingasetofdatapointsorexamples belongingtoagivenclasstoafunction.Forexample,linearregressiontsa linetoasetofdata.Theregressionlinecanthenbeusedinclassicationto determinetheclassofanexamplebasedonwhetheritisononesideoftheline ortheother.Logisticregressionworkssimilarly.Logisticregressiontsthe datatoalogisticfunctioninsteadofalinearfunctionusedinlinearregression. NaiveBayesclassiersaresupervisedprobabilisticclassiersformedon 9

PAGE 14

Figure3: Bayestheorem. Figure4: Anexampleofarandomforest'streestructure. Bayestheorem.Thisalgorithmcomputestheprobabilityofaexamplehaving aspeciccategoricaldependentvariable.NaiveBayesdeterminesthisprobabilitybyexaminingtherelationbetweendependentandindependentvariables inthetrainingsetuponwhichtheclassierwastrained.Oncetheprobabilityofanexamplehavingeachpossibledependentvariableisdetermined, themostprobableisselectedasthealgorithmspredictionoftheexample's class.Todeterminethe'mostprobable'or'maximumlikelihood'requiresan assumptiononthedistributionoffeatures.Forthisreason,threenaiveBayes distributionmodelswereused:Gaussian,Bernoulli,andmultinomial.These werechosenbecausetheyweretheonlyavailablenaivebayesclassiersinthe machinelearninglibraryused. Randomforestsareatypeofensemblelearningmethodclassierthatcreatesmultipledecisiontrees.Itisanensemblemethodbecauseitblendsmultiplemodelstogether.Forexample,dierentpredictorscanbeusedindierent leaves,randomnoisecanbeinjected,dierentnodescanusedierentdecision trees,justaslinearorquadraticdecisionstructures,etc.Asaclassier,the 10

PAGE 15

decisiontreestructureiscreatedwhentheclassiertstothetrainingdata set,andwhenpresentedwithatestingexample,apaththroughthetreeis chosenbasedonthefeaturesofthetestingexampleandanalleafisreached whichdictatesthepredictedclassoftheexample. 11

PAGE 16

3Methods TheMindResearchNetwork[15]inAlbuquerque,NewMexicomadeavailabletwoneurologicalmedicalimagingdatasetswithscansfromschizophrenia andbipolarsubjectsaswellasacontrolgroupforcomparison.eachindependentvariableforeachsubjectisavoxel,avolumetricpixelofthemedical imageofthatpatient,representingacubesometimesstretchedofvarioussize dependingontheparametersofthemachineataparticularlocationinthe subjectsbrain.Multiplescanswerereceivedforeachsubjectforeachofthe dierentimagingmodalitiesthatwereusedwhentheywerescanned.Dierent imagingmodalitiesareusedoneatatimeduringthescantocollectdierent aspectsofthebrainanditsfunctioning,MRImachinesnotbeingabletouse multiplemodalitiessimultaneously. DataSetAinvolvedimagedatafromacohortof164individualsofwhich54 wereDSM-IVclassiedasschizophrenic,48wereDSM-IVclassiedasbipolar and62werefreeofanyDSM-IVAxis1disorderorpsychotropicmedication. Imagedataforeachindividualwascomprisedofaninitialdatastreamof 52000voxelsderivedfromafunctionalMRIfMRIimageandasecondary datastreamof40901voxelsderivedfromafractionalanisotropyFAimage. DataSetBwassimilarlystructuredandinvolvedimagedatafromacohort of63individualsofwhich35wereDSM-IVclassiedasschizophrenicand28 werefreeofanyDSM-IVAxis1disorderorpsychotropicmedication.Image dataforeachindividualwascomprisedofaninitialdatastreamof370features derivedfromanFA-image,asecondarydatastreamof2244featuresderived fromanALFFrestingstateimageandathirddatastreamof1435features derivedfromagreymatterstructuralimage. AllanalysisandcodingwasdoneonaUbuntuLinux[20]systemrunning 12

PAGE 17

freelyavailableversion12.04LTSPrecisePangolin.Python2.7[11]waschosenasthelanguageofchoiceduetotheavailabilityofopensourcemachine learninglibrariesanditsreadability.Geany[10]wasusedasthedevelopment environmentofchoiceduetoitssimplicityandrichnessoftexteditingfeatures thatmostotherdevelopmentenvironmentslack.Ipython [7] wasusedextensivelytotestvariousfunctionsduetoitsbashintegrationandusefultoolsfor benchmarkingandobjectexploration,convenientlydisplayingtheattributes andmethodsofwhateverobjectoneisworkingwithusingbash-likeautocomplete.ThemachinelearningmodulefromSciPy[6],oftenreferredtoas sklearnSciKitsMachineLearning,[5, ? ]waschosenforimplementingvarious machinelearningclassiersduetoitsexcellentdocumentationandnumberof toolsitcontained. Thedatawasreceivedinaconvenientformatforanalyzingwiththeclassiers,consistingofatwo-dimensionalarraywithrowsrepresentingsubjects andcolumnsrepresentingvoxelsorfeatures.MatlabmatrixleswereconvertedtopythondictionariesusingSciPy'siomodule.Eachdictionarythen hadakeywhosevaluewasthetwo-dimensionalmatrixofthedata.Thisdata wasthenextractedfromthedictionaryintoanumpyarray. Datawasthensplitintoseparatearraysforeachpairingofdiseasestateand modality,allowingeachgrouptobeaccessedanddrawnfromindependentlyas wellastopreventmislabelinganysubjects.Whileretainingtheseparatearraysforeachmodalityanddiseasestate,additionalarrayswereconstructedby horizontallyconcatenatingarraysofmultiplemodalitiesforeachdiseasegroup, allowingforcombinationsofmodalitiestobeanalyzedtogethertodemonstrate whichmodalitiesweremosteectivefortheaccuracyoftheclassier'spredictions.IndatasetA,therewasanarrayforcontrols,schizophreniasubjects, andbipolarsubjectsinbothfMRI,FA,andafMRI+FAconcatenatedarray. 13

PAGE 18

ThusdatasetAwasrepresentedby9arrayvariables.IndatasetB,there wasanarrayforbothcontrolsandschizophreniasubjects,foreachofFA, GM,ALFF,FA+GM,FA+ALFF,GM+ALFF,andFA+GM+ALFF.Thus datasetBwasrepresentedby14arrays. Alladditionaldatapreparationinvolvedgeneratingtargetarrays.Targets arraysareNby1arraysthatstoretheclassofeachsubjectinanassociated dataarray.Eachelementisanintegerfrom1tothenumberofclassgroups. Thesehadtobeconstructedusinginformationreceivedwiththedatathat therstxsubjectswerefromgroup0andthenextysubjectsweregroup1. Specicfunctionswerecreatedtodothisforeachdatasetreceived.Target arrayswereexpectedbyallthesupervisedalgorithmsusedinadditiontothe data,butasaseparatearray. Functionswerethenconstructedforeachoftheeightalgorithms.Each functionisfourlineslong,takestrainingandtestingdataandtargetsasparameters,initializesaclassier,tstothetrainingdatabasedonthetraining targets,thenreturnsarealnumberbetween0and1representingthepercent ofthetestingdatathatwascorrectlyclassied. Afunctionwasthenconstructedtorunalltheclassiersonagiventraining andtestingsetandrecordtheclassicationaccuracyofeachalgorithm.This functionwasthenusedinahigherlevelfunctionthatranalleightalgorithms onagivendatasetNtimes,creatingarandomtrainingsetofPpercentofeach classgroupandatestingsetusingtheremainder,andrecordingtheaccuracy ofeachalgorithmoneachiterationofrandomsamplingandclassication.Yet again,thepreviousfunctionwasusedinanevenhigherlevelfunctionwhereit wascalledoneachmodalityandcombinationofmodalitiesfMRI,FA+FMRI, etc..witheachpreprocessingmethodLDA,etc... 14

PAGE 19

4 Results Theresultofthemainfunction,senttoeitherstdoutoragiventextle, displaysafewthings.Firsttheinitialparametersoftheclassicantexperiment areshownshown: N ,thenumberofrandomsamplingstodoforeachalgorithm, C ,thenumberofcomponentstobeusedforPCAandICA,and P thepercent ofeachclasstobeusedinconstructingatrainingset. Thenweseethemean/max/min/stdaccuracyofeachalgorithm,oneach dataset,witheachpreprocessingmethod,afterNiterationsofrandomsamplingPpercentofeachgroup.ThisinformationwasgeneratedonallcombinationsofdatasetsforarangeofdierentvaluesofPtodemonstratethe numberofsamplesrequiredfortrainingtoachieveathresholdclassication accuracy,aswellasnumerousvaluesofCtodetermineeyeballtheoptimal numberofcomponentstouse.ICAandPCAseemtoperformbestataround 10components. Resultsweregeneratedusingthefollowingparameters:1000trialsofrandomsamplingforeachalgorithmoneachdata-subset,20%,30%,50%,75%,and 90%trainingsetsizes,and10componentsforICAandPCA.Theresultsfor the90%trainingsetsizesareshownbelow. DatasetAiscomprisedof164subjectsfromthreeclassgroups:control, schizophrenia,andbipolar;andthreemodalities:fMRI,FA,andfMRI+FA. GroupComparison Accuracy Modality FeatureExtractionMethod Classier ControlvsSchizophreniavsBipolarDisorder .874 FMRI+FA LinearDiscriminantAnalysis LogisticRegression ControlvsSchizophrenia .965 FMRI+FA LinearDiscriminantAnalysis K-NearestNeighbors ControlvsBipolarDisorder .916 FMRI+FA LinearDiscriminantAnalysis LogisticRegression SchizophreniavsBipolarDisorder .902 FMRI LinearDiscriminantAnalysis RandomForests Table1:BestResultsfromDatasetA Thistableshowsthemeanaccuracyofthegivenclassierafter1000trialsofrandomsamplingwithtraining sizesetto90%.Thebestresultsatthattrainingsizeforeachclasscomparisonareshown. 15

PAGE 20

Doingathreegroupcomparisonincludingallclassesintrainingandtesting, usingjustthefMRIdataafterLDA,LogisticRegressionLRachieved81.5% accuracyusingonly20%ofthedatatotrainon,butdidnotdobetterthan 82%evenat90%trainingsetsize.LogisticRegressionalsodidwellusing thefMRI+FAdata,reaching85.7%at20%trainingsizeand87.8%with90% trainingsize. NeitherICAnorPCAdidmuchbetterthan70%foranyclassier,often achievingaccuraciesof40%orlowerinanalyzingmanymodalities.Whilethey performedbetterinsomemodalitiesthanothers,therelationshipbetweenthe modalitiesandtheaccuraciesappearinlinewiththevarianceintheLDA results.UsingLDAondatasetA,thecombinationoffunctionalMRIdata withfractionalanistropyfromadiusiontensorimagingscanperformedbetter thaneitheralone,thoughindependentlyfMRIledtohigheraccuracies. Whencomparingjustcontrolsandschizophreniasubjects,attemptingto diagnoseschizophreniainthetestingdatasubjects,K-NearestNeighborsKNN achieved96.5%accuracyontheLDAtransformeddatawith90%trainingset size.Whencomparingjustcontrolsandbipolarsubjects,attemptingtodiagnosebipolardisorderinthetestingdatasubjects,logisticregressionachieved 91.6%accuracywiththeLDAtransformeddataandtrainingon90%ofthe data.Whentryingtodierentiatebetweenbipolarandschizophreniasubjects,bestresultswereachievedusingthefMRIdataalone,whereafterLDA, RandomForestsRFreached90.2%classicationaccuracyafter1000trials ofrandomsampling. DatasetBiscomposedof63subjectsfromtwoclassgroups:control andschizophrenia;and7modalities:FA,GM,ALFF,FA+GM,FA+ALFF, GM+ALFF,GM+ALFF+FA. BestresultsindiagnosingschizophreniacorrectlywereachievedusingRan16

PAGE 21

Accuracy Modality FeatureExtractionMethod Classier ControlvsSchizophrenia .967 ALFF+GM+FA LinearDiscriminantAnalysis RandomForests Table2:BestResultsfromDatasetB Thistableshowsthemeanaccuracyafter1000trialsofrandomsamplingwithtrainingsizesetto90%.The bestresultsatthattrainingsizeforeachclasscomparisonareshown. domForestsRFat90%trainingsizeusingonlythemergeddataafterLDA, scoring96.7%.At20%trainingsize,thebestalgorithmwasBernoulliNaive BayesBNB,scoring79.3%.BothachievedtheiraccuracyusingtheLDA transformeddata. Fromourresultsitisclearthatpreprocessingplaysanimportantrolein machinelearningforthisproject.PriortousingLinearDiscriminantAnalysis LDAtotransformthedatabeforetrainingandtestingwiththeclassiers, noalgorithmachievedresultsbetterthanaround60%.InDatasetA,LDA improvedtheclassicationaccuracyofallalgorithmstoabove80%at90% trainingsize.At96%and98%trainingsizes,LDAallowedclassicationaccuraciesofover97%fordiagnosingschizophreniasubjectsversuscontrolin DatasetA.Whiletheseresultsareverylarge,theyarearticiallylargedueto theclassieroverttingtothedatasetandmaynotproperlyindicatehowthe classierswouldgeneralizeinpractice. PreprocessingusingLDAfaroutperformedpreprocessingusingPCAor ICAinclassicationafterdatatransformation.Thereasonforthisislikely thatLDAoperateswithknowledgeoftheclassgroupingswhendetermining whichfeaturesaremostimportantwhilePCAandICAareblindtotheclass ofagivensubjectwhencomparingfeatures. Trainingsizehasacleareectonclassicationaccuracy,largertraining setsmeaningtheclassierhasmoreinformationontobaseitspredictionson. Notethatthistrendofhigheraccuracywithlargertrainingsetsisnotconstant intheseresultsastheaccuraciesarethemeanof100iterationsofrandom 17

PAGE 22

sampling.Somepermutationsofarandomlyselectedtrainingsetsubjects andtestingsetsubjectsmayleadtohigherandloweraccuraciesthanothers. Becauseanitenumberofrandomtrialsaredone,thereisthepossibilitythat classierswill'getlucky'anddobetterthanotherswithlargertrainingsets, butwithalargenumberoftrials,theeectoftrainingsizeisexpectedtobe moreorlesslinearlyrelatedtoclassicationaccuracyinpractice. ItisclearinthethreetablesabovethatPCAandICAdoquitepoorly incomparisontoLDA,barelyexceeding50%accuracyevenatlargertraining setsizeswhileclassiersusingLDAreduceddataofthesamemodalitieswere abletoachieveaccuraciesrangingfrom60%to89%accuracy.Itiscleargiven theabovetablethatcertainmodalitiesorcombinationsofmodalitiesallow theclassierstoperformbetterthanwhenusingothermodalities.Inthe abovetable,fMRIdataappearstobemoreindicativeofbipolardisorderthan FAdata,butwhenbothareexaminedtogether,theclassieroutperforms classierslookingateithermodalityindividually. 18

PAGE 23

5DiscussionandFutureDirection Inthisthesis,multipleneurologicaldatasetswereanalyzedbyeightopen sourcemachinelearningclassiersafterthedatahadbeentransformedby fouropensourcepreprocessingalgorithms.Signicantclassicationaccuracies wereachievedindiagnosingmentalillnessanddierentiatingbetweendierent mentalillnessusingfreelyavailablesoftware.Thisworksupportstheideathat analysiscanbedonecheaplyandwellshouldscienticdatabemadeavailable foranalysisbythepublic. Whilethemachinelearningalgorithmsdomostofthehardworkforus, manyproblemscanandwillappearduringtheconstructionofsoftwarefordata analysispurposes.Becausethedatasetsweusemachinelearningalgorithms toanalyzearetoolargeandcomplextoextractusefulinformationfromby hand,itcanbeeasytoconfuseormisrepresentsomeaspectofthedatain awaythatcanproduceerroneousresultsthatmaynotimmediatelyappear suspect,oriftheresearcherislucky,willproduceresultsthatareobviously erroneous.Commonexamplesofdatamanipulationmistakesincludetesting classicationaccuracyonsamplestheclassierhasalreadyseenandtrainedon, leadingtonearperfectscoresorlosingthepropergroup/classlabelingcausing theclassiertoperformnobetterthanarandomguessinclassprediction. Ifprogrammingrelatederrorssuchasthosejustmentionedoccur,theycan leadtheresearcherstofalseconclusionsabouttheintegrityofthedataorthe machinelearningalgorithmsbeingused. Classiershavevariedusesandapplicationsbuttheyaredrivenbytwo mainfunctions.Supervisedclassiers't'toatrainingdatasetwithrespect totheclassofeachsampleinthetrainingdata.Thisdoesnotcreatedata, butpreparestheclassierforprediction,predictingtheclassofthetesting 19

PAGE 24

datasetgivestheuserthecorrectclassandthepredictedclassofeachexampletheclassiertriedtopredicttheclassof,allowingtheconstructionof a'classicationaccuracy'forthattestingset,bestviewedusingaconfusion matrix.Properlyorganizingthepipelineoftheresultscanbeacomplextask whencomparingdierentfusionsofrelateddatasetsorrunningsomethingat arangeofdierentparametersseekingoptimization.Havinganabstracted interfacethatallowspowerfulquerieswithouttheoverheadofsettingupa databasefacilitatesthis. Combiningclassierstogethertoforma'meta-classier'isanintriguing optionforincreasingclassicationaccuracy.Eachclassierweusepredictsthe classofanexampleanditsaccuracyiscomputedbycomparingtheprediction againsttheknownclassofthetestingexample.Theideaistobuildaclassier whoseinputsaretheoutputsofmultipleclassiers,wherethe'meta-classier' couldgainincreasedaccuracyinlightofincorrectclassicationsbythelower levelclassiers. Thisthesiscouldbeexpandeduponbyaddingtothelistofclassiersand dimensionalityreductiontechniqueswechosetoemploy.Thesklearnmachine learninglibraryforpythonthatwasusedhastensofotherclassiersthatwere notselectedbutmaywellperformbetterforthisproblemspaceorothers.The conceptofhighthroughputmachinelearningclassication,usingallavailable toolstoidentifywhatworksbestwithouttheoverheadofthinking,isan intriguingidea.Mostpublishedpapersinthemachinelearningcommunity appeartoselectonlyafewdierentalgorithms.Unlessadatasetisofsuch asizethatcomputingresourcesbecomelimited,ashotgun-likeapproachto determiningthebesttoolisimportant. Oneofthemoreinterestingaspectsofthisthesisisthemonetarycost involvedincarryingouttheanalysis.Whilethedatacollectionrequiredthe 20

PAGE 25

concertedeortofmanyresearcherswhocollectivelyspentthousandsofhours scanningpatientsaswellasthecostinvolvedinbuyingandmaintaininga multimilliondollarliquidnitrogencooled3-teslamagneticresonanceimaging machine,allanalysisdescribedinthisthesisusedfreelyavailablesoftware,the entireprocesscostingnomorethanthepowerusedbythepersonalcomputer andthestimulantsusedtokeeptheunpaidundergraduateresearcheralert. Additionally,HIPAAsafemedicaldata,bothgeneticandimaging,isincreasinglybeingmadefreelyandpubliclyavailableontheinternetforanalysis byanyoneinterested.Thisrepresentsaparadigmshiftinthedemographicsof researcherswhohaveaccesstodatathatstillcontainsnoveldiscoverieswaiting tobefoundandresourcesforanalysisthathavelongbeenlimitedtoemployees ofcorporateandgovernmentinstitutions,achangethatcouldfundamentally shifttheorganizationandstructureofthescienticcommunity. 21

PAGE 26

6Appendix DatasetA 62healthycontrolsHC,age38 17,30females,54patientswithschizophreniaSZ,age37 12,22femalesand48patientswithbipolardisorderBP, age37 14,26femaleswererecruitedattheOlinNeuropsychiatricResearch CenterandwerescannedbybothfMRIandDTI.Allsubjectsgavewritten, informed,HartfordhospitalIRBapprovedconsent.Schizophreniaorbipolar disorderwasdiagnosedaccordingtoDSM-IV-TRcriteriaintheonthebasisof astructuredclinicalinterviewFirstetal.,1995administeredbyaresearch nurseandreviewofthemedicalle.Allpatientswerestabilizedonmedicationpriortothescansessioninthisstudy.Healthyparticipantswerescreened toensuretheywerefreefromDSM-IVAxisIorAxisIIpsychopathology assessedusingtheSCIDSpitzeretal.,1996andalsointerviewedtodeterminethattherewasnohistoryofpsychosisinanyrst-degreerelatives.All subjectswereurine-screenedtoeliminatethosewhowerepositiveforabused substances.Patientsandcontrolswereageandgendermatched,withnosignicantdierencesamong3groups,whereage:p=0.93,F=0.07,DF=2. Sex:p=0.99, q 2=0.017,DF=2.Allparticipantshadnormalhearing,and wereabletoperformtheoddballtasksuccessfullyduringpracticepriortothe scanningsession.[14] TheAuditoryoddballtaskinvolvedsubjectsencounteringthreefrequencies ofsounds:targetHzwithprobability,p=0.09,novelcomputergeneratedcomplextones,p=0.09,andstandardHz,p=0.82presented throughacomputersystemviasoundinsulated,MR-compatibleearphones. Stimuliwerepresentedsequentiallyinpseudorandomorderfor200mseach 22

PAGE 27

withinter-stimulusintervalISIvaryingrandomlyfrom500to2050ms.Subjectswereaskedtomakeaquickbutton-pressresponsewiththeirrightindex ngeruponeachpresentationofeachtargetstimulus;noresponsewasrequired fortheothertwostimuli.Tworunsof244stimuliwerepresentedKiehlet al.,2001.[14] ScanswereacquiredattheInstituteofLiving,Hartford,CTona3T dedicatedheadscannerSiemensAllegraequippedwith40mT/mgradients andastandardquadratureheadcoil.Thefunctionalscanswereacquiredusing gradient-echoechoplanarimagingEPIwiththefollowingparameters:repeat timeTR=1.5s,echotimeTE=27ms,eldofview=24cm,acquisition matrix=64 64,ipangle=70 ,voxelsize=3.75 3.75 4mm3,slicethickness=4mm,gap=1mm,andnumberofslices=29;ascendingacquisition. Sixdummyscanswerecarriedoutatthebeginningtoallowforlongitudinal equilibrium,afterwhichtheparadigmwasautomaticallytriggeredtostartby thescanner.DTIimageswereacquiredviaasingle-shotspin-echoechoplanar imagingEPIwithatwice-refocusedbalanceechosequencetoreduceeddy currentdistortions,TR/TE=5900/83ms,FOV=20cm,acquisitionmatrix= 128 96,reconstructionmatrix128 128,8averages,b=0,and1000s/mm2 along12non-collineardirections,45contiguousaxialsliceswith3mmslice thickness.fMRIpreprocessingfMRIdatawerepreprocessedusingthesoftware packageSPM5http://www.l.ion.ucl.ac.uk/spm/software/spm5/Fristonet al.,2005.ImageswererealignedusingINRIalign,amotioncorrectionalgorithmunbiasedbylocalsignalchangesFreireetal.,2002.Datawerespatially normalizedintothestandardMNIspaceFristonetal.,1995,smoothedwith a12mm3fullwidthathalf-maximumGaussiankernel.Thedata,originally 3.75 3.75 4mm,wereslightlysubsampledto3 3 3mm,resultingin 53 63 46voxels.[14] 23

PAGE 28

DatasetB DTI,restingstatefMRIandsMRIdataofcollectedfrom63subjectscollectedrecruitedaspartofamultimodalschizophreniacenterforbiomedicalresearchexcellenceCOBREstudyattheMindResearchNetworkhttp://COBRE.mrn.org .Informedconsentwasobtainedfromallsubjectsaccordingtoinstitutional guidelinesattheUniversityofNewMexicoUNM.Allsubjectswerescreened andexcludediftheyhadhistoryofneurologicaldisorder,historyofmentalretardation,historyofsevereheadtraumawithmorethan5minuteslossof consciousness,orhistoryofsubstanceabuseordependencewithinthelast 12months.exceptfornicotine.HealthycontrolswerefreefromanyAxis Idisorder,asassessedwiththeSCID-NPStructuredClinicalInterviewfor DSM-IV-TR,Non-patientversion.PatientsmetcriteriaforschizophreniadenedbytheDSM-IV-TRbasedontheSCID-PinterviewFirst,1995.All patientswereonstablemedicationpriortothefMRIscansession.Thetwo groupsdidnotdierwithregardtoage,genderandethnicityImagingParametersDataset.Allthedatawerecollectedona3-TeslaSiemensTrioscanner witha12-channelradiofrequencycoilattheMindResearchNetwork.[15] Results Belowaremoredetailedresultsofanotherclassicationexperimentwith dierentparameters.Thesewerecomputedusing100iterationsofrandom samplingat10%,25%,50%,75%,and90%trainingsetsizeswith10componentsusedforICAandPCA.Thesedepicttheeectoftrainingsizeand modalityontheaccuracyoftheclassiers.Theaccuracyandnameofthe mostaccurateclassieraredisplayedforeachpairingofmodalitiesandtrainingsizes. 24

PAGE 29

PercentTrainingSize fMRI FA fMRI+FA 10% LR.798 LR.629 LR.834 25% LR.816 LR.658 LR.861 50% LR.820 LSVM.667 LR.872 75% LSVM.817 LR.665 LR.876 90% LR.822 RF.680 LR.889 Table3:DatasetA-LinearDiscriminantAnalysis-SchizophreniavsBipolar vsControl PercentTrainingSize fMRI FA fMRI+FA 10% BNB.405 KNN.404 KNN.417 25% GNB.417 RF.420 KNN.440 50% GNB.437 RF.448 RF.470 75% GNB.426 RF.462 RF.472 90% RF.456 RF.469 RF.526 Table4:DatasetA-IndependentComponentAnalysis-Schizophreniavs BipolarvsControl PercentTrainingSize fMRI FA fMRI+FA 10% LSVM.381 KNN.437 RF.411 25% GNB.418 KNN.457 GNB.461 50% GNB.460 LR.470 RF.487 75% GNB.467 LSVM.503 LR.502 90% LR.478 LSVM.520 BNB.517 Table5:DatasetA-PrincipalComponentAnalysis-SchizophreniavsBipolar vsControl PercentTrainingSize fMRI FA fMRI+FA 10% KNN.890 LR.801 LR.933 25% LR.893 LR.817 GNB.949 50% LR.890 LR.827 GNB.957 75% LSVM.884 LR.820 KNN.962 90% GNB.883 LR.837 KNN.967 Table6:DatasetA-LinearDiscriminantAnalysis-SchizophreniavsControl 25

PAGE 30

PercentTrainingSize fMRI FA fMRI+FA 10% KNN.573 KNN.596 RF.581 25% GNB.612 RF.617 GNB.628 50% GNB.626 GNB.640 GNB.678 75% GNB.599 GNB.656 GNB.688 90% GNB.636 RF.679 GNB.710 Table10:DatasetA-IndependentComponentAnalysis-BipolarvsControl PercentTrainingSize fMRI FA fMRI+FA 10% BNB.567 KNN.580 RF.580 25% GNB.579 RF.624 KNN.644 50% GNB.596 RF.650 RF.677 75% RF.620 RF.656 RF.693 90% RF.648 RF.692 RF.737 Table7:DatasetA-IndependentComponentAnalysis-Schizophreniavs Control PercentTrainingSize fMRI FA fMRI+FA 10% LR.558 KNN.665 LR.595 25% GNB.580 KNN.687 RF.648 50% GNB.619 KNN.713 RF.692 75% GNB.636 LSVM.711 RF.719 90% GNB.567 KNN.745 RF.722 Table8:DatasetA-PrincipalComponentAnalysis-SchizophreniavsControl PercentTrainingSize fMRI FA fMRI+FA 10% LR.846 LR.733 LR.876 25% LR.871 LR.787 LR.907 50% LR.879 LSVM.798 LR.911 75% LR.885 LR.803 LR.918 90% LR.876 KNN.806 LR.918 Table9:DatasetA-LinearDiscriminantAnalysis-BipolarvsControl 26

PAGE 31

PercentTrainingSize fMRI FA fMRI+FA 10% BNB.540 KNN.539 BNB.538 25% BNB.570 RBF.532 BNB.560 50% BNB.582 RBF.529 BNB.581 75% BNB.608 BNB.545 BNB.606 90% BNB.605 RBF.545 BNB.605 Table14:DatasetA-PrincipalComponentAnalysis-SchizophreniavsBipolar PercentTrainingSize fMRI FA fMRI+FA 10% LSVM.571 LR.598 RF.583 25% GNB.595 KNN.621 GNB.623 50% GNB.611 LSVM.664 KNN.651 75% GNB.630 LSVM.690 LR.671 90% RF.638 LSVM.698 RF.688 Table11:DatasetA-PrincipalComponentAnalysis-BipolarvsControl PercentTrainingSize fMRI FA fMRI+FA 10% LR.879 LR.718 LR.874 25% LR.890 LR.745 LR.884 50% LR.892 RF.768 LR.897 75% RF.900 RF.759 LR.897 90% RF.915 KNN.802 LR.903 Table12:DatasetA-LinearDiscriminantAnalysis-SchizophreniavsBipolar PercentTrainingSize fMRI FA fMRI+FA 10% BNB.543 LSVM.527 BNB.547 25% LSVM.532 BNB.552 BNB.537 50% BNB.563 BNB.553 BNB.559 75% LSVM.538 BNB.574 LR.550 90% RF.551 LR.575 LR.564 Table13:DatasetA-IndependentComponentAnalysis-Schizophreniavs Bipolar 27

PAGE 32

PercentTrainingSize FA ALFF GM FA+ALFF ALFF+GM FA+GM FA+ALFF+GM 10% LR.740 LR.845 LR.922 LR.867 LR.945 LR.932 LR.950 25% LR.768 LSVM.856 LR.923 KNN.881 LR.960 LR.932 LR.962 50% LR.766 RBF.862 LR.926 GNB.883 LR.965 RBF.941 RBF.964 75% LR.766 RBF.874 RBF.948 GNB.883 LR.962 RBF.950 RBF.969 90% LR.807 RBF.877 RBF.949 LSVM.894 LR.961 RBF.953 RBF.966 Table15:DatasetB-LinearDiscriminantAnalysis-SchizophreniavsControl PercentTrainingSize FA ALFF GM FA+ALFF ALFF+GM FA+GM FA+ALFF+GM 10% RF.562 LSVM.552 LSVM.552 LR.552 RF.554 LSVM.552 RF.565 25% LR.573 LR.563 BNB.576 LR.598 RF.578 BNB.581 LR.584 50% BNB.613 RF.587 LR.567 LR.615 LR.591 BNB.618 LR.612 75% KNN.618 RF.623 KNN.608 LR.634 RF.616 BNB.604 LSVM.632 90% KNN.686 RF.647 KNN.671 LR.670 RF.649 BNB.657 RF.627 Table16:DatasetB-IndependentComponentAnalysis-Schizophreniavs Control PercentTrainingSize FA ALFF GM FA+ALFF ALFF+GM FA+GM FA+ALFF+GM 10% LR.574 RBF.552 LSVM.567 RBF.552 RBF.552 RBF.553 RF.558 25% LSVM.583 RBF.563 LR.589 GNB.572 RBF.562 LR.602 RBF.562 50% LR.613 RF.574 RBF.621 RF.596 LR.584 LR.615 RF.603 75% LSVM.648 RF.622 RBF.644 RF.610 RF.609 RBF.615 RF.608 90% LSVM.643 LSVM.639 RBF.664 RF.660 RF.636 RBF.627 RF.634 Table17:DatasetB-PrincipalComponentAnalysis-SchizophreniavsControl 28

PAGE 33

References [1]TheMindResearchNetwork,asubsidiaryoftheLovelaceRespiratory ResearchInstitute,www.mrn.org [2]FabianPedregosa,GalVaroquaux,AlexandreGramfort,VincentMichel, BertrandThirion,OlivierGrisel,MathieuBlondel,PeterPrettenhofer, RonWeiss,VincentDubourg,JakeVanderplas,AlexandrePassos,David Cournapeau,MatthieuBrucher,MatthieuPerrot,anddouardDuchesnay.2011.Scikit-learn:MachineLearninginPython.J.Mach.Learn. Res.999888November2011,2825-2830. [3]Comon,Pierre:"IndependentComponentAnalysis:anewconcept?",SignalProcessing,36:287Theoriginalpaperdescribing theconceptofICA [4]UbuntuLinux,http://www.ubuntu.com/ [5]http://www.holehouse.org/mlclass/01_02_Introduction_regression _analysis_and_gr.html [6]Scikit-learn:machinelearninginPython,http://scikit-learn.org/stable/ [7]SciPy,http://www.scipy.org/ [8]IpythonInteractivePython+BashShell,http://ipython.org/ [9]Python2.7,http://www.python.org/download/releases/2.7/ [10]SparsePrincipalComponentAnalysisHuiZou,TrevorHastie,Robert TibshiraniJournalofComputationalandGraphicalStatisticsVol.15, Iss.2,2006 [11]Geanydevelopmentenvironment,http://www.geany.org/ [12]LDAasaclassier,http://scikit-learn.org/stable/modules/generated/ sklearn.lda.LDA.html#sklearn.lda.LDA 29

PAGE 34

[13]NumericalRecipes3rdEdition,http://apps.nrbook.com/empanel/ index.html#pg=883 [14]AmericanPsychiatricAssociation..Diagnosticandstatisticalmanualofmentaldisordersthed.,textrev..Washington,DC [15]Ogawa,S.;Sung,Y.,"FunctionalMagneticResonanceImaging", Scholarpedia2:3105,doi:10.4249/scholarpedia.3105 [16]http://en.wikipedia.org/wiki/Fractional_anisotropy [17]AmplitudeoflowfrequencyuctuationofBOLDsignalandresting-state functionalconnectivityanalysisofbrainsofParkinson'sdiseaseQ.Chen ,S.Lui,S-S.Zhang,H-H.Tang,J-G.Wang,X-Q.Huang,Q-Y. Gong,D.Zhou,andH-F.shang;DepartmentofNeurology,WestChina HospitalofSichuanUniversity,Chengdu,Sichuan,China, [18]Li,M.,Cui,L.,Deng,W.,Ma,X.,Huang,C.,Jiang,L.,&...Li,T. "Voxel-basedmorphometricanalysisonthevolumeofgraymatterin bipolarIdisorder".PsychiatryResearch:Neuroimaging. [19]http://en.wikipedia.org/wiki/Diusion_tensor_imaging#Diusion_ tensor_imaging [20]DiscriminatingschizophreniaandbipolardisorderbyfusingfMRIand DTIinamultimodalCCA+jointICAmode;JingSui,GodfreyPearlson ,ArvindCaprihan,TlayAdali,Kent.Kiehl,JingyuLiu,Jeremy Yamamoto,VinceD.Calhoun 30


ERROR LOADING HTML FROM SOURCE (http://ncf.sobek.ufl.edu//design/skins/UFDC/html/footer_item.html)