ADAPTIVEON-DEVICEDEEPLEARNINGSYSTEMS
By
BiyiFang
ADISSERTATION
Submittedto
MichiganStateUniversity
inpartialful˝llmentoftherequirements
forthedegreeof
ElectricalEngineeringDoctorofPhilosophy
2019
ABSTRACT
ADAPTIVEON-DEVICEDEEPLEARNINGSYSTEMS
By
BiyiFang
Mobilesystemssuchassmartphones,drones,andaugmented-realityheadsetsarerevo-
lutionizingourlives.On-devicedeeplearningisregardedasthekeyenablingtechnologyfor
realizingtheirfullpotential.Thisisbecausecommunicationwithcloudaddsadditionalla-
tencyorcost,ortheapplicationsmustoperateevenwithintermittentinternetconnectivity.
Thekeytoachievingthefullpromiseofthesemobilevisionsystemsise˙ectivelyanalyzing
thestreamingvideoframes.However,processingstreamingvideoframestakeninmobile
settingsischallengingintwofolds.First,theprocessingusuallyinvolvesmultiplecomputer
visiontasks.Thismulti-tenantcharacteristicrequiresmobilevisionsystemstoconcurrently
runmultipleapplicationsthattargetdi˙erentvisiontasks.Second,thecontextinmobile
settingscanbefrequentlychanged.Thisrequiresmobilevisionsystemstobeabletoswitch
applicationstoexecutenewvisiontasksencounteredinthenewcontext.
Inthisarticle,we˝llthiscriticalgapbyproposingNestDNN,aframeworkthatenables
resource-awaremulti-tenanton-devicedeeplearningforcontinuousmobilevision.NestDNN
enableseachdeeplearningmodeltoo˙er˛exibleresource-accuracytrade-o˙s.Atruntime,
itdynamicallyselectstheoptimalresource-accuracytrade-o˙foreachdeeplearningmodel
to˝tthemodel'sresourcedemandtothesystem'savailableruntimeresources.Indoing
so,NestDNNe˚cientlyutilizesthelimitedresourcesinmobilevisionsystemstojointly
maximizetheperformanceofalltheconcurrentlyrunningapplications.
AlthoughNestDNNisabletoe˚cientlyutilizetheresourcebybeingresource-aware,it
essentiallytreatsthecontentofeachinputimageequallyandhencedoesnotrealizethe
fullpotentialofsuchpipelines.Torealizeitsfullpotential,wefurtherproposeFlexDNN,a
novelcontent-adaptiveframeworkthatenablescomputation-e˚cientDNN-basedon-device
videostreamanalyticsbasedonearlyexitmechanism.Comparedtostate-of-the-artearly
exit-basedsolutions,FlexDNNaddressestheirkeylimitationsandpushesthestate-of-the-art
forwardthroughitsinnovative˝ne-graineddesignandautomaticapproachforgenerating
theoptimalnetworkarchitecture.
Copyrightby
BIYIFANG
2019
TABLEOFCONTENTS
LISTOFTABLES
....................................
vii
LISTOFFIGURES
...................................
viii
Chapter1Introduction
...............................
1
Chapter2RelatedWork
..............................
3
Chapter3Resource-AwareMulti-TenantOn-DeviceDeepLearning
...
6
3.1IntroductionofNestDNN............................6
3.2ChallengesandOurSolutions..........................10
3.3NestDNNOverview................................12
3.4DesignofNestDNN................................14
3.4.1FilterbasedModelPruning.......................14
3.4.1.1BackgroundonCNNArchitecture..............14
3.4.1.2Bene˝tsofFilterPruning...................15
3.4.1.3FilterImportanceRanking...................16
3.4.1.4PerformanceofFilterImportanceRanking..........17
3.4.1.5FilterPruningRoadmap....................19
3.4.2Freeze-&-GrowbasedModelRecovery.................19
3.4.2.1MotivationandKeyIdea...................19
3.4.2.2ModelFreezingandFilterGrowing..............20
3.4.2.3SuperiorityofMulti-CapacityModel.............21
3.4.3Resource-AwareScheduler........................22
3.4.3.1MotivationandKeyIdea...................22
3.4.3.2CostFunction..........................23
3.4.3.3SchedulingSchemes......................24
3.4.3.4CachedGreedyHeuristicApproximation...........25
3.5Evaluation.....................................27
3.5.1Datasets,DNNsandApplications....................27
3.5.1.1Datasets.............................27
3.5.1.2DNNModels..........................28
3.5.1.3MobileVisionApplications..................28
3.5.2PerformanceofMulti-CapacityModel..................29
3.5.2.1ExperimentalSetup......................29
3.5.2.2OptimizedResource-AccuracyTrade-o˙s...........31
3.5.2.3ReductiononMemoryFootprint...............32
3.5.2.4ReductiononModelSwitchingOverhead...........33
3.5.3PerformanceofResource-AwareScheduler...............35
3.5.3.1ExperimentalSetup......................35
3.5.3.2ImprovementonAccuracyandFrameRate.........37
v
3.5.3.3ReductiononEnergyConsumption..............39
3.6ConclusionofNestDNN.............................39
Chapter4Content-AdaptiveOn-DeviceDeepLearning
...........
41
4.1IntroductionofFlexDNN.............................41
4.2Background&Motivation............................46
4.2.1DynamicsofMobileVideoContents&Bene˝tofLeveragingtheDy-
namics...................................46
4.2.2DrawbacksofExistingSolutions.....................48
4.3DesignofFlexDNN................................50
4.3.1FilterRankingbasedonCollectiveImportance.............51
4.3.1.1Motivation...........................51
4.3.1.2CollectiveImportance-basedFilterRanking.........51
4.3.1.3SuperiorityofCollectiveImportance-basedFilterRanking.53
4.3.2SearchingfortheOptimalEarlyExitArchitecture...........53
4.3.2.1Motivation...........................53
4.3.2.2EarlyExitArchitectureSearchScheme............54
4.3.2.3EarlyExitRate.........................56
4.3.3EarlyExitInsertionPlan.........................57
4.3.3.1Motivation...........................57
4.3.3.2ProblemFormulation......................57
4.3.3.3E˚ciencyofEarlyExit....................58
4.3.4RuntimeAdaptationtoWorkloadandSystemResourceDynamics..60
4.3.4.1Motivation...........................60
4.3.4.2Accuracy-ResourcePro˝le...................60
4.3.4.3RuntimeAdaptation......................62
4.4Implementation..................................63
4.5Evaluation.....................................65
4.5.1Methodology...............................65
4.5.2ModelPerformance............................67
4.5.2.1HighEarlyExitRate.....................67
4.5.2.2Computation-E˚cientEarlyExits..............68
4.5.2.3HighComputationalConsumptionReduction........68
4.5.2.4CompactMemoryFootprint..................69
4.5.3RuntimePerformance...........................70
4.5.3.1Top-1Accuracyvs.FrameProcessingTime.........70
4.5.3.2ReductiononEnergyConsumption..............70
4.5.3.3PerformanceofRuntimeAdaptation.............72
4.5.4PerformanceonImageNet........................73
4.6ConclusionofFlexDNN.............................74
Chapter5Conclusion
................................
75
BIBLIOGRAPHY
....................................
76
vi
LISTOFTABLES
Table3.1:De˝nedterminologiesandtheirexplanations................14
Table3.2:Summaryofdatasets,DNNmodels,andmobilevisionapplicationsused
inthiswork..................................26
Table3.3:Bene˝tofmulti-capacitymodelonmemoryfootprintreduction.....32
Table3.4:Bene˝tofmulti-capacitymodelonmodelswitching(modelupgrade)in
termsofmemoryusage............................35
Table3.5:Bene˝tofmulti-capacitymodelonmodelswitching(modeldowngrade)
intermsofmemoryusage..........................35
Table4.1:Summaryofthreeapplications,twobasemodels,andsixgenerated
FlexDNNmodels...............................63
vii
LISTOFFIGURES
Figure3.1:NestDNNarchitecture............................11
Figure3.2:Illustrationof˝lterpruning[1].Bypruning˝lters,bothmodelsizeand
computationalcostarereduced.......................16
Figure3.3:Filterimportancepro˝lingperformanceof(a)TRRand(b)
L
1
-normon
VGG-16trainedonCIFAR-10........................17
Figure3.4:Illustrationofmodelfreezingand˝ltergrowing..............20
Figure3.5:Illustrationofmodelswitching(modelupgradevs.modeldowngrade)of
multi-capacitymodel.............................22
Figure3.6:Top-1testaccuracyvs.modelsizecomparisonbetweendescendentmod-
elsandbaselinemodels...........................29
Figure3.7:Computationalcostcomparisonbetweendescendantmodelsandvanilla
models....................................30
Figure3.8:Modelswitchingenergyconsumptioncomparisonbetweenmulti-capacity
modelsandindependentmodels.Theenergyconsumptionismeasured
onaSamsungGalaxyS8smartphone....................34
Figure3.9:Pro˝leofallaccumulatedsimulationsgeneratedbyourdesignedbench-
mark......................................37
Figure3.10:RuntimeperformancecomparisonbetweenbaselineandNestDNNunder
schedulingscheme:(a)MinTotalCost;(b)MinMaxCost..........37
Figure3.11:EnergyconsumptioncomparisonbetweenbaselineandNestDNNunder
schedulingscheme:(a)MinTotalCost;(b)MinMaxCost..........39
Figure4.1:AconceptualoverviewofFlexDNN.FlexDNNisbuiltontopofabase
modelwiththeaugmentationofoneormoreearlyexitsinsertedthrough-
outthebasemodel.Foraneasyinput,itexitsattheearlyexitinserted
atanearlierlocationsincetheextractedfeaturesaregoodenoughto
classifythecontentintheeasyinput.Assuch,theeasyinputavoidsfur-
thercomputationalcostincurredonward.Forahardinput,itproceeds
deeperuntiltheextractedfeaturesaregoodenoughtoexitthehardinput.44
viii
Figure4.2:Illustrationoffourframesofavideoclipofbikingcapturedusingamobile
camerainUCF-101dataset:(a)and(d)areframeswithcontentsthat
areeasytorecognize;(b)and(c)areframeswithcontentsthatarehard
torecognize..................................47
Figure4.3:Bluesolidcurve:minimumcomputationalconsumptiontocorrectlyrec-
ognizethecontentineachframe(
optimalmodel
).Reddottedcurve:
computationalconsumptionofthe
one-size-˝ts-allmodel
.........47
Figure4.4:Bene˝tbroughtbytheadaptationvs.modelswitchingoverheadofthe
dynamiccon˝gurationapproach.......................49
Figure4.5:FlexDNNarchitecture............................50
Figure4.6:Comparisonoftheminimumnumberof˝ltersneededtoahievethe
sameaccuracyasusingall˝lterswithineachlayerbetweencollective
importance-basedrankingschemeandtheschemebasedonindependent
ranking....................................52
Figure4.7:Illustrationofderivationofqualityofanearlyexit............59
Figure4.8:Accuracy-latencypro˝leunderdi˙erent

values..............61
Figure4.9:(a)UCF-15:examplevideoframeofbiking(left)andskiing(right).
(b)Place-8:illustrationofdatacollectionusingaORDROEP5head-
mountedcamera(left);examplevideoframeofparkinglot(right).(c)
TDrone:illustrationofdatacollectionusingaDJIMavicProdrone
(left);examplevideoframeoftra˚csurveillanceintheresidentialarea
(right).....................................62
Figure4.10:ComparisonbetweenFlexDNNandbaselineapproachesintheaccuracy-
frameprocessingtimespace.........................66
Figure4.11:(a)Comparisonbetweentheaccumulatedcomputationalcostofallthe
earlyexitsandthecomputationalcostofthebasemodel.(b)Comparison
betweensaving(theaveragecomputationsavedfromearlyexitingper
frame)andoverhead(theaveragecomputationconsumedbytheearly
exitseachframegoesthroughbutfailstoexit)..............66
Figure4.12:Cumulativeexitrateateachwithoutlossofaccuracy.
(markedasE1,E2,...)areorderedbasedontheirdistancestothe
inputlayer(i.e.,E1istheearliestexit).FEdenotestheregularexitof
thebasemodel................................67
ix
Figure4.13:MemoryfootprintcomparisonbetweentheFlexDNNmodelanditsbag-
of-modelcounterpart.............................69
Figure4.14:ComparisonbetweenFlexDNNandbaselineapproachesintheaccuracy-
energyconsumptionspace..........................71
Figure4.15:Performanceofruntimeadaptationtoworkloadandsystemresourcedy-
namics.....................................72
Figure4.16:Top-1accuracycomparisonbetweenMSDNet-mobile...........73
x
Chapter1
Introduction
Mobilevisionsystemssuchassmartphones,drones,andaugmentedrealityheadsetsare
ubiquitoustoday.AsAIchipsetsemerge,thesesystemsbegintosupporton-devicelive
videostreamprocessing,whichistheenablerofawiderangeofcontinuousmobilevision
applications.Thistrendisfueledbytherecentadvancementindeeplearning(i.e.,Deep
NeuralNetworks(DNNs))[2]duetoitssuccessinachievingimpressivelyhighaccuraciesin
manyimportantvisiontasks.
Continuousmobilevisionapplicationsrequirecontinuouslyprocessingthestreaming
videoinput,andreturningtheprocessingresultswithlowlatency.Unfortunately,DNNs
areknowntobememoryandcomputationallyintensive[3],andhighcomputationalde-
mandtranslatesintohighprocessinglatencyandhighenergyconsumption.Thus,consider-
ingmobilesystemsareconstrainedbylimitedresourcesintermsofcomputation,memory,
andbatterycapacities,reducingresourcedemandsofDNNsiscrucialtorealizingthefull
potentialofcontinuousmobilevisionapplications.
Thekeytoachievingthefullpromiseofthesemobilevisionsystemsise˙ectivelyanalyzing
thestreamingvideoframes.However,processingstreamingvideoframestakeninmobile
settingsischallengingintwofolds.First,theprocessingusuallyinvolvesmultiplecomputer
visiontasks.Thismulti-tenantcharacteristicrequiresmobilevisionsystemstoconcurrently
runmultipleapplicationsthattargetdi˙erentvisiontasks.Second,thecontextinmobile
settingscanbefrequentlychanged.Thisrequiresmobilevisionsystemstobeabletoswitch
1
applicationstoexecutenewvisiontasksencounteredinthenewcontext.
Inwe˝llthiscriticalgapbyproposingNestDNN,aframeworkthatenablesresource-
awaremulti-tenanton-devicedeeplearningforcontinuousmobilevision.Ittakesthedynam-
icsofruntimeresourcesinamobilevisionsystemintoconsideration,anddynamicallyselects
theoptimalresource-accuracytrade-o˙andresourceallocationforeachoftheconcurrently
runningdeeplearningmodelstojointlymaximizetheirperformance.WeevaluateNestDNN
usingsixmobilevisionapplicationsthattargetsomeofthemostimportantvisiontasksfor
mobilevisionsystems.Resultshaveshownitoutperformstheresource-agnosticcounterpart
signi˝cantly.FromNestDNN,wediscoveryetanotherproblemthatitdoesnotsolve.
AlthoughNestDNNisabletoachieveresource-awaremulti-tenanton-devicedeeplearn-
ing,itessentiallytreatsthecontentofeachinputimageequally.Intorealizethefull
potentialofDNN-basedprocessingpipeline,wefurtherproposeFlexDNN,anovelcontent-
adaptiveframeworkthatenablescomputation-e˚cientDNN-basedon-devicevideostream
analyticsbasedonearlyexitmechanism.Comparedtostate-of-the-artearlyexit-basedso-
lutions,FlexDNNaddressestheirkeylimitationsandpushesthestate-of-the-artforward
throughitsinnovative˝ne-graineddesignandautomaticapproachforgeneratingtheopti-
malnetworkarchitecture.WeuseFlexDNNtobuildthreecomputation-e˚cientcontinuous
mobilevisionapplicationsontopofMobileNets.OurevaluationresultsshowthatFlexDNN
signi˝cantlyoutperformsbothcomputation-e˚cientcontent-agnosticandstate-of-the-art
content-adaptiveapproachesinreducingcomputationalconsumptionbyalargemargin.
Lastly,weconcludethisreportin
2
Chapter2
RelatedWork
Ourrelatedworkcontainsfourparts:1)MobileSensingSystems;2)DeepNeuralNetwork
ModelCompression;3)ContinuousMobileVision;and4)content-adaptivevideostream
analytics.
MobileSensingSystems.
Ourworkisalsobroadlyrelatedtoresearchinmobilesensing
systems.Priormobilesensingsystemshaveexploredavarietyofsensingmodalitiesthathave
enabledawiderangeofinnovativeapplications.Amongthem,accelerometer,microphone
andphysiologicalsensorsaresomeofthemostlyexploredsensingmodalities.Forexample,
Mokaya
etal.
developedanaccelerometer-basedsystemtosenseskeletalmusclevibrations
forquantifyingskeletalmusclefatigueinanexercisesetting[4].Nirjon
etal.
developed
MusicalHeart[5]whichintegratedamicrophoneintoanearphonetoextractheartbeatinfor-
mationfromaudiosignals.Nguyen
etal.
designedanin-earsensingsystemintheformof
earplugsthatisabletocaptureEEG,EOG,andEMGsignalsforsleepmonitoring[6].Re-
cently,researchershavestartedexploringusingwirelessradiosignalasacontactlesssensing
mechanism.Forexample,Wang
etal.
developedWiFall[7]thatusedwirelessradiosignal
todetectaccidentalfalls.Fang
etal.
usedradioasasinglesensingmodalityforintegrated
activitiesofdailylivingandvitalsignmonitoring[8].Inthiswork,weexploreinfraredlight
asanewsensingmodalityinthecontextofASLtranslation.Itcomplementsexistingmobile
sensingsystemsbyprovidinganon-intrusiveandhigh-resolutionsensingscheme.Weregard
3
thisworkasanexcellentexampletodemonstratetheusefulnessofinfraredsensingformobile
systems.Withtheincomingeraofvirtual/augmentedreality,weenvisioninfraredsensing
willbeintegratedintomanyfuturemobilesystemssuchassmartphonesandsmartglasses.
DeepNeuralNetworkModelCompression.
Modelcompressionfordeepneuralnet-
workshasattractedalotofattentionsinrecentyearsduetotheimperativedemandon
runningdeeplearningmodelsonmobilesystems.Oneofthemostprevalentmethodsfor
compressingdeepneuralnetworksispruning.Han
etal.
[9]proposedaparameterpruning
methodthatremovesnodeconnectionswithsmallweights.Althoughthismethodise˙ective
atreducingmodelsizes,itdoesnote˙ectivelyreducecomputationalcosts.Toovercomethis
problem,Li
etal.
[1]proposeda˝lterpruningmethodthathasachievedupto
38%
reduc-
tionincomputationalcost.Ourworkalsofocusesoncompressingdeepneuralnetworksvia
˝lterpruning.Ourproposed˝lterpruningapproachoutperformsthestate-of-the-art.More-
over,unlikeexistingmodelcompressionmethodswhichproduceprunedmodelswith˝xed
resource-accuracytrade-o˙s,ourproposedmulti-capacitymodelisabletoprovidedynamic
resource-accuracytrade-o˙s.Thisissimilartotheconceptofdynamicneuralnetworksin
thedeeplearningliterature
ContinuousMobileVision.
Theconceptofcontinuousmobilevisionwas˝rstadvanced
byBahl
etal.
[13].Thelastfewyearshavewitnessedmanye˙ortstowardsrealizingthe
visionofcontinuousmobilevisionInparticular,in[14],LiKamWa
etal.
proposed
aframeworknamedStar˝sh,whichenablese˚cientrunningconcurrentvisionapplications
onmobiledevicesbysharingcommoncomputationandmemoryobjectsacrossapplications.
OurworkisinspiredbyStar˝shintermsofsharing.Bysharingparametersamongdescen-
dentmodels,ourproposedmulti-capacitymodelhasacompactmemoryfootprintandincurs
4
littlemodelswitchingoverhead.Ourworkisalsoinspiredby[15].In[15],Han
etal.
pro-
posedaframeworknamedMCDNN,whichappliesvariousmodelcompressiontechniquesto
generateacatalogofmodelvariantstoprovidedi˙erentresource-accuracytrade-o˙s.How-
ever,inMCDNN,thegeneratedmodelvariantsareindependentofeachother,anditrelies
oncloudconnectivitytoretrievethedesiredmodelvariant.Incontrast,ourworkfocuseson
developinganon-devicedeeplearningframeworkwhichdoesnotrelyoncloudconnectivity.
Moreover,MCDNNfocusesonmodelsharingacrossconcurrentlyrunningapplications.In
contrast,NestDNNtreatseachoftheconcurrentlyrunningapplicationsindependently,and
focusesonmodelsharingacrossdi˙erentmodelvariantswithineachapplication.
Content-AdaptiveVideoStreamAnalytics.
FlexDNNiscloselyrelatedtocontent-
adaptivevideostreamanalytics.In[18],theauthorsproposedacontent-adaptivevideo
streamanalyticssystemsnamedChameleonthatdynamicallychangestheDNNmodelsto
adapttothedi˚cultylevelsofthevideoframes.However,itrequiresthesystemtocarryall
themodelvariantswithvariouscapacitiesandthusdoesnot˝tresource-constrainedmobile
platforms.Toaddressthislimitation,BranchyNet[19]andMSDNet[20]useasinglemodel
withaugmentedearlyexitstorealizetheadaptationtothedi˚cultylevelsofvideoframes.
SimilartoBranchyNetandMSDNet,FlexDNNalsousesasinglemodelwithaugmented
earlyexitsforcontentadaptation.However,FlexDNNdi˙ersfromthesepioneerworkin
thatitadoptsa˝ne-grainedapproachtomakeearlypredictionsatthegranularityof˝lters,
anditautomaticallygeneratestheoptimalearlyexitarchitectureandtheoptimalearlyexit
insertionplanwiththeobjectivetomaximizethebene˝tbroughtbyearlyexits.
5
Chapter3
Resource-AwareMulti-TenantOn-Device
DeepLearning
3.1IntroductionofNestDNN
Mobilesystemswithonboardvideocamerassuchassmartphones,drones,wearablecameras,
andaugmented-realityheadsetsarerevolutionizingthewaywelive,work,andinteractwith
theworld.Byprocessingthestreamingvideoinputs,thesemobilesystemsareableto
retrievevisualinformationfromtheworldandarepromisedtoopenupawiderangeof
newapplicationsandservices.Forexample,adronethatcandetectvehicles,identifyroad
signs,andtracktra˚c˛owswillenablemobiletra˚csurveillancewithaerialviewsthat
traditionaltra˚csurveillancecameraspositionedat˝xedlocationscannotprovide[21].
Awearablecamerathatcanrecognizeeverydayobjects,identifypeople,andunderstand
thesurroundingenvironmentscanbealife-changerfortheblindandvisuallyimpaired
individuals[22].
Thekeytoachievingthefullpromiseofthesemobilevisionsystemsise˙ectivelyanalyzing
thestreamingvideoframes.However,processingstreamingvideoframestakeninmobile
settingsischallengingintwofolds.First,theprocessingusuallyinvolves
multiple
computer
visiontasks.Thismulti-tenantcharacteristicrequiresmobilevisionsystemsto
concurrently
runmultipleapplicationsthattargetdi˙erentvisiontasks[14].Second,the
context
inmobile
6
settingscanbefrequentlychanged.Thisrequiresmobilevisionsystemstobeabletoswitch
applicationstoexecutenewvisiontasksencounteredinthenewcontext[15].
Inthepastfewyears,deeplearning(e.g.,DeepNeuralNetworks(DNNs))[2]hasbecome
thedominantapproachincomputervisionduetoitscapabilityofachievingimpressivelyhigh
accuraciesonavarietyofimportantvisiontasksAsdeeplearningchipsetsemerge,
thereisasigni˝cantinterestinleveragingtheon-devicecomputingresourcestoexecute
deeplearningmodelsonmobilesystemswithoutcloudsupportComparedtothe
cloud,mobilesystemsareconstrainedbylimitedresources.Unfortunately,deeplearningis
knowntoberesource-demanding[3].Toenableon-devicedeeplearning,oneofthecommon
techniquesusedbyapplicationdevelopersiscompressingthedeeplearningmodeltoreduce
itsresourcedemandatamodestlossinaccuracyastrade-o˙[15,29].Althoughthistechnique
hasgainedconsiderablepopularityandhasbeenappliedtodevelopingstate-of-the-artmobile
deeplearningsystems[17,ithasa
keydrawback
:sinceapplicationdevelopersdevelop
theirapplicationsindependently,theresource-accuracytrade-o˙ofthecompressedmodel
ispredeterminedbasedona
static
resourcebudgetatapplicationdevelopmentstageand
is
˝xed
aftertheapplicationisdeployed.However,theavailableresourcesinmobilevision
systemsatruntimearealways
dynamic
becausetheconcurrentlyrunningapplicationschange
dependingonthecontext.Asaconsequence,whentheresourcesavailableatruntimedo
notmeettheresourcedemandsofthecompresseddeeplearningmodels,resourcecontention
amongconcurrentlyrunningapplicationsoccurs,forcingthestreamingvideotobeprocessed
atamuchlowerframerate.Onetheotherhand,whenextraresourcesatruntimebecome
available,thecompresseddeeplearningmodelscannotutilizetheextraavailableresources
toregaintheirsacri˝cedaccuraciesback.
Inthiswork,wepresentNestDNN,aframeworkthattakesthe
dynamicsofruntime
7
resources
intoconsiderationtoenableresource-awaremulti-tenanton-devicedeeplearning
formobilevisionsystems.
Itreplaces˝xedresource-accuracytrade-o˙swith˛exibleresource-accuracytrade-o˙s,
anddynamicallyselectstheoptimalresource-accuracytrade-o˙ofthedeeplearningmodel
atruntimetotthemodel'sresourcedemandtothesystem'savailableruntimeresources.
Atitscore,NestDNNemploysa
pruningandrecovery
schemewhichtransformsano˙-
the-shelfdeeplearningmodelintoasinglecompact
multi-capacitymodel
.Themulti-capacity
modelhastwokeyfeatures.First,themulti-capacitymodeliscomprisedofasetofsub-
models.Eachsub-modelhasauniquecapacitytoprovideanoptimizedresource-accuracy
trade-o˙.Second,unliketraditionalmodelvariantsthatareindependentofeachother,the
sub-modelwithsmallercapacity
shares
itsmodelarchitectureanditsmodelparameterswith
thesub-modelwithlargercapacity,makingitself
nested
insidethesub-modelwithlarger
capacitywithouttakingextramemoryspace.Bydoingso,themulti-capacitymodelisable
toprovidevariousresource-accuracytrade-o˙swithacompactmemoryfootprint.
Thecreationofmulti-capacitymodelenablesNestDNNtojointlymaximizetheperfor-
manceofconcurrentvisionapplicationsrunningonmobilevisionsystems.Thispossibility
comesfromtwokey
insights
.First,whileacertainamountofruntimeresourcescanbe
tradedforanaccuracygaininsomevisionapplication,thesameamountofruntimere-
sourcescanbetradedforamuchlargeraccuracygaininsomeothervisionapplication.
Second,somevisionapplicationdoesnotrequirereal-timeresponseandthuscantoleratea
relativelylargeinferencelatency(e.g.,sceneunderstanding).Thispresentsanopportunity
toreallocatesomeruntimeresourcesfromthelatency-tolerantvisionapplicationtovision
applicationsthatneedmoreruntimeresourcestomeetthereal-timerequirement(e.g.,road
signidenti˝cation).NestDNNexploitstheseinsightsbyincorporatingtheaccuracyandin-
8
ferencelatencyintoa
costfunction
foreachvisionapplication.Giventhecostfunctionsof
alltheconcurrentlyrunningvisionapplications,NestDNNemploysa
runtimescheduler
that
selectsthemostsuitablesub-modelforeachapplicationanddeterminestheoptimalamount
ofruntimeresourcestoallocatetoeachselectedsub-modeltojointlymaximizetheaccuracy
andminimizetheinferencelatencyofconcurrentvisionapplications.
WehaveconductedarichsetofexperimentstoevaluatetheperformanceofNestDNN.To
examinetheperformanceofthemulti-capacitymodel,weevaluateditonsixmobilevision
applicationsthattargetsomeofthemostimportantcomputervisiontasksformobilevision
systems.Theseapplicationsaredevelopedbasedontwowidelyuseddeeplearningmodels
VGGNetandResNetandsixcommonlyuseddatasetsfromcomputervisioncommunity.
Toexaminetheperformanceofruntimescheduling,weimplementedNestDNNandthesix
mobilevisionapplicationsonthreesmartphones.Thedetailedcontributionsandimportant
resultsaresummarizedasfollows:

Multi-CapacityModel.
Wehavedesignedapruningandrecoveryschemetotransform
ano˙-the-shelfdeeplearningmodelintoasinglecompactmulti-capacitymodelthatis
abletoprovideoptimizedresource-accuracytrade-o˙s.Thepruningandrecoveryscheme
consistsofa
modelpruningphase
anda
modelrecoveryphase
.Formodelpruning,we
havedevisedastate-of-the-art
˝lterpruning
approachnamed
TripletResponseResidual
(TRR)thatsigni˝cantreducesnotonlythesizeofadeeplearningmodelbutalsoits
computationalcost.Formodelrecovery,wehavedevisedaninnovative
modelfreezing
and˝ltergrowing
(i.e.,
freeze-&-grow
)approachthatgeneratesthemulti-capacitymodel
inaniterativemanner.OurexperimentalresultsshowthattheNestDNNmulti-capacity
modelisabletoprovideoptimizedresource-accuracytrade-o˙s,andsigni˝cantlyreduce
modelmemoryfootprintaswellasmodelswitchingoverhead.
9

Resource-AwareScheduler.
Wehavedesignedaruntimeschedulerthatjointlymaxi-
mizestheaccuracyandminimizestheinferencelatencyofconcurrentvisionapplications
runningonmobilevisionsystems.Theavailableruntimeresourcesinmobilevisionsys-
temsaredynamicandcanbechangedduetoeventssuchasstartingnewapplications,
closingexistingapplications,andapplicationprioritychanges.Theschedulerisableto
selectthemostappropriateaccuracy-resourcetrade-o˙foreachoftheconcurrentvi-
sionapplicationsanddeterminetheamountofruntimeresourcestoallocatetoeach
applicationtooptimizetheschedulingobjective.Ourexperimentalresultsshowthat
theNestDNNruntimescheduleroutperformsthenon-resource-awarecounterpartontwo
state-of-the-artschedulingschemes,achievingasmuchas
4
:
2%
increaseonaccuracy,
2
:
0

increaseonframerateand
1
:
7

reductiononenergyconsumptionwhenrunning
concurrentmobilevisionapplications.
Tothebestofourknowledge,NestDNNrepresentsthe˝rstframeworkthatenables
resource-awaremulti-tenanton-devicedeeplearningforcontinuousmobilevision.Itcon-
tributesnoveltechniquesthataddresstheuniquechallengesincontinuousmobilevision.
Webelievethatourworkrepresentsasigni˝cantsteptoturningtheenvisionedcontinuous
mobilevisionintoreality[13,14,34].
3.2ChallengesandOurSolutions
ThedesignofNestDNNpresentsanumberofchallenges.Inthissection,wedescribethese
challengesfollowedbyexplaininghowtheycanbee˙ectivelyaddressedbyNestDNN.
ComputationalIntensityofDeepLearningModels
.ThefoundationofNestDNNis
thegenerationofamulti-capacitymodelfromano˙-the-shelfdeeplearningmodel.The
10
Figure3.1:
NestDNNarchitecture.
multi-capacitymodelneedstobecompactandcomputationallylightweightsuchthatitis
abletoe˚cientlyrunonresource-limitedmobilesystems.However,deeplearningmodels
areknowntobememoryandcomputationalintensive[35,36].Runningonedeeplearning
modelissometimeschallengingformobilesystems,letalonerunningmultipleofthemconcur-
rently.Toaddressthischallenge,weproposeastate-of-the-artdeeplearningmodelpruning
approachnamedTripletResponseResidual(TRR).Mostwidelyusedpruningapproaches
focusonpruningmodelparameters[9,37].Althoughpruningparameterscansigni˝cantly
reducemodelsize,itdoesnotnecessarilyreducecomputationalcostintermsof˛oating
pointoperations(i.e.,FLOP)[1,38,39],makingitlessusefulformobilesystemswhichneed
toprovidereal-timeservices.Incontrast,ourapproachfocusesonpruning˝lters,which
e˙ectivelyreducesthesizeofadeeplearningmodelanditscomputationalcost.
LargeAmountofModelVariants
.OnekeyfeatureofNestDNNistheprovisionof˛ex-
ibletrade-o˙sbetweenaccuracyandresourceuse.Toachievethisresource-aware˛exibility,
onenaiveapproachistohaveallthepossiblemodelvariantswithvariousresource-accuracy
trade-o˙sinstalledinthemobilesystem.However,sincethesemodelvariantsare
indepen-
dent
ofeachother,thisapproachisnotscalableandbecomesinfeasiblewhenthemobile
systemconcurrentlyrunsmultipledeeplearningmodels,witheachofwhichhavingmul-
tiplemodelvariants.Toaddressthischallenge,weproposeaninnovativefreeze-&-grow
11
approachthatusesthe˝lterpruningroadmapgeneratedduringthemodelpruningphase
astheguidelinetoproduceasinglemulti-capacitymodelwithallmodelvariantswithinit-
self.Moreimportantly,unliketraditionalmodelvariantsthatareindependentofeachother,
thesemodelvariantssharemodelarchitectureandparameterswitheachother,makingthe
multi-capacitymodelhavingacompactmemoryfootprint.
LackofResource-AwareScheduler
.Theavailableruntimeresourcesinmobilevision
systemsaredynamicandarechangedduetoeventssuchasstartingnewapplications,closing
existingapplications,andapplicationprioritychanges.Assuch,usingamodelwitha˝xed
resource-accuracytrade-o˙foravisionapplicationwillmaketheapplicationunderperformed
whenextraresourcesbecomeavailable.Toaddressthisproblem,weproposearuntime
schedulerthatisresource-aware.Onceruntimeresourcesbecomeavailable,thescheduler
determineswhichresource-accuracytrade-o˙tobeadoptedforeachapplicationandhow
manyruntimeresourcesshouldbeallocatedforeachapplication.Assuch,runtimeresources
arebestutilizedtomaximizetheperformanceofallthevisionapplicationsconcurrently
runningonamobilevisionsystem.
3.3NestDNNOverview
Figure3.1illustratesthearchitectureofNestDNN,whichissplitintoan
o˜inestage
and
an
onlinestage
.
Theo˜inestageconsistsofthreephases:modelpruningmodelrecovery
andmodelpro˝ling.
Inthemodelpruningphase,byfollowingthe˝lterimportancerankingprovidedbyour
TRRapproach,˝ltersinagiventrainedo˙-the-shelfdeeplearningmodel(i.e.,
vanillamodel
)
12
isiterativelypruned.Duringeachiteration,lessimportant˝ltersarepruned,andthepruned
modelisretrainedtocompensatetheaccuracydegradation(ifthereisany)causedby˝lter
pruning.Theiterationendswhentheprunedmodelcouldnotmeettheminimumaccuracy
requirementsetbytheuser.Thissmallestprunedmodeliscalled
seedmodel
.Asaresult,
a
˝lterpruningroadmap
iscreatedwhereeachfootprintontheroadmapisaprunedmodel
withits˝lterpruningrecord.This˝lterpruningroadmapincludingtheseedmodelispassed
tothemodelrecoveryphase.
Inthemodelrecoveryphase,byfollowingthe˝lterpruningroadmapandthefreeze-&-
growapproach,the
multi-capacitymodel
ofthevanillamodelisiterativelygenerated.Model
recoveryusestheseedmodelasthestartingpoint.Duringeachiteration,thearchitecture
andparametersofthemodelis˝rstfrozen.Byfollowingthe˝lterpruningroadmapinthe
reverseorder
andaddingthepruned˝ltersback,a
descendantmodel
withalargercapacity
isgenerated.Accuracyisregainedbyretrainingthedescendantmodel.Byrepeatingthe
iteration,anewdescendantmodelisgrownuponthepreviousdescendantmodel.Asaresult,
the˝naldescendantmodelhasthecapacitiesofallthepreviousdescendantmodelsandis
thusnamedmulti-capacitymodel.
Inthemodelpro˝lingphase,giventhesystemsspecsofamobilevisionsystem,amodel
pro˝leisgeneratedforthemulti-capacitymodelincludingtheaccuracy,memoryfootprint,
andprocessinglatencyofeachofitscapacities.
Finally,intheonlinestage,theNestDNNschedulercontinuouslymonitorsevents
includingchangesinavailableruntimeresources,startingnewapplications,closingexisting
applications,andchangesinapplicationpriority,andistriggeredoncesucheventisoccurred.
Oncetriggered,theschedulerexaminesthemodelpro˝lesofallrunningvisionapplications,
selectstheappropriatedescendantmodelforeachapplication,anddeterminestheamount
13
Terminology
Explanation
VanillaModel
O˙-the-shelfdeeplearningmodel(e.g.,
ResNet)trainedonagivendataset(e.g.,
ImageNet).
PrunedModel
Intermediateresultobtainedduringmodel
pruning.
SeedModel
Smallestprunedmodelgeneratedduring
modelpruningthatmeetstheminimumac-
curacyrequirementsetbytheuser.Itis
alsothestartingpointofmodelrecovery.
DescendantModel
Asub-modelgrownupontheSeedModel
duringmodelrecovery.Ithasaunique
resource-accuracycapacity.
Multi-CapacityModel
ThelastlygeneratedDescendantModel
thathasthecapacitiesofallthepreviously
generateddescendantmodelsbutnestedin
asinglemodel.
Table3.1:
De˝nedterminologiesandtheirexplanations.
ofruntimeresourcestoallocatetoeachselecteddescendantmodeltojointlymaximizethe
accuracyandminimizetheinferencelatencyofthoseapplications.
Forclari˝cationpurpose,Table3.1summarizestheterminologiesde˝nedinthiswork
andtheirbriefexplanations.Inthenextsection,wedescribeNestDNNindetail.
3.4DesignofNestDNN
3.4.1FilterbasedModelPruning
3.4.1.1BackgroundonCNNArchitecture
Beforedelvingdeepinto˝lterpruning,itisimportanttounderstandthearchitectureof
aconvolutionalneuralnetwork(CNN).Ingeneral,aCNNconsistsoffourtypesoflayers:
convolutionallayers,activationlayers,poolinglayers,andfully-connectedlayers.Dueto
thecomputationalintensityofconvolutionoperations,convolutionallayersarethemost
computationalintensivelayersamongthefourtypesoflayers.Speci˝cally,eachconvolutional
14
layeriscomposedofasetof
3D˝lters
,whichplaystheroleofBy
convolvinganimagewiththese3D˝lters,itgeneratesasetoffeaturesorganizedinthe
formof
featuremaps
,whicharefurthersenttothefollowingconvolutionallayersforfurther
featureextraction.
3.4.1.2Bene˝tsofFilterPruning
Figure4.6illustratesthedetailsof˝lterpruning.Let

j

1
2
R
w
j

1

h
j

1

m
j

1
denote
theinputfeaturemapsofthe
j
thconvolutionallayer
conv
j
ofaCNN,where
w
j

1
and
h
j

1
arethewidthandheightofeachoftheinputfeaturemaps;and
m
j

1
isthetotalnumber
oftheinputfeaturemaps.Theconvolutionallayer
conv
j
consistsof
m
j
3D˝lterswithsize
k

k

m
j

1
(
k

k
isthe2Dkernel).Itappliesthese˝ltersontotheinputfeaturemaps

j

1
togeneratetheoutputfeaturemaps

j
2
R
w
j

h
j

m
j
,whereone3D˝ltergenerates
oneoutputfeaturemap.Thisprocessinvolvesatotalof
m
j
k
2
m
j

1
w
j
h
j
˛oatingpoint
operations(i.e.,FLOP).
Sinceone3D˝ltergeneratesoneoutputfeaturemap,pruningone3D˝lterin
conv
j
(markedingreenin
conv
j
)resultsinremovingoneoutputfeaturemapin

j
(markedingreen
in

j
),whichleadsto
k
2
m
j

1
parameterand
k
2
m
j

1
w
j
h
j
FLOPreduction.Subsequently,
m
j
+1
2Dkernelsappliedontothatremovedoutputfeaturemapintheconvolutionallayer
conv
j
+1
(markedingreenin
conv
j
+1
)arealsoremoved.Thisleadstoanadditional
k
2
m
j
+1
parameterand
k
2
m
j
+1
w
j
+1
h
j
+1
FLOPreduction.Therefore,bypruning˝lters,bothmodel
size(i.e.,modelparameters)andcomputationalcost(i.e.,FLOP)arereduced[1].
15
Figure3.2:
Illustrationof˝lterpruning[1].Bypruning˝lters,bothmodelsizeand
computationalcostarereduced.
3.4.1.3FilterImportanceRanking
Thekeyto˝lterpruningisidentifyinglessimportant˝lters.Bypruningthose˝lters,the
sizeandcomputationalcostofaCNNmodelcanbee˙ectivelyreduced.
Tothisend,weproposea˝lterimportancerankingapproachnamed
TripletResponse
Residual
(TRR)tomeasuretheimportanceof˝ltersandrank˝ltersbasedontheirrelative
importance.OurTRRapproachisinspiredbyone
keyintuition
:sincea˝lterplaystherole
of
a˝lterisimportantifitisabletoextractfeaturemapsthatareuseful
todi˙erentiateimagesbelongingtodi˙erentclasses
.Inotherwords,a˝lterisimportantif
thefeaturemapsitextractsfromimagesbelongingtothesameclassaremoresimilarthan
theonesextractedfromimagesbelongingtodi˙erentclasses.
Let{
anc
,
pos
,
neg
}denoteatripletthatconsistsofananchorimage(
anc
),apositive
image(
pos
),andanegativeimage(
neg
)wheretheanchorimageandthepositiveimageare
fromthesameclass,whilethenegativeimageisfromadi˙erentclass.Byfollowingthekey
intuition,TRRof˝lter
i
isde˝nedas:
TRR
i
=
X
(
kF
i
(
anc
)
F
i
(
neg
)
k
2
2
kF
i
(
anc
)
F
i
(
pos
)
k
2
2
)
(3.1)
where
F
(

)
denotesthegeneratedfeaturemap.Essentially,TRRcalculatesthe
L
2
distances
16
Figure3.3:
Filterimportancepro˝lingperformanceof(a)TRRand(b)
L
1
-normonVGG-16
trainedonCIFAR-10.
offeaturemapsbetween(
anc
,
neg
)andbetween(
anc
,
pos
),andmeasurestheresidual
betweenthetwodistances.Bysumminguptheresidualsofallthetripletsfromthetraining
dataset,thevalueofTRRofaparticular˝lterre˛ectsitscapabilityofdi˙erentiatingimages
belongingtodi˙erentclasses,actingasameasureofimportanceofthe˝lterwithintheCNN
model.
3.4.1.4PerformanceofFilterImportanceRanking
Figure3.3(a)illustratesthe˝lterimportancepro˝lingperformanceofourTRRapproach
onVGG-16[36]trainedontheCIFAR-10dataset[40].ThevanillaVGG-16modelcontains
13convolutionallayers.Eachofthe13curvesinthe˝guredepictsthetop-1testaccuracies
when˝ltersofoneparticularconvolutionallayerareprunedwhiletheotherconvolutional
layersremainunmodi˝ed.Eachmarkeronthecurvecorrespondstothetop-1testaccuracy
whenaparticularpercentageof˝ltersispruned.Asanexample,thetopmostcurve(blue
dottedlinewithbluetrianglemarkers)showstheaccuraciesare89.75%,89.72%and87.40%
17
when0%(i.e.,vanillamodel),50%and90%ofthe˝ltersinthe13thconvolutionallayer
conv
13
arepruned,respectively.
Wehavetwokeyobservationsfromthe˝lterimportancepro˝lingresult.First,we
observethatourTRRapproachisabletoe˙ectivelyidentifyredundant˝lterswithineach
convolutionallayer.Inparticular,theaccuracyremainsthesamewhen59.96%ofthe˝ltersin
conv
13
arepruned.Thisindicatesthatthesepruned˝lters,identi˝edbyTRR,areredundant.
Bypruningtheseredundant˝lters,thevanillaVGG-16modelcanbee˙ectivelycompressed
withoutanyaccuracydegradation.Second,weobservethatourTRRapproachisableto
e˙ectivelyidentifyconvolutionallayersthataremoresensitiveto˝lterpruning.Thisis
re˛ectedbythedi˙erencesinaccuracydropswhenthesamepercentageof˝ltersarepruned
atdi˙erentconvolutionallayers.Thissensitivitydi˙erenceacrossconvolutionallayershas
beentakenintoaccountintheiterative˝lterpruningprocess.
TodemonstratethesuperiorityofourTRRapproach,wehavecompareditwiththe
state-of-the-art˝lterpruningapproach.Thestate-of-the-art˝lterpruningapproachuses
L
1
-
normofa˝ltertomeasureitsimportance[1].Figure3.3(b)illustratesthe˝lterimportance
pro˝lingperformanceof
L
1
-normonthesamevanillaVGG-16modeltrainedontheCIFAR-
10dataset.BycomparingFigure3.3(a)toFigure3.3(b),weobservethatTRRachieves
betteraccuracythan
L
1
-normatalmosteverypruned˝lterpercentageacrossall13curves.
Asaconcreteexample,TRRachievesanaccuracyof89.72%and87.40%when50%and90%
ofthe˝ltersat
conv
13
areprunedrespectively,while
L
1
-normonlyachievesanaccuracyof
75.45%and42.65%correspondingly.Thisresultindicatesthatthe˝ltersprunedbyTRR
havemuchlessimpactonaccuracythantheonesprunedby
L
1
-norm,demonstratingthat
TRRoutperforms
L
1
-normatidentifyinglessimportant˝lters.
18
3.4.1.5FilterPruningRoadmap
Byfollowingthe˝lterimportancerankingprovidedbyTRR,weiterativelyprunethe˝lters
inaCNNmodel.Duringeachiteration,lessimportant˝ltersacrossconvolutionallayersare
pruned,andtheprunedmodelisretrainedtocompensatetheaccuracydegradation(ifthere
isany)causedby˝lterpruning.Theiterationendswhentheprunedmodelcouldnotmeet
theminimumaccuracygoalsetbytheuser.Asaresult,a
˝lterpruningroadmap
iscreated
whereeachfootprintontheroadmapisaprunedmodelwithits˝lterpruningrecord.The
smallestprunedmodelontheroadmapiscalled
seedmodel
.This˝lterpruningroadmapis
usedtoguidethemodelrecoveryprocessdescribedbelow.
3.4.2Freeze-&-GrowbasedModelRecovery
3.4.2.1MotivationandKeyIdea
The˝lterpruningprocessgeneratesaseriesofprunedmodels,eachofwhichactingasa
modelvariantofthevanillamodelwithauniqueresource-accuracytrade-o˙.However,due
totheretrainingstepwithineachpruningiteration,theseprunedmodelshavedi˙erent
modelparameters,andthusareindependentfromeachother.Therefore,althoughthese
prunedmodelsprovidedi˙erentresource-accuracytrade-o˙s,keepingallofthemlocallyin
resource-limitedmobilesystemsispracticallyinfeasible.
Toaddressthisproblem,weproposetogenerateasingle
multi-capacitymodel
thatacts
equivalentlyastheseriesofprunedmodelstoprovidevariousresource-accuracytrade-o˙s
buthasamodelsizethatismuchsmallerthantheaccumulatedmodelsizeofallthepruned
models.Thisisachievedbyaninnovative
modelfreezingand˝ltergrowing
(i.e.,
freeze-&-
grow
)approach.
19
Figure3.4:
Illustrationofmodelfreezingand˝ltergrowing.
Intheremainderofthissection,wedescribethedetailsofthefreeze-&-growapproach
andhowthemulti-capacitymodelisiterativelygenerated.
3.4.2.2ModelFreezingandFilterGrowing
Thegenerationofthemulti-capacitymodelstartsfromtheseedmodelderivedfromthe˝lter
pruningprocess.Byfollowingthe˝lterpruningroadmapandthefreeze-&-growapproach,
themulti-capacitymodelisiterativelycreated.
Figure3.4illustratesthedetailsofmodelfreezingand˝ltergrowingduringthe˝rst
iteration.Forillustrationpurpose,onlyoneconvolutionallayerisdepicted.Asshown,given
theseedmodel,we˝rstapply
modelfreezing
tofreezetheparametersofallits˝lters(marked
asbluesquares).Next,sinceeachfootprintontheroadmaphasits˝lterpruningrecord,we
followthe˝lterpruningroadmapinthe
reverseorder
andapply
˝ltergrowing
toaddthe
pruned˝ltersback(markedasgreenstripesquares).Withtheadded˝lters,thecapacity
ofthis
descendantmodel
isincreased.Lastly,weretrainthisdescendantmodeltoregain
accuracy.Itisimportanttonotethatduringretraining,sincetheseedmodelisfrozen,its
parametersarenotchanged;onlytheparametersoftheadded˝ltersarechanged(marked
asgreensolidsquarestoindicatetheparametersarechanged).Assuch,wehavegenerated
asinglemodelthatnotonlyhasthecapacityoftheseedmodelbutalsohasthecapacityof
20
thedescendantmodel.Moreover,theseedmodel
shares
allitsmodelparameterswiththe
descendantmodel,makingitself
nested
insidethedescendantmodelwithouttakingextra
memoryspace.
Byrepeatingtheiteration,anewdescendantmodelisgrownuponthepreviousone.As
such,the˝naldescendantmodelhasthecapacitiesofallthepreviousdescendantmodels
andisthusnamed
multi-capacitymodel
.
3.4.2.3SuperiorityofMulti-CapacityModel
Thegeneratedmulti-capacitymodelhasthefollowingthreekeyadvantages.
OneCompactModelwithMultipleCapabilities
.Thegeneratedmulti-capacitymodel
isabletoprovidemultiplecapacitiesnestedinasinglemodel.Thiseliminatestheneedof
installingpotentiallyalargenumberofindependentmodelvariantswithdi˙erentcapacities.
Moreover,bysharingparametersamongdescendantmodels,themulti-capacitymodelisable
tosavealargeamountofmemoryspacetosigni˝cantlyreduceitsmemoryfootprint.
OptimizedResource-AccuracyTrade-o˙s
.Eachcapacityprovidedbythemulti-
capacitymodelhasauniqueoptimizedresource-accuracytrade-o˙.OurTRRapproach
isabletoprovidestate-of-the-artperformanceatidentifyingandpruninglessimportant˝l-
ters.Asaresult,themulti-capacitymodeldeliversstate-of-the-artinferenceaccuracyunder
agivenresourcebudget.
E˚cientModelSwitching
.Becauseofparametersharing,themulti-capacitymodelis
abletoswitchmodelswithlittleoverhead.Switchingindependentdeeplearningmodels
causessigni˝cantoverhead.Thisisbecauseitrequirestopageinandpageoutthe
entire
deeplearningmodels.Multi-capacitymodelalleviatesthisprobleminanelegantmannerby
onlyrequiringtopageinandpageoutaverysmallportionofdeeplearningmodels.
21
Figure3.5:
Illustrationofmodelswitching(modelupgradevs.modeldowngrade)of
multi-capacitymodel.
Figure3.5illustratesthedetailsofmodelswitchingofmulti-capacitymodel.Forillus-
trationpurpose,onlyoneconvolutionallayerisdepicted.Asshown,sinceeachdescendant
modelisgrownuponitspreviousdescendantmodels,whenthemulti-capacitymodelis
switchingtoadescendantmodelwithlargercapability(i.e.,
modelupgrade
),itincurs
zero
page-outoverhead
,andonlyneedstopageintheextra˝ltersincludedinthedescendant
modelwithlargercapability(markedasgreensquares).Whenthemulti-capacitymodelis
switchingtoadescendantmodelwithsmallercapability(i.e.,
modeldowngrade
),itincurs
zeropage-inoverhead
,andonlyneedstopageoutthe˝ltersthatthedescendantmodelwith
smallercapabilitydoesnothave(markedasgraysquares).Asaresult,themulti-capacity
modelsigni˝cantlyreducestheoverheadofmodelpageinandpageout,makingmodel
switchingextremelye˚cient.
3.4.3Resource-AwareScheduler
3.4.3.1MotivationandKeyIdea
Thecreationofthemulti-capacitymodelenablesNestDNNtojointlymaximizetheperfor-
manceofvisionapplicationsthatareconcurrentlyrunningonamobilevisionsystem.This
possibilitycomesfromtwo
keyinsights
.First,whileacertainamountofruntimeresources
canbetradedforanaccuracygaininsomeapplication,thesameamountofruntimeresources
22
maybetradedfora
larger
accuracygaininsomeotherapplication.Second,forapplications
thatdonotneedreal-timeresponseandthuscantoleratearelativelylargeprocessingla-
tency,wecan
reallocate
someruntimeresourcesfromthoselatency-tolerantapplicationsto
otherapplicationsthatneedmoreruntimeresourcestomeettheirreal-timegoals.NestDNN
exploitsthesetwokeyinsightsbyencodingtheinferenceaccuracyandprocessinglatencyinto
a
costfunction
foreachvisionapplication,whichservesasthefoundationforresource-aware
scheduling.
3.4.3.2CostFunction
Let
V
denotethesetofvisionapplicationsthatareconcurrentlyrunningonamobilevision
system,andlet
A
min
(
v
)
and
L
max
(
v
)
denotetheminimuminferenceaccuracyandthe
maximumprocessinglatencygoalssetbytheuserforapplication
v
2
V
.Additionally,
let
M
v
denotethemulti-capacitymodelgeneratedforapplication
v
,andlet
m
v
denotea
descendantmodel
m
v
2
M
v
.Thecostfunctionofthedescendantmodel
m
v
forapplication
v
isde˝nedasfollows:
C
(
m
v
;u
v
;v
)=max(0
;A
min
(
v
)

A
(
m
v
))+


max(0
;
L
(
m
v
)
u
v

L
max
(
v
))
(3.2)
where
A
(
m
v
)
istheinferenceaccuracyof
m
v
,
u
v
2
(0
;
1]
isthecomputingresourcepercentage
allocatedto
v
,and
L
(
m
v
)
istheprocessinglatencyof
m
v
when100%computingresources
areallocatedto
v
.Thevaluesof
A
(
m
v
)
and
L
(
m
v
)
areobtainedviapro˝lingatthemulti-
capacitymodeldevelopmentstage.
Essentially,the˝rstterminthecostfunctionrepresentsthepenaltyforselectinga
descendantmodel
m
v
thathasaninferenceaccuracylowerthantheminimumaccuracy
23
goal.Thesecondterminthecostfunctionrepresentsthepenaltyforselectingadescendant
model
m
v
thathasaprocessinglatencyhigherthanthemaximumprocessinglatencygoal.

2
[0
;
1]
isaknobsetbytheusertodeterminethelatency-accuracytrade-o˙preference.
Alarge

weightsmoreonthepenaltyforlatencywhileasmall

favorshigheraccuracy.
3.4.3.3SchedulingSchemes
Giventhecostfunctionofeachdescendantmodelofeachconcurrentlyrunningapplication,
theresource-awareschedulerincorporatestwowidelyusedschedulingschemestojointly
maximizetheperformanceofconcurrentvisionapplicationsfortwodi˙erentoptimization
objectives.
MinTotalCost
.TheMinTotalCost(i.e.,minimizethetotalcost)schedulingschemeaims
tominimizethetotalcostofallconcurrentapplications.Thisoptimizationproblemcanbe
formulatedasfollows:
min
u
v
;m
v
2
M
v
X
v
2
V
C
(
m
v
;u
v
;v
)
(3.3)
s:t:
X
v
2
V
S
(
m
v
)

S
max
;
X
v
2
V
u
v

1
where
S
(
m
v
)
denotestheruntimememoryfootprintofthedescendantmodel
m
v
.Thetotal
memoryfootprintofalltheconcurrentapplicationscannotexceedthemaximummemory
spaceofthemobilevisionsystemdenotedas
S
max
.
UndertheMinTotalCostschedulingscheme,theresource-awareschedulerfavorsappli-
cationswithlowercostsandthusisoptimizedtoallocatemoreruntimeresourcestothem.
MinMaxCost
.TheMinMaxCost(i.e.,minimizethemaximumcost)schedulingscheme
24
aimstominimizethecostoftheapplicationthathasthehighestcost.Thisoptimization
problemcanbeformulatedasfollows:
min
u
v
;m
v
2
M
v
k
(3.4)
s:t:
8
v
:
C
(
m
v
;u
v
;v
)

k;
X
v
2
V
S
(
m
v
)

S
max
;
X
v
2
V
u
v

1
wherethecostofanyoftheconcurrentlyrunningapplicationsmustbesmallerthan
k
where
k
isminimized.
UndertheMinMaxCostschedulingscheme,theresource-awareschedulerisoptimizedto
fairlyallocateruntimeresourcestoalltheconcurrentapplicationstobalancetheirperfor-
mance.
3.4.3.4CachedGreedyHeuristicApproximation
SolvingthenonlinearoptimizationproblemsinvolvedinMinTotalCostandMinMaxCost
schedulingschemesiscomputationallyhard.Toenablereal-timeonlineschedulinginmobile
systems,weutilizeagreedyheuristicinspiredby[29]toobtainapproximatesolutions.
Speci˝cally,wede˝neaminimumindivisibleruntimeresourceunit

u
(e.g.,
1%
ofthe
totalcomputingresourcesinamobilevisionsystem)andstartallocatingthecomputing
resourcesfromscratch.ForMinTotalCost,weallocate

u
tothedescendentmodel
m
v
of
application
v
suchthat
C
(
m
v
;

u;v
)
hasthesmallestcostincreaseamongotherconcurrent
applications.ForMinMaxCost,weselectapplication
v
withthehighestcost
C
(
m
v
;u
v
;v
)
,
25
Type
Dataset
DNNModel
MobileVisionApplication
Generic
Category
CIFAR-10
VGG-16
VC
ImageNet-50
ResNet-50
RI-50
ImageNet-100
ResNet-50
RI-100
Class
Speci˝c
GTSRB
VGG-16
VS
Adience-Gender
VGG-16
VG
Places-32
ResNet-50
RP
Table3.2:
Summaryofdatasets,DNNmodels,andmobilevisionapplicationsusedinthiswork.
andallocate

u
to
v
andchoosetheoptimaldescendentmodel
m
v
=argmin
m
v
C
(
m
v
;u
v
;v
)
for
v
.ForbothMinTotalCostandMinMaxCost,theruntimeresourcesareiterativelyallo-
cateduntilexhausted.
Theruntimeofexecutingthegreedyheuristiccanbefurthershortenedviathecaching
technique.Thisisparticularlyattractivetomobilesystemswithverylimitedresources.
Speci˝cally,whenallocatingthecomputingresources,insteadofstartingfromscratch,we
startfromthepointwhereacertainamountofcomputingresourceshasalreadybeenallo-
cated.Forexample,wecancachetheun˝nishedrunningschemewhere
70%
ofthecomputing
resourceshavebeenallocatedduringoptimization.Inthenextoptimizationiteration,we
directlystartfromtheun˝nishedrunningschemeandallocatetheremaining
30%
computing
resources,thussaving
70%
oftheoptimizationtime.Topreventfromfallingintoalocal
minimumovertime,acompleteexecutionofthegreedyheuristicisperformedperiodically
toenforcethecachedsolutiontobeclosetotheoptimalone.
26
3.5Evaluation
3.5.1Datasets,DNNsandApplications
3.5.1.1Datasets
ToevaluatethegeneralizationcapabilityofNestDNNondi˙erentvisiontasks,weselecttwo
typesoftasksthatareamongthemostimportanttasksformobilevisionsystems.
Generic-CategoryObjectRecognition.
Thistypeofvisiontasksaimstorecognizethe
genericcategoryofanobject(e.g.,aroadsign,aperson,oranindoorplace).Without
lossofgenerality,weselect3commonlyusedcomputervisiondatasets,eachcontaininga
small,amedium,andalargenumberofobjectcategoriesrespectively,representinganeasy,
amoderate,andadi˚cultvisiontaskcorrespondingly.

CIFAR-10
[40].Thisdatasetcontains50Ktrainingimagesand10Ktestingimages
belongingto10genericcategoriesofobjects.

ImageNet-50
[41].ThisdatasetisasubsetoftheILSVRCImageNet.Itcontains63K
trainingimagesand2Ktestingimagesbelongingtotop50mostpopularobjectcategories
basedonthepopularityrankingprovidedbytheo˚cialImageNetwebsite.

ImageNet-100
[41].SimilartoImageNet-50,thisdatasetisasubsetoftheILSVRC
ImageNet.Itcontains121Ktrainingimagesand5Ktestingimagesbelongingtotop100
mostpopularobjectcategoriesbasedonthepopularityrankingprovidedbytheo˚cial
ImageNetwebsite.
Class-Speci˝cObjectRecognition.
Thistypeofvisiontasksaimstorecognizethe
speci˝cclassofanobjectwithinagenericcategory(e.g.,astopsign,afemaleperson,ora
kitchen).Withoutlossofgenerality,weselect3objectcategories:1)roadsigns,2)people,
27
and3)places,whicharecommonlyseeninmobilesettings.

GTSRB
[42].Thisdatasetcontainsover50Kimagesbelongingto43classesofroad
signssuchasspeedlimitsignsandstopsign.

Adience-Gender
[43].Thisdatasetcontainsover14Kimagesofhumanfacesoftwo
genders.

Places-32
[44].ThisdatasetisasubsetofthePlaces365-Standarddataset.Places365-
Standardcontains1.8millionimagesof365sceneclassesbelongingto16higher-level
categories.Weselecttworepresentativesceneclasses(e.g.,parkinglotandkitchen)from
eachofthe16higher-levelcategoriesandobtaina32-classdatasetthatincludesover
158Kimages.
3.5.1.2DNNModels
ToevaluatethegeneralizationcapabilityofNestDNNondi˙erentDNNmodels,weselect
tworepresentativeDNNmodels:1)VGG-16and2)ResNet-50.VGG-16[36]isconsidered
asoneofthemoststraightforwardDNNmodelstoimplement,andthusgainsconsiderable
popularityinbothacademiaandindustry.ResNet-50[35]isconsideredasoneofthetop-
performingDNNmodelsincomputervisionduetoitssuperiorrecognitionaccuracy.
3.5.1.3MobileVisionApplications
Withoutlossofgenerality,werandomlyassignCIFAR-10,GTSRBandAdience-Genderto
VGG-16;andassignImageNet-50,ImageNet-100andPlaces-32toResNet-50tocreatesix
mobilevisionapplicationslabeledas
VC
(i.e.,VGG-16trainedontheCIFAR-10dataset),
RI-50
,
RI-100
,
VS
,
VG
,and
RP
,respectively.WetrainandtestallthevanillaDNNmodels
andallthedescendantmodelsgeneratedbyNestDNNbystrictlyfollowingtheprotocol
28
Figure3.6:
Top-1testaccuracyvs.modelsizecomparisonbetweendescendentmodelsand
baselinemodels.
providedbyeachofthesixdatasetsdescribedabove.
Thedatasets,DNNmodels,andmobilevisionapplicationsaresummarizedinTable3.2.
3.5.2PerformanceofMulti-CapacityModel
Inthissection,weevaluatetheperformanceofthegeneratedmulti-capacitymodelwiththe
goaltodemonstrateitssuperioritylistedin
3.5.2.1ExperimentalSetup
SelectionofDescendantModels.
Withoutlossofgenerality,foreachmobilevision
application,wegenerateamulti-capacitymodelthatcontains˝vedescendantmodels.These
descendantmodelsaredesignedtohavediverseresource-accuracytrade-o˙s.Weselectthese
descendantmodelswiththepurposetodemonstratethatourmulti-capacitymodelenables
applicationstorunevenwhenavailableresourcesareverylimited.Itshouldbenotedthata
multi-capacitymodelisneitherlimitedtooneparticularsetofresource-accuracytrade-o˙s
norlimitedtooneparticularnumberofdescendantmodels.NestDNNprovidesthe˛exibility
29
Figure3.7:
Computationalcostcomparisonbetweendescendantmodelsandvanillamodels.
todesignamulti-capacitymodelbasedonusers'preferences.
Baseline.
Tomakeafaircomparison,weusethesamearchitectureofdescendantmodelsfor
baselinemodelssuchthattheirmodelsizesandcomputationalcostsareidentical.Inaddi-
tion,wepre-trainedbaselinemodelsonImageNetdatasetandthen˝ne-tunedthemoneach
ofthesixdatasets.Pre-trainingisane˙ectivewaytoboostaccuracy,andthusisadoptedas
aroutineinmachinelearningcommunity[45,46].Wealsotrainedbaselinemodelswithout
pre-training,andobservedthatbaselinemodelswithpre-trainingconsistentlyoutperform
thosewithoutpre-training.Therefore,weonlyreportaccuraciesofbaselinemodelswith
pre-training.
30
3.5.2.2OptimizedResource-AccuracyTrade-o˙s
Figure3.6illustratesthecomparisonbetweendescendentmodelsandbaselinemodelsacross
sixmobilevisionapplications.Foreachapplication,weshowthetop-1testaccuraciesofboth
descendantmodelsandbaselinemodelsasafunctionofmodelsize.Forbetterillustration
purpose,thehorizontalaxisisplottedusingthelogarithmicscale.
Wehavetwokeyobservationsfromtheresult.First,weobservethatdescendantmodels
consistentlyachievehigheraccuraciesthanbaselinemodelsateverymodelsizeacrossall
thesixapplications.Onaverage,descendantmodelsachieve4.98%higheraccuracythan
baselinemodels.Thisindicatesthatourdescendantmodelateachcapacityisabletodeliver
state-of-the-artinferenceaccuracyunderagivenmemorybudget.Second,weobservethat
smallerdescendantmodelsoutperformbaselinemodelsmorethanlargerdescendantmodels.
Onaverage,thetwosmallestdescendantmodelsachieve6.68%higheraccuracywhilethetwo
largestdescendantmodelsachieve3.72%higheraccuracycomparedtotheircorresponding
baselinemodels.ThisisbecauseourTRRapproachisabletopreserveimportant˝lters
whilepruninglessimportantones.Despitehavingasmallcapacity,asmalldescendant
modelbene˝tsfromtheseimportant˝lterswhilethecorrespondingbaselinemodeldoesnot.
Figure3.7showsthecomputationalcostsof˝vedescendantmodelsandthecorrespond-
ingvanillamodelsofthesixapplicationsinGFLOPs(i.e.,GigaFLOPs).Asshown,all
descendantmodelshavelessGFLOPsthanthecorrespondingvanillamodels.Thisresult
indicatesthatour˝lterpruningapproachisabletoe˙ectivelyreducethecomputational
costsacrosssixapplications,demonstratingthegeneralizationofour˝lterpruningapproach
ondi˙erentdeeplearningmodelstrainedondi˙erentdatasets.
31
Application
Multi-Capacity
ModelSize(MB)
Accumulated
ModelSize(MB)
ReducedMemory
Footprint(MB)
VC
196.0
437.5
241.5
VS
12.9
19.8
6.9
VG
123.8
256.0
132.2
RI-50
42.4
58.1
15.7
RI-100
87.1
243.5
156.4
RP
62.4
97.1
34.7
AllIncluded
524.6
1112.0
587.4
Table3.3:
Bene˝tofmulti-capacitymodelonmemoryfootprintreduction.
3.5.2.3ReductiononMemoryFootprint
Anotherkeyfeatureofmulti-capacitymodelissharingparametersamongitsdescendant
models.Toquantifythebene˝tofparametersharingonreducingmemoryfootprint,we
comparethemodelsizeofmulti-capacitymodelwiththeaccumulatedmodelsizeofthe˝ve
descendantmodelsasiftheywereindependent.Thismimicstraditionalmodelvariantsthat
areusedinexistingmobiledeeplearningsystems.
Table3.3liststhecomparisonresultsacrossthesixmobilevisionapplications.Obviously,
themodelsizeofthemulti-capacitymodelissmallerthanthecorrespondingaccumulated
modelsizeforeachapplication.Moreover,deeplearningmodelwithlargermodelsize
bene˝tsmorefromparametersharing.Forexample,
VC
hasthelargestmodelsizeacross
thesixapplications.Withparametersharing,itachievesareducedmemoryfootprintof241.5
MB.Finally,ifweconsiderrunningallthesixapplicationsconcurrently,themulti-capacity
modelachievesareducedmemoryfootprintof587.4MB,demonstratingtheenormousbene˝t
ofmulti-capacitymodelonmemoryfootprintreduction.
32
3.5.2.4ReductiononModelSwitchingOverhead
Anotherbene˝tofparametersharingisreducingtheoverheadofmodelswitchingwhen
thesetofconcurrentapplicationschanges.Toquantifythisbene˝t,weconsiderallthe
possiblemodelswitchingcasesamongallthe˝vedescendantmodelsofeachmulti-capacity
model,andcalculatetheaveragepage-inandpage-outoverheadformodelupgradeand
modeldowngrade,respectively.Wecompareitwiththecasewheredescendantmodelsare
treatedasiftheywereindependent,whichagainmimicstraditionalmodelvariants.
Table3.4liststhecomparisonresultsofallsixmobilevisionapplicationsformodel
upgradeintermsofmemoryusage.Asexpected,theaveragepage-inandpage-outmemory
usageofindependentmodelsduringmodelswitchingislargerthanmulti-capacitymodels
foreveryapplication.Thisisbecauseduringmodelswitching,independentmodelsneed
topageinandpageoutthe
entire
modelswhilemulti-capacitymodelsonlyneedtopage
inaverysmallportionofthemodels.Itshouldbenotedthatthepage-outoverheadof
multi-capacitymodelduringmodelupgradeiszero.Thisisbecausethedescendantmodel
withsmallercapabilityispartofthedescendantmodelwithlargercapability,andthusit
doesnotneedtobepagedout.
Table3.5liststhecomparisonresultsofallsixapplicationsformodeldowngrade.Sim-
ilarresultsareobserved.Theonlydi˙erenceisthatduringmodeldowngrade,thepage-in
overheadofmulti-capacitymodeliszero.
Besidesmemoryusage,wealsoquantifythebene˝tonreducingtheoverheadofmodel
switchingintermsofenergyconsumption.Speci˝cally,wemeasuredenergyconsumedby
randomlyswitchingmodelsfor250,500,750,and1,000timesusingdescendantmodelsand
independentmodels,respectively.
33
Figure3.8:
Modelswitchingenergyconsumptioncomparisonbetweenmulti-capacitymodelsand
independentmodels.TheenergyconsumptionismeasuredonaSamsungGalaxyS8smartphone.
Figure3.8showsthecomparisonresultsacrossallsixmobilevisionapplications.As
expected,energyconsumedbyswitchingmulti-capacitymodelislowerthanswitchingin-
dependentmodelsforeveryapplication.Thisbene˝tbecomesmoreprominentwhenthe
modelsizeislarge.Forexample,thesizeofthelargestdescendantmodelof
VC
and
VS
is
196.0MBand12.9MB,respectively.Thecorrespondingenergyconsumptionreductionfor
every1,000modelswitchesis602.1Jand4.0J,respectively.
Takentogether,thegeneratedmulti-capacitymodelisabletosigni˝cantlyreducethe
overheadofmodelswitchingintermsofbothmemoryusageandenergyconsumption.The
bene˝tbecomesmoreprominentwhenmodelswitchingfrequencyincreases.Thisispartic-
ularlyimportantformemoryandbatteryconstrainedmobilesystems.
34
Application
Multi-CapacityModel
UpgradeOverhead(MB)
IndependentModels
UpgradeOverhead(MB)
Page-In
Page-Out
Page-In
Page-Out
VC
81.4
0
128.2
46.8
VS
1.3
0
1.7
0.3
VG
50.0
0
76.2
26.2
RI-50
19.2
0
21.2
2.0
RI-100
38.3
0
67.9
29.5
RP
26.4
0
34.9
4.6
Table3.4:
Bene˝tofmulti-capacitymodelonmodelswitching(modelupgrade)intermsof
memoryusage.
Application
Multi-CapacityModel
DowngradeOverhead(MB)
IndependentModels
DowngradeOverhead(MB)
Page-In
Page-Out
Page-In
Page-Out
VC
0
81.4
46.8
128.2
VS
0
1.3
0.3
1.7
VG
0
50.0
26.2
76.2
RI-50
0
19.2
2.0
21.2
RI-100
0
38.3
29.5
67.9
RP
0
26.4
4.6
34.9
Table3.5:
Bene˝tofmulti-capacitymodelonmodelswitching(modeldowngrade)intermsof
memoryusage.
3.5.3PerformanceofResource-AwareScheduler
3.5.3.1ExperimentalSetup
DeploymentPlatforms.
WeimplementedNestDNNandthesixmobilevisionapplications
onthreesmartphones:SamsungGalaxyS8,SamsungGalaxyS7,andLGNexus5,all
runningAndroidOS7.0.WeusedtheMonsoonpowermonitor[47]tomeasurethepower
consumption.Wehaveachievedconsistentexperimentalresultsacrossallthreesmartphones.
WeonlyreportthebestresultsobtainedfromGalaxyS8.
35
Baseline.
WeusethemodellocatedattheofeveryyellowcurveinFigure3.6
asourbaselineforeachmobilevisionapplication.Thisisthemodelthato˙ersthe
best
resource-accuracytrade-o˙amongallthemodelvariants.
BenchmarkDesign.
Wehavedesignedabenchmarkthatsimulatesruntimeapplication
queriesindi˙erentscenarios.Speci˝cally,ourbenchmarkcreatesanewapplicationorkillsa
runningapplicationwithcertainprobabilitiesateverysecond.Thenumberofconcurrently
runningapplicationsisfrom2to6.Atthesametime,themaximumavailablememoryto
runconcurrentapplicationsissetto
400
MB.Eachsimulationgeneratedbyourbenchmark
lastsfor60seconds.Werepeatthesimulation100timesandreporttheaverageruntime
performance.
Figure3.9illustratesthepro˝leofallaccumulatedsimulationsgeneratedbyourbench-
mark.Figure3.9(a)showsthetimedistributionofdi˙erentnumbersofconcurrentappli-
cations.Asshown,thepercentageoftimewhentwo,three,four,˝veandsixapplications
runningconcurrentlyis9.8%,12.7%,18.6%,24.5%and34.3%,respectively.Itshowsthat
ourbenchmarkhascoveredallthenumbersofconcurrentapplications.Figure3.9(b)shows
therunningtimedistributionofeachindividualapplication.Asshown,therunningtimeof
eachapplicationisevenlydistributed,indicatingourbenchmarkensuresafairtimeshare
amongallapplications.
EvaluationMetrics.
Weusethefollowingtwometricstoevaluatetheschedulingperfor-
mance.

AccuracyGain
.SincetheabsoluteTop-1accuraciesachievedbythesixvisionappli-
cationsarenotinthesamerange,weuseaccuracygainoverthebaselinewithineach
applicationasamoremeaningfulmetric.
36
Figure3.9:
Pro˝leofallaccumulatedsimulationsgeneratedbyourdesignedbenchmark.
Figure3.10:
RuntimeperformancecomparisonbetweenbaselineandNestDNNunderscheduling
scheme:(a)MinTotalCost;(b)MinMaxCost.

FrameRate
.Weuseframerateasthemetrictomeasurethereal-timeperformance
ofavisionapplication.Itisastandardmetricformobilevisionsystems.Framerateis
inverselyproportionaltoinferencelatency.Thelowertheinferencelatencyis,thehigher
theframerateis.
3.5.3.2ImprovementonAccuracyandFrameRate
Figure3.10(a)showsthecomparisonbetweenthebaselineandNestDNNundertheMinTo-
talCostschedulingscheme.Eachbluediamondmarkerrepresentstheruntimeperformance
obtainedbyschedulingwithaparticular

inthecostfunctionde˝nedinTheyellow
circlerepresentstheruntimeperformanceofthebaseline.
37
Wehavetwokeyobservationsfromourcomparisonresult.First,byadjustingthevalue
of

,NestDNNisabletoprovideavarietyoftrade-o˙sbetweenaccuracyandframerate.
Incontrast,thebaselineis˝xedando˙ersnotrade-o˙betweenaccuracyandframerate.
Second,theverticaldottedlineandthehorizontaldottedlinealtogetherpartitionthe˝gure
intofourquadrants.Theupperrightquadrantrepresentsaregionthathasbothhigher
accuracygainandhigherframeratecomparedtothebaseline.Asshown,therearemany
bluediamondmarkerslocatingatthisupperrightquadrant.Ateachofthosebluedia-
mondmarkers,NestDNNisabletoachievebothhigheraccuracygainandhigherframe
ratecomparedtothebaseline.Inparticular,weselect3ofthosebluediamondmarkers
todemonstratetheruntimeperformanceimprovementachievedbyNestDNN.Speci˝cally,
whenNestDNNhasthesameaccuracygainasthebaseline,NestDNNachieves
2
:
0

frame
rateintheunitofframepersecond(FPS)comparedtothebaseline.WhenNestDNNhas
thesameframerateasthebaseline,NestDNNachieves
4
:
1%
accuracygaincomparedto
thebaseline.Finally,weselecttheofthebluediamondcurve,whicho˙ersthebest
accuracy-frameratetrade-o˙amongallthe

.AttheNestDNNachieves
1
:
5

frame
rateand
2
:
6%
accuracygaincomparedtothebaseline.
Figure3.10(b)showsthecomparisonbetweenthebaselineandNestDNNundertheMin-
MaxCostschedulingscheme.WhenNestDNNhasthesameaccuracygainasthebaseline,
NestDNNachieves
1
:
9

frameratecomparedtothebaseline.WhenNestDNNhasthesame
framerateasthebaseline,NestDNNachieves
4
:
2%
accuracygaincomparedtothebaseline.
AttheNestDNNachieves
1
:
5

framerateand
2
:
1%
accuracygaincomparedtothe
baseline.
38
Figure3.11:
EnergyconsumptioncomparisonbetweenbaselineandNestDNNunderscheduling
scheme:(a)MinTotalCost;(b)MinMaxCost.
3.5.3.3ReductiononEnergyConsumption
Finally,weevaluatetheinferenceenergyconsumptionduringscheduling.Inthisexperiment,
weevaluateinferenceenergyconsumptionofNestDNNatthe
Figure3.11(a)showsthecomparisonbetweenthebaselineandNestDNNunderthe
MinTotalCostschedulingscheme.Acrossdi˙erentnumbersofinferences,NestDNNachieves
anaverage
1
:
7

energyconsumptionreductioncomparedtothebaseline.Similarly,Fig-
ure3.11(b)showsthecomparisonbetweenthebaselineandNestDNNundertheMinMax-
Costschedulingscheme.NestDNNachievesanaverage
1
:
5

energyconsumptionreduction
comparedtothebaseline.
3.6ConclusionofNestDNN
Inthischapter,wepresentedthedesign,implementationandevaluationofNestDNN,a
frameworkthatenablesresource-awaremulti-tenanton-devicedeeplearningformobilevision
systems.Itcontributesnoveltechniquesthataddresstheuniquechallengesofmobilevision
systems.WeevaluatedNestDNNusingsixmobilevisionapplicationsthattargetsomeof
themostimportantvisiontasksformobilevisionsystems.Ourresultsshowthatcompared
tothenon-resource-awarecounterpart,NestDNNachievesasmuchas
4
:
2%
increaseon
39
accuracy,
2
:
0

increaseonframerateand
1
:
7

reductiononenergyconsumptionwhen
runningmultiplemobilevisionapplicationsconcurrently.
40
Chapter4
Content-AdaptiveOn-DeviceDeep
Learning
4.1IntroductionofFlexDNN
Mobilevisionsystemssuchasmobilephones,drones,andaugmentedrealityheadsetsare
ubiquitoustoday.DrivenbyrecentbreakthroughinDeepNeuralNetworks(DNNs)[2]and
theemergenceofAIchipsets,state-of-the-artmobilevisionsystemsstarttouseDNN-based
processingpipelinesforon-devicevideostreamanalytics,whichactsastheenablerofawide
rangeofcontinuousmobilevisionapplications.
On-devicevideostreamanalyticsrequiresprocessingstreamingvideoframesathigh
throughputandreturningtheprocessingresultswithlowlatency.Unfortunately,DNNsare
knowntobecomputationallyexpensive[3],andhighcomputationalconsumptiondirectly
translatestohighprocessinglatencyandhighenergyconsumption.Givenmobilesystems
areconstrainedbylimitedcomputeresourcesandbatterycapacities,reducingcomputational
consumptionofDNN-basedpipelinesiscrucialtohigh-throughput,low-latency,andlow-
energyon-devicevideostreamanalytics.
Toreducecomputationalconsumption,mostexistingworkpursuesmodelcompression
techniques[9,37,48].However,modelcompressionyieldsanone-size-˝ts-allnetworkthat
requiresthesamesetoffeaturemapstobeextractedforallvideoframesagnostictothe
41
contentofeachframe.
Infact,thecomputationconsumedbyaDNN-basedprocessingpipelineisheavilyde-
pendentonthecontentofthevideoframes[20].Forvideoframeswithcontentsthatare
easytorecognize,asmalllow-capacityDNNmodelissu˚cientwhilealargehigh-capacity
DNNmodelthatconsumesmorecomputationisoverkill;ontheotherhand,forvideoframes
withcontentsthatarehardtorecognize,itisnecessarytoemploylargehigh-capacityDNN
modelsintheprocessingpipelinetoensurethecontentstobecorrectlyrecognized.Thisis
verysimilartohowhumanvisionsystemworkswhereaglimpseissu˚cienttorecognize
simplescenesandobjectsinordinaryposes,whereasmoreattentionande˙ortsareneeded
tounderstandcomplexscenesandobjectsthatarecomplicatedorpartiallyoccluded[49].
Basedonthiskeyinsight,content-adaptivevideostreamanalyticssystemssuchas
Chameleon[18]haverecentlyemerged.Leveragingthedynamicsofvideocontents,thesesys-
temse˙ectivelyreducethecomputationalconsumptionofDNN-basedprocessingpipelines
bydynamicallychangingtheDNNmodelsinthepipelinetoadapttothedi˚cultylevelsof
thevideoframes.However,thisdynamiccon˝gurationapproachisamismatchtoresource-
constrainedmobilesystems.Thisisbecauseitrequiresallthemodelvariantswithvarious
capacitiestobeinstalledinthemobilesystem,whichresultsinlargememoryfootprint.More
importantly,ifalargenumberofmodelvariantsisincorporatedandthecontentdynamics
issubstantial,theoverheadofsearchingfortheoptimalmodelvariantandswitchingmodels
atruntimecanbeprohibitivelyexpensive,whichconsiderablydwarfsthebene˝tbroughtby
adaptation.
ThelimitationofChameleonisrootedintheconstraintwhereitrequireshavingmultiple
modelvariantswithvariouscapacitiestoadapttovariousdi˚cultylevelsofvideoframes.To
addressthislimitation,state-of-the-artsuchasBranchyNet[19]andMSDNet[20]introduces
42
theideaofconstructingasinglemodelbyaddingatlayersofaregularDNN
modeltomakeearlyprediction.Withsuchearlypredictionmechanism,easyframesdo
notneedtogothroughallthelayersandtheircomputationalconsumptionisthusreduced.
Whiletheconceptofearlypredictionispromising,thesepioneerworksareconstrainedby
thefollowinglimitations:

Thesepioneerworksarecoarse-grainedapproachwhereearlyexitsareinsertedatthe
outputsofconvolutionallayersofaDNNmodel.Eachconvolutionallayeriscomposed
ofalargenumberofconvolutional˝lters,andthese˝ltersdominatethecomputational
consumptionoftheDNNmodel.However,notallthe˝lterswithineachconvolutional
layerareneededtoearlyexiteasyframes.Asaconsequence,computationconsumedby
thoseunnecessary˝ltersiswastedbythecoarse-grainedapproachduetoitsconstraint
onmakingearlypredictionsatthegranularityoflayers.

Theearlyexitsthemselvesalsoconsumecomputation.Computationconsumedby
framesthatfailtoexitattheearlyexitsiswastedandbecomestheoverheadsincurredby
thisapproach.Unfortunately,theearlyexitarchitectureofpriorworksisdesignedbased
onheuristicswithoutfocusingonthetrade-o˙betweenearlyexitratesandtheincurred
overheads.Withoutcarefullyaccountingforthetrade-o˙,theincurredoverheadscould
diminishthebene˝tofearlyexits.

Lastly,thenumberandlocationsoftheinsertedearlyexitsinpriorworksarealsodeter-
minedbasedonheuristics.Whilee˙ectiveincomparisontomodelswithoutearlyexits,
consideringtheexponentialcombinationsofnumberandlocationsofearlyexits,even
fordeveloperswithextensivedomainexpertise,withoutconsiderablee˙ortsontrialand
error,itwouldbeextremelychallengingtoderiveanearlyexitinsertionplanthatcan
43
Figure4.1:
AconceptualoverviewofFlexDNN.FlexDNNisbuiltontopofabasemodelwith
theaugmentationofoneormoreearlyexitsinsertedthroughoutthebasemodel.Foraneasy
input,itexitsattheearlyexitinsertedatanearlierlocationsincetheextractedfeaturesaregood
enoughtoclassifythecontentintheeasyinput.Assuch,theeasyinputavoidsfurther
computationalcostincurredonward.Forahardinput,itproceedsdeeperuntiltheextracted
featuresaregoodenoughtoexitthehardinput.
fullyleveragethecomputationalconsumptionreductionbene˝tbroughtbyearlyexits.
Moreover,sinceearlyexitsincuroverheads,thenumberandlocationsoftheinserted
earlyexitsplayacriticalroleindeterminingtheamountofcomputationthatcanbe
saved,makingthederivationoftheearlyexitinsertionplanevenmorechallenging.
Inthispaper,wepresentFlexDNN,acontent-adaptiveframeworkforcomputation-
e˚cientDNN-basedmobilevideostreamanalytics.Figure4.1providesaconceptualoverview
ofFlexDNN.FlexDNNe˙ectivelyaddressestheselimitationswiththreenovelcontribu-
tions:

Fine-GrainedEarlyPrediction.
Insteadofmakingearlypredictionsatthegranular-
ityoflayers,FlexDNNbreaksthenaturalboundariesatlayersandadoptsa˝ne-grained
approachtomakeearlypredictionsatthegranularityof˝lters.Forthis,wehavedesigned
anovel˝lterimportancerankingschemebasedontheconceptofcollectiveimportance
toidentifyredundant˝lterswithineachconvolutionallayer.Withsuch˝ne-grainedap-
44
proach,FlexDNNisabletoprovidemuchmore˛exibilityonmakingearlypredictions.
Moreover,itcutso˙unnecessarycomputationwastedbyredundant˝lterswithineach
layerthatcoarse-grainedapproachescouldnotachieve.

OptimalArchitecture.
FlexDNNisabletogeneratetheoptimalarchitecturefor
eachearlyexitandtheoptimalearlyexitinsertionplan.Forthis,wehavedesigned
anarchitecturesearch-basedschemethatisableto˝ndtheoptimalarchitecturethat
balancesthetrade-o˙betweenearlyexitrateandcomputationaloverheadforeachearly
exit.Wehavealsoformulatedthedeterminationofnumberoflocationsofearlyexitsas
anoptimizationproblemtoderivetheoptimalearlyexitinsertionplanthatmaximizes
thebene˝tbroughtbycontentadaptation.Assuch,FlexDNNallowsdeveloperswith
limiteddomainexpertisetobuildDNN-basedcomputation-e˚cientcontinuousmobile
visionapplicationswithminimume˙orts.

RuntimeAdaptation.
Atruntime,mobilevisionsystemsmayexperienceworkload
dynamicsunderdi˙erentcontextsaswellassystemresourcedynamicsduetomulti-
tenancy.Asits˝nalcontribution,thedesignofFlexDNNprovidesanaturalmechanism
toadapttobothworkloadandsystemresourcedynamicsatruntime.
WeimplementFlexDNNinTensorFlow[50]anduseittobuildthreerepresentative
continuousmobilevisionapplications:ActivityRecognition,SceneUnderstanding,and
Tra˚cSurveillancebasedonVGGNetandMobileNets.Weusereal-worlddatasetswith
videostakenbymobiledevicestopro˝letheseapplicationsandevaluatetheirruntimeper-
formanceonbothmobileCPUandmobileGPUplatforms.
WhileMobileNets(content-agnosticmodels)areknownfortheircomputation-e˚cient
designs,ourevaluationshowsthatFlexDNNsigni˝cantlyoutperformsMobileNetsdueto
45
itscontent-adaptivecapability:achievingupto31.0%reductionincomputationconsump-
tionandupto29.6%reductioninenergyconsumptionwhilehavingthesameaccuracy.We
alsocompareFlexDNNtoBranchyNet,arecentstate-of-the-artcontent-adaptivemodelthat
isbasedoncoarse-grainedearlyexitdesign.Ourresultsshowthatbesidesthebene˝tof
automaticallygeneratingtheoptimalearlyexitarchitectureandtheoptimalearlyexitinser-
tionplan,FlexDNNsigni˝cantlyoutperformsBranchyNetduetoits˝ne-graineddesignand
architecturesuperiority,achievingupto41.2%reductionincomputationconsumptionand
upto38.4%reductioninenergyconsumptionwhilehavingthesameaccuracy.Finally,our
evaluationshowsthatunderworkloadandsystemresourcedynamicsatruntime,FlexDNN
isabletoadapttothedynamicsandachievemuchhigheraccuraciesthanBranchyNet.
4.2Background&Motivation
4.2.1DynamicsofMobileVideoContents&Bene˝tofLeveraging
theDynamics
Duetomobilityofcameras,videostakeninreal-worldmobilesettingsexhibitsubstantial
contentdynamicsintermsofdi˚cultylevelacrossframesovertime.Toillustratethis,
Figure4.2showsfourframesofavideoclipofbikingcapturedusingamobilecamerain
thehumanactivityvideodataset
UCF-101
[51].Amongthem,sincetheentiretyofboththe
bikerandherbikeiscaptured,frame(a)and(d)arerelativelyeasiertorecognizeasbiking
activity.Incontrast,frame(b)and(c)capturethebikerwithonlypartofthebike,andare
thusrelativelyhardertorecognize.Insuchcase,asmallermodelissu˚cientforframe(a)
and(d),butamorecomplexmodelisnecessaryforframe(b)and(c).
46
Figure4.2:
Illustrationoffourframesofavideoclipofbikingcapturedusingamobilecamerain
UCF-101dataset:(a)and(d)areframeswithcontentsthatareeasytorecognize;(b)and(c)are
frameswithcontentsthatarehardtorecognize.
Figure4.3:
Bluesolidcurve:minimumcomputationalconsumptiontocorrectlyrecognizethe
contentineachframe(
optimalmodel
).Reddottedcurve:computationalconsumptionofthe
one-size-˝ts-allmodel
.
Theintrinsicdynamicsofvideocontentscreateanopportunitytoreducecomputational
consumptionbymatchingthecapacityoftheDNNmodeltothedi˚cultylevelofeach
videoframe.Toquantifyhowmuchcomputationalconsumptioncanbereduced,we˝rst
pro˝letheminimumcomputationalconsumptionintermsofthenumberof˛oatingpoint
operations(FLOPs)thatisneededtocorrectlyrecognizethecontentineachframeofan
400-framevideoclip.Speci˝cally,wederivetenmodelvariantswithdi˙erentcapacities
from
MobileNetV1
byvaryingitsnumbersoflayersand˝lters.Foreachframe,weselectthe
modelvariantwiththelowestFLOPsthatisabletocorrectlyrecognizethecontentin
that
particular
frame(
optimalmodel
).Wethencompareittothemodelvariantwiththelowest
FLOPsthatisabletocorrectlyrecognizethecontentsin
all
400frames(
one-size-˝ts-all
model
)framebyframe.
47
Figure4.3showsourpro˝lingresult.Asshowninthebluesolidcurve,theminimum
computationconsumedtocorrectlyrecognizethecontentineachframechangesfrequently
acrossframes.Thisobservationstronglyre˛ectstheintrinsicdynamicsofvideocontents
illustratedinFigure4.2.Inaddition,thedi˙erencebetweenunderthetwocurves
re˛ectsthe
bene˝t
broughtbythe
optimalmodel
.Thelargedi˙erenceindicatesthatconsid-
erablecomputationalconsumptioncanbereducedbymatchingthecapacityofthemodel
tothedi˚cultylevelofeachvideoframe.
4.2.2DrawbacksofExistingSolutions
Thebene˝tofcomputationreductionmotivatestodynamicallychangethemodelcapacity
toadapttothecontentsofvideoframes.Toachievecontentadaptation,existingsolutions
suchas
Chameleon
[18]usedynamiccon˝gurationwhereXXX.Whilee˙ectiveasasolution
forresourcefulsystems,dynamiccon˝gurationisamismatchtoresource-constrainedmobile
platformsforthefollowingtworeasons.
First,dynamiccon˝gurationrequiresallthemodelvariantswithvariouscapacitiesto
beinstalledinthemobilesystem.Thisis,unfortunately,notascalablesolutionandcould
leadtolargememoryfootprint.FortheexampleusedinFigure4.3,asingle
MobileNetV1
modelis
14
MBbutthetotalmemoryfootprintoftenmodelvariantsis
67
MB;thememory
footprintwouldonlyincreaseifthenumberofmodelvariantsincreases.
Second,dynamiccon˝gurationincurslargeoverheadsonsearchingfortheoptimalmodel
variantandmodelswitchingatruntime.Takemodelswitchingasanexample.Model
switchinginvolvestwosteps:
modelinitialization
(i.e.,allocatingmemoryspaceforthemodel
toswitchto)and
parameterloading
(i.e.,loadingthemodelparametersintotheallocated
memoryspace).AsshowninFigure4.3,modelswitchingcouldoccurveryfrequently(110
48
Figure4.4:
Bene˝tbroughtbytheadaptationvs.modelswitchingoverheadofthedynamic
con˝gurationapproach.
timeswithin400frames)becausethecontentsofvideoscapturedbymobilecamerascan
changequitedrasticallyinashortperiodoftime.Toquantifymodelswitchingoverhead,we
pro˝letheaverageprocessingtimeofbothmodelinitializationandparameterloadingofall
themodelswitchingoccurredinthesame400-framevideoclipontheSamsungGalaxyS8
smartphoneCPU.Tomaketheresultmoremeaningful,wealsopro˝letheaverageinference
timeperframeof
optimalmodel
and
one-size-˝ts-allmodel
.
Figure4.4showsourresult.Theaverageinferencetimeperframeof
one-size-˝ts-all
model
and
optimalmodel
is
79
:
5
msand
44
:
1
ms.Thedi˙erencebetweenthemmeasures
thebene˝tbroughtbytheadaptation:byusing
optimalmodel
insteadof
one-size-˝ts-all
model
oneachframe,weareabletoreduce44.5%ofthecomputation.However,theaverage
processingtimeofmodelinitializationandparameterloadingis
18
:
9
msand
9
:
6
ms,This
modelswitchingoverheaddropsthe
actual
computationreductiontoonly8.7%,whichcuts
thebene˝tbroughtbytheadaptationsigni˝cantly.
49
Figure4.5:
FlexDNNarchitecture.
4.3DesignofFlexDNN
Inthiswork,wepresentFlexDNN,acontent-adaptiveframeworkforcomputation-e˚cient
DNN-basedmobilevideostreamanalytics,thate˙ectivelyaddressesthedrawbacksofexist-
ingsolutions.Figure4.5illustratesthehigh-levelarchitectureofFlexDNN.Asanoverview,
toaddressthelimitationcausedbycoarse-graineddesign,FlexDNNadoptsa˝ne-grained
designtomakeearlypredictionsatthegranularityof˝lters.Itincorporatesacollective
importance-based˝lterrankingschemetoranktheimportanceof˝lterswithineachcon-
volutionallayerSuch˝lterrankinglaysthefoundationofFlexDNNandallows
ustosystematicallygeneratetheoptimalarchitectureforeachearlyexitthroughanarchi-
tecturesearchschemeandderivetheoptimalearlyexitinsertionplanthroughan
optimizationformulationLastly,the˛exiblearchitectureofthecontent-adaptive
computation-e˚cientmodelgeneratedbyFlexDNNprovidesanaturalmechanismtoof-
fertrade-o˙betweenaccuracyandresourcedemand.3.4.2).Suchmechanismallows
50
FlexDNNtoadapttobothworkloadandsystemresourcedynamicsatruntime
4.3.1FilterRankingbasedonCollectiveImportance
4.3.1.1Motivation
ThefoundationofFlexDNNistorank˝ltersbasedontheirimportanceandidentifyless
important˝lterswithineachlayer.Toranktheimportanceof˝ltersofaparticularlayer,
existingworkadoptstheapproachwhichrankseach˝lterindependentlybasedonaprede-
˝nedimportanceindicator.Forinstance,in[1],
L
2
-normisusedastheindicatorbasedon
theheuristicthatimportant˝lterstendtohavelarger
L
2
-normvalues.Unfortunately,this
approachhasakeydrawback:itignoresthedependencebetween˝ltersineachlayer.As
aresult,informationcontainedinthetop-ranked˝lterscanbehighlyoverlapped,whichis
therootcauseofredundancy.
4.3.1.2CollectiveImportance-basedFilterRanking
Toaddressthedrawbackofexisting˝lterimportancerankingapproaches,weproposea˝lter
importancerankingschemethattakes˝lterdependenceintoaccounttorankthecollective
importanceinsteadofindividualimportanceof˝lterswithineachlayer.
Ourcollectiveimportance-based˝lterrankingschemeisinspiredbythesequentialback-
wardselection(SBS)approachinfeatureselectionliterature[52].Atahighlevel,ourscheme
startswithall˝lters,anditerativelyremovesthe˝lterthatleastreducestheinferenceaccu-
racyfromthe˝lterset.
Speci˝cally,ourschememaintainstwolists:rankedandunranked,withrankedinitialized
asbeingemptyandunrankedinitiatedwiththefullsetof˝ltersincludedinaconvolutional
51
Figure4.6:
Comparisonoftheminimumnumberof˝ltersneededtoahievethesameaccuracyas
usingall˝lterswithineachlayerbetweencollectiveimportance-basedrankingschemeandthe
schemebasedonindependentranking.
layer.Foreach˝lter
f
i
inunranked,wetemporallydropitfromunranked,addanearlyexit
withthefeaturemapsgeneratedbythe˝ltersinsideunrankedasitsinput,and˝ne-tune
theparametersoftheearlyexit.Next,weobtainthevalidationaccuracyofthisearlyexit
andstoreitinatableaccasakey-valuepairwiththekeybeing
f
i
andthevaluebeingthe
validationaccuracy.Thedropped˝lter
f
i
isthenaddedbacktounranked.Thisprocedure
iteratesuntilallthe˝ltersaregonethrough.Finally,the˝ltercorrespondstothehighest
accuracyintheaccisidenti˝edastheleastimportant˝lteramongunranked.Hence,we
permanentlydropthis˝lterfromunranked,andinsertitatthetopofranked.Thisprocess
terminatesuntillthenumberof˝ltersinunrankedislessthan1/5˝ltersofthislayer.This
isbecauseempiricallythefeaturemapextractedbylessthan1/5˝ltersisnotenoughfor
anearlyexittomakeearlypredictionwithhighcon˝dence.Wetreateachoftheremaining
˝ltersinunrankedequallyimportant,andaddalloftheseremaining˝ltersinsideunranked
tothetopofranked.Basedontheranked˝ltersinsideranked,FlexDNNisableto˝ndthe
minimumnumberof˝ltersthatanearlyexitneedstoachievethesameaccuracyasusing
allthe˝lterswithineachlayer.Thisisachievedbyiterativelydroppingthelowestranked
˝ltersuntiltheaccuracyofearlyexitstartstodrop.
52
4.3.1.3SuperiorityofCollectiveImportance-basedFilterRanking
Wecompareourschemetoindependent˝lterimportancerankingschemebasedon
L
2
-norm.
Todothis,weuseUCF-15asthevideodatasetandMobileNetV1asourbasemodeltorank
theimportanceof˝ltersofeachlayer.We˝ndtheminimumnumberof˝lterstoachievethe
sameaccuracyasusingall˝lterswithineachlayerforbothourscheme(Fine-Grained-SBS)
andtheindependent˝lterimportancerankingschemebasedon
L
2
-norm(Fine-Grained-L2).
Figure4.6showstheresult.Duetothelimitationofspace,wedonotshowtheresults
ofL11toL13sincetheyhavesimilarresultsasL10.Asshown,theminimumnumber
of˝ltersobtainedbyourcollectiveimportance-based˝lterrankingschemeislowerthan
L
2
-normacrossallthelayers.Inparticular,ourschemeisabletoidentifyupto28.1%
moreredundant˝lterscomparedto
L
2
-norm.Thisisbecauseourschemeaccountsfor
thedependencebetween˝lterswithineachlayerandthusonlyselects˝ltersthatcontain
complementaryinformation.
4.3.2SearchingfortheOptimalEarlyExitArchitecture
4.3.2.1Motivation
Anearlyexitisessentiallyaself-containedneuralnetworkthatcanmakeearlypredictions.
However,designingthearchitectureofearlyexitsisnottrivial:earlyexitsconsumecom-
putation.Computationconsumedbyframesthatfailtoexitiswastedandbecomesthe
overheads.Assuch,itisnecessarytominimizesuchoverheadsbyusingleastnumberoflay-
ersand˝lterstobuildeachearlyexit.However,earlyexitswithsuchextremelylightweight
architecturecouldexitmuchlessframes,diminishingthebene˝tofearlyexits.Therefore,
thereexistsatrade-o˙betweenearlyexitratesandcomputationaloverheadsinthedesign
53
spaceofearlyexitarchitecture.Moreover,thelocationsofearlyexitstoinsertatalsoa˙ect
theearlyexitrates.Thisrequiresustodesignthearchitectureforeachearlyexitbased
onitsinsertedlocation,whichmakesthetaskofearlyexitarchitecturedesignevenmore
challenging.
ExistingworksuchasBranchyNetdesignsthearchitectureofearlyexitsbasedonheuris-
ticsandconsiderablee˙ortsontrialanderror,butdoesnotprovidedetaileddesignguidelines
duetothecomplexityofthisdesigntask.Asaresult,theirapproachmaynotbegeneralized
toothermodels.Moreover,itforcesdeveloperstohavedecentknowledgeonDNNarchitec-
tureandtospendconsiderablee˙ortsontrialanderror,prohibitingitswideadoptionin
practice.
4.3.2.2EarlyExitArchitectureSearchScheme
Toaddressthisissue,weadoptafundamentallydi˙erentapproach.Insteadofmanually
designingthearchitecturebasedonheuristicsandtrialsanderrors,weproposeascheme
basedonarchitecturesearchto˝ndtheoptimalarchitecturethatoptimizesthetrade-o˙
betweenearlyexitratesandcomputationaloverheadsforeachearlyexit.
Ingeneral,networkarchitecturesearchtechniquescanbegroupedintotwocategories:1)
thebottom-upapproachthatsearchesforanoptimalcellstructurebasedonreinforcement
learning(RL)orgeneticevolutionandstackscellstogethertoformanetwork2)the
top-downapproachthatprunesanover-parameterizednetworkuntiltheoptimalnetwork
architectureisfound[1,56].Althoughbothapproachesworkinourscenario,inthiswork,
weselectthetop-downapproachduetoitsmoree˚cientsearchprocess.
Atahighlevel,ourearlyexitarchitecturesearchschemestartswithinsertinganover-
parameterizedearlyexitateverypossiblelocationinaDNNmodel.Itthenmaximizesthe
54
earlyexitratebytrainingwithemphasisonimportant˝ltersfollowedbyminimizingthe
computationaloverheadsofearlyexitsviaiterativelayerand˝lterpruning.Indoingso,the
besttrade-o˙betweenearlyexitrateandcomputationaloverheadforeachindividualearly
exitisachieved.
Weexplainthedetailsofeachstagebelow.

Stage-1:InsertOver-ParameterizedEarlyExits
.Foreachlayer,weinsert
M
early
exitswhere
k
th
earlyexitusesthe˝rst
d
kN=M
e
mostimportant˝ltersrankedinranked
asinput,where
N
denotesthetotalnumberof˝ltersofthislayer.Weinitializeeachearly
exitwithfourlayerswitheachlayerhavingtwiceasmany˝ltersasitscorresponding
previouslayer.Weincreasethenumberof˝ltersbythefactoroftwosoastoproperly
encodetheincreasinglyricherrepresentationsaswegodeeper.Similarsetupcanbe
foundinpopularDNNssuchasMobileNetsandInceptionV3.

Stage-2:MaximizeEarlyExitRatesbyTrainingwithEmphasisonImportant
Filters
.Givenhowweinsertearlyexitsinourpreviousstep,thehighera˝lterisranked
inranked,themorefrequentthe˝lterisusedbyearlyexits.Toforcethesefrequentlyused
˝lterstolearnmoresalientfeatures,wetrainthose˝lterswithbyassigning
lowerdropoutratecomparedto˝ltersthatarelowerranked.Indoingso,thisprocess
essentiallymaximizestheexitrateofeachearlyexit.Wedetailthede˝nitionofearly
exitratein

Stage-3:MinimizeOverheadsbyPruningLayersandFilters
.Althoughtheexit
rateofeachearlyexitismaximized,theoverheadofeachover-parameterizedearlyexit
isstillsigni˝cantduetoover-parameterization.FlexDNNminimizesthisoverheadby
iterativelypruninglayersand˝ltersofeachearlyexituntiltheexitratestartstodrop.
55
Speci˝cally,foreachearlyexit,westartwithlayer-wisepruningbyiterativelypruningits
layersuntilitsexitratedrops.Wethenapply˝lter-wisepruningbyiterativelypruning
lowerranked˝lterswithineachremaininglayeruntilitsexitratedrops.Asaresult,the
architectureofeachinsertedearlyexitachievestheoptimizedtrade-o˙betweentheexit
rateandcomputationaloverhead.
4.3.2.3EarlyExitRate
Earlyexitrateofagivenearlyexitre˛ectstheprobabilityofaframetobeexitedtoavoid
furthercomputation.Aframeisexitedifitscon˝dencescoreishigherthanapre-de˝ned
threshold.Formally,ourcon˝dencescoreisde˝nedas:
Conf
(
y
)=1+
1
log
C
X
c
2
C
y
c
log
y
c
(4.1)
where
y
=[
y
1
;y
2
;:::;y
c
;:::;y
C
]
isthesoftmaxclassi˝cationprobabilityvectorgeneratedby
anearlyexit,
C
isthetotalnumberofclasses,and
P
c
2
C
y
c
log
y
c
isthenegativeentropyof
thesoftmaxclassi˝cationprobabilitydistributionoveralltheclasses.Thethresholdforeach
earlyexitsuchthattheframesareexitedwithoutlossofaccuracycanbeobtainedusing
cross-validation[19].Wedenotethosethresholdsas:
T
lossless
=(
T
EE
1
;:::;T
EE
i
;:::;T
EE
K
)
(4.2)
where
EE
i
denotesthe
i
th
earlyexitof
~
Net
,
K
istotalnumberofearlyexitsinsertedin
~
Net
.
56
4.3.3EarlyExitInsertionPlan
4.3.3.1Motivation
Althoughtheoptimizedtrade-o˙betweentheexitrateandcomputationaloverheadofeach
earlyexitisachievedbyourarchitecturesearchschemeexplainedabove,theoptimized
trade-o˙fortheentirenetworkisnot.Thisisbecauseearlyexitshavebeeninsertedat
everypossiblelocationthroughoutthenetworkandhenceaccumulateimmenseoverhead
altogether.
Toobtainthegloballyoptimizedtrade-o˙,incontrasttoBranchyNetandMSDNetwhich
manuallydeterminetheearlyexitinsertionlocationsbytrialanderror,FlexDNNadopts
asystematicapproachtoderiveanoptimalearlyexitinsertionplan.Atahighlevel,we
formulatethederivationofearlyexitinsertionplanasanoptimizationproblemandsolveit
usingane˚cientgreedyheuristicthatgreedilyprunesine˚cientearlyexitsinaniterative
manner.
4.3.3.2ProblemFormulation
FlexDNNformulatesthederivationofearlyexitinsertionplanasthefollowingoptimization
problem:
Net

=argmin
~
Net
Res
(
~
Net
)
(4.3)
where
Net

isthemodelwithoptimalearlyexitinsertionplan,
~
Net
isthesetofcandidates
withallthepossibleinsertioncombinations,
Res
(

)
evaluatesthecomputationalconsumption
ofaspeci˝cinsertionplan.
SolvingEq.(4.3)bysearchingallthepossibleinsertionplansiscomputationallyexpensive
becausethereare
2
K
combinations,where
K
isthetotalnumberofinsertionlocations.To
57
reducethecomplexityofthisprocess,FlexDNNutilizesagreedyheuristictoobtainan
approximationsolutionbasedonthefollowingkeyobservation.
Whenamodelisdenselyinsertedwithearlyexits,pruninganyoftheearlyexitsleads
toreductionofcomputationalconsumption.Amongallearlyexits,pruningtheearlyexit
withsmallestearlyexitrateandlargecomputationaloverheadleadstolargestreductionof
computationalconsumption(i.e.,ine˚cientearlyexit).Basedonthisobservation,FlexDNN
greedilyprunesthesetheseine˚cientearlyexitsinaniterativemanner.FlexDNNterminates
thisiterationprocesswhencomputationalconsumptionofthemodelstartstoincrease.This
isbecauseatthisstage,alltheremainingearlyexitsarecontributingtothecomputational
e˚ciencyandhencearebene˝cialtothemodel.Assuch,theremainingearlyexitsrepresent
theoptimalearlyexitinsertionplan.
4.3.3.3E˚ciencyofEarlyExit
Toidentifyine˚cientearlyexits,wede˝neametric
Q
thatquanti˝esthequalityofthetrade-
o˙betweenearlyexitrateandcomputationaloverheadofaparticularearlyexit.Speci˝cally,
forearlyexit
j
,wede˝neitsquality
Q
j
astheratiobetweenthegain
G
j
itbringsandthe
cost
C
j
itincurs:
Q
j
=
G
j
=C
j
(4.4)
where
C
j
isthecomputationconsumedbythethisearlyexit,and
G
j
isthecomputation
avoidedduetotheexistenceoftheearlyexit.Both
C
j
and
G
j
aremeasuredbythenumber
of˛oatingpointoperations.
Figure4.7illustrateshow
G
j
and
C
j
arecalculated.Inparticular,the˝gureshows
threeconsecutiveearlyexitsinsertedatthe
i
th
,
j
th
,and
k
th
earlyexitpositionsofbase
58
Figure4.7:
Illustrationofderivationofqualityofanearlyexit.
model(
i<j<k
).Let
F
denotethetotalnumberofinputframes;
cr
i
and
cr
j
denotethe
cumulativeexitrateofthe
i
th
and
j
th
earlyexit,respectively(
0

cr
i

cr
j

1
);
CE
j
denotethecomputationconsumedbythe
j
th
earlyexitperinputframe;and
CB
j
denote
thecomputationalcostofthebasemodelbetweenthe
j
th
and
k
th
earlyexitperinputvideo
frame.
Since
F

cr
i
inputframesexitatthe
i
th
earlyexit,thereare
F

(1

cr
i
)
inputframes
goingthroughthe
j
th
earlyexit.Asaresult,
C
j
iscalculatedas:
C
j
=
F

(1

cr
i
)

CE
j
(4.5)
Thereare
F

(
cr
j

cr
i
)
inputframesexitingatthe
j
th
earlyexit.Theseinputframes
avoidfurthercomputationalcostincurredbetweenthe
j
th
and
k
th
convolutionallayerofthe
basemodel.Therefore,
G
j
iscalculatedas:
G
i
=
N

(
cr
j

cr
i
)

CB
j
(4.6)
59
4.3.4RuntimeAdaptationtoWorkloadandSystemResourceDy-
namics
4.3.4.1Motivation
Themovingspeedofmobilevisionsystemsvariesdependingoncontexts.Toensureobtaining
allthenecessaryinformationcontainedinvideoframes,videoscapturedbymobilevision
systemsmovinginhighspeedsrequirehighframeprocessingrateswhilevideoscaptured
bymobilevisionsystemsmovinginlowspeeds(orstationary)requirelowframeprocessing
rates.Therefore,theworkloadofprocessingvideoscapturedbymobilevisionsystemscan
varyindi˙erentcontexts.
Inaddition,mobilevisionsystemsusuallyrunmultipleapplicationconcurrentlyand
hencetheiravailableruntimecomputationresourcesaredynamicduetotheeventssuch
asstartingnewapplications,closingexistingapplications,andapplicationprioritychanges.
Inaddition,thebatterystatusofthesesystemschangesandhencerequiresdi˙erenten-
ergybudgets.Therefore,thesystemresourcesinmobilevisionsystemscanalsochange
dynamically.
Toadapttosuchworkloadandsystemresourcedynamics,FlexDNNusesaknobtotrade
o˙accuracyandresourcedemand.Speci˝cally,giventheaccuracy-resourcepro˝legenerated
o˜ine,FlexDNNdynamicallyadjuststheknobtoadapttothecurrentworkloadandsystem
resourceatruntimesuchthatFlexDNNisabletoachievethehighestaccuracy.
4.3.4.2Accuracy-ResourcePro˝le
Toobtaintheaccuracy-resourcepro˝le,wechangetheexitthresholdof
Net

obtainedin
Eq.(4.3)byapplyingaknob

2
(0
;
1]
on
T
lossless
(de˝nedinEq.(4.2)).Formally,theexit
60
Figure4.8:
Accuracy-latencypro˝leunderdi˙erent

values.
threshold
T
isgivenby:
T
=

T
lossless
=(
T
EE
1
;:::;T
EE
i
;:::;T
EE
K
)
(4.7)
Inessence,thelower

is,theless
T
are.Withalower

,avideoframeisabletoexitwith
alowercon˝dence.Thiscreatesanaturalmechanismtotradeo˙accuracyandresource
consumptionbyadjustingthevalueof

.
Given
T
,theaccuracy-resourcepro˝leisthenobtainedbyo˜inepro˝lingtheaccuracy
andthecorrespondingresourceusageonagivenmobileplatform.Inthiswork,FlexDNNfo-
cusesontwotypesofresources:computationalresourceandenergyresource.Computational
resourceisevaluatedasaverageCPU/GPUprocessinglatencyperframeandenergyresource
isevaluatedasaverageenergyconsumptionperframe.Weobtainbothaccuracy-latencyand
accuracy-energyconsumptionpro˝lesunder100%CPU/GPUutilization.
Asanexample,Figure4.8showstheaccuracy-latencypro˝lewith

varyingfrom0.2
to1.0.Asshown,byadjusting

todi˙erentvalues,FlexDNNisabletotradeo˙accuracy
andframeprocessinglatency.Speci˝cally,FlexDNNreduces62.8%frameprocessinglatency
byonlysacri˝cing4.9%Top-1accuracy.Wedonotshowtheaccuracy-energyconsumption
pro˝lebecauseitexhibitsasimilarpro˝le.
61
Figure4.9:
(a)UCF-15:examplevideoframeofbiking(left)andskiing(right).(b)Place-8:
illustrationofdatacollectionusingaORDROEP5head-mountedcamera(left);examplevideo
frameofparkinglot(right).(c)TDrone:illustrationofdatacollectionusingaDJIMavicPro
drone(left);examplevideoframeoftra˚csurveillanceintheresidentialarea(right).
4.3.4.3RuntimeAdaptation
Thegoalofruntimeadaptationisto˝ndtheoptimal

valuesuchthatFlexDNNsacri˝ces
minimalaccuracywhilestillbeingabletoprocessalltheframesofgivenbytheworkload
withinthesystemresourcebudget.Toachievethisgoal,FlexDNNcontinuouslymonitorsthe
workloadandsystemresourcesandmapsthemintothelatencybudgetandenergybudget.
Basedonthelatencyandenergybudget,FlexDNNderivestheoptimal

valueasfollows.
Speci˝cally,FlexDNNtakesinthreeinputs:1)theframerateofthecurrentworkload
fr
;2)
theratiooftheallocatedcomputeresourceatruntimetototalcomputeresourceofthemobile
system
r
;and3)theenergyresourcebudgetperframe
Bud
max
e
.Giventhosethreeinputs,
FlexDNN˝rstcalculatesthelatencybudgetwith100%CPU/GPUutilization
Bud
t
=1
=fr
.
Nextitadapts
Bud
t
totheallocatedcomputeresourceatruntimebydividing
Bud
t
with
r
:
Bud
max
t
=
Bud
t
=r
.Forenergybudget,FlexDNNdirectlyuses
Bud
max
e
.Finally,the
optimal

valueisobtainedby˝ndingthelargest

fromaccuracy-latencyandaccuracy-
energyconsumptionpro˝lessuchthatFlexDNNrunswithinbothlatencybudget
Bud
max
t
andenergybudget
Bud
max
e
.
62
Application
Target
MobilePlatform
Dataset
Numberof
Frames
BaseModel
FlexDNNModel
ActivityRecognition
MobilePhone
UCF-15
1,080K
MobileNet,VGG-16
M-U,V-U
SceneUnderstanding
ARHeadset
Place-8
123K
MobileNet,VGG-16
M-P,V-P
Tra˚cSurveillance
Drone
TDrone
146K
MobileNet,VGG-16
M-T,V-T
Table4.1:
Summaryofthreeapplications,twobasemodels,andsixgeneratedFlexDNNmodels.
4.4Implementation
WeselectMobileNets[30]andVGGNet[36]asourbaseDNNmodelstobuildourcontent-
adaptivecomputation-e˚cientmodels.MobileNetsarestate-of-the-artcomputation-e˚cient
DNNmodelsdesignedformobileplatforms.TodemonstratethegenerabilityacrossDNN
modelsaswellasinvestigatethebene˝tFlexDNNbringstoresource-demandingbasemodels,
weselectVGG-16asarepresentativemodel.Inourexperiments,weinsertearlyexitsatthe
˝lterswithinthedepthwiselayersofthedepthwiseconvolutionblocksofMobileNetandthe
convolutionallayersofVGG-16.
TodemonstratethegenerabilityofFlexDNNacrossapplications,weuseFlexDNNand
threevideodatasetstobuildthreerepresentativecontinuousmobilevisionapplicationsfor
threemobileplatforms:
ActivityRecognitiononMobilePhones.
Automaticlabelinghumanactivitiesinvideos
isbecomingaveryattractivefeatureforsmartphones.Thisapplicationaimstorecognize
activitiesperformedbyanindividualfromvideostreamscapturedbymobilephonecameras.
Todesignthisapplication,weuseUCF-101humanactivitydataset[51].UCF-101contains
videoclipsof101humanactivityclassescapturedbyeither˝xedormobilecamerasinthe
wild.Weselectedvideoclipsof
15
activities(e.g.,bikingandskiing)capturedbymobile
camerasasourdataset(namedUCF-15)tobuildandevaluatethisapplication.UCF-15
63
consistsof
1
;
080
Kvideoframes.Wesplittrainingandtestvideosbyfollowingtheoriginal
paper[51].ExamplevideoframesofUCF-15areshowninFigure4.9(a).
SceneUnderstandingforAugmentedReality.
Sc-eneunderstandingisoneofthe
corecapabilitiesofaugmentedreality.Thisapplicationaimstorecognizeplacesfromvideo
streamscapturedbyhead-mountedcameras.Todesignthisapplication,wecollectedourown
videoclipsinthewildwithIRBapprovalduetolackofpubliclyavailabledatasets.During
datacollection,participantswereinstructedtocollect˝rst-personviewvideofootagefrom
diverseplacesbywearingtheORDROEP5head-mountedcamera[57].Framesinallthe
videoclipsaremanuallylabelled.Fromthelabeledvideoclips,weselected
8
mostcommon
placesthatparticipantsvisited(e.g.,parkinglot,kitchen)asourdataset(namedPlace-8)
tobuildandevaluatethisapplication.Place-8consistsof
123
Kvideoframes.Toavoid
modelover˝tting,weusethesame
8
placesfromPlaces-365[44]asourtrainingset,andour
self-collectedvideoframesastestset.Illustrationofdatacollectionusingthehead-mounted
cameraandoneexamplevideoframeofPlace-8areshowninFigure4.9(b).
Drone-basedTra˚cSurveillance.
Duetoitsmobility,atra˚csurveillancedroneis
abletotracktra˚cconditionsinalargeareawithanextremelowcostthattraditional˝xed
videocamera-basedtra˚csurveillancesystemscouldnotprovide[58].Thisapplicationaims
todetectvehiclesfromvideostreamscapturedbydronecameras.Duetolackofpublicly
availabledatasets,weuseoneofthemostadvancedcommercialdrones,DJIMavicPro[59],
tocollectourowntra˚csurveillancevideoclipsinthewildwithIRBapproval.Toensure
diversity,videoswererecordedundervariousdronecameraangles(25
°
to90
°
),˛yingheights
(2.5mto51.2m),speeds(1m/sto11.2m/s),weatherconditions(cloudy,sunny),androad
types(residential,urban,highway).Framesinallvideoclipsaremanuallylabelled.This
64
dataset(namedTDrone)consistsof
146
Kvideoframes.Wesplititinto15%and85%for
trainingandtesting.Illustrationofdatacollectionusingdroneandoneexamplevideoframe
ofTDroneareshowninFigure4.9(c).
WeimplementFlexDNNinTensorFlow.Foreachofthethreeapplications,weuse
FlexDNNtogenerateacontent-adaptivecomputation-e˚cientmodelfromMobileNetand
VGGrespectivelyforourevaluation.Welistthethreeapplications,twobasemodels,and
sixmodelsgeneratedbyFlexDNNinTable4.1.
4.5Evaluation
4.5.1Methodology
Baselines.
WecompareFlexDNNwithtwobaselines:1)BasemodelofFlexDNN:Mo-
bileNetandVGG-16(i.e.,content-agnostic).WecompareFlexDNNmodeltoitscor-
respondingbasemodeltodemonstratethebene˝tbroughtbycontentadaptation.2)
BranchyNet[19]:arecentstate-of-the-artcontent-adaptivemodelbasedonthecoarse-
grainedearlyexitdesign.Tomakefaircomparisons,weuseMobileNetandVGG-16as
thebasemodelsforBranchyNet.
DeploymentPlatforms.
Tomatchthethreeapplicationstotheirtargetmobileplatforms,
wedeployM-UandV-UoftheActivityRecognitionapplicationonaSamsungGalaxyS8
smartphoneandrunthemonthesmartphoneCPU;wedeployM-PandV-PoftheScene
UnderstandingapplicationaswellasM-TandV-ToftheTra˚cSurveillanceapplication
onaNVIDIAJetsonXavierdevelopmentboard[60]andrunthemontheonboardGPU.
WechooseNVIDIAXavierbecauseitisthestate-of-the-artmobileGPUdesignedfornext-
65
Figure4.10:
ComparisonbetweenFlexDNNandbaselineapproachesintheaccuracy-frame
processingtimespace.
generationDNN-basedintelligentmobilesystemssuchasARheadsets,drones,androbots.
EvaluationMetrics.
WeusethreemetricstoevaluatetheperformanceofFlexDNNand
thebaselines:1)
inferenceaccuracy
:weuseTop-1accuracyofallthevideoframesinthetest
setasthemetricofinferenceaccuracy;2)
computationalcost
:weuseaverageCPU/GPU
processingtime(with100%CPU/GPUutilization)perframeasthemetricofcomputational
cost;3)
energyconsumption
:computationalcostdirectlytranslatestoenergyconsumption.
Wethususetheaverageenergyconsumptionperframeasourthirdmetric.
Figure4.11:
(a)Comparisonbetweentheaccumulatedcomputationalcostofalltheearlyexits
andthecomputationalcostofthebasemodel.(b)Comparisonbetweensaving(theaverage
computationsavedfromearlyexitingperframe)andoverhead(theaveragecomputation
consumedbytheearlyexitseachframegoesthroughbutfailstoexit).
66
Figure4.12:
Cumulativeexitrateateachwithoutlossofaccuracy.rly
(markedasE1,E2,...)areorderedbasedontheirdistancestotheinputlayer(i.e.,E1isthe
earliestexit).FEdenotestheregularexitofthebasemodel.
4.5.2ModelPerformance
Inthissection,wefocusonpro˝lingthemodelsgeneratedbyFlexDNNtohighlighttheir
keycharacteristics.
4.5.2.1HighEarlyExitRate
ThegeneratedFlexDNNmodelisabletoachievehighearlyexitratesthroughitsearlyexits
withoutlossofaccuracy.Toquantifythischaracteristic,wepro˝leeachofsixmodelsonthe
testset,andmeasurethecumulativeexitrateateachearlyexit.
Figure4.12showsthelocationsoftheinsertedearlyexits(e.g.,E1representsthatthe
earlyexitisclosesttotheinputlayer)andtheircumulativeexitrates.Asshown,foreachof
thesixmodels,theincreasingcumulativeexitratesimplythesigni˝canceofeachinserted
earlyexit.Accumulatively,theseearlyexitsareabletoexit75.4%,66.8%,78.5%,95.6%,
76.8%,and78.5%oftheframesonM-U,M-U,M-P,V-P,V-T,andV-T,respectively.
Thisresultindicatesthatourtechniqueise˙ectiveatidentifyinge˚cientearlyexitswhile
67
pruninglesse˚cientones.
WealsoobservethatwhenMobileNetisthebasemodel,FlexDNNprunesmoreearly
exitscomparedtoVGG-16.ThisisbecauseMobileNetisaverye˚cientbasemodelin
termsofaccuracy-computationaloverheadratio,andhenceinsertingtoomanyearlyexits
bringsmorecumulativecomputationaloverheadinproportiontoitsbasemodel.Incontrast,
VGG-16isamuchlesse˚cientbasemodel,andthereforeinsertingearlyexitsbringsmuch
lesscumulativecomputationaloverheadinproportiontoitsbasemodel.Assuch,FlexDNN
onlyinsertsuptothreeearlyexitswhenthebasemodelisMobileNet,andinsertsupto
eightearlyexitswhenthebasemodelisVGG-16.
4.5.2.2Computation-E˚cientEarlyExits
TheearlyexitsincorporatedinthegeneratedFlexDNNmodelsarecomputation-e˚cient.
Toquantifythischaracteristic,wecomparetheaccumulatedcomputationalcostofallthe
earlyexitsoftheFlexDNNmodelwiththecomputationalcostofitsbasemodel.Asshown
inFigure4.11(a),theaccumulatedcomputationalcostofalltheearlyexitsoftheFlexDNN
modelisonly7.8%,1.4%,5.0%,1.6%,9.7%,and1.3%ofitsownbasemodelforM-U,V-U,
M-P,V-P,M-T,andV-T,respectively,indicatingthatevenintheworstcasescenariowhere
avideoframegoesthrough
all
theinsertedearlyexits,theseexitsaltogetherincur
marginal
computationaloverheadcomparedtothebasemodel.
4.5.2.3HighComputationalConsumptionReduction
Becauseofhighearlyexitrateandcomputation-e˚cientearlyexitdesign
thegeneratedFlexDNNmodelisabletoachievehighcomputationalconsumptionreduction.
Toquantifythischaracteristic,weusetheaveragecomputationsavedfromearlyexiting
68
perframetoquantifythesaving;andusetheaveragecomputationconsumedbytheearly
exitseachframegoesthroughbutfailstoexittoquantifytheoverhead.Asshownin
Figure4.11(b),theoverheadissubstantiallylowerthanthesavingforeachmodel:thesaving-
overhead-ratiois
8

,
46

,
8

,
49

,
5

,and
46

forM-U,V-U,M-P,V-P,M-T,andV-T,
respectively.Aswewillshowinthenextsubsection,theachievedhighsaving-overhead-ratio
isdirectlytranslatedintovarioussystemperformanceimprovementatruntime.
Figure4.13:
MemoryfootprintcomparisonbetweentheFlexDNNmodelanditsbag-of-model
counterpart.
4.5.2.4CompactMemoryFootprint
ThegeneratedFlexDNNmodelhasacompactmemoryfootprint.Toquantifythissuperi-
ority,wecompareitsmodelsizewithitsbag-of-modelcounterpartwhosenumberofmodel
variantsequalsthenumberofexits(earlyexitsplustheregularexitofthebasemodel)in
theFlexDNNmodel.
Figure4.13illustratesthecomparisonresult.Asshown,FlexDNNisabletoreducethe
memoryfootprintfor
1
:
4

,
7
:
8

,
1
:
3

,
7
:
8

,
1
:
4

,and
7
:
8

forM-U,V-U,M-P,V-P,
M-T,andV-T,respectively.Thisresultdemonstratestheconsiderablebene˝tofFlexDNN
onmemoryfootprintreduction.
69
4.5.3RuntimePerformance
Inthissection,wefocusonevaluatingtheruntimeperformanceofthemodelsgeneratedby
FlexDNNwhendeployedonmobileplatformsandcomparingthemwithbaselines.
4.5.3.1Top-1Accuracyvs.FrameProcessingTime
Figure4.10comparesFlexDNNwiththebaselinesintheaccuracy-frameprocessingtime
space.ForFlexDNN,eachbluediamondmarkeronthebluecurverepresentstheruntime
performanceobtainedbyusingaparticular

valuede˝nedinEq.(4.7).Wemaketwomain
observations.
First,FlexDNNoutperformsbothMobileNetandVGG-16,whichrepresentthecontent-
agnosticDNNmodelswithoutmodi˝cationofFlexDNN.Underthesameaccuracyachieved
byMobileNet,FlexDNNreducesthecomputationalconsumptionby28.2%,25.9%,30.5%,
48.4%,38.6%,and55.8%forM-U,V-U,M-P,V-P,M-T,andV-T,respectively.Thisresult
demonstratesthesuperiorityofFlexDNNduetoitscontent-adaptivecapability.Second,
FlexDNNalsooutperformsBranchyNetacrosstheaccuracy-frameprocessingtimespace.
Forinstance,whenconstrainedat58msaverageCPUprocessingtimeperframeforM-
U,FlexDNNisabletoachieveapproximately4%accuracygainoverBranchyNet.When
constrainedat25msperframeforM-U,FlexDNNisabletoachieveapproximately8%
accuracygainoverBranchyNet.ThisresultdemonstratesthesuperiorityofFlexDNNdue
toits˝ne-graineddesignandarchitecturesuperiority.
4.5.3.2ReductiononEnergyConsumption
Computationcostdirectlytranslatestoenergyconsumption.Besidesreducingframepro-
cessinglatency,FlexDNNalsoconsumesconsiderablylessenergycomparedtobaselines.
70
Figure4.14:
ComparisonbetweenFlexDNNandbaselineapproachesintheaccuracy-energy
consumptionspace.
AsshowninFigure4.14,underthesameaccuracyachievedbythecorrespondingbase
model,FlexDNNreducestheenergyconsumptionby31.8%,23.8%,23.0%,50.4%,38.6%,
and53.8%forM-U,V-U,M-P,V-P,M-T,andV-T,respectively.Acrosstheentireaccuracy-
energyconsumptionspace,comparedtoBranchyNet,FlexDNNalsosigni˝cantlyreduces
energyconsumption.Forinstance,whentheaccuracyis90%forM-UFlexDNNisable
toreduceapproximately20%energycomparedtoBranchyNet.Similarly,whentheaccu-
racyis85%forM-U,FlexDNNisabletoreduceapproximately50%energycomparedto
BranchyNet.
71
Figure4.15:
Performanceofruntimeadaptationtoworkloadandsystemresourcedynamics.
4.5.3.3PerformanceofRuntimeAdaptation
Wecomparetheperformanceofruntimeadaptationtoworkloadandsystemresourcedy-
namicsbetweenFlexDNNandBranchyNet.WedidnotevaluateonmobileCPUduetoits
lowframerate.
Weusesimulationtocreatebothworkloadandsystemresourcedynamics.Speci˝cally,
tosimulateworkloaddynamics,forscenariowhenthebasemodelisMobileNet,weincrease
thestreamingvideoframeratefrom18FPSto30FPSforM-PandM-T;andwhenthebase
modelisVGG-16,weincreaseframeratefrom13FPSto25FPS,andfrom18FPSto30
FPSforV-PandV-T.Tosimulatesystemresourcedynamics,wedecreaseGPUutilization
from20%to15%forM-PandM-T;anddecreasefrom75%to20%forV-PandV-T.We
allocatesmallerratioofresourceforscenariowhenthebasemodelisMobileNetbecauseit
islesscomputationaldemandingmodelcomparedtoVGG-16.
Figure4.16showstheresults.Whentheresourceconstraintislesstight(Res
=20%
forM-PandM-T,Figure4.16(a)-(b);Res
=75%
forV-PandV-T,Figure4.16(b)-(c))and
72
workloadislow,FlexDNNandBranchyNetachievecomparableaccuracies.However,asthe
workloadincreases,theaccuracyofFlexDNNremainshighwhiletheaccuracyofBranchyNet
dropssigni˝cantly.Whentheresourceconstraintistight(Res
=15%
forM-PandM-T,Res
=50%
forV-PandV-T)andworkloadisthehighest,FlexDNNoutperformsBranchyNet
bylargestmargins.Inparticular,FlexDNNoutperformsBranchyNetby6.0%,6.5%,3.0%,
and4.4%inaccuracyforM-P,V-P,M-T,andV-T,respectively.
4.5.4PerformanceonImageNet
Figure4.16:
Top-1accuracycomparisonbetweenMSDNet-mobile.
Finally,wecomparetheperformanceofFlexDNNandMSDNet.Tomakefaircompar-
isons,bothFlexDNNandMSDNetuseMobileNetV2fortheirbasemodel.Fromthe˝gure,
wehavefourkeyobservations.First,atthehighesttop-1validationaccuracyofMSDNet
(MobileNet)(i.e.,70.6%),FlexDNN(MobileNet)consumes51.3%lessFLOPscompared
toMSDNet(MobileNet).Second,whenwesacri˝cetop-1accuracyby5%(i.e.,65.6%),
FlexDNN(MobileNet)consumes17.9%lessFLOPscomparedtoMSDNet(MobileNet).
Third,MSDNet(MobileNet)consumeslessFLOPsthanFlexDNN(MobileNet)whenthe
top-1validationaccuracyislowerthan62.2%.However,consideringthattheaccuracyis
muchlowerthanthehighestaccuracy,suchmodelsarelesspracticallyusefuleveniftheyhave
73
lessFLOPs.Lastly,FlexDNNisabletoachievehighertop-1accuracy(74.0%)comparedto
MSDNet(MobileNet)(70.6%).
Insum,FlexDNNisabletooutperformMSDNetonImageNetdataset.
4.6ConclusionofFlexDNN
Inthispart,wepresentedthedesign,implementationandevaluationofFlexDNN,acontent-
adaptiveframeworkforcomputation-e˚cientDNN-basedmobilevideostreamanalytics.
FlexDNNaddressesthelimitationsofexistingsolutionsandpushesthestate-of-the-artfor-
wardthroughitsinnovative˝ne-graineddesignandtheautomaticapproachforgenerating
theoptimalarchitecturebasedonearlyexitsforcontentadaptation.WeusedFlexDNNto
builtthreecontinuousmobilevisionapplicationsontopofMobileNetV1,andusedboth
mobileCPUandGPUplatformsforruntimeevaluation.Ourevaluationresultsshow
thatFlexDNNoutperformsbothcomputation-e˚cientcontent-agnosticandstate-of-the-art
content-adaptiveapproaches.WealsoshowthatFlexDNNoutperformsstate-of-the-arton
ImageNetdataset.
74
Chapter5
Conclusion
Inthisreport,wecovertwoworks:NestDNNandFlexDNN.NestDNNrepresentsaframe-
workthatenablesresource-awaremulti-tenanton-devicedeeplearningformobilevision
systems.ThekeyideaofNestDNNistogenerateamulti-capacitymodelthatprovides
˛exibleresource-accuracytrade-o˙s,andtodynamicallyselecttheoptimalonetojointly
maximizetheirperformance.Tofurtherenhancethecomputation-e˚ciencyofDNN-based
processingpipelines,weproposeFlexDNN.FlexDNNisanovelcontent-adaptiveframe-
workthatenablescomputation-e˚cientDNN-basedon-devicevideostreamanalyticsbased
onearlyexitmechanism.Comparedtostate-of-the-artearlyexit-basedsolutions,FlexDNN
addressestheirkeylimitationsandpushesthestate-of-the-artforwardthroughitsinnovative
˝ne-graineddesignandautomaticapproachforgeneratingtheoptimalnetworkarchitecture.
75
BIBLIOGRAPHY
76
BIBLIOGRAPHY
[1]
H.Li,A.Kadav,I.Durdanovic,H.Samet,andH.P.Graf,˝ltersfore˚cient
con
arXivpreprintarXiv:1608.08710
,2016.
[2]
Y.LeCun,Y.Bengio,andG.Hinton,learning,
Nature
,vol.521,no.7553,pp.
2015.
[3]
V.Sze,Y.-H.Chen,T.-J.Yang,andJ.S.Emer,tprocessingofdeepneural
networks:Atutorialandsurvey
ProceedingsoftheIEEE
,vol.105,no.12,pp.
2329,2017.
[4]
F.Mokaya,R.Lucas,H.Y.Noh,andP.Zhang,awearablesystemfor
unobtrusiveskeletalmusclefatiguein
InformationProcessinginSensor
Networks(IPSN),201615thACM/IEEEInternationalConferenceon
.IEEE,2016,
pp.
[5]
S.Nirjon,R.F.Dickerson,Q.Li,P.Asare,J.A.Stankovic,D.Hong,B.Zhang,X.Jiang,
G.Shen,andF.Zhao,Aheartywayoflisteningtomusic,in
Proceedings
ofthe10thACMConferenceonEmbeddedNetworkSensorSystems
.ACM,2012,pp.

[6]
A.Nguyen,R.Alqurashi,Z.Raghebi,F.Banaei-kashani,A.C.Halbower,andT.Vu,
lightweightandinexpensivein-earsensingsystemforautomaticwhole-nightsleep
stagein
Proceedingsofthe14thACMConferenceonEmbeddedNetwork
SensorSystemsCD-ROM
.ACM,2016,pp..
[7]
Y.Wang,K.Wu,andL.M.Ni,Device-freefalldetectionbywirelessnetw
IEEETransactionsonMobileComputing
,vol.16,no.2,pp.4,2017.
[8]
B.Fang,N.D.Lane,M.Zhang,A.Boran,andF.Kawsar,dyscan:Enablingradio-
basedsensingonwearabledevicesforcontactlessactivityandvitalsignin
The14thACMInternationalConferenceonMobileSystems,Applications,andServices
(MobiSys)
,2016,pp.
[9]
S.Han,J.Pool,J.Tran,andW.Dally,earningbothweightsandconnectionsfor
e˚cientneuralnetwin
AdvancesinNeuralInformationProcessingSystems
,2015,
pp.
[10]
Y.Guo,A.Yao,andY.Chen,networksurgeryfore˚cientin
Advances
InNeuralInformationProcessingSystems
,2016,pp.
[11]
J.Lin,Y.Rao,J.Lu,andJ.Zhou,timeneuralpin
AdvancesinNeural
InformationProcessingSystems
,2017,pp.
77
[12]
G.Huang,Z.Liu,K.Q.Weinberger,andL.vanderMaaten,connectedcon-
volutionalnetwin
ProceedingsoftheIEEEconferenceoncomputervisionand
patternrecognition
,vol.1,no.2,2017,p.3.
[13]
P.Bahl,M.Philipose,andL.Zhong,cloud-poweredsightforall:showing
thecloudwhatyouin
ProceedingsofthethirdACMworkshoponMobilecloud
computingandservices
.ACM,2012,pp.
[14]
R.LiKamWaandL.Zhong,h:E˚cientconcurrencysupportforcomputervision
in
Proceedingsofthe13thAnnualInternationalConferenceonMobile
Systems,Applications,andServices
.ACM,2015,pp.
[15]
S.Han,H.Shen,M.Philipose,S.Agarwal,A.Wolman,andA.Krishnamurthy,nn:
Anapproximation-basedexecutionframeworkfordeepstreamprocessingunderresource
constrainin
Proceedingsofthe14thAnnualInternationalConferenceonMobile
Systems,Applications,andServices
,ser.MobiSys'16.NewYork,NY,USA:ACM,
2016,pp.123[Online].Available:http://doi.acm.org/10.1145/2906388.2906396
[16]
A.Mathur,N.D.Lane,S.Bhattacharya,A.Boran,C.Forlivesi,andF.Kawsar,
eye:Resourcee˚cientlocalexecutionofmultipledeepvisionmodelsusingwearable
commodityhardwin
Proceedingsofthe15thAnnualInternationalConferenceon
MobileSystems,Applications,andServices
.ACM,2017,pp.
[17]
L.N.Huynh,Y.Lee,andR.K.Balan,Mobilegpu-baseddeep
learningframeworkforcontinuousvisionin
Proceedingsofthe15th
AnnualInternationalConferenceonMobileSystems,Applications,andServices
,ser.
MobiSys'17.NewYork,NY,USA:ACM,2017,pp.[Online].Available:
http://doi.acm.org/10.1145/3081333.3081360
[18]
J.Jiang,G.Ananthanarayanan,P.Bodik,S.Sen,andI.Stoica,scalable
adaptationofvideoin
Proceedingsofthe2018ConferenceoftheACMSpecial
InterestGrouponDataCommunication
.ACM,2018,pp..
[19]
S.Teerapittayanon,B.McDanel,andH.Kung,hynet:Fastinferenceviaearly
exitingfromdeepneuralnetwin
PatternRecognition(ICPR),201623rdInterna-
tionalConferenceon
.IEEE,2016,pp.
[20]
G.Huang,D.Chen,T.Li,F.Wu,L.vanderMaaten,andK.Q.Weinberger,
densenetworksforresourcee˚cientimage
arXivpreprint
arXiv:1703.09844
,2017.
[21]
A.Puri,surveyofunmannedaerialvehicles(uav)fortra˚csurv
Department
ofcomputerscienceandengineering,UniversityofSouthFlorida
,pp.2005.
[22]
ThisPowerfulWearableIsaLife-Changerforthehttps://blogs.nvidia.com/
blog/2016/10/27/wearable-device-for-blind-visually-impaired/.
78
[23]
A.Krizhevsky,I.Sutskever,andG.E.Hinton,classi˝cationwithdeepconvo-
lutionalneuralnetwin
Advancesinneuralinformationprocessingsystems
,2012,
pp.
[24]
Y.Taigman,M.Yang,M.Ranzato,andL.Wolf,Closingthegaptohuman-
levelperformanceinfacevin
ProceedingsoftheIEEEConferenceonCom-
puterVisionandPatternRecognition
,2014,pp.
[25]
B.Zhou,A.Lapedriza,J.Xiao,A.Torralba,andA.Oliva,earningdeepfeaturesfor
scenerecognitionusingplacesin
NIPS
,2014,pp.
[26]
Googlehttps://store.google.com/us/product/google_clips.
[27]
AnOn-deviceDeepNeuralNetworkforFacehttps://machinelearning.
apple.com/2017/11/16/face-detection.html.
[28]
Amazonhttps://aws.amazon.com/deeplens/.
[29]
H.Zhang,G.Ananthanarayanan,P.Bodik,M.Philipose,P.Bahl,andM.J.Freedman,
evideoanalyticsatscalewithapproximationanddelayin
NSDI
,vol.9,
2017,p.1.
[30]
A.G.Howard,M.Zhu,B.Chen,D.Kalenichenko,W.Wang,T.Weyand,M.Andreetto,
andH.Adam,E˚cientconvolutionalneuralnetworksformobilevision

arXivpreprintarXiv:1704.04861
,2017.
[31]
X.Zeng,K.Cao,andM.Zhang,Asmall-footprintmobiledeeplearning
systemforrecognizingunconstrainedpillin
Proceedingsofthe15thAnnual
InternationalConferenceonMobileSystems,Applications,andServices
.ACM,2017,
pp.
[32]
N.D.Lane,S.Bhattacharya,P.Georgiev,C.Forlivesi,L.Jiao,L.Qendro,and
F.Kawsar,x:Asoftwareacceleratorforlow-powerdeeplearninginferenceon
mobilein
201615thACM/IEEEInternationalConferenceonInformation
ProcessinginSensorNetworks(IPSN)
.IEEE,2016,pp.
[33]
B.Fang,J.Co,andM.Zhang,DeepASL:EnablingUbiquitousandNon-Intrusive
WordandSentence-LevelSignLanguageTin
Proceedingsofthe15thACM
ConferenceonEmbeddedNetworkedSensorSystems(SenSys)
,Delft,TheNetherlands,
2017.
[34]
S.Naderiparizi,P.Zhang,M.Philipose,B.Priyantha,J.Liu,andD.Ganesan,
Aprogrammableearly-discardcameraarchitectureforcontinuousmobile
in
Proceedingsofthe15thAnnualInternationalConferenceonMobileSystems,
Applications,andServices
,ser.MobiSys'17.NewYork,NY,USA:ACM,2017,pp.
[Online].Available:http://doi.acm.org/10.1145/3081333.3081347
79
[35]
K.He,X.Zhang,S.Ren,andJ.Sun,residuallearningforimage
in
ProceedingsoftheIEEEConferenceonComputerVisionandPatternRecognition
,
2016,pp.770
[36]
K.SimonyanandA.Zisserman,erydeepconvolutionalnetworksforlarge-scaleimage

arXivpreprintarXiv:1409.1556
,2014.
[37]
S.Han,H.Mao,andW.J.Dally,compression:Compressingdeepneural
networkswithpruning,trainedquantizationandhu˙manco
arXivpreprint
arXiv:1510.00149
,2015.
[38]
J.-H.Luo,J.Wu,andW.Lin,inet:A˝lterlevelpruningmethodfordeepneural
network
arXivpreprintarXiv:1707.06342
,2017.
[39]
P.Molchanov,S.Tyree,T.Karras,T.Aila,andJ.Kautz,convolutionalneural
networksforresourcee˚cienttransfer
arXivpreprintarXiv:1611.06440
,2016.
[40]
A.KrizhevskyandG.Hinton,multiplelayersoffeaturesfromtiny
2009.
[41]
O.Russakovsky,J.Deng,H.Su,J.Krause,S.Satheesh,S.Ma,Z.Huang,A.Karpathy,
A.Khosla,M.Bernstein
etal.
,largescalevisualrecognitionc
InternationalJournalofComputerVision
,vol.115,no.3,pp.52,2015.
[42]
J.Stallkamp,M.Schlipsing,J.Salmen,andC.Igel,vs.computer:Benchmarking
machinelearningalgorithmsfortra˚csignrecogn
Neuralnetworks
,vol.32,pp.
2012.
[43]
G.LeviandT.Hassner,geandgenderclassi˝cationusingconvolutionalneuralnet-
win
ProceedingsoftheIEEEConferenceonComputerVisionandPatternRecog-
nitionWorkshops
,2015,pp.
[44]
B.Zhou,A.Lapedriza,A.Khosla,A.Oliva,andA.Torralba,A10million
imagedatabaseforscene
IEEETransactionsonPatternAnalysisand
MachineIntelligence
,2017.
[45]
M.Oquab,L.Bottou,I.Laptev,andJ.Sivic,andtransferringmid-level
imagerepresentationsusingconvolutionalneuralnetwin
ProceedingsoftheIEEE
conferenceoncomputervisionandpatternrecognition
,2014,pp.
[46]
M.Huh,P.Agrawal,andA.A.Efros,makesimagenetgoodfortransferlearn-

arXivpreprintarXiv:1608.08614
,2016.
[47]
MonsoonPowerhttps://www.msoon.com/LabEquipment/PowerMonitor/.
[48]
S.Liu,Y.Lin,Z.Zhou,K.Nan,H.Liu,andJ.Du,deepmodelcompression
formobiledevices:Ausage-drivenmodelselectionframew2018.
80
[49]
D.B.Walther,B.Chai,E.Caddigan,D.M.Beck,andL.Fei-Fei,linedraw-
ingssu˚ceforfunctionalmridecodingofnaturalscene
Proceedingsofthe
NationalAcademyofSciences
,vol.108,no.23,pp.96612011.
[50]
M.Abadi,P.Barham,J.Chen,Z.Chen,A.Davis,J.Dean,M.Devin,S.Ghemawat,
G.Irving,M.Isard
etal.
,ensor˛ow:asystemforlarge-scalemachinein
OSDI
,vol.16,2016,pp.
[51]
K.Soomro,A.R.Zamir,andM.Shah,Adatasetof101humanactionsclasses
fromvideosinthe
arXivpreprintarXiv:1212.0402
,2012.
[52]
I.GuyonandA.Elissee˙,introductiontovariableandfeature
Journal
ofmachinelearningresearch
,vol.3,no.Mar,pp.82,2003.
[53]
H.Liu,K.Simonyan,andY.Yang,Di˙erentiablearchitecturesearc
arXiv
preprintarXiv:1806.09055
,2018.
[54]
C.Liu,B.Zoph,M.Neumann,J.Shlens,W.Hua,L.-J.Li,L.Fei-Fei,A.Yuille,
J.Huang,andK.Murphy,eneuralarchitecturesearcin
Proceedingsof
theEuropeanConferenceonComputerVision(ECCV)
,2018,pp.
[55]
B.Zoph,V.Vasudevan,J.Shlens,andQ.V.Le,transferablearchitecturesfor
scalableimagein
ProceedingsoftheIEEEconferenceoncomputervision
andpatternrecognition
,2018,pp.
[56]
T.-J.Yang,A.Howard,B.Chen,X.Zhang,A.Go,M.Sandler,V.Sze,andH.Adam,
Platform-awareneuralnetworkadaptationformobilein
Pro-
ceedingsoftheEuropeanConferenceonComputerVision(ECCV)
,2018,pp.
[57]
ORDROEP5Head-MountedCamhttp://www.ordro.com.cn/product_detail/
204,2018.
[58]
M.A.Khan,W.Ectors,T.Bellemans,D.Janssens,andG.Wets,v-basedtraf-
˝canalysis:auniversalguidingframeworkbasedonliteraturesurveyELSEVIER
SCIENCEBV,2017.
[59]
DJIMavichttps://www.dji.com/mavic,2018.
[60]
NVIDIAJetsonAGXXavierDeveloperhttps://developer.nvidia.com/
embedded/buy/jetson-xavier-devkit,2018.
81