Calendars

6 pages
9 views

A relation-based contextual approach for efficient multimedia analysis

of 6
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Share
Description
A relation-based contextual approach for efficient multimedia analysis
Transcript
  Arelation-basedcontextualapproachforecientmultimediaanalysis    E.Spyrou,G.Tolias,Ph.Mylonas  Image,VideoandMultimediaLaboratory   NationalTechnicalUniversityofAthens  ZographouCampus,PC15780,Athens,Greece    { espyrou,gtolias,fmylonas   } @image.ntua.gr   Abstract    Inthispaperwepresentourresearchworkonthe   identicationofhigh-levelconceptswithinmultimedia   documentsthroughtheintroductionandutilizationof   contextualrelations.Aconceptualontologyisintro-  duced,asthemeansofexploitingthevisualcontextof   images,intermsofhigh-levelconceptsandregiontypes   theyconsistof.Ameaningfulcombinationofthese    featuresresultsinacomputationallyecienthandling   ofvisualcontextandextractionofmid-levelcharac-  teristicstowardstheultimategoalofsemanticmulti-  mediaanalysis.Evaluationresultsarepresentedona   medium-sizedataset,consistingof1435images,25re-  giontypesand6high-levelconceptsderivedfromthe   beachdomainofinterest.   1Introduction     Mostcurrentcontent-basedimageanalysisandre-  trievalsystemsarelimitedbytheexistingstate-of-the-  artinimageunderstanding,inthesensethatthey   usuallyfallshortofhigher-levelinterpretationandex-  ploitationofcontextualknowledge.Combiningthelat-  terwithtraditionalimageorsceneclassicationtech-  niques,inordertoachievebettersemanticresultsdur-  ingthecontentanalysisphase,formsachallengingand   broadresearchproblem.Itwasonlyrecently,thatmul-  timediaanalysissystemshavestartedusingsemantic   knowledgetechnologies,astheyaredenedbynotions  likeontologies[17]orfolksonomies[6]. Amongthemostinterestingtasksinmultimedia   contentanalysisisthedetectionofhigh-levelcon-  ceptswithinmultimediadocuments.Acknowledging   theneedforprovidingsuchananalysis,manyresearch   eortssetfocusonlow-levelfeatureextractioninaway   toecientlydescribethevariousaudiovisualcharacter-  isticsofamultimediadocument.However,thewidely   discussed\semanticgap"[13]characterizesthedier-  encesbetweendescriptionsofamultimediaobjectby   dierentrepresentationsandthelinkingfromlow-to   high-levelfeatures.Animportantstepfornarrowing   thisgapistoautomatetheprocessofsemanticfeature   extractionofmultimediacontentobjects,byenhancing   imageandvideoclassicationwithsemanticcharacter-  isticsandknowledge. Plentyofindicativeworksexisttowardsthesolu-  tionofthisproblem.In[5]amultimediaanalysisand   retrievalsystemispresented,usingmulti-modalma-  chinelearningtechniquesinordertomodelsemantic   conceptsinvideos.Moreover,in[14],aregion-based   approachincontentretrievalthatusesLatentSemantic   Indexing(LSI)techniquesisproposed.In[11]thefea-  turesareextractedbysegmentedregionsofanimage. Also,in[19]aregion-basedapproachispresented,that  usesknowledgeencodedintheformofanontology.In   [1]ahybridthesaurusapproachispresented.Finally,  alexiconusedinanapproachforaninteractivevideo   retrievalsystemispresentedin[4]. Furthermore,theaspectofcontextualknowledgein   multimediaisintroducedin[18]and[8],asanextra   sourceofinformationforbothobjectdetectionand   sceneclassication.Recentresearcheortsinvesti-  gatedcertaintypesofcontext,suchasspatialcontext  [20](i.e.topologicalrelationshipsbetweenregionsin   thesamescene),temporalcontext[3](withinvideose-  quencesorbetweenimagesbelongingtoaparticular  imagecollection),orimagingcontext[2],inconjunc-  tionwithimagecontentfeaturesintheformsofeither  low-levelfeatures(e.g.color,texture,shape)orseman-  ticconcepts(e.g.  sky    ,   vegetation     ,   sand    ,and    sea    ).In   therestofourpaperweshallrefertotheterm     visual  context    ,byinterpretingitas   allinformationrelated    tothevisualscenecontentofastillimageor   videosequencethatmaybeusefulduringits   analysisphase    . Thestructureofthispaperisasfollows:Section2    2008 Third International Workshop on Semantic Media Adaptation and Personalization 978-0-7695-3444-2/08 $25.00 © 2008 IEEEDOI 10.1109/SMAP.2008.3915  dealswiththeproposedmid-levelconceptualization.In   Section3theoverallfuzzycontextknowledgeformal-  izationisdescribed,togetherwiththeproposedcon-  textualadaptationintermsofthevisualcontextalgo-  rithmanditsoptimizationsteps.Section4listsourex-  perimentalresultsderivedfromthe    beach    domainand   Section5concludesbrieyourwork.  2Concepts'DetectionusingaRegion    Thesaurus    Inthissectionweproposetotackletheproblemof  high-levelconceptdetectionthroughaninnovativeway   basedonmid-levelinformationandfeatures[15].This  researcheorthasbeeninitiallydiscussedin[9]and   [16]andisfurtherexpandedandstrengthenedherein. Ingeneral,visualfeaturesthatmaybeextractedfroma   stillimageorvideodocumentcanbedividedintwoma-   jorcategories:  low     -levelvisualfeatures,whichmaypro-  videaqualitativeorquantitativedescriptionofthevi-  sualproperties,and    high    -levelfeatures,whichdescribe   thevisualcontentofanimageintermsofitssemantics. Onefundamentaldierencebetweenthosecategoriesis  thatlow-levelfeaturesmaybecalculateddirectlyfrom    animageorvideo,whilehigh-levelfeaturescannotbe   directlyextractedbutareoftendeterminedbyexploit-  ingthelow-levelfeatures.Inthefollowingweshall  describebrieytheextractionoflow-levelfeaturesand   theconstructionofacorresponding    regionthesaurus    ,  whereasFigure1presentstheoverallmethodology.  Figure 1. High-level concept detection algo-rithm AfteratrivialstepofbasicMPEG-7colorandtex-  turefeaturesextraction,thenextimportantstepaims  tobridgethelow-levelfeaturestothehigh-levelcon-  cepts.Toachievethis,weconstructavisualdictionary   andwithitsaidweformamid-levelimagedescription. Thisdescriptionwillcontainallthenecessaryinforma-  tiontoconnectanimagewitheveryvisualwordofthe   dictionary.Thisway,weachievetokeepaxed-size   imagedescriptionandfacetheproblemthatthenum-  berofsegmentedregionsisnotxed    apriori   .Moreover  thismid-leveldescriptionwillproveusefulwhencon-  textualrelationswillbeexploited. Initiallyweselecttheappropriateregiontypes.We   startfromanarbitrarylargenumberofsegmentedre-  gionsandweapplyanecient   hierarchicalclustering    algorithm[10]onthem.Afterthisprocess,eachclus-  termayormaynotcontainahigh-levelfeatureand   eachhigh-levelfeaturemaybecontainedinoneormore   clusters;i.e.theconcept   sand    mayberepresentedby   manyinstancesdieringe.g.inthecolororthetex-  ture.Moreover,inaclusterthatmaycontaininstances  fromasemanticentity(e.g.  sea    ),theseinstancescould   bemixedupwithpartsfromanothervisuallysimilar  concept(e.g.  sky    ).Finally,wedoselecttheregion   typethatrepresentseachclusterastheclosestregion   toitscentroid. Wemoveforwardbyformallydescribingthecon-  structedvisualdictionary(thesaurus)   T     asasetofvi-  sualwords   w     i   byequation(1).  T     =  w     i   ;i   = 1 :::N      T       ;w     i   ⊂ R      (1)    i   w     = R;i   = 1 :::N      Ô      (2)    i;j    w     = ∅ ;i    =  j    (3)  where    Í      Ô      isthenumberofregiontypesofthethe-  saurus(and,obviously,thenumberofclusters)and    w     i   isthe    i   -thcluster,whichisasetofregionsthat  belongto    R      ,asitispresentedinequation(1).The   regiontypesarethecentroidsoftheclusters(4)and   therestfeaturevectorsofaclusteraretheirsyn-  onyms.Byusingasignicantlylargetrainingsetof  images/keyframes,theentirethesaurusisconstructed. Itspurposeistoformalizeaconceptualizationbetween   thelow-andthehigh-levelfeaturesandfacilitatetheir  association.  z    ( w     i   ) =1 | w     i   |  r    ∈ w      i    f    ( r    ) (4)   f    ( w     i   ) = f     arg min r    ∈ w      i     d     f    ( r    ) ;z    ( w     i   )  (5)  Eachregiontypeisrepresentedbyitsfeaturevector  thatcontainsalltheextractedlow-levelinformation   forit(5).Asitisratherobvious,alow-leveldescriptor  doesnotcarryanysemanticinformation.Itonlycon-  stitutesaformalrepresentationoftheextractedvisual  featuresoftheregion.Ontheotherhand,ahigh-level  conceptcarriesonlysemanticinformation.Thus,are-  giontypeliesin-betweenthosefeatures.Itcontainsthe   necessaryinformationtoformallydescribethecolor  andtexturefeatures,butcanalsobedescribedwith   a    lower    descriptionthanthehigh-levelconcepts.I.e.,  onecandescribearegiontypeas   agreenregionwitha   coarsetexture    . Havingcalculatedthedistanceofeachregion(clus-  ter)oftheimagetoallthewordsoftheconstructed    16  thesaurus,themodelvectorthatsemanticallydescribes  thevisualcontentoftheimageisformedbykeeping   thesmallerdistanceforeachmid-levelconcept(region   type).Inparticular,themodelvectordescribingimage     p    i   ,willbe:  m      i   =  m      i   (1) ;:::;m      i   (  j    ) ;:::;m      i   ( N      T      )  ;i   = 1 :::N      K       (6)  where:  m      i   (  j    ) = min r    ∈ R      ( k    i    )  d     f    ( w     j    ) ;f    ( r    )  (7)   i   = 1 :::N      K       ;j    = 1 :::N      T      Eachmodelvectorisdenotedby    m      i   ∈ M;i   =1 :::N      K       ,where    Ì       isthesetofallmodelvectorsand    m      i   isthemodelvectorofimage/keyframe    k    i   .Morespecif-  ically,the     j    -thelementofamodelvectorcontainsthe   minimumdistanceamongstalldistancesbetweenthe     j    -thregiontypeandalltheimage'sregions(8). Afterextractingmodelvectorsfromallimagesof  the(annotated)trainingset,aneuralnetwork-based   detectoristrainedseparatelyforeachhigh-levelcon-  cept.Theinputofthedetectorsisamodelvector   m      i   describinganimage/keyframeintermsoftheregion   thesaurus.Theoutputofthenetworkisthecon-  dencethattheimage/keyframecontainsthespecic   concept.Itisimportanttoclarifythatthedetectors  aretrainedbasedonannotationperimageandnotper  region.Thesamestandsfortheiroutput,thusthey   providethecondencethatthespecicconceptexists  somewherewithintheimage/keyframeinquestion.  3Mid-levelVisualContext    Inordertofullyexploitthenotionofvisualcontext  andcombineitwiththeaforementionedmid-levelre-  giontypes,wefurtherrenetheinitialhigh-levelcon-  ceptdetectionresultsbyexploitingsolelycontextual  (incomparisontovisual)relationsbetweenhigh-level  concepts.Thisapproachdierentiatesitselffrommost  oftherelatedresearchworks,becauseitdealswith   aglobalinterpretationoftheimageandtheconcepts  thatarepresentinit.Inotherwords,high-levelcon-  ceptseitherexistordonotexistwithintheentireimage   underconsiderationandnotwithinaspecicregionof  interest(e.g.,theimagemightcontainconcept   water    ,  butthereisnoinformationregardingitsspatialalloca-  tion).Thesameapproachisadoptedbythewell-known   TRECVIDexperimentsseries[12]. Inordertofurtheradapttheresultsoflow-level,  descriptor-basedmultimediaanalysis,utilizingtheno-  tionofmid-levelregion-types,weintroduceaconcept-  basedmethod,foundedonanenhancedhigh-levelcon-  textualontology;thelatterisdescribedasasetof   con-  cepts    and    semanticrelations    betweenconceptswithin   agivenuniverse.Ingeneral,wemaydecomposesuch   anontology    O C      intotwoparts,i.e. 1.theset   C      ofallsemanticconcepts   c    i   ∈ C;i   =1 :::n    and   2.theset   R      c    i    ;c    j    ofallsemanticrelationsamongstany   twogivenconcepts   c    i   ;c    j    ;j    = 1 :::n    Moreformally:  O C      = { C;R      c    i    ;c    j    } ;R      c    i    ;c    j    : C      × C      → { 0 ;   1 } (8)  Theutilizedrelationsneedtobemeaningfullycom-  bined,soastoprovideaviewoftheknowledgethat  sucesforcontextdenitionandestimation.Since   modellingofreal-lifeinformationisusuallygoverned   byuncertaintyandambiguity,itisourbeliefthat  theserelationsmustincorporatefuzzinessintheirdef-  inition.Theconstructedontologymaybedescribed   bythe\fuzzied"versionoftheconceptontology(eq. 9),where    C      representsagainthesetofallpossiblecon-  cepts,   F      ( R      c    i    ;c    j    ) = r    c    i    ;c    j    : C      × C      → [0 ;   1] denotesafuzzy   ontologicalrelationamongsttwoconcepts   c    i   ;c    j    and    R      c    i    ;c    j    denotesthenon-fuzzysemanticrelationamongst  thetwoconcepts.ThenalcombinationoftheMPEG-  7originatingrelationsformsanRDFgraphandcon-  stitutestheabstractcontextualknowledgemodeltobe   used(Fig.2).  O f    C      =  C;r    c    i    ;c    j     ;i;j    = 1 :::n;r    c    i    ;c    j    : T     × T     → [0 ;   1] (9)  Herein,   r    ∈ R denotesafuzzyontologicalrelationand    R = { Sp;P;Ex;Ins;Loc;Pat;Pr    } (10)  denotesthesetofallavailablerelations.Ameaningful  combinationofrelationsisdescribedby:  C  ij    = ( ∪ r    ∈R r    c    i    ;c    j    p     r;ij    ) ;p    r;ij    ∈ {− 1 ;   0 ;   1 } ;i   = 1 :::n    (11)  Thevalueof    p    r;ij    isdeterminedbythesemanticsof  eachrelation    R      c    i    ;c    j    usedintheconstructionof   C  ij    .We   remindthat:  •  p    r;ij    = 1 ,ifthesemanticsof   r    t   i    ;t   j    implyitshould   beconsideredasis   •  p    r;ij    = − 1 ,ifthesemanticsof   r    t   i    ;t   j    implyitsin-  verseshouldbeconsidered    •  p    r;ij    = 0 ,ifthesemanticsof   r    c    i    ;c    j    donotallowits  participationintheconstructionofthecombined   relation    C  T      .  17  Table 1. Fuzzy semantic relations between concepts. NameInverseSymbolMeaning   SpecializationGeneralization    Sp    ( a;b    ) b    isaspecializationinthemeaningof   a    PartPartOf   P      ( a;b    ) b    isapartof   a    ExampleExampleOf   Ex    ( a;b    ) b    isanexampleof   a    InstrumentInstrumentOf   Ins    ( a;b    ) b    isaninstrumentoforisemployedby    a    LocationLocationOf   Loc    ( a;b    ) b    isthelocationof   a    PatientPatientOf   Pat   ( a;b    ) b    isaectedbyorundergoestheactionof   a    PropertyPropertyOf   Pr    ( a;b    ) b    isapropertyof   a    Thenalontologythatoccursafterthecombination   oftheaforementionedrelationsisdenotedby:  O C C      = { C;   C  ij    } ;i;j    = 1 ;:::;ni    =  j    (12)  Thegraphoftheproposedmodelcontainsnodes  (i.e.domainconcepts)andedges(i.e.anappropri-  atecombination    1     ofcontextualfuzzyrelationsbetween   concepts).Thedegreeofcondenceofeachedgerepre-  sentsfuzzinessinthemodel.Non-existingedgesimply   non-existingrelations(i.e.relationswithzerocon-  dencevaluesareomitted).Anexistingedgebetweena   givenpairofconceptsisproducedbasedonthesetof  contextualfuzzyrelationsthataremeaningfulforthe   particularpair.Eachconcepthasadierentprobabil-  itytoappearinthescene,thusaatcontextmodel  wouldnothavebeensucientinthiscase;onthecon-  trary,conceptsarerelatedtoeachother,implyingthat  thegraphrelationsusedareinfacttransitive.Thede-  greeofcondenceisimplementedusingtheRDFrei-  cationtechnique[21].  Figure 2. A fragment of the beach domain on-tology. Concept beach    is the “root” element. 1     Thecombinationofdierentcontextualfuzzyrelationsto-  wardsthegenerationofapracticallyexploitableknowledgeview    isconductedbyutilizingfuzzyalgebra'soperations,ingeneral  andthedefault   t    -norm,inparticular.   3.1 Contextualization effect Oncethecontextualknowledgestructureisnalized   andthecorrespondingrepresentationisimplemented,  avariationofthecontext-basedcondencevalueread-   justmentalgorithm[7]isappliedontheoutputofthe   neuralnetwork-basedclassier.Theproposedcontex-  tualizationapproachempowersapost-processingstep   ontopoftheinitialsetofmid-levelregiontypesex-  tracted.Itprovidesanoptimizedre-estimationofthe   initialconcepts'degreesofcondenceforeachregion   typeandupdateseachmodelvector.Intheprocess,  itutilizesthehigh-levelcontextualknowledgefromthe   constructedcontextualontology. Anestimationofeachconcept'sdegreeofmember-  shipisderivedfromdirectandindirectrelationships  oftheconceptwithotherconcepts,usingameaning-  fulcompatibilityindicatorordistancemetric.Again,  dependingonthenatureofthedomainsprovidedin   thedomainontology,thebestindicatorcouldbese-  lectedusingthe    max    orthe    min     operator,respectively. Thegeneralstructureofthedegreeofmembershipre-  evaluationalgorithmisnowasfollows: 1.Theconsidereddomainimposestheuseofado-  main(dis-)similaritymeasure:       ∈ [0 ;   1] . 2.Foreachregiontype    t   considerafuzzyset   L     t   with   adegreeofmembership         t   ( c    ) ,containingthepos-  sibleconcepts'degreesofcondence. 3.Foreachconcept   c    i   inthefuzzyset   L     t   withade-  greeofmembership         t   ( c    i   ) ,obtaintheparticular  contextualinformationintheformofitsrelations  tothesetofanyotherconcepts:  { r    c    i    ;c    j    : c    i   ;c    j    ∈ C;i    =  j    } . 4.Calculatethenewdegreeofmembership         t   ( c    ) ,  takingintoaccounteachdomain'ssimilaritymea-  sure.Inthecaseofmultipleconceptrelationsin   theontology,whenrelatingconcept   c    tomorethan    18  the    root    concept(Fig.2),anintermediateaggre-  gationstepshouldbeappliedfortheestimationof        t   ( c    ) byconsideringthe    contextrelevance    notion,   cr    c    :  cr    c    = max { r    c;c    1 ;:::;r    c;c    k     } ,   c    1 :::c    k    ∈ C      .We   expressthecalculationof        t   ( c    ) withtherecursive   formula:       n    t   ( c    ) =      n     − 1 t   ( c    ) −      (      n     − 1 t   ( c    ) − cr    c    ) (13)  where    n    denotestheiterationused.Equivalently,  foranarbitraryiteration    n    :       n    t   ( c    ) = (1 −      ) n     ·      0 t   ( c    )+(1 − (1 −      ) n     ) · cr    c    (14)  where         0 t   ( c    ) representstheinitialdegreeofmem-  bershipforconcept   c    .  4ExperimentalResults    Inthissectionweprovidesomeexperimentalresults  facilitatingtheconceptualapproachandweshalltry   todemonstratetheusefulnessofthevisualcontextal-  gorithmwhenappliedtoreal-lifemultimediacontent  problemsanddata.Wecarriedoutasetofexperi-  mentsutilizingasetof1435images,25regiontypes  and6high-levelconcepts   { sea    ,   wave    ,   sky    ,   sand    ,   rock    ,   vegetation     } derivedfromthe    beach    domain.Allimages  wereacquiredfromourpersonalimagecollectionsand   theInternet.Wefurtherutilizedaclusteringtraining   setof300images(i.e.merely21%ofthedataset)and   selected         = 0 :   125 asthebestnormalizationparame-  terfortheconsidereddomainofinterest.Typically,a   numberof   n    = 3 iterationswereused. Evaluationresultsfor6high-level   beach    concepts  arepresentedinTable2.Eachconcept'srowdis-  playstheprecisionandrecallvalues   before    and    after    theuseofcontext.Observingtheresultsitisrather  obviousthattheproposedcontextualizationalgorithm    exploitssemanticrelationsinordertofavorordisfavor  thedegreesofcondenceforthedetectionofaconcept  thatexistswithinanimage.Thus,itstrengthensthe   concepts'dierences,butatthesametimeittreats  smoothlythecondencevaluesofalmostcertaincon-  cepts(e.g.  sea    ).Finally,exploitingtheconstructed   ontologicalknowledge,thealgorithmisabletodisam-  biguatecasesofsimilarconceptsorevenconceptsbe-  ingdiculttobedetectedsolelybasedontraditional  low-levelanalysissteps.  5Conclusions    Ourresearcheortindicatesclearlythathigh-level  conceptscanbeecientlydetectedwhenanimageis  representedbyamid-levelmodelvectorwiththeaid   ofavisualthesaurusandadditionalcontextualknowl-  edge.Amongstthecorecontributionofthisworkhas  beentheimplementationofanovelmid-levelvisual  contextinterpretationutilizingafuzzy,ontology-based   representationofknowledge.Experimentalresearchre-  sultswerepresented,indicatingasignicanthigh-level  conceptdetectionoptimization(i.e.precisionimprove-  mentperconceptvariesfrom6.98%to25.86%)over  theentiredatasetutilized.  6Acknowledgements    ThisworkwaspartiallysupportedbytheEu-  ropeanCommissionundercontractsFP6-027026K-  Space,FP6-027685MESHandFP7-215453WeKnowIt. EvaggelosSpyrouispartiallyfundedbyPENED2003   ProjectOntomedia03ED475.  References    [1]N.Boujemaa,F.Fleuret,V.G.,Sahbi,H.:Visual  contentextractionforautomaticsemanticannota-  tionofvideonews.InIS&T/SPIEConf.onStorage   andRetrievalMethodsandApplicationsforMulti-  media,partofElectronicImagingsymposium,2004. [2]M.BoutellandJ.Luo,   Bayesianfusionofcam-  erametadatacuesinsemanticsceneclassication     ,in   Proc.IEEEConf.ComputerVisionPatternRecog-  nition(CVPR),Washington,DC,2004,vol.2,pp. 623{630. [3]M.Boutell,J.Luo,C.M.Brown,   Ageneralizedtem-  poralcontextmodelforclassifyingimagecollections    ,  ACMMultimediaSyst.J.,11(1),pp.82{92,Nov. 2005. [4]CeesG.M.Snoek,MarcelWorring,D.C.K.,Smeul-  ders,A.W.:Learnedlexicon-driveninteractivevideo   retrieval,2006. [5]IBM:(Marvel:Multimediaanalysisandretrieval  system)  [6]A.Mathes,   Folksonomies-CooperativeClassica-  tionandCommunicationThroughSharedMetadata    ,  ComputerMediatedCommunication-LIS590CMC,  GraduateSchoolofLibraryandInformationScience,  UniversityofIllinoisUrbana-Champaign,2004. [7]Ph.Mylonas,Th.AthanasiadisandY.Avrithis,   Improvingimageanalysisusingacontextualap-  proach    ,InProc.of7thInternationalWorkshopon    19
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks
SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!

x