Court Filings

16 pages
7 views

Linear mixed modeling for data from a doublemixed factorial design with covariates. A case study on semantic categorization response times.

of 16
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Share
Description
Linear mixed modeling for data from a doublemixed factorial design with covariates. A case study on semantic categorization response times.
Transcript
  Linear mixed modeling for data from a double mixedfactorial design with covariates. A case study on se-mantic categorization response times Jorge Gonz´alez B. Department of Statistics, Pontificia Universidad Cat ´ olica de Chile, Santiago, Chile. Paul De Boeck Ohio State University, US KU Leuven, Belgium  Francis Tuerlinckx KU Leuven, Belgium. Summary . Linear mixed modeling is a useful approach for double mixed factorial designs withcovariates. It is explained how these designs are appropriate for the study of human behavioras a function of characteristics of persons and situations and stimuli in the situations. Thebehavior of subjects nested in types of persons responding to stimuli nested in types of stimulidefines a mixed factorial design. The inclusion of additional covariates of the observationalunits can help to further explain the behavior under study. A linear mixed modeling approachfor such designs allows a combined focus on fixed effects (general effects) and individualand stimulus differences in these effects. This combination has the potential to advance theintegration of two different sub-disciplines of psychology: general psychology and differentialpsychology, so that they can borrow strength from each other. An application is presented withsemantic categorization response time data from a factorial design with age groups by wordtypes and with age of acquisition as an additional covariate of the words. The results throwlight on the processes underlying the effect of age of acquisition and on individual differencesand word differences. 1. Introduction Human behavior is the topic of study in psychology and other social sciences. Behavioris always behavior of a person in a context. This has been summarized symbolically in afamous expression by Lewin (1943):  B  =  f  ( P,E  ), where  B ,  P  , and  E   stand for behavior,person, and environment or context, respectively. Often the total context is too broad andmore specific elements or aspects from the context are focused on. A generic term for therather specific elements or aspects from the context is stimulus. Examples of stimuli areother persons (social behavior), commercial products (consumer behavior), words (linguisticbehavior), etc. The study of behavior consists of relating behavior to characteristics of thepersons and of the context and stimuli therein. Roughly speaking, the characteristics can becategorical or continuous. In the application, the response time of semantic categorizationsof nouns will be studied as a function of age group (young vs. old adults) and type of objectdenoted by nouns (natural object vs. artifact object). The nouns are the stimuli. In otherapplications the stimuli can be persons, cognitive problems, emotional cues, etc. Given thatcategorical characteristics of persons and stimuli are focused on, it is evident that multipleinstantiations of each category should be included. This article has been published. Please cite it as: González, J., De Boeck, P., and Tuerlinckx, F. (2014).Linear mixed modeling for data from a double mixed factorial design with covariates. A case study on semantic categorization response times. Journal of the Royal Statistical Society: Series C (Applied Statistics), 63(2), 289-302.  2  Jorge Gonz ´ alez, Paul De Boeck and Francis Tuerlinckx  The principles of random sampling and random assignment, as a basis for generalizationand for causal inference, respectively, apply in a double sense when a double mixed factorialdesign is used. Random sampling for the double mixed factorial design refers to both sidesof the design. For the application, these are persons and nouns. Random assignment meansthat the elements are randomly assigned to the levels of the factors. For the application,this would mean that the persons are randomly assigned to age groups and that the nounsare randomly assigned to the natural object category and the artifact category. This is of course not possible. Age and word type are not manipulable. Hence, the application isan observational but not an experimental study. Although under certain conditions causalinference is also possible from observational studies (McGue et al., 2010), we do not plan todraw causal conclusions. When random assignment is not possible, the random samplingprinciple refers to sampling from the levels of the factors, in our case from the two agegroups (two subpopulations of persons) and from the two word types (two subpopulationsof nouns). As will be described in subsection 2.2, random sampling was not fully realizedin the application.Independent of the observational or experimental nature of the study, the designs thatare described are double mixed factorial designs. The categories and the experimentalconditions define the levels of fixed factors and the sampled units, the subjects and thestimuli, define the random factors. Because the mixed nature of the design (fixed andrandom) applies to the person side and to the stimulus side of the design, we use the terms double   and  mixed  . The term  factorial   is added because it is commonly used in the socialsciences and in an ANOVA context for categorical explanatory variables. This format of the study design is nicely in line with the widely accepted understanding that behavior isa function of the person and the environment as stated in the above mentioned expression B  =  f  ( P,E  ) (Lewin, 1943).Not only nominal category variables are of interest in the study of human behavior butalso continuous aspects of persons and stimuli. In the application, the age at which a word isacquired (on average) will be investigated as a potential factor in a semantic categorizationtask. In line with the terminology for analysis of covariance, the term covariate will beused for such continuous explanatory variables. When combined with the study designas described thus far, the resulting design will be called a double mixed factorial designwith covariates. From the modeling perspective we will describe next, the (categorical)factors and the covariates are all covariates, or factors, or whatever term one would prefer.Independent of their categorical or continuous nature and whether balanced or not, they arethe explanatory variables (independent variables) for a given response variable (dependentvariable).Whether the study design is observational or experimental, the statistical models thatcan be applied are the same. If the response variable is continuous, linear mixed modeling(LMM) would be appropriate, with fixed effects of the categorical factors (e.g., age groupand word type) and the covariates (e.g., age of acquisition) and with random effects due tothe random factors (subjects, stimuli) nested within the levels of the fixed factors. This isan instance of LMM for cross-classified data. This conceptually rather simple study designleads to a large number of possible model specifications, especially regarding the randomeffects. For example, for the random person effects, the structure can be as simple as arandom intercept with equal variance in the two age groups, but it can also be as complexas including a random slope and a random intercept with a different covariance structuredepending on the age group. The full complexity of random effect structures is describedin Section 3.  Double mixed designs   3 The random effect structure is of importance for mainly two reasons. First, the statis-tical test results for the fixed effects may depend on the assumed structure of the randomeffects. Practices largely differ in this regard, as one may derive from an overview by Barret al. (2013). These latter recommend a maximal structure, but the design they consid-ered is simpler than in the present application. Second, while the random effects are oftenconsidered as a background one needs to take into account for a proper test of the fixedeffects, we believe they are often also a valuable source of information. In psychology, adivision can be noticed between the study of general phenomena, as in cognitive and socialpsychology, where one is interested in the first place in fixed effects, and the study of indi-vidual differences, as in the domain of intelligence and personality, where one is interested inlatent traits. See Cronbach (1957) for a discussion. Mixed modeling offers an opportunityto integrate these two approaches to some extent, by focusing on general (fixed) effects andindividual differences (random effects) at the same time. With a mixed model for cross-classified data one can even go one step further and also investigate stimulus differencesand not just individual differences in how they relate to people’s behavior.The paper is organized as follows. We first give a description of the research questionsand the data set in Section 2. In Section 3 we describe the modeling approach and howaspects of the models are linked with the research questions. In Section 4 the data areanalyzed based on the proposed modeling approach. The paper concludes with a discussionand final comments in Section 5. 2. Research questions and data 2.1. Research questions  The data are from a study by De Deyne and Storms (2007) focusing on how the age of acquisition of words is related to the response time in a semantic categorization task. Thelog of the response time is the response variable. The design is a two-by-two design: AgeGroup ( A  = young or old) by Word Type ( T   = artifact or natural), with subjects ( s ) nestedin the two levels of   A , and with words ( w ) nested in the two levels of   T  . All subjects arecrossed with all words. Also three continuous covariates are available for the words: wordfrequency (freq), word familiarity (fam), and age of acquisition (aoa). On the basis of apreliminary analysis to be reported in Section 4, only the age of acquisition is used for afurther analysis of the data. There are no subject covariates available in this data set andwe have no information on the precise age of each subject. After the preliminary step inthe analysis, it turns out that Age Group, Word Type, and the age of acquisition are theonly explanatory variables that will be used for further analysis.Age of acquisition is considered an important characteristic of words in psycholinguistics.It is related to various types of linguistic behavior (e.g., Bonin et al., 2004). We will look atthe data from two possible theoretical perspectives related to age of acquisition. The firstperspective is that the age of acquisition determines the cumulative frequency of encounterswith the word in question and that a higher cumulative encounter frequency leads to fasterresponse times in semantic tasks. Based on this cumulative-frequency hypothesis we expectthat the effect of differences between words in age of acquisition is smaller for older subjectsthan for younger subjects. The expectation is based on the assumption that the effect of anincreasing number of encounters diminishes with the number of encounters and thus withage (as in the law of diminishing marginal returns). The second perspective is that age of acquisition is associated with the structure of the lexicon at the time of acquisition, and  4  Jorge Gonz ´ alez, Paul De Boeck and Francis Tuerlinckx  that a younger age of acquisition is associated with a more central position in the lexiconand therefore leads to faster response times in semantic tasks. Based on this structure-of-lexicon hypothesis, a stronger effect of age of acquisition may be expected for naturalobjects (e.g., animals, plants) than for artifact objects (e.g., containers, drugs, vehicles,buildings, gadgets, etc.), because centrality in the lexicon is less relevant when the semanticstructure is loose and variable, such as for artifacts. The semantic structure of artifactsis more scattered, more complex, and less stable than the semantic structure of naturalobjects. Just to give one example, technology provides us with ever new artifacts and newcategories of artifacts, for example, sensors of all various kinds. From a structure-of-lexiconpoint of view, the advantage of an earlier acquisition is therefore less clear for artifact words.In sum, on the basis of the cumulative-frequency hypothesis, we may expect an in-teraction of age of acquisition with Age Group. On the basis of the structure-of-lexiconhypothesis, we may expect an interaction of age of acquisition with Word Type. Whenboth principles apply, the corresponding two interactions would show. When both princi-ples play a role, it is also possible that one moderates the other, resulting in a second orderinteraction of age of acquisition with Age Group and Word Type.The authors of the srcinal study (De Deyne and Storms, 2007) were primarily interestedin the fixed effects of age of acquisition, and they used linear regression per age groupand per subject. We are also interested in the fixed effects of the design factors and inindividual differences and word differences, and thus in random subject and random wordeffects, because they are relevant for the kind of underlying processes. When the individualdifferences with respect to natural objects are of a different kind than individual differenceswith respect to artifacts, then this is an indication for differences in the cognitive processesfor the two word types, and it would be in line with our conjecture about the differencebetween the two lexicons. When the word differences are of a different kind depending onthe age of the subjects, then this is an indication for age differences in how the words areprocessed.The specific research questions are grouped into two categories. The first categoryconcerns the general effects (fixed effects). The second category concerns the varying effects(random effects).First, from the two perspectives, the cumulative-frequency perspective and the structure-of-lexicon perspective, we are interested in the interaction of age of acquisition with AgeGroup and Word Type, and therefore also in the lower-order effects:(a) Do young and old people differ in the speed of the semantic categorization task?(b) Does the speed of the semantic categorization task depend on the type of word?(c) Are word type differences the same in both age groups?(d) What is the effect of age of acquisition and does it depend on age group, word type,and their combination?Second, because we want to combine the two approaches, for general phenomena (fixedeffects) and for individual differences and word differences (random effects), and becausethese differences are relevant for the nature of the underlying processes, we have also researchquestions regarding the random effects:(e) Are possible individual differences in response time general or do they depend on theword type, and, when they depend on the word type, how are the individual differencescorrelated?  Double mixed designs   5 (f) Are possible word differences in response time general or do they depend on theage group, and, when they depend on the age group, how are the word differencescorrelated? 2.2. Data  Subjects  . The participants were 21 young adults from age 18 to 20, and also 21 older adultsfrom age 52 to 56. They were the only participants in the study. The young adults werestudents who participated for course credits. The older adults were volunteers knowingthey would receive a movie theatre film ticket for their participation. They all had a highereducation degree so that their educational level was roughly equivalent with the universitystudents. All participants were native Dutch speakers. It is rather common in studies inpsychology that students participate for course credit and that other participants are vol-unteers who receive a small reward. Because the sampling mechanism is not random, it isnot clear whether possible age differences can be generalized. Words  . The subjects were presented with 160 nouns that had to be categorized as refer-ring to a natural object or an artifact object and the response times were registered. Thenouns were selected from available norm lists obtained from studies described in De Deyneand Storms (2007). The norm lists concern three aspects of the nouns: familiarity, age of acquisition, and frequency. Subjects from these earlier studies have rated words on their  familiarity   on a five-points scale: “The word has been encountered or used 1-never, 2- al-most never, 3-sometimes, 4- often, 5- very often”. The subjects from these earlier studieswere also requested to estimate the  age of acquisition  , the age at which they estimated theyfirst had learned the words. The procedure is fully described and also validated by Rutset al. (2004).  Frequency   norms for these words are available from the international CELEXdata base (Baayen et al., 2003). The 160 words were selected such that the number of letters was in the range from 3 to 10, such that it was clear to which of the two categoriesa word belongs (“artifact” or “natural”), and such there is an equal number of both wordcategories. From the word selection as described, it is not clear whether possible differencesbetween the two word categories can be generalized. Actual data for the analysis  . The participants in the study were presented with all 160words and had to classify each word in one of two categories: “natural” or “artifact”. Theorder of presentation was randomized within subjects. Response times in milliseconds wereregistered. The data to be analyzed are the natural log transformed response times of correctresponses. This makes the data in fact slightly unbalanced. De Deyne and Storms (2007)report that after removing 14 words with an error rate larger than 40%, the error rates dropto 3.7% for the younger group and 3.5% for the older group. Among the remaining wordsthere are 70 denoting natural objects and 76 denoting artifact objects. 3. Modeling 3.1. Linear mixed models  As noted by Baayen et al. (2008), Freeman et al. (2010), Jaeger (2008), Quen´e and Vanden Bergh (2008) in the field of psycholinguistics, models known in different contexts ashierarchical linear models, multilevel models, or linear mixed models (e.g, Raudenbush andBryk, 2001; Verbeke and Molenberghs, 2000; Pinheiro and Bates, 2000; Snijders and Bosker,
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks
SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!

x