University of California, Los Angeles
Advances in language testing in the past decade have occurred in three areas: (a) the development of a theoretical view that considers language ability to be multicomponential and recognizes the influence of the test method and test taker characteristics on test performance, (b) applications of more sophisticated measurement and statistical tools, and (c) the development of “communicative” language tests that incorporate principles of “communicative” language teaching. After reviewing these advances, this paper describes an interfactional model of language test performance that includes two components, language ability and test method. Language ability consists of language knowledge and metacognitive strategies, whereas test method includes characteristics of the environment, rubric, input, expected response, and relationship between input and expected response. Two aspects of authenticity are derived from this model. The situational authenticity of a given test task depends on the relationship between its test method characteristics and the features of a specific language use situation, while its interfactional authenticity pertains to the degree to which it invokes the test taker’s language ability. The application of this definition of authenticity to test development is discussed.
Since 1989, four papers reviewing the state of the art in the field of language testing have appeared (Alderson, 1991; Bachman, 1990a; Skehan, 1988, 1989, 1991). All four have argued that language testing has come of age as a discipline in its own right within applied linguistics and have presented substantial evidence, I believe, in support of this assertion. A common theme in all these articles is that the field of language testing has much to offer in terms of theoretical, methodological, and practical accomplishments to its sister disciplines in applied linguistics. Since these papers provide excellent critical surveys and discussions of the field of language testing, I will simply summarize some of the common themes in these reviews in Part 1 of this paper in order to whet the appetite of readers who may be interested in knowing what are the issues and problems of current interest to language testers. These articles are nontechnical and accessible to those who are not themselves language testing specialists. Furthermore, Skehan (1991) and Alderson (1991) appear in collections of papers from recent confer-ences that focus on current issues in language testing. These collections include a wide variety of topics of current interest within language testing, discussed from many perspectives, and thus constitute major contributions to the literature on language testing.
The purpose of this paper is to address a question that is, I believe, implicit in all of the review articles mentioned above, What does language testing have to offer to researchers and practitioners in other areas of applied linguistics, particularly in language learning and language teaching? These reviews discuss several specific areas in which valuable contributions can be expected (e.g., program evaluation, second language acquisition, classroom learning, research methodology). Part 2 of this paper focuses on two recent developments in language testing, discussing their potential contributions to language learning and language teaching. I argue first that a theoretical model of second language ability that has emerged on the basis of research in language testing can be useful for both researchers and practitioners in language learning and language teaching. Specifically, I believe it provides a basis for both conceptualizing second language abilities whose acquisition is the object of considerable research and instructional effort, and for designing language tests for use both in instructional settings and for research in language learning and language teaching. Second, I will describe an approach to characterize the authenticity of a language task which I believe can help us to better understand the nature of the tasks we set, either for students in instructional programs or for subjects in language learning research and which can thus aid in the design and development of tasks that are more useful for these purposes.
PART 1: LANGUAGE TESTING IN THE 1990s
In echoing Alderson’s (1991) title, I acknowledge the commonal-ities among the review articles mentioned above in the themes they discuss and the issues they raise. While each review emphasizes specific areas, all approach the task with essentially the same rhetorical organization: a review of the achievements in language testing, or lack thereof, over the past decade; a discussion of areas of likely continued development; and suggestions of areas in need of increased emphasis to assure developments in the future. Both Alderson and Skehan argue that while language testing has made progress in some areas, on the whole “there has been relatively little progress in language testing until recently” (Skehan, 1991, p. 3). Skehan discusses the contextual factors—theory, practical consider-ations, and human considerations—that have influenced language testing in terms of whether these factors act as “forces for conserva-tism” or “forces for change” (p. 3). The former, he argues, “all have the consequence of retarding change, reducing openness, and gen-erally justifying inaction in testing” (p. 3), while the latter are “pres-sures which are likely to bring about more beneficial outcomes” (p. 7). All of the reviews present essentially optimistic views of where language testing is going and what it has to offer other areas of applied linguistics. I will group the common themes of these reviews into the general areas of (a) theoretical issues and their im-plications for practical application, (b) methodological advances, and (c) language test development.
THEORETICAL ISSUES
One of the major preoccupations of language testers in the past decade has been investigating the nature of language proficiency. In 1980 the “unitary competence hypothesis” (Oller, 1979), which claimed that language proficiency consists of a single, global ability was widely accepted. By 1983 this view of language proficiency had been challenged by several empirical studies and abandoned by its chief proponent (Oller, 1983). The unitary trait view has been replaced, through both empirical research and theorizing, by the view that language proficiency is multicomponential, consisting of a number of interrelated specific abilities as well as a general ability or set of general strategies or procedures. Skehan and Alderson both suggest that the model of language test performance proposed by Bachman (1990b) represents progress in this area, since it includes both components of language ability and characteristics of test methods, thereby making it possible “to make statements about actual performance as well as underlying abilities” (Skehan, 1991, p. 9). At the same time, Skehan correctly points out that as research progresses, this model will be modified and eventually superseded. Both Alderson and Skehan indicate that an area where further progress is needed is in the application of theoretical models of language proficiency to the design and development of language tests. Alderson, for example, states that “we need to be concerned not only with . . . the nature of language proficiency, but also with language learning and the design and researching of achievementtests; not only with testers, and the problems of our professionalism,but also with testees, with students, and their interests, perspectivesand insights” (Alderson, 1991, p. 5).
A second area of research and progress is in our understanding of the effects of the method of testing on test performance, A number of empirical studies conducted in the 1980s clearly demonstrated that the kind of test tasks used can affect test performance as much as the abilities we want to measure (e.g., Bachman & Palmer, 1981, 1982, 1988; Clifford, 1981; Shohamy, 1983, 1984). Other studies demonstrated that the topical content of test tasks can affect performance (e.g., Alderson & Urquhart, 1985; Erickson & Molloy, 1983). Results of these studies have stimulated a renewed interest in the investigation of test content. And here the results have been mixed. Alderson and colleagues (Alderson, 1986, 1990; Alderson & Lukmani, 1986; Alderson, Henning, & Lukmani, 1987) have been investigating (a) the extent to which “experts” agree in their judgments about what specific skills EFL reading test items measure, and at what levels, and (b) whether these expert judgments about ability levels are related to the difficulty of items. Their results indicate first, that these experts, who included test designers assessing the content of their own tests, do not agree and, second, that there is virtually no relationship between judgments of the levels of ability tested and empirical item difficulty. Bachman and colleagues, on the other hand (Bachman, Davidson, Lynch, & Ryan, 1989; Bachman, Davidson, & Milanovic, 1991; Bachman, Davidson, Ryan, & Choi, in press) have found that by using a content-rating instrument based on a taxonomy of test method characteristics (Bachman, 1990b) and by training raters, a high degree of agreement among raters can be obtained, and such content ratings are related to item difficulty and item discrimina-tion. In my view, these results are not inconsistent. The research of Alderson and colleagues presents, I believe, a sobering picture of actual practice in the design and development of language tests。
