In second language acquisition (SLA) research various tests are employed for theory building. However, what tests should be administered to participants in a research setting? In order to address the issue, this study was conducted to examine the reliability, validity of tests themselves, and the difference in results according to the difference of test-types, focusing on cross-sectional restrictive relative clause acquisition and relative clause test-types. Among the variety of relative clause tests, four test-types that appear frequently in recent SLA academic publications were selected. The four test-types were Translation, Cloze Procedure, Grammaticality Judgment, and Sentence Combining. The results from 120 Japanese students indicate the following. First, Sentence Combining shows high reliability in internal consistency and the highest validity. Second, the research results change across the different test-types. Third, the concept of 'test-type (task) related interlanguage variability' should be explained by the combination of quality-related issues such as 'measurement error' and the cognitive demands each test-type requires of the subjects. Implications for the practical issues of language test construction for SLA research, educational evaluation, and the directions for further research are also discussed.


