Empirical Investigation of the Stability of IRT Item-Parameters Estimation (Pages: 291-301)

Author :  

Year-Number: 2013-Volume 5, Issue 2
Language : null
Konu : null

Abstract

This study examined the effect of various sample sizes (200, 500, 1000, 5000, 10000, and 20000) and test lengths (15, 30, and 60) on the accuracy of item response theory item- parameters estimation using real test data. Estimates of item parameters were obtained by fitting the three-parameter logistic model. The main findings of this study confirmed those findings in previous studies which used simulated data in that longer tests resulted in more accurate estimates of all item parameters across different sample sizes and across different ability levels, especially at ability levels lower than zero. Item difficulty parameter appeared to be the most sensitive to fluctuations in sample size and test length; whereas, item guessing parameter appeared to be the least sensitive. On the other hand, different samples yielded comparable results in terms of accuracy in estimating the three item parameters. Finally, the minimum requirements for accurate parameters estimation tended to be 500 for sample size and 30 for test length. However, sample sizes as small as 200 can still yield acceptable estimates when combined with test lengths longer than 15.

Keywords

Abstract

This study examined the effect of various sample sizes (200, 500, 1000, 5000, 10000, and 20000) and test lengths (15, 30, and 60) on the accuracy of item response theory item- parameters estimation using real test data. Estimates of item parameters were obtained by fitting the three-parameter logistic model. The main findings of this study confirmed those findings in previous studies which used simulated data in that longer tests resulted in more accurate estimates of all item parameters across different sample sizes and across different ability levels, especially at ability levels lower than zero. Item difficulty parameter appeared to be the most sensitive to fluctuations in sample size and test length; whereas, item guessing parameter appeared to be the least sensitive. On the other hand, different samples yielded comparable results in terms of accuracy in estimating the three item parameters. Finally, the minimum requirements for accurate parameters estimation tended to be 500 for sample size and 30 for test length. However, sample sizes as small as 200 can still yield acceptable estimates when combined with test lengths longer than 15.

Keywords


  • Allen, M., & Yen, W. (1979). Introduction to measurement theory. Monterey, CA: Brooks/Cole.

  • Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. NY: CBS college publishing.

  • Fan, X. (1998). Item response theory and classical test theory: an empirical comparison of their item/person statistics. Educational and psychological measurement, 58(3), 357-381.

  • Hambleton, R. K. (1989). Principles and selected applications of item response theory. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 147-200). Washington, DC: American Council on Education and Macmillan.

  • Hambleton, R. K., Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12(3), 38-47.

  • Hambleton, R. K.,& Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer- Nijhoff Publishing.

  • Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.

  • Han, T., Kolen, M., & Pohlmann, J. (1997). A comparison among IRT true- and observed-score equatings and traditional equpercentile equating. Applied Measurement in Education, 10(2), 105-121.

  • Hulin, C. L., Lissak, R. I., & Drasgow, F. (1982). Recovery of two- and three-parameter logistic item characteristic curves: A Monte Carlo study. Applied Psychological Measurement, 6(3), 249-260.

  • Lord, F. M. (1968). An Analysis of the Verbal Scholastic Aptitude Test using Birnbaum's three-parameter logistic model. Educational and Psychological Measurement, 28, 989-1020.

  • Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.

  • Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.

  • Mislevy, R. j., & Bock, R. D. (1990). PC- BILOG- item analysis and test scoring with binary logistic models [Computer software]. Mooresville, IN: Scientific Software.

  • Ree, M. J., & Jensen, H. E. (1980). Effects of sample size on linear equating of item characteristic curve parameters. In D. J. Weiss (Ed.), Proceedings of the 1979 computerized adaptive testing conference. Minneapolis: University of Minnesota.

  • Seong, T. J. (1990). Sensitivity of marginal maximum likelihood estimation of item and ability parameters to the characteristics of the prior ability distributions. Applied Psychological Measurement, 14(3), 299-311.

  • Sireci, S. g. (1991). "Sample-Independent" Item Parameters? An investigation of the stability of IRT item parameters estimated from small data sets. Paper presented at the annual meeting of the Northeastern Educational Research Association, Ellenville, NY.

  • Stone, C. A. (1992). Recovery of marginal maximum likelihood estimates in the two-parameter logistic response model: An evaluation of MULTILOG. Applied Psychological Measurement, 16(1), 1-16.

  • Swaminathan, H., & Gifford, J. A. (1983). Esimation of parameters in the three-parameter latent trait model. In D. J. Weiss (Ed.), New horizons in testing, (pp. 9-30). New York: Academic Press.

  • Swaminathan, H., Hambleton, R., Sireci, S., Xing, D., & Rizavi, S. (2003). Small sample estimation in dichotomous item response models: Effect of priors based on judgmental information on the accuracy of item parameter estimates (Research Report LSAC-CTR-98-06). Newtown, PA: Law School Admission Council.

  • Yen, W. M., & Fitzpatrick, A. R. (2006). . In R. L. Brennan (Ed.), Educational measurement (4rd ed., pp. 111-153. American Council on Education and Praeger publishers.

                                                                                                                                                                                                        
  • Article Statistics