You are here

Öğrenci başarılarının belirlenmesi sınavından klasik test kuramı, tek ve çok boyutlu madde tepki kuramı modelleri ile kestirilen başarı puanlarının karşılaştırılması

A comparison of estimated achivement scores obtained from student achievement assessment test utilizing classical test theory, unidimensional and multidimensional IRT

Journal Name:

Publication Year:

Author NameUniversity of AuthorFaculty of Author
Abstract (2. Language): 
The focus of this research is to test the estimation of achievement measurements in the test battery and to empirically compare the results after applying classical test theory, unidimensional and multidimensional item response theory models to Student Achievement Assessment Test (ÖBBS- 2008) subtests of Turkish and Mathematics. It also tries to put forward the best model that estimates students’ achievement with less error as the comparison is being made. From the analysis of Turkish test's data results, it is identified that the ability parameters estimated obtained from the whole test under multidimensional IRT, have partially less error scores and reached more precise measurement than ability parameters estimated obtained from unidimensional IRT on the basis of sub dimensions and test scores obtained from CTT. Similar results were obtained in mathematics test results. Finally, it is found that parameters, obtained within the scope of multidimensional IRT, have partially less error scores.
Abstract (Original Language): 
Bu araştırmada, bir test bataryasındaki başarı ölçüleri kestiriminin doğruluğunun belirlenmesi ve ampirik olarak Klasik Test Kuramı (KTK), tek ve çok boyutlu Madde Tepki Kuramı (MTK) modellerinin Öğrenci Başarılarının Belirlenmesi Sınavı’nın (ÖBBS-2008) Türkçe ve matematik alt testi verilerine uygulanarak elde edilen başarı ölçülerinin karşılaştırılması amaçlanmıştır. Bu karşılaştırmalar yapılırken başarı ölçülerini daha az hata ile kestiren en iyi model ortaya konulmaya çalışılmıştır. Türkçe testi verilerinin analizi sonucunda tüm testten çok boyutlu MTK ile kestirilen yetenek parametrelerinin alt boyutlar bazında tek boyutlu MTK’ye göre kestirilen yetenek parametreleri ve KTK’ye göre elde edilen test puanlarına kıyasla kısmen daha düşük standart hataya sahip olduğu belirlenmiştir. Matematik testi verilerinin analizi sonucunda, yetenek parametrelerinin kestiriminde en düşük hatanın çok boyutlu MTK’ye göre; en yüksek hatanın ise matematik testinin alt boyutlarından tek boyutlu MTK ve tüm testten KTK’ye göre belirlenen puanlardan elde edildiği belirlenmiştir.
20
44

REFERENCES

References: 

Ackerman, T.A. (1989). Unidimensional IRT Calibration of Compensatory and Non-
Compensatory Multidimensional Items. Applied Psychological Measurement, 13, 113–127.
Ackerman, T. A. and Davey, T. C. (1991). Concurrent adaptive measurement of multiple abilities.
Paper presented at the annualmeeting of the American Educational Research Association,
Chicago.
Adams, R. J., Wilson, M., and Wang, W.C. (1997). The Multidimensional Random Coefficients
Multinomial Logit Model. Applied Psychological Measurement, 21, 1–23.
Anderson, J.O. (1999). Does Complex Analysis (IRT) Pay Any Dividends in Achievement
Testing?.The Alberta Journal of Educational Research, XLV,344-352.
Ansley, T.N. and Forsyth, R.A. (1985). An Examination of The Characteristics of Unidimensional
IRT Parameter Estimates Derived from Two-Dimensional Data. Applied Psychological
Measurement, 9, 37–48.
Baykul, Y. (2000). Eğitimde ve Psikolojide Ölçme: Klasik Test Teorisi ve Uygulanması. Ankara: ÖSYM
Yayınları.
Bock, R. D., Thissen, D. and Zimowski, M. F. (1997). IRT Estimation of Domain Scores. Journal of
Educational Measurement, 37(3), 197–211.
Chang, Y.W. (1992). A Comparison of Unidimensional and Multidimensional IRT Approaches to Test
İnformation in a Test Battery. Unpublished doctoral dissertation, University of Minnesota.
Courville, T. G. (2005). An Empirical Comparison of Item Response Theory and Classical Test Theory
Item/Person Statistics. Unpublished doctoral dissertation, Texas A&M University.
Çelen, Ü. (2008). Klasik Test Kuramı ve Madde Tepki Kuramına Dayalı Olarak Geliştirilen İki Testin
Psikometrik Özelliklerinin Karşılaştırılması. Yayımlanmamış Doktora Tezi, Ankara Üniversitesi
Eğitim Bilimleri Enstitüsü, Ankara.
de la Torre J. and Patz R. J.(2005). Making The Most of What We Have: A Practical Application of
Multidimensional IRT in Test Scoring. Journal of Educational and Behavioral Statistics, 30, 295–
311.
Özer Özkan, Y. (2014). A comparison of estimated achivement scores obtained from student achievement assessment
test utilizing classical test theory, unidimensional and multidimensional IRT. International Journal of Human
Sciences, 11(1), 20-44. doi: 10.14687/ijhs.v11i1.2739
41
Demirtaşlı, Ç.N. (2002). A Study Of Raven Stndart Progressive Matrices Tests’ Item Measures
Under Clasic and Item Response Models: An Empirical Comparison. Ankara University,
Journal of Faculty of Educational Sciences, 35, 1-2.
Drasgow, F. and Parsons, C.K. (1983). Application of Unidimensional Item Response Theory
Models to Multidimensional Data. Applied Psychological Measurement,7,189–199.
Elias, S., Hattie, J., and Douglas, G. (1998). An Assessment of Various Item Response Model and
Structural Equation Model Fit İndices to Detect Unidimensionality. Paper presented at the annual
meeting of the National Council on Measurement in Education, San Diego, CA.
Embretson, S.E. and Reise, S.P. (2000). Item Response Theory for Psychologists. New Jersey: Lawrence
Erlbaum Associates.
Fan, X. (1998). Item Response Theory and Classical Test Theory: An Empirical Comparison of
Their Item/Person Statistics. Educational and Psychological Measurement, 58, 357–381.
Haberman, S. J. (2008). When Can Subscores Have Value?.Journal of Educational and Behavioral
Statistics, 33 (2), 204–229.
Haberman, S. J. and Sinharay, S. (2010a). Reporting of Subscore Using Multidimensional Item
Response Theory, Psychometrika 75 (2), 209–227.
Haladyna, T. M. and Kramer, G. A. (2004). The Validity of Subscores for a Credentialing Test.
Evaluation and the Health Professions, 27 (4), 349–368.
Harrison, D.A. (1986). Robustness of IRT Parameter Estimation to Violations of The
Unidimensionality Assumption, Journal of Educational Statistics, 11, 91–115.
Hwang, D.Y. (2002). Classical Test Theory and Item Response Theory: Analitical and Empirical
Comparison. Speeches/meeting paper, presented at the Annual Meeting of the Southwest
Educational Research Association.
Jimelo L. and Silvestre-Tipay. (2009). Item Response Theory and Classical Test Theory: An
Empirical Comparison of Item/Person Statistics in a Biological Science Test. The International
Journal of Educational and Psychological Assessment, 1(1), 19-31.
Kelderman, H. (1996). Multidimensional Rasch Models for Partial-Credit Scoring. Applied
Psychological Measurement, 20, 155–168.
Köse, A. (2010). Madde Tepki Kuramına Dayalı Tek Boyutlu ve Çok Boyutlu Modellerin Test Uzunluğu ve
Örneklem Büyüklüğü Açısından Karşılaştırılması. Yayımlanmamış Doktora Tezi, Ankara
Üniversitesi Eğitim Bilimleri Enstitüsü, Ankara.
Lawson, S. (1991). One Parameter Latent Trait Measurement: Do The Results Justify The Effort?.
In B. Thompson (Ed.), Advances in Educational Research: Substantive Findings, Methodological
Developments, Greenwich, CT: JAI Press, 1, 159–168.
Luecht R. M. (2003). Applications of Multidimensional Diagnostic Scoring for Certification and
Licensure Tests. Paper presented at the meeting of the National Council on Measurement in Education,
Chicago, IL.
MacDonald, P. and Paunonen, S. (2002). A Monte Carlo Comparison of Item and Person Statistics
Based on İtem Response Theory Versus Classical Test Theory. Educational and Psychological
Measurement, 62, 921–943.
MEB (2009). İlköğretim Öğrencilerinin Başarılarının Belirlenmesi Raporu-Türkçe, Matematik, Fen Bilgisi,
Sosyal Bilgiler. Eğitim Araştırma ve Geliştirme Dairesi Başkanlığı.
Ndalichako, J. L. and Rogers,W. T. (1997). Comparison of Finite State Score Theory, Classical Test
Theory, and Item Response Theory in Scoring Multiple-Choice Items. Educational and
Psychological Measurement, 57, 580–589.
Progar, S. and Sočan ,G. (2008). An Empirical Comparison of Item Response Theory and Classical
Test Theory, Horizons of Psychology, 17 (3), 5–24.
Rogers, W.T. and Ndalichako, J. (2000). Number-Right, Item-Response, and Finite-State Scoring:
Robustness With Respect to Lack of Equally Classifiable Options and Item Option
Dependence, Educational and Psychological Measurement, 60(1), 5–19.
Özer Özkan, Y. (2014). A comparison of estimated achivement scores obtained from student achievement assessment
test utilizing classical test theory, unidimensional and multidimensional IRT. International Journal of Human
Sciences, 11(1), 20-44. doi: 10.14687/ijhs.v11i1.2739
42
Rost, J. and Carstensen, C. H. (2002). Multidimensional Rasch Measurement Via Item Component
Models and Faceted Designs. Applied Psychological Measurement, 26, 42–56.
Sinharay, S., Haberman, S. J., and Puhan, G. (2007). Subscores Based on Classical Test Theory: to
Report or Not to Report. Educational Measurement: Issues and Practice, 26 (4), 21–28.
Spencer, G.S. (2004). The Strength of Multidimensional Item Response Theory in Exporing Consrtuct Space
That is Multidimensional and Corralated. Unpublished doctoral dissertation, Brigam Young
University.
Stout, W. (1987). A nonparametric approach for assessing latent trait unidimensionality.
Psychometrica, 52, 589–617.
Stout, W. F., Douglas, J., Junker, B. and Roussos, L.A. (1993). DIMTEST manual. Unpublished
manuscript available from W. F. Stout, University of Illinois at Urbana-Champaign,
Champaign.
Sünbül, Ö. (2011). Çeşitli Boyutluluk Özelliklerine Sahip Yapılarda, Madde Parametrelerinin
Değişmezliğinin Klasik Test Teorisi, Tek Boyutlu Madde Tepki Kuramı ve Çok Boyutlu
Madde Tepki Kuramı Çerçevesinde İncelenmesi. Yayımlanmamış doktora tezi, Mersin
Üniversitesi Sosyal Bilimler Enstitüsü, Mersin.
Tatlıdil, H. (2002).Uygulamalı Çok Değişkenli İstatistiksel Analiz. Ankara: Akademi Matbaası.
Tate, R. L. (2004). Implications of Multidimensionality for Total Score and SubscorePerformance.
Applied Measurement in Education, 17(2), 89–112.
Tomkowicz, J.ve Rogers, W.T. (2005). The Use of One-, Two-, and Three-Parameter and Nominal
Item Response Scoring in Place of Number-Right Scoring in the Presence of Test-Wiseness,
The Alberta Journal of Educational Research, 51(3),200–215.
Traub, R.E (1983). A Priori Consideration In Choosing An Item Response Model.In R.K.
Van der Linden, W. J. and Hambleton, R. K. (Eds.) (1997). Handbook of Modern Item Response Theory.
New York: Springer.
Walker, C.M. ve Beretvas, S.N. (2003). Comparing Multidimensional and Unidimensional
Proficiency Classifications: Multidimensional IRT As a Diagnostic Aid. Journal of Educational
Measurement, 40 (3), 255-275.
Way, W. D., Ansley, T.N. and Forsyth, R. A. (1988). The Comparative Effects of Compensatory
and Non-Compensatory Two Dimensional Data on Unidimensional IRT estimates. Applied
Psychological Measurement, 12, 239–252.
Wiberg, M. (2012). Can a multidimensional test be evaluated with unidimensional item response
theory? Educational Research and Evaluation, 18(4): 307-320
Yao, L. and Schwarz R. (2006). A Multidimensional Partial Credit Model with Associated İtem and
Test Statistics: An Application to Mixed Format Tests. Applied Psychological Measurement, 30,
469–492.
Yao, L. (2009). Reporting Valid and Reliable Overall Score and Domain Score. Paper presented at the
meeting of the National Council on Measurement in Education, San Diego, CA.

Thank you for copying data from http://www.arastirmax.com