You are here

Gender differential item functioning on a national field-specific test: The case of PhD entrance exam of TEFL in Iran

Journal Name:

Publication Year:

Abstract (2. Language): 
Differential Item Functioning (DIF) exists when examinees of equal ability from different groups have different probabilities of successful performance in a certain item. This study examined gender differential item functioning across the PhD Entrance Exam of TEFL (PEET) in Iran, using both logistic regression (LR) and one-parameter item response theory (1-p IRT) models. The PEET is a national test consisting of a centralized written examination designed to provide information on the eligibility of PhD applicants of TEFL to enter PhD programs. The 2013 administration of this test provided score data for a sample of 999 Iranian PhD applicants consisting of 397 males and 602 females. First, the data were subjected to DIF analysis through logistic regression (LR) model. Then, to triangulate the findings, a 1-p IRT procedure was applied. The results indicated (1) more items flagged for DIF by LR than by 1-p IRT (2) DIF cancellation (the number of DIF items were equal for both males and females), as revealed through LR, (3) equal number of uniform and non-uniform DIF, as tracked via LR, and (4) female superiority in the test performance, as revealed via IRT analysis. Overall, the findings of the study indicated that PEET suffers from DIF. As such, test developers and policymakers (like NOET & MSRT) are recommended to take these findings into serious consideration and exercise care in fair test practice by dedicating effort to more unbiased test development and decision making.
63
82

REFERENCES

References: 

Ahmadi, A., & Jalili, T. (2014). A confirmatory study of Differential Item Functioning on EFL reading comprehension. Applied Research on English Language, 3(6), 55-68.
Alavi, S. M., & Karami, H. (2010). Differential item functioning and ad hoc interpretations. TELL, 4(1), 1-18.
Amirian, S. M. R., Alavi, S. M., & Fidalgo, A. M. (2014). Detecting gender DIF with an English proficiency test in EFL context. Iranian Journal of Language Testing, 4(2), 187-203.
Angoff, W. H. (1993). Perspectives on differential item functioning methodology. In P. W. Holland& H. Wainer (Eds.), Differential item functioning (pp. 3–23). Hillsdale, NJ: Lawrence Erlbaum.
Aryadoust, V., Goh, C. & Lee, O. K. (2011). An investigation of differential item functioning in the MELAB listening test. Language Assessment Quarterly, 8(4), 361–385.
Barati, H., Ketabi, S., & Ahmadi, A. (2006). Differential item functioning in high-stakes tests: the effect of field of study. IJAL, 19(2), 27-42.
Becker, B. J. (1989). Gender and science achievement: A reanalysis of studies from two meta-analyses. Journal of Research in Science Teaching, 26, 141–169.
Bolger, N., & Kellaghan, T. (1990). Method of measurement and gender differences in scholastic achievement. Journal of Educational Measurement, 27, 165–174.
78 A. Ahmadi & A. Darabi/Gender differential item …
Breland, H., & Lee, Y-W. (2007). Investigating uniform and non-uniform gender DIF incomputer-based ESL writing assessment. Applied Measurement in Education, 20, 377–403.
Burkam, D. T., Lee, V. E., & Smerdon, B. A. (1997). Gender and science learning early in high school: Subject matter and laboratory experiences. American Educational Research Journal, 34, 297–331.
Camilli, G. (2006). Test fairness. In R. L. Brennan (Ed.), Educational measurement (pp. 221-256). Westport: American Council on Education & Praeger Publishers.
Camilli, G., & Shepard, L. (1994). Methods for identifying biased test items. Thousand Oaks, CA: SAGE Publications.
Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning test items. Educational Measurement: Issues and Practice, 17, 31–47.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum.
Cole, N. S. (1997). The ETS gender study: How males and females perform in educational settings. Princeton, NJ: Educational Testing Service.
Davis, K. A., & Skilton-Sylvester, E. (2004). Looking back, taking stock, moving forward: Investigating gender in TESOL. TESOL Quarterly, 38(3), 381–404.
Douglas, D. (2014). Nobody seems to speak English here today: Enhancing assessment and training in aviation English. Iranian Journal of Language Teaching Research 2(2), 1-12
Drasgow, F. (1987). Study of the measurement bias of two standardized psychological tests. The Journal of Applied Psychology, 72, 19–29.
Ehrlich, S. (1997). Gender as social practice. Implications for second language acquisition. Studies in Second Language Acquisition, 19, 421–446.
Ehrman, M., & Oxford, R. (1988). Effects of sex differences, career choice, and psychological type on adult language learning strategies. Modern Language Journal, 72, 253–265.
Engelhard, G. (1990). Gender differences in performance on mathematics items: Evidence from the United States and Thailand. Contemporary Educational Psychology, 15, 13–26.
Fidalgo, A. M., Alavi, S. M., & Amirian, S. M. R. (2014). Strategies for testing statistical and practical significance in detecting DIF with logistic regression models. Language Testing, 31(4), 433-451.
French, B. F., & Maller, S. J. (2007). Iterative purification and effect size use with Logistic Regression for differential item functioning detection. Educational and Psychological Measurement, 67, 373–393.
Geranpayeh, A., & Kunnan, A. J. (2007) Differential item functioning in terms of age in the certificate in advanced English examination. Language Assessment Quarterly, 4(2), 190-222.
Iranian Journal of Language Teaching Research 4(1), (Jan., 2016) 63-82 79
Gierl, M, Bisanz, J., Bisanz, G., & Boughton, K. (2003). Identifying content and cognitive skills that produce gender differences in mathematics: A demonstration of the multidimensionality based DIF analysis. Journal of Educational Measurement, 40(4), 281–306. Gómez-Benito, J., Hidalgo, M. D., & Zumbo, B. D. (2013). Effectiveness of combining statistical tests and effect sizes when using logistic discriminant function regression to detect differential item functioning for polytomous items. Educational and Psychological Measurement, 73(5), 875-897.
Güler, N., & Penfield, R. D. (2009). A comparison of logistic regression and contingency table methods for simultaneous detection of uniform and non-uniform DIF. Journal of Educational Measurement, 46, 314–329.
Haeok, K. (1990). A longitudinal study of sex-related bias in mathematics subtests of the California Achievement Test. Applied Measurement in Education, 3, 275–284.
Halpern, D. (1992). Sex differences in cognitive abilities. Hillside, NJ: Lawrence Erlbaum.
Hamilton, L. S. (1999). Detecting gender-based differential item functioning on a constructed-response science test. Applied Measurement in Education, 12, 211–235.
Harding, L. (2011). Accent, listening assessment and the potential for a shared-L1 advantage: A DIF perspective. Language Testing, 29 (2), 163-180.
Harris, A., & Carlton, S. (1993). Patterns of gender differences on mathematics items on the scholastic aptitude test. Applied Measurement in Education, 6, 137–151.
Hauger, J. B., & Sireci, S. G. (2008). Detecting differential item functioning across examinees tested in their dominant language and examinees tested in a second language. International Journal of Testing, 8, 237–250.
Hidalgo, M. D., & López-Pina, J. A. (2004). Differential item functioning detection and effect size: A comparison between logistic regression and Mantel–Haenszel procedures. Educational and Psychological Measurement, 64, 903–915.
Holland, P. W., & Wainer, H. (Eds.). (1993). Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum Associates.
Hyde, J., & Linn, M., (1988).Gender differences in verbal activity: A meta-analysis. Psychological Bulletin, 104, 53–69.
Jodoin, M. G., & Gierl, M. J. (2001). Evaluating power and type I error rates using an effect size with the logistic regression procedure for DIF. Applied Measurement in Education, 14, 329–349.
Jovanovic, J., Solano-Flores, G., & Shavelson, R. J. (1994). Performance-based assessments: Will gender differences in science achievement be eliminated? Education and Urban Society, 26, 352–366.
80 A. Ahmadi & A. Darabi/Gender differential item …
Kiany, R., Shayestefar, P., Ghafar Samar, R., & Akbari, R. (2013). High-rank stakeholders’ perspectives on high-stakes University entrance examinations reform: priorities and problems. Higher Education, 65, 325–340. Kim, Y. H., & Jang, E. E. (2009). Differential functioning of reading subskills on the OSSLT for L1 and ELL students: A multidimensionality model‐based DBF/DIF approach. Language Learning, 59(4), 825-865.
Li, H., & Suen, H. K (2013). Detecting native language group differences at the subskills level of reading: A differential skill functioning approach. Language Testing, 30, 273–298.
Maller, S. J. (1997). Deafness and WISC-III item difficulty: Invariance and fit. Journal of School Psychology, 35, 299–314.
Mazor, K. M., Clauser, B. E., & Hambleton, R. K. (1992). The effect of sample size on the functioning of the Mantel-Haenszel statistic. Educational and Psychological Measurement, 52(2), 443–451.
Mellenbergh, G. J. (1982). Contingency table models for assessing item bias. Journal of Educational Statistics, 7, 105–107.
Mendes-Barnett, S., & Ercikan, K. (2006). Examining sources of gender DIF in mathematics assessments using a confirmatory multidimensional model approach. Applied Measurement in Education, 19(4), 289–304.
Näsström, G. (2004). Differential item functioning for items in the Swedish national test in mathematics course. Retrieved from http://www. Vxu.se/msi/picme1o/L2ng.PDF.
Navas-Ara, M. J., & Gómez-Benito, J. (2002). Effects of ability scale purification on the identifi-cation of dif. European Journal of Psychological Assessment, 18, 9–15.
Pae, T.(2004a). DIF for examinees with different academic backgrounds. Language Testing,21, 53–73.
Pae, T. (2004b). Gender effect on reading comprehension with Korean EFL learners. System, 32, 265–281.
Pae, T. (2012). Causes of gender DIF on an EFL language test: A multiple-data analysis over nine years. Language Testing, 29, 533–554.
Pae, T., & Park, G.-P. (2006). Examining the relationship between differential item functioning and differential test functioning. Language Testing, 23(4), 475–496.
Paek, I. (2012). A note on three statistical tests in the logistic regression DIF procedure. Journal of Educational Measurement, 49, 121–126.
Paek, I., & Wilson, M. (2011). Formulating the Rasch differential item functioning model under the Marginal Maximum Likelihood estimation context and its comparison with Mantel–Haenszel procedure in short test and small sample conditions. Educational and Psychological Measurement, 71, 1023–1046.
Iranian Journal of Language Teaching Research 4(1), (Jan., 2016) 63-82 81
Park, G.-P. (2008). Differential item functioning on an English listening test across gender. TESOL Quarterly, 42(1), 115–122.
Penfield, R.D., & Camilli, G. (2007). Differential item functioning and item bias. In S. Sinharay, & C.R. Rao (Eds.), Handbook of statistics, Volume 26 Psychometrics (pp.125-167). New York: Elsevier.
Plake, B. S. (1980a). A comparison of a statistical and subjective procedure to ascertain item validity: One step in the test validation process. Educational and Psychological Measurement, 40, 397–404.
Plake, B. S. (1980b). An investigation of the Iowa Tests of Basic Skills for sex bias: A developmental look. Psychology in the Schools, 17, 47–52.
Reeve, B. B. (2003). An introduction to modern measurement theory. Retrieved from http://appliedresearch.cancer.gov/areas/cognitive /immt.pdf.
Rezaee, A., & Shabani, E. (2010). Gender differential item functioning analysis of the University of Tehran English Proficiency Test. Pazhuhesh-e Zabanha-ye Khareji, 56, 89–108.
Rogers, H. J., & Swaminathan, H. (1993). A comparison of logistic regression and Mantel–Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement, 17, 105–116.
Runnels, J. (2013). Measuring differential item and test functioning across academic disciplines. Language Testing in Asia, 3(9), doi:10.1186/2229-0443-3-9.
Ryan, K., & Bachman, L.F. (1992). Differential item functioning on two tests of EFL proficiency. Language Testing, 9(1), 12–29.
Sireci, S.G., & Rios, J.A. (2013). Decisions that make a difference in detecting differential item functioning. Educational Research and Evaluation, 19, 170–187.
Song, X., Cheng, L., & Klinger, D. (2015). DIF investigations across groups of gender and academic background in a large scale high-stakes language test. Papers in Language Testing and Assessment 4(1), 97-124.
Stoneberg, B. D. (2004). A study of gender-based and ethnic-based differential item functioning (DIF) in the spring 2003 Idaho Standards Achievement Tests. Applying the Simultaneous Bias Test (SIBTEST) and the Mantel-Haenszel Chi Square Test. Paper for EDMS 889 Measurement-Statistics Practicum, University of Maryland, College Park. Retrieved from http://files.eric.ed.gov/fulltext/ED483777.pdf
Sunderland, J. (2000). Issues of language and gender in second and foreign language education. Language Teaching, 33, 203–223.
Takala, S., & Kaftandjieva, F. (2000). Test fairness: A DIF analysis of an L2 vocabulary test. Language Testing, 17, 323–40.
82 A. Ahmadi & A. Darabi/Gender differential item …
Thissen, D., Steinberg, L., & Wainer, H., (1993). Detection of Differential Item Functioning using the parameters of item response models. In Holland, P.W., & Wainer, H. (Eds.), Differential item functioning (pp. 35–66). Lawrence Erlbaum Associate, Hillsdale, NJ.
Thompson, B (2006). Foundations of behavioral statistics: An insight-based approach. London: The Guilford Press.
Uiterwijk, H., & Vallen, T. (2005). Linguistic sources of item bias for second generation immigrants in Dutch tests. Language Testing, 22, 211–234.
Young, D. J., & Fraser, B. J. (1994). Gender differences in science achievement: Do school effects make a difference? Journal of Research in Science Teaching, 31, 857–871.
Zandi, H., Kaivanpanah, SH., & Alavi, S.M. (2014). The effect of test specifications review on improving the quality of a test. Iranian Journal of Language Teaching Research 2(1), 1-14.
Zenisky, A., Hambleton, R., & Robin, F. (2003). Detection of differential item functioning in large scale state tests: A study evaluating a two-stage approach. Educational and Psychological Measurement, 63, 51–64.
Zhang, W. (2006). Detecting differential item functioning using the DINA model (Unpublished doctoral dissertation). University of North Carolina at Greensboro, Greensboro, NC.
Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and likert-type (ordinal) item scores. Ottawa ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.
Zumbo, B. D. (2003). Does item-level DIF manifest itself in scale-level analyses? Implications for translating language tests. Language Testing, 20(2), 136-147.
Zumbo, B. D. (2007). Three generations of DIF analysis: Considering where it has been, where it is now, and where it is going. Language Assessment Quarterly, 4(2), 223-233.
Zumbo, B. D. (2008, July). Statistical methods for investigating item bias in self-report measures. Florence Lectures on DIF and Item Bias. Lectures Conducted from Universita degli Studi di Firenze, Florence, Italy.

Thank you for copying data from http://www.arastirmax.com