The Effect Of Missing Data On The Accuracy Of CDM-Classification

Ömür Kaya KALKAN; Tahsin Oğuz BAŞOKÇU

The Academic Events Group, 6th INTERNATIONAL CONFERENCE ON EDUCATION

Ömür Kaya KALKAN, Tahsin Oğuz BAŞOKÇU

Last modified: 2017-07-28

Abstract

The problem of missing data is frequently encountered in researches in educational and behavioral sciences where Kayıp veri ile yürütülen analizler yanlı parametre kestirimi, standart hatanın artması, bulguların genellenebilirliğinin zayıflaması, bilgi kaybı gibi ciddi problemlere neden olabilmektedir ( Dong & Peng, 2013; Finch, 2010; Izquierdo & Pedrero, 2014; Rubin, 2003 ).Bireylerden veri elde etmek amacıyla, test veya anketlerin kullanıldığı eğitim ve davranış bilimleri araştırmalarında, kayıp veri problemi ile oldukça sık karşılaşılmaktadır.tests or questionnaires are used in order to obtain data from individuals. Analyzes that are conducted with missing data can cause serious problems such as biased parameter estimation, increased standard error, weakening of the generalizability of findings and information loss (Dong & Peng, 2013; Izquierdo & Pedrero 2014; Rubin, 2003). It was reported that missing data lead to biased parameter and latent class estimations also in Son dönemlerde araştırmacıların yoğun ilgisini çeken, bilişsel tanı modellerinde de kayıp verinin yanlı parametre ve örtük sınıf kestirimlerine neden olduğu belirtilmektedir (Başokçu, Kalkan & Öğretmen, 2016).also in cognitive diagnosis models (CDMs) (Başokçu, Kalkan & Öğretmen, 2016).Bu araştırmanın amacı, Q matriste maddelerle ilişkilendirilen özellik sayısı yoğunluğu ile kayıp veri etkileşiminin, sınıflandırma doğruluğu üzerindeki etkisini incelemektir. The purpose of this study is to examine the effect of number of features associated with Q matrix items and missing data interaction on the accuracy of classification. For this purpose, two manipulation conditions were specified, one being the items associated with a single feature (simple format) and the other being the items associated with two or more features (complex format), for the number of features associated with the items. Q matris özellik sayısının 4, madde sayısının 24 olduğu, g ve s parametre değerlerinin 0.1-0.3 arasında değiştiği, 3000 kişilik, 100 veri seti üretilmiştir.100 data sets of 3000 persons, in which the number of features in the Q matrix was 4, the number of items was 24, and the parameter values of g and s varied between 0.1-0.3, were generated. 10% of the data from 8 items in simple and complex formats were removed randomly and the multiple imputation (MI) method was used for estimation of missing data. The results showed that the complex items containing missing data and imputed data provided a higher percentage of accurate classification rates more than simple items with DINA model.