You are here

Educational data mining: a sample of review and study case

Journal Name:

Publication Year:

Abstract (2. Language): 
The aim of this work is to encourage the research in a novel merged field: Educational data mining (EDM). Thereby, two subjects are outlined: The first one corresponds to a review of data mining (DM) methods and EDM applications. The second topic represents an EDM study case. As a result of the application of DM in Web-based Education Systems (WBES), stratified groups of students were found during a trial. Such groups reveal key attributes of volunteers that deserted or remained during a WBES experiment. This kind of discovered knowledge inspires the statement of correlational hypothesis to set relations between attributes and behavioral patterns of WBES users. We concluded that: When EDM findings are taken into account for designing and managingWBES, the learning objectives are improved.
118-139

REFERENCES

References: 

Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., and Verkamo, A. (1996). Fast discovery of
association rules. In: U.M. (Eds. Fayyad et al.) Advances in knowledge discovery and data
mining (pp. 307-328). Massachusetts: MIT Press.
Alejandro Peña Ayala / World Journal on Educational Technology 2 (2009) 118-139
134
Ayers, E., Nugent, R., and Dean, N. (2009). A pilot study on logic proof tutoring using hints generated
from historical student data. Paper presented at the 2nd International Conference on
Educational Data Mining (EDM’09), Cordoba, Spain.
Ayala, G. and Y. Yano (1998). Collaborative learning environment based on intelligent agents. Journal
for Elsevier, Expert Systems with Applications, 14(1), 129-137.
Barnes, T., Stamper, J., Lehman, L., and Croy, M. (2008). A pilot study on logic proof tutoring using
hints generated from historical student data. Paper presented at the 1st International
Conference on Educational Data Mining (EDM’08), Montreal, Canada.
Bishop, C.M. (1995). Neural networks for pattern recognition. Oxford: Oxford University Press.
Blum, A, Song, D., and Venkataraman, S. (2004). Detection of interactive stepping stones: Algorithms
and confidence bounds. Paper presented at the 7th International Symposium in Recent
Advances in Intrusion Detection, RAID 2004, Sophia Antipolis, France.
Boser, B., Guyon, L., and Vapnik, V. (1992). A training algorithm for optimal margin classifiers. Paper
presented at the 5th Annual ACM Workshop on Computational Learning Theory, COLT,
Pittsburgh, Pennsylvania, United States.
Brandt, N., Brockhausen, P., de Haas, M., Kietz, J.U., Knobbe, A., Rem, O., and Zücker, R. (2001).
Mining multi-relational data. (Technical report, IST ProjectMiningMart, IST-11993).
Breiman, L., Friedman, J.H., Olshen, R.,and Stone, C. (1984). Classification and regression trees. New
York: Chapman & Hall.
Broomhead, D., and Lowe, D. (1988). Multivariable functional interpolation and adaptative networks.
Complex Systems, 2(1988), 321-355.
Brusilovsky, P. (1995). Towards adaptive learning environments. Paper presented at the Gesellschaft
für Informatik und der Schweizer Informatiker Gesellschaft (GISI-95), Zürich, Switzerland.
Brusilovsky, P. (2003). Adaptive and intelligent Web-based education systems. Journal for Artificial
Intelligence in Education, 13(2003), 156-169.
Burton, R.B., and Brown, J.S. (1979). An investigation of computer coaching for informal learning
activities. (Eds. Sleeman and Brown) Intelligent Tutoring Systems (pp. 77-98) New York:
Academic Press.
Chi, Z., Yan, H., and Pham, T. (1996). Fuzzy algorithms with applications to image processing and
pattern recognition. Singapore:World Scientific.
Christensen, R. (1997). Log-linear models and logistic regression. 2nd Ed. New York: Springer.
Clark, P., and Niblett, T. (1989). The CN2 induction algorithm. Journal for Machine Learning, Kluwer,
3(4), 261-283.
Cohen J., Cohen P., West S.G., and Aiken L.S. (2003). Applied multiple regression and correlation
analysis for the behavioral sciences. 3rd Ed. Mahwah: SAGE Publications Inc.
Alejandro Peña Ayala / World Journal on Educational Technology 2 (2009) 118-139
135
Corcoran, A. and Wainwright, R. (1996). Reducing disruption of superior building blocks in genetic
algorithms. Paper presented at the ACM symposium on Applied Computing, Pennsylvania,
United States.
Cortes, C. Vapnik, V. (1995). Support vector networks. Journal for Machine Learning , 20(3), 273-297.
Daya, S. (2002). Chi-square test for larger (r x c) contingency tables. Journal for Evidence-based
Obstetrics & Gynecology, 3(2), 59-60.
Dempster, A.P., Laird, N.M., and, Rubin, D.B. (1997). Maximum likelihood from incomplete data via
the EM algorithm. Journal for Royal Statistic Sociaty, 39, 1-39.
Draper, N.R., and Smith, H. (1998). Applied regression analysis. 3rd Ed. UK: Wiley Series in Probability
and Statistics.
Dumais, S., Platt, J., Heckerman, D., and Sahami, M. (1998). Inductive learning algorithms and
representations for text categorization. Paper presented at the seventh international
conference on information and knowledge management,Maryland, USA.
Everitt, B. S. (1992). The analysis of contingency tables. 2nd Ed. USA: Chapman & Hall/CRC.
Feldman, A., and Gemund, A. (2006). A two-step hierarchical algorithm for model-based diagnosis.
Paper presented at the 21st National Conference on Artificial intelligence, Boston, United
States.
Fraley, C., and Raftery, A.E. (2000). Model-based clustering, discriminant analysis, and density
estimation. (Technical Report no. 380). University ofWashington, U.S. Department of Statistics.
Fisher, D. (1987). Knowledge acquisition via incremental conceptual clustering. Journal for Machine
Learning, 2(2), 139-172.
Freund, Y., and Schapire, R. E. 1(998). Large margin classification using the perceptron algorithm.
Paper presented at the 11th Conference on Computational Learning Theory (COLT' 98),
Wisconsin, USA.
García, E., Romero, C., Ventura, S., and Castro, C. (2009). An architecture for making
recommendations to courseware authors using association rule mining and collaborative
filtering. Journal for User Model and User Adapted Interaction, 19(1-2), 99-132.
Gill, H., and Rao, P. (1996). The official guide to datawarehousing. USA: Prentice Hall.
Greenwood, P., and Nikulin,M. (1996). A guide to chi-squared testing. New York:Wiley.
Guo, Q., and Zhang, M. (2009). Implement web learning environment based on data mining. Elsevier
Knowledge Based Systems, 22 (6), 439-442.
Gwyn, A.E., and Angus, D. (1980). Testing linear versus logarithmic regression models. Journal for
Review of Economic Studies, 47(1), 275-291.
Hernandez, J., Ramirez, M.J., and Ferri, C. (2008). Introducción a la minería de datos. México:
Prentice Hall.
Alejandro Peña Ayala / World Journal on Educational Technology 2 (2009) 118-139
136
Homaifar, A., and McCormick, E. (1995). Simultaneous esign of membership functions and rule sets
for fuzzy controllers using genetic algorithms. Journal for IEEE Transactions on Fuzzy Systems,
3(2), 129-139.
IMS (2009). Instructional Management Systems Global Learning Consortium Inc
http://www.imsglobal.org. [November 20, 2009].
Jain, A.K., Murty, M.N., and Flynn, P.J. (1999). Data clustering: A review. Journal for ACM Computing
Surveys, 31(3), 264-323.
Jesus, del., Hoffmann, M., Junco, F.,and Sánchez, L. (2004). Induction of fuzzy-rule-based classifiers
with evolutionary boosting algorithms. Journal for IEEE Transactions on Fuzzy Systems,12(3),
296-308.
Jovanoski, V., and Lavrac, N. (2001). Classification rule learning with APRIORI-C. Paper presented at
the10th Portuguese Conference on Artificial Intelligence, Knowledge Extraction, Multi-agent
Systems, Logic Programming and Constraint Solving, Porto, Portugal.
Karypis, G., Han, E.H., and Kumar, V. (1999). Chameleon: Hierarchical clustering using dynamic
modeling. Journal for IEEE Computer, 32(8), 1-22.
Kavsek, B., and Lavrac, N. (2006). APRIORI-SD: Adapting association rule learning to subgroup
discovery. Journal for Applied Artificial Intelligence, 20(7), 543-583.
Kliewer, G., and Tschoke, S. (2000). A general parallel simulated annealing library and its application
in airline industry. Paper presented at the 14th International Parallel and Distributed Processing
Symposium (IPDPS 2000), Cancun,Mexico, 2000.
Kohonen, T., and Somervuo, P. (1998). Self-organizing maps of symbol strings. Journal for
Neurocomputing, 21(1-3), 19-30.
Kotsiantis, S.B. (2007). Supervised machine learning: A review of classification techniques. Journal for
Informatica, 31(3), 249-268,
Koza, J. (1990). Genetic programming: a paradigm for genetically breeding populations of computer
programs to solve problems. (Technical Report: CS-TR-90-1314) Stanford University, USA,
Computer Science Department.
Kurkova, V. (1992). Kolmogorov's theorem and multilayer neural networks. Journal for Neural
networks, 5(3), 501-506.
Lavrac, N., Dzeroski, S., and Grobelnik, M. (1991). Learning nonrecursive definitions of relations with
LINUS. Paper presented at the 5th European Working Session on Learning, Porto, Portugal.
Leung,K.S., and Wong, M.L. (2002). Data mining using grammar based genetic programming and
applications, France: Lavoisier
Alejandro Peña Ayala / World Journal on Educational Technology 2 (2009) 118-139
137
Li, Y., and Long, P. M. (1999). The relaxed online máximum margin algorithm. Paper presented at
Conference on Advances in Neural Information Processing Systems, NIPS, Denver, Colorado,
USA, Journal for New York: Springer, 46(1-3), 361-387.
Liu, Ch., and Wechsler, H. (2002). Gabor feature based classification using the enhanced fisher linear
discriminant model for face recognition. Journal IEEE Transaction Image Processing, 11(4), 467-
476.
Lyerly, S. B. (1952). The average spearman rank correlation coefficient. Journal for New York:
Springer, 17(4), 421-428.
Madhyastha, T., and Hunt, E. (2009). Mining diagnostic assessment data for concept similarity.
Journal of Educatinal Data Mining, 1(1), 1-19.
Mantel, N. (1963). Chi-square tests with one degree of freedom; extensions of the mantel- Haenszel
procedure. Journal of the American Statistical Association, 58(303), 690-700.
Mahbub Morshed, A.K.M. (2007). Is there really a “border effect?” Journal for International Money
and Finance, 26(7), 1229-1238.
Marín, J., and Shen, Q.(2008). A fuzzy-XCS classifier system with linguistic hedges. Journal for
Computational Intelligence Research, 4(4), 329–34.
Martínez, F., Hervás, C., Gutiérrez, P., Martínez, A., and Ventura, S. (2006). Evolutionary product-unit
neural networks for classification. Paper presented at the 7th Conference on Intelligent Data
Engineering and Automated Learning, Burgos, Spain.
Mathé, N., and J. Chen (1996). User-centred, indexing for adaptive information access. Journal for
User Modeling and User-Adapted Interaction, 6(2-3), 225-261.
McKinney, B.A., Crowe, J.E., Voss, H.U., Crooke, P.S., N. Barney, N., and Moore, J.H. (2006). Hybrid
grammar-based approach to nonlinear dynamical system identification from biological time
series, Journal for APS Physical. Review E, 73(2), 1-7.
Menard S. (2001). Applied logistic regression analysis. 2nd Edition, Mahwah: SAGE University Series
on Applicative Applications in the Social Sciences.
Nielsen, R.H. (1989). Theory of the backpropagation neural network. Paper presented at the
International Joint Conference on Neural Networks,Washington DC, USA.
Minaei-Bidgoli, B., and Punch, W. (2003). Using genetic algorithms for data mining optimization in an
educational Web-based system. (Eds. E. Cantú-Paz et al.). Genetic and Evolutionary
Computation, Part II. LNCS 2724 (pp.2252–2263). Berlin Heidelberg: Springer-Verlag
Moller, M. (1993). A scaled conjugate gradient algorithm for fast supervised learning. Journal for
Neural Networks, 6(4), 525-533.
Muggleton, S. (1991). Inductive logic programming. Journal for New Generation Computing, 8(4),
295-318.
Alejandro Peña Ayala / World Journal on Educational Technology 2 (2009) 118-139
138
Muggleton, S. (1995). Inverse Entailment and Progol. Journal for New Generation Computing, 13(3)
245-286.
Nishishato S. (2004). Dual scaling. 1st Ed. Mahwah: SAGE University Series on Applicative Applications
in the Social Sciences.
Otero, J., and Sánchez, L. (2005). Induction of descriptive fuzzy classifiers with the logitboost
algorithm. Journal for Soft Computing, 10(9), 825-835.
Pavlik, P., Cen, H., and Koedinger, K. (2009). Learning factors transfer analysis: Using learning curve
analysis to automatically generate domain models. Paper presented at the 2nd International
Conference on Educational Data Mining (EDM’09), Cordoba, Spain.
Platt, J. (1991). A resource allocating network for function interpolation. Journal for Neural
Computation, 3(2), 213-225.
Pechenizkiy, M., Trčka, N., Vasilyeva, E., Aalst, W., and De Bra, P. (2009). Process mining online
assessment data. Paper presented at the 2nd International Conference on Educational Data
Mining (EDM’09), Cordoba, Spain.
Pelikan,M., and Muhlenbein, H. (1999). The bivariate marginal distribution algorithm. (Eds. R. Roy, et
al.). Advances in Soft Computing-Engineering Design and Manufacturing (pp. 521-535). London:
Springer.
Peña, A. (2008). A student model based on cognitive maps. National Polytechnic Institute, Center of
Computer Research, PhD thesis.
Quinlan, J., (1986). Induction of decision trees. Journal for Machine Learning, 1(1), 81-106.
Quinlan, J. (1993). C4.5: Programs for machine learning. San Francisco:Morgan Kaufman.
Romero, C., Ventura, S., Espejo, P., and Hervas, C (2008). Data mining algorithms to classify students.
Paper presented at 1st International Conference on Educational Data Mining (EDM’08),
Montreal, Canada.
Toumazou, Ch. (1996). Circuits and systems tutorials. New York: IEEE Press.
Sánchez, L., Couso, I., and Corrales, J.(2001). Combining GP operators with SA search to evolve fuzzy
rule based classifiers. Journal of Information Sciences, 136(1-4), 175-191.
Sánchez, L., and Otero, J. (2007). Boosting fuzzy rules in classification problems under single-winner
Inference. Journal for Intelligent Systems, 22(9), 1021-1034.
Sanchez, L., Casillas, J., Cordon, O., and del Jesus, M.J. (2002). Some relationships between fuzzy and
random classi6ers and models. Journal for Approximate Reasoning, 29(2), 175–213.
Satorra, A., and Bentler, P. M. (2001). A scaled difference chi-square test statistic for momento
structure analysis. Journal for Psycometrika, 66(4), 507-514.
Snedecor, G.W., and Cochran, W.G. (1980). The sample correlation coefficient and Properties. In
Statistical Methods. 7th Ed. Ames: Iowa State Press, 175-178.
Alejandro Peña Ayala / World Journal on Educational Technology 2 (2009) 118-139
139
Steinbach, M., Karypis, G., and Kumar, V. (2000). A comparison of document clustering techniques.
Paper presented at Knowledge Discovery and Data Mining (KDD) Workshop on Text Mining,
Ljubljana. Slovenia.
Sheremetov, L. y V. Uskov (2002). Hacia la nueva generación de sistemas de aprendizaje basado en la
Web. Revista Computación y Sistemas, 584), 256-267.
Thornton, J., Savvides, M., and Vijaya Kumar, B.V.K. (2004). Linear shift-invariant maximum margin
SVM correlation filter. Paper presented at the Conference Intelligent Sensors, Sensor Networks
and Information Processing,Melbourne, Australia.
Uhr, I. (1969). Teaching machine programs that generate problems as a function of interaction with
students. Paper presented at 24th National Conference, 125-134.
Vit, E., Luckevic, M., and Misner, S. (2002). Making better business intelligence decisions faster.
Redmon,Washington:Microsfot Press.
Yao, X. (1999). Evolving artificial neural networks. Journal for IEEE, 87(9), 1423-1447.
Weinberger, K.Q., and Saul, L.K. (2009). Distance metric learning for large margin nearest neighbor
classification. Journal for Machine Learning Research, 10, 207-244.
Wildrow, B., and Lehr, M. (1990). 30 Years of adaptive neural networks: Perceptron, Madaline, and
Backpropagation. Journal for IEEE, 78(9), 1415-1442.
Wilson, S.W. (1995). Classifier fitness based on accuracy. Journal for Evolutionary Computation, 3(2),
149-176.
Witten. I. H., and Frank, E. (2005). Data mining: Practical machine learning tools and techniques. 2nd
Ed. San Francisco:Morgan Kaufmann.
Zelen, M. (1971). The analysis of several 2 x 2 contingency tables. Journal for Biometrika, 58(1), 129-
137.

Thank you for copying data from http://www.arastirmax.com