Abstract— Educationaldata mining (EDM) which is an emerging area of research generally focuses onareas influencing learning systems. Wide variations of mining techniques havebeen developed in educational context. The major objective is to translate thenew data into meaningful information for a clear conclusions and thereby betterdecision making. One key area of research focuses on improving student academicperformance. This seminar focuses on a detailed study of EDM which includesgoals of educational data mining, phases, EDM techniques as well as challenges.Learning analytics which is a sister field of EDM have also been discussed.
Acomparative analysis of various algorithms used in EDM is also included. Keywords—EDM, Phases, Prediction, Clustering I. INTRODUCTION DataMining is a domain of computer science and the analysis step of the “KnowledgeDiscovery in Databases” process or KDD. It is the process of recognizingpatterns in huge data sets.
Data Mining has been applied in a great number offields including retail sales, bioinformatics, and counter terrorism. In recentyears, there has been increasing interest in the use of data mining toinvestigate scientific questions within educational research, an area ofinquiry termed Educational Data Mining (EDM).EducationalData mining is defined as the area of scientific inquiry centered around thedevelopment of methods for making discoveries within unique kinds of data thatcome from educational settings, and using those methods to better understandstudents and the settings which they learn in .On one hand the increase in bothinstrumental educational software as well as state database of student’sinformation have created large repositories of data reflecting how studentslearn. On the other hand, the use of internet in education has created a newcontext known as e-learning or web based education in which large amounts ofinformation about teaching –learning interactions are endlessly generated andubiquitously available.EDM seeks to use these data repositories to betterunderstand learners and learning, and to develop computational approaches thatcombine data and theory to transform practice to benefit learners. Learninganalytics and EDM are sister fields. Learning analytics deals with themeasurement, collection, and analysis and reporting of data about learners andtheir contexts, for purposes of understanding and optimizing learning and theenvironments in which it occurs.
EDM refers to technique, tools, and researchdesigned for automatically extracting meaning from large repositories of datagenerated by or related to people’s learning activities in educationalsettings. II. PHASES OF EDM The first phase of educationaldata mining is to find the relationships between the data of educationalenvironment using data mining techniques i.e. classification, clustering,regression etc. The second phase of educational data mining is validation of discoveredrelationships between data so that uncertainty can be avoided. The third phaseis to make predictions for future on the basis of validated relationships inlearning environment. The fourth phase is supporting decision making processwith the help of predictions.
III. METHODS OF EDM There are a wide variety of currentmethods popular within educational data mining. There are multiple taxonomies of the areas of educational datamining, one by Romeroand Ventura 2, and one by Baker 3.These methods fall into followingcategories:A. PredictionIn prediction, the goal is to develop a model which can infer asingle aspect of the data (predicted variable) from some combination of otheraspects of the data (predictor variables). Prediction requires having labelsfor the output variable for a limited data set, where a label represents sometrusted “ground truth” information about the output variable’s value inspecific cases.
In some cases, however, it is important to consider the degreeto which these labels may in fact be approximate, or incompletely reliable.Prediction has two key uses within educational data mining. In some cases,prediction methods can be used to study what features of a model are importantfor prediction, giving information about the underlying construct. This is acommon approach in programs of research that attempt to predict studenteducational outcomes without predictingintermediate or mediating factors first. In a second type of usage, predictionmethods are used in order to predict what the output value would be in contextswhere it is not desirable to directly obtain a label for that construct. Educationaldata mining methods have enabled the construction of student models of a widenumber of constructs. The Classification methods have been used to developdetectors of student affect, including frustration, boredom, anxiety, engagedconcentration, joy, and distress 2.
Detectors of affect and emotion have beenused to drive automated adaptation to differences in student affect, significantlyreducing students’ frustration and anxiety and increasing the incidence ofpositive emotion. Classification methods have also been used to developdetectors of off-task behavior, predicting differences in student learning. B. ClusteringIn clustering, the goal is to find data points that naturallygroup together, splitting the full data set into a set of clusters. Clusteringis particularly useful in cases where the most common categories within thedata set are not known in advance.
If a set of clusters is optimal, within acategory, each data point will in general be more similar to the other datapoints in that cluster than data points in other clusters.The use of clusteringin domains where a considerable amount is already known brings some risk ofdiscovering phenomena that are already known. As work in other areas of EDMgoes forward, an increasing amount is known about student behavior acrosslearning environments.
One potential future use of clustering, in thissituation, would be to use clustering as a second stage in the process ofmodeling student behavior in a learning system. First, existing detectors couldbe used to classify known categories of behavior. Then, data points notclassified as belonging to any of those known behavior categories could beclustered, in order to search for unknown behaviors.C. Relationship miningIn relationship mining, the goal is to discover relationshipsbetween variables, in a data set with a large number of variables.
This maytake the form of attempting to find out which variables are most stronglyassociated with a single variable of particular interest, or may take the formof attempting to discover which relationships between any two variables arestrongest.a) Association Rule mining: : if-then rules of the form that ifsome set of variable values is found, another variable will generally have aspecific valueb) Correlation mining: the goal is to find (positive ornegative) linear correlations between variablesc) Sequential Pattern mining: the goal is to find temporalassociations between event.d) Causal data mining: the goal is to find whether one event wasthe cause of another event. D. Discovery with modelsIn discovery with a model, a model of aphenomenon is developed via prediction, clustering, or in some cases knowledgeengineering (within knowledge engineering, the model is developed using humanreasoning rather than automated methods). This model is then used as acomponent in another analysis, such as prediction or relationship mining.In the prediction case, the created model’s predictions are usedas predictor variables in predicting a new variable. For instance, analyses ofcomplex constructs such as gaming the system within online learning havegenerally depended on assessments of the probability that the student knows thecurrent knowledge component being learned 3.
These assessments of studentknowledge have in turn depended on models of the knowledge components in adomain, generally expressed as a mapping between exercises within the learningsoftware and knowledge components. IV. APPLICATIONS OF EDM There are many applications ortasks in educational environments that have been resolved through DM.A. Analysis and Visualization of dataIt is used to highlightmeaningful information and support decision making. In the educational sector,for example, it can be helpful for course administrators and educators foranalyzing the usage information and students? activities during course to get abrief idea of a student’s learning.
Visualization information and statics arethe two main methods that have been used for this task. Statistical analysis ofeducational data can give us information like where students enter and exit,the most important pages students browse, how many number of downloads ofe-learning resources, how many number of different type of pages browsed andtotal amount of time for browsing of these different pages. It also providesinformation about reports on monthly and weekly user trends, usage summaries,how much material students will study and the series in which they studytopics, patterns of studying activity, timing and sequencing of activities.Visualization uses graphical methods to help people in understanding andanalyzing data. There are number of studies related to visualization ofdifferent educational data such as patterns of hourly, daily, seasonal andannual user behavior on online forums.B.
Predicting student performanceInstudent performance prediction, we predict the unknown values of a variablethat defines the student. In educational sector, the mostly predicted valuesare student’s performance, their marks, knowledge or score. 5 Classificationtechnique is used to combine individual items based upon quantitative traits orbased upon training set of previously labeled items. Student’s performanceprediction is very popular application of DM in education sector. Differenttechniques and models are applied for prediction of student’s performance likedecision trees, neural networks, rule based systems, Bayesian networks etc.This analysis is helpful for someone in predicting student’s performance.C.
Enrolment managementEnrolmentmanagement is frequently used in higher education to explain well-plannedstrategies and ways to shape the enrolment of college to meet planned goals. Itis an organizational concept and also a systematic set of activities designedto allow educational institutions to exert more influence over student’senrolments. Such practices often include retention programs, marketing,financial aid awarding and admission policies. 6D. Grouping StudentsIn this case groups of studentsare created according to their customized features, personal characteristics,etc.
These clusters/groups of students can be used by the instructor/developerto build a personalized learning system which can promote effective grouplearning. The DM techniques used in this task are classification and clustering.E. Predicting Student ProfilingDatamining can help management to identify the demographic, geographic andpsychographic characteristics of students based on information provided by thestudents at the time of admission. Neural networking technique can be used toidentify different types of students. 8 F. Planning and SchedulingPlanning and scheduling is usedto enhance the traditional educational process by planning future courses,course scheduling, planning resource allocation which helps in the admissionand counseling processes, developing curriculum, etc.
Different DM techniquesused for this task are classification, categorization, estimation, andvisualization.G. User ModelingUsermodeling encompasses what a learner knows, what the user experience is like,what a learner’s behavior and motivation are, and how satisfied users are withonline learning. EDM can be applied in modeling user knowledge, user behaviorand user experience.H.
Organization of SyllabusExplorationof subjects and their relationships can directly assist in better organizationof syllabi and provide insights to existing curricula of educational programs.One of the applications of data mining is to identify related subjects insyllabi of educational programs in a large educational institute. I. Detecting Cheating in Online examinationDatamining techniques can propose models which can help organizations to detect andto prevent cheats in online assessments. The models generated use datacomprising of different student’s personalities, stress situations generated byonline assessments, and common practices used by students to cheat to obtain abetter grade on these examsV. CONCLUSION EDM has been introduced as an upcoming research area related toseveral well-established areas of research including e-learning, adaptivehypermedia, intelligent tutoring systems, web mining, data mining, etc.
Thecontribution to student modeling coming from classification methods, regressionmethods, clustering methods, and methods for the distillation of data for humanjudgment have been discussed. Classification, regression, and clusteringmethods have supported the development of validated models of a variety ofcomplex constructs that have been embedded into increasingly sophisticatedstudent models. Clustering methods have supported the discovery of how studentschoose to respond to new types of educational human-computer interactions,enriching student models. Classification and regression models have affordedaccurate and validated models of a broader range of student behavior.Distillation of data for human judgment has itself facilitated the developmentof models speeding the process of labeling data with reference to difference instudent behaviors, in turn speeding the process of creating classification andregression models. In turn, these discoveries have increased the sophisticationand richness of student models, covering a broader range of behavior.
This paperis intended to study in detail about EDM and it reviews the relevant work inthis area. So, it could be said that EDM is now approaching its adolescence,that is, it is no longer in its early days but is not yet a mature area.An appreciable research is still being done on various algorithms. REFERENCE 1 C. Romero, S. Ventura. Educational Data Mining: “AReview of the State-of-the-Art” IEEE Transaction on Systems, Man, andCybernetics – Part C: Applications and Reviews, Vol. 40,no.
6, November 2010.2 R.Baker, “Data Mining for Education”, in InternationalEncyclopedia of Education (3rd edition). Oxford, UK: Elsevier, 2010.
3 R. Baker, K.Yacef ,”The State of Educational Data Mining in 2009:A Review and Future Visions”. Journal of Educational Data Mining, vol.1,no.1.
4 Naeimeh Delavari, Somnuk Phon-Amnuaisuk, “Data MiningApplication in Higher Learning institutions”, Informatics in Education, Vol. 7,No. 1, 2008.5 U. K.
Pandey, and S. Pal, “A Data mining view on classroom teaching language”, (IJCSI) International Journal of Computer ScienceIssue, Vol. 8, Issue 2, pp. 277-282, ISSN:1694- 0814, 2011.6 Monika Goyal and Rajan Vohra, “Applications of DataMining in Higher Education”, IJCSI International Journal of Computer ScienceIssues, Vol. 9, Issue 2, No 1, March 2012.
7 Dr. Mohd Maqsood Ali, “Role of data mining ineducation sector”, International Journal of Computer Science and MobileComputing Vol. 2, Issue. 4, April 2013.