data mining (EDM) which is an emerging area of research generally focuses on
areas influencing learning systems. Wide variations of mining techniques have
been developed in educational context. The major objective is to translate the
new data into meaningful information for a clear conclusions and thereby better
decision making. One key area of research focuses on improving student academic
performance. This seminar focuses on a detailed study of EDM which includes
goals of educational data mining, phases, EDM techniques as well as challenges.
Learning analytics which is a sister field of EDM have also been discussed. A
comparative analysis of various algorithms used in EDM is also included.
EDM, Phases, Prediction, Clustering
Mining is a domain of computer science and the analysis step of the “Knowledge
Discovery in Databases” process or KDD. It is the process of recognizing
patterns in huge data sets. Data Mining has been applied in a great number of
fields including retail sales, bioinformatics, and counter terrorism. In recent
years, there has been increasing interest in the use of data mining to
investigate scientific questions within educational research, an area of
inquiry termed Educational Data Mining (EDM).
Data mining is defined as the area of scientific inquiry centered around the
development of methods for making discoveries within unique kinds of data that
come from educational settings, and using those methods to better understand
students and the settings which they learn in .On one hand the increase in both
instrumental educational software as well as state database of student’s
information have created large repositories of data reflecting how students
learn. On the other hand, the use of internet in education has created a new
context known as e-learning or web based education in which large amounts of
information about teaching –learning interactions are endlessly generated and
ubiquitously available.EDM seeks to use these data repositories to better
understand learners and learning, and to develop computational approaches that
combine data and theory to transform practice to benefit learners. Learning
analytics and EDM are sister fields. Learning analytics deals with the
measurement, collection, and analysis and reporting of data about learners and
their contexts, for purposes of understanding and optimizing learning and the
environments in which it occurs.EDM refers to technique, tools, and research
designed for automatically extracting meaning from large repositories of data
generated by or related to people’s learning activities in educational
PHASES OF EDM
The first phase of educational
data mining is to find the relationships between the data of educational
environment using data mining techniques i.e. classification, clustering,
regression etc. The second phase of educational data mining is validation of discovered
relationships between data so that uncertainty can be avoided. The third phase
is to make predictions for future on the basis of validated relationships in
learning environment. The fourth phase is supporting decision making process
with the help of predictions.
METHODS OF EDM
There are a wide variety of current
methods popular within educational data mining. There are multiple taxonomies of the areas of educational data
mining, one by Romero
and Ventura 2, and one by Baker 3.These methods fall into following
In prediction, the goal is to develop a model which can infer a
single aspect of the data (predicted variable) from some combination of other
aspects of the data (predictor variables). Prediction requires having labels
for the output variable for a limited data set, where a label represents some
trusted “ground truth” information about the output variable’s value in
specific cases. In some cases, however, it is important to consider the degree
to which these labels may in fact be approximate, or incompletely reliable.
Prediction has two key uses within educational data mining. In some cases,
prediction methods can be used to study what features of a model are important
for prediction, giving information about the underlying construct. This is a
common approach in programs of research that attempt to predict student
educational outcomes without predicting
intermediate or mediating factors first. In a second type of usage, prediction
methods are used in order to predict what the output value would be in contexts
where it is not desirable to directly obtain a label for that construct. Educational
data mining methods have enabled the construction of student models of a wide
number of constructs. The Classification methods have been used to develop
detectors of student affect, including frustration, boredom, anxiety, engaged
concentration, joy, and distress 2. Detectors of affect and emotion have been
used to drive automated adaptation to differences in student affect, significantly
reducing students’ frustration and anxiety and increasing the incidence of
positive emotion. Classification methods have also been used to develop
detectors of off-task behavior, predicting differences in student learning.
In clustering, the goal is to find data points that naturally
group together, splitting the full data set into a set of clusters. Clustering
is particularly useful in cases where the most common categories within the
data set are not known in advance. If a set of clusters is optimal, within a
category, each data point will in general be more similar to the other data
points in that cluster than data points in other clusters.The use of clustering
in domains where a considerable amount is already known brings some risk of
discovering phenomena that are already known. As work in other areas of EDM
goes forward, an increasing amount is known about student behavior across
learning environments. One potential future use of clustering, in this
situation, would be to use clustering as a second stage in the process of
modeling student behavior in a learning system. First, existing detectors could
be used to classify known categories of behavior. Then, data points not
classified as belonging to any of those known behavior categories could be
clustered, in order to search for unknown behaviors.
In relationship mining, the goal is to discover relationships
between variables, in a data set with a large number of variables. This may
take the form of attempting to find out which variables are most strongly
associated with a single variable of particular interest, or may take the form
of attempting to discover which relationships between any two variables are
a) Association Rule mining: : if-then rules of the form that if
some set of variable values is found, another variable will generally have a
b) Correlation mining: the goal is to find (positive or
negative) linear correlations between variables
c) Sequential Pattern mining: the goal is to find temporal
associations between event.
d) Causal data mining: the goal is to find whether one event was
the cause of another event.
Discovery with models
In discovery with a model, a model of a
phenomenon is developed via prediction, clustering, or in some cases knowledge
engineering (within knowledge engineering, the model is developed using human
reasoning rather than automated methods). This model is then used as a
component in another analysis, such as prediction or relationship mining.
In the prediction case, the created model’s predictions are used
as predictor variables in predicting a new variable. For instance, analyses of
complex constructs such as gaming the system within online learning have
generally depended on assessments of the probability that the student knows the
current knowledge component being learned 3. These assessments of student
knowledge have in turn depended on models of the knowledge components in a
domain, generally expressed as a mapping between exercises within the learning
software and knowledge components.
APPLICATIONS OF EDM
There are many applications or
tasks in educational environments that have been resolved through DM.
Analysis and Visualization of data
It is used to highlight
meaningful information and support decision making. In the educational sector,
for example, it can be helpful for course administrators and educators for
analyzing the usage information and students? activities during course to get a
brief idea of a student’s learning. Visualization information and statics are
the two main methods that have been used for this task. Statistical analysis of
educational data can give us information like where students enter and exit,
the most important pages students browse, how many number of downloads of
e-learning resources, how many number of different type of pages browsed and
total amount of time for browsing of these different pages. It also provides
information about reports on monthly and weekly user trends, usage summaries,
how much material students will study and the series in which they study
topics, patterns of studying activity, timing and sequencing of activities.
Visualization uses graphical methods to help people in understanding and
analyzing data. There are number of studies related to visualization of
different educational data such as patterns of hourly, daily, seasonal and
annual user behavior on online forums.
Predicting student performance
student performance prediction, we predict the unknown values of a variable
that defines the student. In educational sector, the mostly predicted values
are student’s performance, their marks, knowledge or score. 5
technique is used to combine individual items based upon quantitative traits or
based upon training set of previously labeled items. Student’s performance
prediction is very popular application of DM in education sector. Different
techniques and models are applied for prediction of student’s performance like
decision trees, neural networks, rule based systems, Bayesian networks etc.
This analysis is helpful for someone in predicting student’s performance.
management is frequently used in higher education to explain well-planned
strategies and ways to shape the enrolment of college to meet planned goals. It
is an organizational concept and also a systematic set of activities designed
to allow educational institutions to exert more influence over student’s
enrolments. Such practices often include retention programs, marketing,
financial aid awarding and admission policies. 6
In this case groups of students
are created according to their customized features, personal characteristics,
etc. These clusters/groups of students can be used by the instructor/developer
to build a personalized learning system which can promote effective group
learning. The DM techniques used in this task are classification and clustering.
Predicting Student Profiling
mining can help management to identify the demographic, geographic and
psychographic characteristics of students based on information provided by the
students at the time of admission. Neural networking technique can be used to
identify different types of students. 8
Planning and Scheduling
Planning and scheduling is used
to enhance the traditional educational process by planning future courses,
course scheduling, planning resource allocation which helps in the admission
and counseling processes, developing curriculum, etc. Different DM techniques
used for this task are classification, categorization, estimation, and
modeling encompasses what a learner knows, what the user experience is like,
what a learner’s behavior and motivation are, and how satisfied users are with
online learning. EDM can be applied in modeling user knowledge, user behavior
and user experience.
Organization of Syllabus
of subjects and their relationships can directly assist in better organization
of syllabi and provide insights to existing curricula of educational programs.
One of the applications of data mining is to identify related subjects in
syllabi of educational programs in a large educational institute.
Detecting Cheating in Online examination
mining techniques can propose models which can help organizations to detect and
to prevent cheats in online assessments. The models generated use data
comprising of different student’s personalities, stress situations generated by
online assessments, and common practices used by students to cheat to obtain a
better grade on these exams
EDM has been introduced as an upcoming research area related to
several well-established areas of research including e-learning, adaptive
hypermedia, intelligent tutoring systems, web mining, data mining, etc. The
contribution to student modeling coming from classification methods, regression
methods, clustering methods, and methods for the distillation of data for human
judgment have been discussed. Classification, regression, and clustering
methods have supported the development of validated models of a variety of
complex constructs that have been embedded into increasingly sophisticated
student models. Clustering methods have supported the discovery of how students
choose to respond to new types of educational human-computer interactions,
enriching student models. Classification and regression models have afforded
accurate and validated models of a broader range of student behavior.
Distillation of data for human judgment has itself facilitated the development
of models speeding the process of labeling data with reference to difference in
student behaviors, in turn speeding the process of creating classification and
regression models. In turn, these discoveries have increased the sophistication
and richness of student models, covering a broader range of behavior. This paper
is intended to study in detail about EDM and it reviews the relevant work in
this area. So, it could be said that EDM is now approaching its adolescence,
that is, it is no longer in its early days but is not yet a mature area.
An appreciable research is still being done on various algorithms.
1 C. Romero, S. Ventura. Educational Data Mining: “A
Review of the State-of-the-Art” IEEE Transaction on Systems, Man, and
Cybernetics – Part C: Applications and Reviews, Vol. 40,no. 6, November 2010.
2 R.Baker, “Data Mining for Education”, in International
Encyclopedia of Education (3rd edition). Oxford, UK: Elsevier, 2010.
3 R. Baker, K.Yacef ,”The State of Educational Data Mining in 2009:
A Review and Future Visions”. Journal of Educational Data Mining, vol.1,no.1.
4 Naeimeh Delavari, Somnuk Phon-Amnuaisuk, “Data Mining
Application in Higher Learning institutions”, Informatics in Education, Vol. 7,
No. 1, 2008.
5 U. K. Pandey, and S. Pal, “A Data mining view on class
room teaching language”, (IJCSI) International Journal of Computer Science
Issue, Vol. 8, Issue 2, pp. 277-282, ISSN:1694- 0814, 2011.
6 Monika Goyal and Rajan Vohra, “Applications of Data
Mining in Higher Education”, IJCSI International Journal of Computer Science
Issues, Vol. 9, Issue 2, No 1, March 2012.
7 Dr. Mohd Maqsood Ali, “Role of data mining in
education sector”, International Journal of Computer Science and Mobile
Computing Vol. 2, Issue. 4, April 2013.