Abstract: Data mining is
extracts the knowledge/ information from a large amount of data which stores in
multiple heterogeneous data base. Knowledge /information are conveying the
message through direct or indirect. This paper provides a survey of various
data mining techniques. These techniques include association, correlation,
clustering and neural network. This research paper also conducts a formal
review of the application of data mining such as the education sector,
marketing, fraud detection, manufacturing and telecommunication. This paper
discusses the topic based on past research paper and also studies the data
Association, Clustering, Data
mining, data mining application, knowledge discovery database.
the real world, huge amount of data are available in education, medical,
industry and many other areas. Such data may provide knowledge and information
for decision making. For example, you can find out drop out student in any
university, sales data in shopping database. Data can be analyzed , summarized,
understand and meet to challenges.1 Data mining is a powerful concept for
data analysis and process of discovery interesting pattern from the huge amount
of data, data stored in various databases such as data warehouse , world wide
web , external sources .Interesting pattern that is easy to understand,
unknown, valid ,potential useful. Data mining is a type of sorting technique
which is actually used to extract hidden patterns from large databases. The
goals of data mining are fast retrieval of data or information, knowledge
Discovery from the databases, to identify hidden patterns and those patterns
which are previously not explored, to reduce the level of complexity, time
saving, etc 2. Data mining refers extracting knowledge and mining from large
amount of data. Sometimes data mining treated as knowledge discovery in
database (KDD)3 . KDD is an iterative process, consist a following step shown
in Figure1 4.
Selection: select data from various resources where operation to be
Preprocessing: it also known as data
cleaning in which remove the unwanted data.
Transformation: transform /consolidate into a new format for processing.
Data mining: identify the desire result.
Interpretation / evaluation: interpret the result/query to give meaningful
algorithms and techniques like Classification, Clustering, Regression,
Artificial Intelligence, Neural Networks, Association Rules, Decision Trees,
Genetic Algorithm, Nearest Neighbor method etc., are meant for knowledge
discovery from databases 5. The main objective of this paper learns about the
II. Data Mining Techniques
mining means collecting relevant information from unstructured data. So it is
able to help achieve specific objectives. The purpose of a data mining effort
is normally either to create a descriptive model or a predictive model .A
descriptive model presents, in concise form, the main characteristics of the
data set. The purpose of a predictive model is to allow the data miner to
predict an unknown (often future) value of a specific variable; the target
variable 7. The goal of predictive and descriptive model can be achieved
using a variety of data mining techniques as shown in figure 28.
2 Data Mining Models
Classification based on categorical (i.e. discrete, unordered). This technique
based on the supervised learning (i.e. desired output for a given input is
known) .It can be classifying the data based on the training set and values
(class label). These goals are achieve using a decision tree, neural network
and classification rule (IF- Then).for example we can apply the classification
rule on the past record of the student who left for university and evaluate
them. Using these techniques we can easily identify the performance of the
Regression is used to map a data item to a real valued prediction variable 8.
In other words, regression can be adapted for prediction. In the regression
techniques target value are known. For example, you can predict the child
behavior based on family history.
1.3 Time Series Analysis:
Time series analysis is the process of using statistical techniques to model
and explain a time-dependent series of data points. Time series forecasting is
a method of using a model to generate predictions (forecasts) for future events
based on known past events 9. For example stock market.
1.4 Prediction: It is one
of a data mining techniques that discover the relationship between independent
variables and the relationship between dependent and independent variables
4.Prediction model based on continuous or ordered value.
1.5 Clustering: Clustering
is a collection of similar data object. Dissimilar object is another cluster.
It is way finding similarities between data according to their characteristic.
This technique based on the unsupervised learning (i.e. desired output for a
given input is not known). For example, image processing, pattern recognition,
1.6 Summarization: Summarization is
abstraction of data. It is set of relevant task and gives an overview of data.
For example, long distance race can be summarized total minutes, seconds and
Association Rule: Association is the
most popular data mining techniques and fined most frequent item set.
Association strives to discover patterns in data which are based upon
relationships between items in the same transaction. Because of its nature,
association is sometimes referred to as “relation technique”. This method of
data mining is utilized within the market based analysis in order to identify a
set, or sets of products that consumers often purchase at the same time 6.
1.7 Sequence Discovery: Uncovers
relationships among data 8. It is set of object each associated with its own
timeline of events. For example, scientific experiment, natural disaster and
analysis of DNA sequence.
III. Data Mining Application
Various field adapted data mining
technologies because of fast access of data and valuable information from a
large amount of data. Data mining application area includes marketing,
telecommunication, fraud detection, finance, and education sector, medical and
so on. Some of the main applications listed below:
1.8 Data Mining in Education Sector:
We are applying data mining in education sector then new emerging field called
“Education Data Mining”. Using these term enhances the performance of student,
drop out student, student behavior, which subject selected in the course. Data
mining in higher education is a recent research field and this area of research
is gaining popularity because of its potentials to educational institutes. Use
student’s data to analyze their learning behavior to predict the results 10.
1.9 Data Mining in Banking and
Finance: Data mining has been used extensively in the banking and financial
markets 11. In the banking field, data mining is used to predict credit card
fraud, to estimate risk, to analyze the trend and profitability. In the
financial markets, data mining technique such as neural networks used in stock
forecasting, price prediction and so on.
1.10 Data Mining in Market Basket
Analysis: These methodologies based on shopping database. The ultimate goal of
market basket analysis is finding the products that customers frequently
purchase together. The stores can use this information by putting these
products in close proximity of each other and making them more visible and
accessible for customers at the time of shopping 12.
1.11 Data Mining in Earthquake
Prediction: Predict the earthquake from the satellite maps. Earthquake is the
sudden movement of the Earth’s crust caused by the abrupt release of stress
accumulated along a geologic fault in the interior. There are two basic
categories of earthquake predictions: forecasts (months to years in advance)
and short-term predictions (hours or days in advance) 13.
1.12 Data Mining in Bioinformatics:
Bioinformatics generated a large amount of biological data. The importance of
this new field of inquiry will grow as we continue to generate and integrate
large quantities of genomic, proteomic, and other data 4.
1.13 Data Mining in
Telecommunication: The telecommunications field implement data mining
technology because of telecommunication industry have the large amounts of data
and have a very large customer, and rapidly changing and highly competitive
environment. Telecommunication companies’ uses data mining technique to improve
their marketing efforts, detection of fraud, and better management of
telecommunication networks 4.
1.14 Data Mining in Agriculture: Data
mining than emerging in agriculture field for crop yield analysis a with
respect to four parameters namely year, rainfall, production and area of
sowing. Yield prediction is a very important agricultural problem that remains
to be solved based on the available data. The yield prediction problem can be
solved by employing Data Mining techniques such as K Means, K nearest neighbor
(KNN), Artificial Neural Network and support vector machine (SVM) 14.
1.15 Data Mining in Cloud Computing:
Data Mining techniques are used in cloud computing. The implementation of data
mining techniques through Cloud computing will allow the users to retrieve
meaningful information from virtually integrated data warehouse that reduces
the costs of infrastructure and storage 15.Cloud computing uses the Internet
services that rely on clouds of servers to handle tasks. The data mining
technique in Cloud Computing to perform efficient, reliable and secure services
for their users.
This paper provides a general idea of
data mining, data techniques and data mining in various fields. The main
objectives of data mining techniques are to discover the knowledge from active
data. These applications use classification, Prediction, clustering,
Association techniques and so on. Hopefully in future work we review various
classifications and clustering algorithm and its significance’s.