Page 1 of 6
Journal for Studies in Management and Planning
Available at http://edupediapublications.org/journals/index.php/JSMaP/
ISSN: 2395-0463
Volume 04 Issue 04
April 2018
Available online: http://edupediapublications.org/journals/index.php/JSMaP/ P a g e | 114
The KDD Process for Extracting Useful
Knowledge from Volumes of Data
Mining
Pardeep Nehra
Department of Computer Science
E-Mail:- par.nehra82@yahoo.com
Abstract: The aim of this research work is to discover the exception by using the rough set
approach and to structure/represent the exceptions in the form of rule pair, a knowledge
structure that consist of commonsense rule and exception rule. Knowledge structures are
compact representation of rules and increase the comprehensibility. Data mining refers to
extracting or mining knowledge from large amounts of data. The overall process of extracting
useful information is referred as Knowledge Discovery in Databases. Data mining is particular
step in this process application of specific algorithms for extracting patterns (models) from data.
Mining exceptions is getting attention of researchers because it is interesting to discover
exceptions, as they challenge the existing knowledge, lead to the growth of knowledge in new
directions and help decision makers to make right decisions even in rare circumstances.
Keywords: KDD, Data Mining, NN, Rough Set, Fuzzy Set.
Introduction:
The amount of data available from various
sources continues to grow fast. The large
amount of data stored in databases contains
valuable hidden knowledge that could be
used to improve the decision-making
process of an organization. For an instance,
data about previous sales might contain
interesting relationships between products
and customers. The discovery of such
relationships can be very useful to increase
the sales of a company. So, there is a clear
need for semiautomatic methods for
extracting knowledge from data. This need
has led to the emergence of a field called
data mining and knowledge discovery. The
goal of KDD (Knowledge Discovery in
Databases) is to identify the valid, novel,
potentially useful and ultimately
understandable patterns of data. Data
Mining is a stage in the entire process of
KDD which applies an algorithm to extract
interesting patterns. Rough set theory is one
of the popular theories in the field of data
mining. One proposes a formal framework
for the transformation of data into
knowledge. Rough set theory is relatively
simple and it comes handy for dealing with
vagueness and uncertainty problems that are
inherent to decision making situation. Data
mining extract the patterns/Rule. Exceptions
are deviations from the commonsense rules.
Exceptions are interesting as they exhibit the
unexpectedness and contradict prior
knowledge about the domain.
Page 2 of 6
Journal for Studies in Management and Planning
Available at http://edupediapublications.org/journals/index.php/JSMaP/
ISSN: 2395-0463
Volume 04 Issue 04
April 2018
Available online: http://edupediapublications.org/journals/index.php/JSMaP/ P a g e | 115
Knowledge Discovery in Databases
Process:
Knowledge discovery in Databases (KDD)
is the process of finding useful information
and patterns in data have defined KDD as
“The nontrivial process of identifying valid,
novel, potentially, and ultimately
understandable patterns in data”. The KDD
process consists of the following steps:
1) Data cleaning and integration:
The data to be used by the process may
have incorrect or missing data. It means data
may be noisy or inconsistent. The erroneous
data may be corrected or removed and
missing values tuples could be deleted or
missing values can be calculated on average
basis of other values under the data cleaning
step of KDD. While there are multiple
sources of data then data from different
sources can be combined under the data
integration step.
2) Selection and Transformation:
Data relevant to the analysis task are
retrieved from the databases under the step
of data selection. But data from different
sources must be transformed or consolidated
into forms appropriate for mining by
performing summary or aggregation
operation. Data reduction may be used to
reduce the number of possible data values
being considered.
3) Data mining:
This step consists of the use of
algorithms to extract interesting and useful
information and patterns from large
databases for decision making.
4) Pattern evaluation:
As all patterns that are generated are not
of interest. Only some of them are actually
interesting. Under this step truly interesting
patterns are identified on the basis of various
interestingness measures.
5) Knowledge presentation:
This step describes how the data mining
results are presented to the users. This is an
extremely important step because the
usefulness of the results is dependent on it.
This process consists of an important
activity known as post processing. Post
processing make results obtained from data
mining easy to understand for user. Various
visualization and knowledge representation
techniques are used at this step.
Classification Models:
Classification is the process of classifying
data items of a database into groups of
classes. The various types of classification
models are used for classification.
Classification models can be classified into
two categories evolutionary and non- evolutionary approaches. Evolutionary
approaches based classification models
consist of genetic algorithms and non- evolutionary approach based classification
models consist of decision trees, neural- network, rough set, fuzzy set and statistical
techniques.
Decision Trees:
A decision tree is a flow chart like tree structure where each internal node denotes a test on an
attribute, each branch represent an outcome of that test. Leaf nodes represent classes. Figure
Page 3 of 6
Journal for Studies in Management and Planning
Available at http://edupediapublications.org/journals/index.php/JSMaP/
ISSN: 2395-0463
Volume 04 Issue 04
April 2018
Available online: http://edupediapublications.org/journals/index.php/JSMaP/ P a g e | 116
show that how a decision tree is used to classify an organization employees according to their
heights.
Decision Tree
In order to classify an unknown sample, the attribute values of the sample are tested against the
decision tree. A path is traced from the root to a leaf node that holds the class prediction of that
sample. Decision tree can be easily converted to classification rules. Important decision tree
algorithms are C5.0, CHAID and QUEST.
Neural Network (NN):
A NN is an information processing system that consists of a graph representing the processing
system as well as various algorithms that access that graph. A NN is also a predictive model. A
neural network is a directed graph with various nodes that is processing elements and arcs.
Nodes in neural network consist of input, output and hidden layer nodes. To perform a data
mining task, a sample tuple is input through the input nodes and output nodes determine what the
prediction is? Hidden layer consist of learning mechanism. Each link is assigned a weight and
learning process like back propagation adjusts these weights so that our prediction becomes
accurate. The working of the simple NN. Suppose a tuple contain two attributes age and income.
These two attributes become input to processing elements and after processing, the NN in the
diagram predict the output that is whether a customer is defaulter or not. Back propagation is an
important neural network algorithm.
Rough Set:
Rough set theory can be used for classification to discover structural relationships within
imprecise or noisy data. It applies to discrete-valued attributes. Continuous-valued attributes
must therefore be discredited prior to its use. Rough set theory is based on the establishment of
equivalence classes within the given training data. All of the data samples forming an
equivalence class are discernible, that is, the samples are identical with respect to the attributes
describing the data. Given real-world data, it is common that some classes cannot be
distinguished in terms of the available attributes. Rough sets can be used to approximately or
