International Journal of Approximate Reasoning 53 (2012) 24–37
Contents lists available at SciVerse ScienceDirect
International Journal of Approximate Reasoning
journal homepage: www.elsevier.com/locate/ijar
An interval set model for learning rules from incomplete information table
Huaxiong Li
a,c,
∗
, Minhong Wang
b
, Xianzhong Zhou
a
, Jiabao Zhao
a
a
School of Management and Engineering, Nanjing University, Nanjing 210093, Jiangsu, PR China
b
Faculty of Education, The University of Hong Kong, Hong Kong
c
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, Jiangsu, PR China
A R T I C L E I N F O A B S T R A C T
Article history:
Received 2 August 2010Revised 1 September 2011Accepted 7 September 2011Available online 22 September 2011
Keywords:
Interval setInterval extensionIncomplete information tableRule induction
A novel interval set approach is proposed in this paper to induce classiﬁcation rules fromincomplete information table, in which an intervalsetbased model to represent the uncertain concepts is presented. The extensions of the concepts in incomplete informationtable are represented by interval sets, which regulate the upper and lower bounds of theuncertain concepts. Interval set operations are discussed, and the connectives of conceptsarerepresentedbytheoperationsonintervalsets.Certaininclusion,possibleinclusion,andweak inclusion relations between interval sets are presented, which are introduced to inducestrongrulesandweakrulesfromincompleteinformationtable.Therelatedpropertiesof the inclusion relations are proved. It is concluded that the strong rules are always truewhateverthemissingvaluesmaybe,whiletheweakrulesmaybetruewhenmissingvaluesare replaced by some certain known values. Moreover, a conﬁdence function is deﬁned toevaluatetheweakrule.Theproposedapproachpresentsanewviewonruleinductionfromincomplete data based on interval set.© 2011 Elsevier Inc. All rights reserved.
1. Introduction
Many data mining and machine learning methods have been well developed based on the assumption that all recordsin the data set are known, i.e., the data set is complete. However, realworld data sets often contain missing values eitherbecause some values are lost or because the cost of acquiring them are very high [1,9–12,21]. For example, valuable data
maybelostduringthedatatransferprocess.Anothersituationisthatthereisaveryhighcostinacquiringdata.Whilesomedata are easily acquired with low cost, other data may be difﬁcult to acquire and may require high human effort with highcost. Consider a case of medical diagnosis, common tests such as temperature, blood pressure and rhythm of the heart areeasy to obtain, but it costs much money to make gastroscopy test, Xrays tests, magnetic resonance imaging test, or otherhightechtests.Duetothehighcostofacquiringdata,somedataareoftenmissinginthedatabaseofmedicaldiagnosis.Itistherefore necessary to develop effective methods for learning rules from incomplete data, where some values are missing.Such problems have been widely faced in literature [10–15,17–19,21–23,28,31,42]. Let us review some commonly used
methods for learning from incomplete data.Abasicmethodforlearningrulesfromincompletedataistodeletetherecordsthathavemissingvaluesonsomeattributesso that the remaining records are complete data, and all those methods for learning rules from complete data can be used.Although deleting records with missing values presents a solution for rule induction from incomplete data, the deletionwill decrease the available information in the data set, which results in a poor performance of the induced rules. A similarmethod is to ignore the records with missing values in some certain phase of learning. For example in the famous decisiontree algorithm C4.5 [31], the cases with missing values are ignored while computing the information content, and the
informationgainforanattributeisthenmultipliedbythefractionofcasesforwhichthevalueoftheattributeisknown[31].
∗
Corresponding author at: School of Management and Engineering, Nanjing University, Nanjing 210093, Jiangsu, PR China.
Email addresses:
huaxiongli@nju.edu.cn (H. Li), magwang@hku.hk (M. Wang), zhouxz@nju.edu.cn (X. Zhou), jbzhao@nju.edu.cn (J. Zhao).
0888613X/$  see front matter © 2011 Elsevier Inc. All rights reserved.doi:10.1016/j.ijar.2011.09.002
H. Li et al. / International Journal of Approximate Reasoning 53 (2012) 24–37
25
Another widely used strategy for learning from incomplete data attempts to ﬁll in missing values so that the incompletedata can be transformed to complete data. Such methods are called missing values imputation, which can be done eitherbeforeorduringtheprocessofruleinduction.Thearemanyproposedapproachesformissingvaluesimputation.Thesimplestmethodistoﬁllinmissingvalueswith“mostcommonattributevalue.”Forexample,algorithmCN2[4]adoptsthisstrategy.
An improved version of this method uses “concept most common attribute value” to ﬁll in missing values, i.e., the attributevalues selected for ﬁlling in missing values are restricted to the same concept [13]. GrzymalaBusse proposes “assigning all
possible values of the attribute restricted to the given concept” [12], which is effective when the number of missing values
is not so large. However, it may cause high cost of computation when there exist many missing values. Ghahramani and Jordan present a framework based on maximum likelihood density estimation for learning from incomplete data [10]. They
use mixture models for the density estimates and employ the Expectation Maximization principle in deriving a learningalgorithm, which is abbreviated as EM algorithm.There is another category of methods to deal with incomplete data take neither deletion nor ﬁlling in missing valuesstrategy. These methods attempt to derive rules from only known attribute values, instead of deleting objects or ﬁlling inmissingvalues.Manycoveringbasedlearningalgorithms,suchasPRISMproposedbyCendrowska[3],canbeeasilyadopted
to learn rules by considering only known attribute values.Severalobservationscanbemaderegardingcurrentresearchondataminingfromincompletedata,whichmotivatesthepresent study. For the strategy to delete the records with missing values, it will cause the loss of available information, andthe rules induced from the modulated data may have poor performance on the srcinal data. Second, the use of the ﬁlledinvalues should be carefully studied. As for missing values imputation, any methods of ﬁlling in missing values are based onsome certain assumption about the data, which may not be valid. For example, many methods suppose data subject to anormal distribution, but it is not always true in practice. Therefore, ﬁlledin values may not be exactly the srcinal values. Awrong imputation of missing values will induce rules overﬁtting with the wrong data, which have low performance on thereal data. Although these rules may have good statistical characteristics, they are in fact not reliable.Inaddition,manyextendedroughsetmodelsarefrequentlyfacedinknowledgeacquisitionfromincompleteinformationsystem, in which classic equivalence relation are extended to the looser binary relation [11,17,18,29,30,32,33]. For these
methods, the missing values are assumed to be equal to any known values, then the looser relations such as tolerancerelation, similarity relation, dominance relation, and characteristic relation are derived, and extended rough set models arepresentedaccordingtorelatedbinaryrelation.However,inthepractice,themissingvaluescannotberegardedasanyvaluessince they are just lost and may take some certain values that unknown presently. Therefore, it is necessary to develop areasonable method to consider all possible values which may be the srcinal values of the missing values. We may ﬁnd thatinterval set is an appropriate mathematical tool to fulﬁll this task. That is the main motivation of this paper.Generallyspeaking,afundamentalproblemforruleinductionfromincompletedataishowtorepresenttheuncertaintyintheincompletedata.Onemayﬁndthatmanymethodsformodelingtheuncertainty,suchasroughset,belieffunction,adoptthe intervalbased approaches or interval structures, which present upper bound and low bound to model the uncertainconcept [34,39]. In the case that there are missing values on some attributes, an object may be either in the extension
or not in the extension of a concept. Because of the lack of information, one can only express the state of extension andnonextension for part of objects, instead of all objects. Therefore, one may get a partially known concept which is deﬁnedby a lower bound and an upper bound of its extension. A new model for representing the partially known concept will beestablished when interval sets are introduced to modeling the uncertainty of the incomplete data.The main objective of this paper is to present a new representation of the partially known concepts in an incompleteinformation table by using interval sets, and to propose a new method to induce rules based on the inclusion relationshipsbetweentheintervalsetsaboutpartiallyknownconcepts.Therestofthepaperisorganizedasfollows.Section2presentstherelatedworkonintervalanalysisandpointsoutthedifferencebetweentheexistingstudiesandourwork.Section3reviewsthe basic notations of the interval sets and proves some properties of interval sets. Sections 4 presents an interval set modelto represent the concepts in an incomplete information table. Section 5 discusses the interval inclusion relations betweentwo interval sets, which lead to a method for rule induction from incomplete data. Finally, the conclusion is presented inSection 6.
2. Related work
There are numerous researches related to interval analysis for uncertain reasoning which are frequently mentioned inliteratures. We will brieﬂy review the related work and point out the difference between the existing studies and our work.In general, researches related to interval analysis for uncertain reasoning can be broadly classiﬁed into three categories.The ﬁrst category is the study on intervalvalued fuzzy set theory (IVF in short), which is srcinal proposed by Zadehin 1970s [43,44]. The IVF theory emerged from the observation that no objective procedure is available to select the crisp
membership degrees of elements in a fuzzy set, and then it is suggested to associate an interval set to which the actualmembership degree is assumed to belong. There are many papers which study the theoretical foundation of IVF and itsapplications in uncertain reasoning. For example, Cornelis et al. construct a representation theorem for Łukasiewicz implicatorsonthelatticewhichservesastheunderlyingalgebraicstructureforbothintuitionisticfuzzyandintervalvaluedfuzzysets [5]. Bustince discusses the axiom that verify the inclusion grade indicators for intervalvalued fuzzy sets, and presents
26
H. Li et al. / International Journal of Approximate Reasoning 53 (2012) 24–37
expression of the inclusion grade indicators and the expression for the similarity measure between intervalvalued fuzzysets [2]. Dubois and Prade suggest that the extension of fuzzy set to IVF can be justiﬁed in the scope of some information
representation paradigm, and they systematically discuss the connection between intervalvalued fuzzy sets, clouds, andpossibility theory [8]. Zhang et al. introduce (
I
,
I
)IVF rough approximation operators with reference to an IVF approximation space, upon which they propose a general study of (
I
,
I
)intervalvalued fuzzy rough sets on two universes of discourse integrating the rough set theory with the intervalvalued fuzzy set theory [41]. In addition, Zeng and Li discuss
the relationship between similarity measure and entropy of IVF [45].
Thesecondcategoryisthestudyontheintervalprobabilitytheory(IPT),whichfocusongeneralizingtheclassicalsinglevalued probability to interval probability so that the uncertain data or uncertain knowledge can be described and inferredbased on interval probability analysis [6]. In [16], Hall et al. propose a general logical inference approach for uncertain rea
soningbasedonintervalprobailitytheory,whichprovidesdecisionmakerswithinformationinasimpleintervalprobabilityformat,anditcanalsoreﬂectthecomplexityoftheinferenceproblemandtherichnessoftheavailableevidence[16].In[35],
Yager and Kreinovich discuss the decision making problems under interval probabilities, in which the exactly probabilitiesof each situation in the decision process are unknown and they are described by interval probabilities instead of singleprobability [35].
The third category is the study on intervalvalued information system, which is converted from a realvalued decisiontablebymeansofstatisticalmethod.Forexamplein[20],Leungetal.proposedaroughsetapproachtodiscoverclassiﬁcation
rules for the continuous valued information systems. In [20], the continuous valued information systems were transformed
intosomeintervalvaluedinformationsystemsbyastatisticalmethod,inwhichtheconceptofamisclassiﬁcationrateswasusedtocomparedifferentclasseswithagiventhresholdvalue
α
.ByutilizingBooleanreasoningtechniques,theycalculatedthe
α
classiﬁcation reduction and
α
classiﬁcation core, and thus derived the classiﬁcation rules accordingly [20].
Recently,Denoeuxetal.presentedaformalismbasedonintervalsetstrategytorepresenttheuncertaintyonasetvaluedvariable
X
which is deﬁned on a ﬁnite domain
Ω
in the belief function framework [7]. They extend the classical Dempster–
Shafer theory so that the description of the uncertainty regarding a setvalued variable can be generally presented. In theirproposedapproach,thekeynotionisthedeﬁnitionofaclosuresystem
C
(
Ω
)
of
=
2
Ω
.Eachelementof
C
(
Ω
)
isindexedbyanintervalsetstructure
[
A
,
B
]
,whichisdeﬁnedasthesetofsubsetsof
Ω
containing
A
andnotintersecting
B
.ThisformalismhasbeenshowntobemoregeneralthanpreviousattemptstoapplytheDempster–Shaferframeworktodescribetheuncertainty,andmakeitpossibletoexpressrichknowledgeaboutasetvaluedvariablewithonlylimitedadditionalcomplexity[7].
When compared to the researches on interval analysis mentioned above, the interval set method proposed in this paperhas signiﬁcant differences which are presented as follows. Firstly, unlike intervalvalued fuzzy set theory [2,5,8,41,43–45],
we do not concern any fuzzy information included in the data sets, and the interval set model in this paper is not adoptedto evaluate the membership function but to discover the inherent information hidden behind the incomplete data sets,therefore,itisunnecessarytoconsiderhowtodeterminethefuzzymembershipfunctionintheproposedmethod.Secondly,weestablishtheintervalsetmodelforlearningrulesbasedontheintervalinclusionrelationsinsteadofclassicalsetinclusionrelations.Theintervalinclusiondegreeisadoptedasanevaluationofinclusionrelationshipbetweentwointervalsets,whichreﬂects the relations between the condition attributes and decision attributes in the incomplete data sets. Thirdly, unlikeinterval probability theory [6,16,35], we do not concern any interval probability for the induced rules. The probability of a
possibleruleisevaluatedbyasingleintervalinclusiondegreeinsteadofanintervalprobability.Fourthly,unlikethemethodproposed in [20], we concern on categorical data sets instead of continuous valued decision information systems, and the
interval sets in our proposed method are used to represent the range of an extension in the incomplete data sets, instead of transforming the continuous valued decision information systems into interval valued information systems.In addition, unlike the research in [7], we focus on discussing the interval sets in the incomplete data sets instead of
complete data sets, and the interval sets are used to represent the upper and lower bound of partially known concepts inthe incomplete data, which may induce two types of decision rules: certain rules (or strong rules) and possible rules (orweak rules). The certain rules and possible rules present two kinds of view angle on the relations between the conditionattributesandthedecisionattributes:thecertainrulesarealwaystruewhateverthemissingvaluesmaybe,andthepossiblerules may be true when missing values are replaced by some certain known values. In other words, the certain rules arenot conﬂict with any missing values imputation, and the possible rules may be true when we replace the missing values bysome certain values. In the case that we have not any priori knowledge about the missing values, this two kinds of decisionrules truthfully reﬂect the intrinsic knowledge hidden behind the incomplete data sets.
3. Interval sets and intervalset algebras
Thissectionreviewsthebasicconceptsofintervalsetsandintervalsetalgebras[38,39]whichpertinenttoourdiscussion.
Interval computations and interval analysis are srcinally proposed by Moore [26] and used in solving problems for which
theinitialinformationisrepresentednotbynumericalvaluesofquantitiesbutintervalsorsetsofageneralform,thereforeitis can be regarded as interval number theory. Interval number theory has been widely applied in the ﬁeld of computationalmathematics in recent decades. In the 1990s, interval number theory has been extended to interval set theory, which isan appropriate mathematics tool to deal with vague and uncertain information system. An interval set is a family of setsrestricted by a upper bound and a lower bound, which can present a setvalued description on partially known concepts.
H. Li et al. / International Journal of Approximate Reasoning 53 (2012) 24–37
27
Deﬁnition 1.
Let
U
be a ﬁnite set, called the universe or the reference set, and 2
U
be its power set. A subset of 2
U
of theform
A
= [
A
l
,
A
u
] = {
A
∈
2
U

A
l
⊆
A
⊆
A
u
}
iscalledaclosedintervalset,whereitisassumed
A
l
⊆
A
u
.Thesetofallclosedinterval sets is denoted by
I
(
2
U
)
= {[
A
l
,
A
u
]
A
l
,
A
u
⊆
U
,
A
l
⊆
A
u
}
.According to Deﬁnition 1, an interval set is a family of sets intermediate between upper bound set and lower bound set,which is a subset of the power set 2
U
. For an uncertain concept, a crisp set cannot be used to describe the extension sincesome objects may actually be either an extension or not an extension of an uncertain concept. In this case, an interval setmay be an appropriate tool to represent the extension of the uncertain concept, which describes the uncertain concept bya lower bound and upper bound. In an extreme case, when
A
l
=
A
u
=
A
, a degenerate interval set of the form
[
A
,
A
]
isequivalent to ordinary set
A
. Therefore, the ordinary set may be considered as a special case of the interval set, and intervalset is an extension of the ordinary sets [38,39].
Deﬁnition 2.
Let
∩
,
∪
and
−
be the usual set intersection, union and difference deﬁned on 2
U
, respectively. A paralleldeﬁnition of binary operations
,
,
\
on two interval sets
A
∈
I
(
2
U
),
B
∈
I
(
2
U
)
and a unitary operation
¬
on a singleinterval set
A
are deﬁned as follows:
A
B
={
A
∩
B

A
∈
A
,
B
∈
B
}
,
A
B
={
A
∪
B

A
∈
A
,
B
∈
B
}
,
A
\
B
={
A
−
B

A
∈
A
,
B
∈
B
}
,
¬
A
=¬ [
A
l
,
A
u
] = [
U
,
U
] \ [
A
l
,
A
u
]
,
where
A
= [
A
l
,
A
u
]
.Theoperations
,
,
\
,and
¬
onintervalsetscanberespectivelyregardedasextensionsofintersection,union,difference,andcomplementonordinarysets.Wecall
,
,
\
,and
¬
respectivelyintervalintersection,intervalunion,intervaldifferenceand interval complement. For ordinary sets, the intersection, union, and difference of two sets are all ordinary sets, andthe complement set is also an ordinary set. That is, ordinary sets are closed under above operations. A similar fundamentalproblem is that whether or not the interval sets are closed under the operations deﬁned in Deﬁnition 2. The followingtheorem states that interval sets are closure under the operations deﬁned in Deﬁnition 2.
Theorem 1.
Interval sets are closed under the operations of interval intersection, interval union, interval difference and intervalcomplement, that is:
A
∈
I
(
2
U
)
and
B
∈
I
(
2
U
)
⇒
A
B
∈
I
(
2
U
),
A
∈
I
(
2
U
)
and
B
∈
I
(
2
U
)
⇒
A
B
∈
I
(
2
U
),
A
∈
I
(
2
U
)
and
B
∈
I
(
2
U
)
⇒
A
\
B
∈
I
(
2
U
),
A
∈
I
(
2
U
)
⇒ ¬
A
∈
I
(
2
U
),
and the low bound and upper bound of the interval sets
A
B
,
A
B
,
A
\
B
and
¬
A
are respectively determined by following formula:
A
B
=[
A
l
∩
B
l
,
A
u
∩
B
u
]
,
A
B
=[
A
l
∪
B
l
,
A
u
∪
B
u
]
,
A
\
B
=[
A
l
−
B
u
,
A
u
−
B
l
]
,
¬
A
=[
U
−
A
u
,
U
−
A
l
]
,
where
A
= [
A
l
,
A
u
]
,
B
= [
B
l
,
B
u
]
.
Proof.
We only prove
A
B
= [
A
l
∩
B
l
,
A
u
∩
B
u
]
. Other equations can be similarly proved. Suppose
X
∈
A
B
, accordingto Deﬁnition 2, there must exist
X
1
∈
A
,
X
2
∈
B
satisfying
X
=
X
1
∩
X
2
. Thus we have
A
l
⊆
X
1
⊆
A
u
, and
B
l
⊆
X
2
⊆
B
u
,hence
A
l
∩
B
l
⊆
X
1
∩
X
2
⊆
A
u
∩
B
u
,i.e.,
A
l
∩
B
l
⊆
X
⊆
A
u
∩
B
u
.Therefore,
X
∈ [
A
l
∩
B
l
,
A
u
∩
B
u
]
.Ontheotherhand,suppose
X
∈ [
A
l
∩
B
l
,
A
u
∩
B
u
]
, then we have
A
l
∩
B
l
⊆
X
⊆
A
u
∩
B
u
. Let
X
1
=
X
∪
A
l
,
X
2
=
X
∪
B
l
, it holds that
A
l
⊆
X
1
⊆
A
u
,
B
l
⊆
X
2
⊆
B
u
, that is,
X
1
∈
A
,
X
2
∈
B
, and we have
X
=
X
1
∩
X
2
since
A
l
∩
B
l
⊆
X
. According to Deﬁnition 2, it holds
X
∈
A
B
. Therefore,
A
B
= [
A
l
∩
B
l
,
A
u
∩
B
u
]
.
Example 1.
Consider a universe
U
= {
1
,
2
,
3
,
4
,
5
}
which includes ﬁve objects, then
A
= [{
1
,
2
}
,
{
1
,
2
,
3
,
4
}]
and
B
= [{
1
,
3
}
,
{
1
,
2
,
3
,
5
}]
are interval sets, where
A
l
= {
1
,
2
}
,
A
u
= {
1
,
2
,
3
,
4
}
,
B
l
= {
1
,
3
}
,
B
u
= {
1
,
2
,
3
,
5
}
. Based
28
H. Li et al. / International Journal of Approximate Reasoning 53 (2012) 24–37
on Deﬁnition 1, we have
A
= [{
1
,
2
}
,
{
1
,
2
,
3
}
,
{
1
,
2
,
4
}
,
{
1
,
2
,
3
,
4
}]
, and
B
= [{
1
,
3
}
,
{
1
,
2
,
3
}
,
{
1
,
3
,
5
}
,
{
1
,
2
,
3
,
5
}]
.We can calculate the interval intersection, interval union, interval difference, and interval complement on
A
and
B
basedon Deﬁnition 2, which are listed as follows:
A
B
={
A
∩
B

A
∈
A
,
B
∈
B
} = {{
1
}
,
{
1
,
2
}
,
{
1
,
3
}
,
{
1
,
2
,
3
}}=[{
1
}
,
{
1
,
2
,
3
}] = [{
1
,
2
}∩{
1
,
3
}
,
{
1
,
2
,
3
,
4
}∩{
1
,
2
,
3
,
5
}]=[
A
l
∩
B
l
,
A
u
∩
B
u
]
,
A
B
={
A
∪
B

A
∈
A
,
B
∈
B
}={{
1
,
2
,
3
}
,
{
1
,
2
,
3
,
4
}
,
{
1
,
2
,
3
,
5
}
,
{
1
,
2
,
3
,
4
,
5
}}=[{
1
,
2
,
3
}
,
{
1
,
2
,
3
,
4
,
5
}] = [{
1
,
2
}∪{
1
,
3
}
,
{
1
,
2
,
3
,
4
}∪{
1
,
2
,
3
,
5
}]=[
A
l
∪
B
l
,
A
u
∪
B
u
]
,
A
\
B
={
A
−
B

A
∈
A
,
B
∈
B
} = {∅
,
{
2
}
,
{
4
}
,
{
2
,
4
}}=[∅
,
{
2
,
4
}] = [{
1
,
2
}−{
1
,
2
,
3
,
5
}
,
{
1
,
2
,
3
,
4
}−{
1
,
3
}]=[
A
l
−
B
u
,
A
u
−
B
l
]
,
¬
A
=[
U
,
U
] \ [
A
l
,
A
u
] = {{
5
}
,
{
3
,
5
}
,
{
4
,
5
}
,
{
3
,
4
,
5
}} = [{
5
}
,
{
3
,
4
,
5
}]=[
U
−
A
u
,
U
−
A
l
]
.
4. Interval sets on incomplete information tables
This section ﬁrst reviews the basic concept of information tables and then presents an interval set model on incompleteinformation tables.
4.1. Information tables
Information tables have been studied by many authors [15,17,18,21,32,37], which can be formally deﬁned as follows.
Deﬁnition 3.
An information table is the following tuple:
S
=
(
U
,
At
,
{
V
a

a
∈
At
}
,
{
I
a

a
∈
At
}
),
where
U
isaﬁnitenonemptysetofobjects,
At
isaﬁnitenonemptysetofattributes,
V
a
isanonemptysetofvaluesfor
a
∈
At
,and
I
a
:
U
→
V
a
is an information function.For a complete information table, where the attribute values of each object are known, it is assumed that the mapping
I
a
is singlevalued. An example of complete information table may have the form as presented in Table 1.While in practice, there may be missing values on some attributes, i.e., there exist objects whose attribute values areunknown.Inthiscase,wecalltheinformationtableanincompleteinformationtable.In[15],itisassumedthattherearetwo
reasons for information tables to be incomplete. The ﬁrst reason is that the attribute values are lost, which means srcinallythe attribute value are known, however, due to many reasons, the values are not recorded in current information table. Thiskind of missing values are called “lost”. The second possibility is that the attribute values are not relevant: the records aredecided to be a member of some concepts, i.e., are classiﬁed, or diagnosed, in spite of the fact that some attribute values arenot known. In this situation, the missing value is called “do not care” condition [15].
However,afundamentalproblemforthisdescriptionisthathowcanweknowwhetherornotamissingvalueisrelatedtotheclassiﬁcation?Foragivenincompleteinformationtable,amaingoalofdataminingistodiscovertheimplicitrelationshipsbetweentheattributevalues.Howcanweknowmissingvalues“donotcare”withthelearningresultsbeforeruleinduction?Inotherwords,wemayhavenoideawhetherornotthemissingvaluesare“donotcare”,andweonlyknowthatthemissingvalues belong to
V
a
. Therefore, in this paper, we treat all missing values as “lost”, and they may take any values in
V
a
. In this
Table 1
A complete information table.Case
a
1
a
2
d1 2 1 12 1 2 23 1 2 24 2 1 25 2 1 1