github.com/vlifesystems/rulehunter@v0.0.0-20180501090014-673078aa4a83/examples/csv/iris.txt (about) 1 1. Title: Iris Plants Database 2 Updated Sept 21 by C.Blake - Added discrepency information 3 4 2. Sources: 5 (a) Creator: R.A. Fisher 6 (b) Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov) 7 (c) Date: July, 1988 8 9 10 Data Source: 11 UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. 12 Irvine, CA: University of California, School of Information and 13 Computer Science. 14 http://archive.ics.uci.edu/ml/datasets/Iris 15 16 3. Past Usage: 17 - Publications: too many to mention!!! Here are a few. 18 1. Fisher,R.A. "The use of multiple measurements in taxonomic problems" 19 Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions 20 to Mathematical Statistics" (John Wiley, NY, 1950). 21 2. Duda,R.O., & Hart,P.E. (1973) Pattern Classification and Scene Analysis. 22 (Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218. 23 3. Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System 24 Structure and Classification Rule for Recognition in Partially Exposed 25 Environments". IEEE Transactions on Pattern Analysis and Machine 26 Intelligence, Vol. PAMI-2, No. 1, 67-71. 27 -- Results: 28 -- very low misclassification rates (0% for the setosa class) 29 4. Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule". IEEE 30 Transactions on Information Theory, May 1972, 431-433. 31 -- Results: 32 -- very low misclassification rates again 33 5. See also: 1988 MLC Proceedings, 54-64. Cheeseman et al's AUTOCLASS II 34 conceptual clustering system finds 3 classes in the data. 35 36 4. Relevant Information: 37 --- This is perhaps the best known database to be found in the pattern 38 recognition literature. Fisher's paper is a classic in the field 39 and is referenced frequently to this day. (See Duda & Hart, for 40 example.) The data set contains 3 classes of 50 instances each, 41 where each class refers to a type of iris plant. One class is 42 linearly separable from the other 2; the latter are NOT linearly 43 separable from each other. 44 --- Predicted attribute: class of iris plant. 45 --- This is an exceedingly simple domain. 46 --- This data differs from the data presented in Fishers article 47 (identified by Steve Chadwick, spchadwick@espeedaz.net ) 48 The 35th sample should be: 4.9,3.1,1.5,0.2,"Iris-setosa" 49 where the error is in the fourth feature. 50 The 38th sample: 4.9,3.6,1.4,0.1,"Iris-setosa" 51 where the errors are in the second and third features. 52 53 5. Number of Instances: 150 (50 in each of three classes) 54 55 6. Number of Attributes: 4 numeric, predictive attributes and the class 56 57 7. Attribute Information: 58 1. sepal length in cm 59 2. sepal width in cm 60 3. petal length in cm 61 4. petal width in cm 62 5. class: 63 -- Iris Setosa 64 -- Iris Versicolour 65 -- Iris Virginica 66 67 8. Missing Attribute Values: None 68 69 Summary Statistics: 70 Min Max Mean SD Class Correlation 71 sepal length: 4.3 7.9 5.84 0.83 0.7826 72 sepal width: 2.0 4.4 3.05 0.43 -0.4194 73 petal length: 1.0 6.9 3.76 1.76 0.9490 (high!) 74 petal width: 0.1 2.5 1.20 0.76 0.9565 (high!) 75 76 9. Class Distribution: 33.3% for each of 3 classes.