github.com/vlifesystems/rulehunter@v0.0.0-20180501090014-673078aa4a83/examples/csv/iris.txt (about)

     1  1. Title: Iris Plants Database
     2  	Updated Sept 21 by C.Blake - Added discrepency information
     3  
     4  2. Sources:
     5       (a) Creator: R.A. Fisher
     6       (b) Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
     7       (c) Date: July, 1988
     8  
     9  
    10  Data Source:
    11    UCI Machine Learning Repository [http://archive.ics.uci.edu/ml].
    12    Irvine, CA: University of California, School of Information and
    13    Computer Science.
    14  	http://archive.ics.uci.edu/ml/datasets/Iris
    15  
    16  3. Past Usage:
    17     - Publications: too many to mention!!!  Here are a few.
    18     1. Fisher,R.A. "The use of multiple measurements in taxonomic problems"
    19        Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions
    20        to Mathematical Statistics" (John Wiley, NY, 1950).
    21     2. Duda,R.O., & Hart,P.E. (1973) Pattern Classification and Scene Analysis.
    22        (Q327.D83) John Wiley & Sons.  ISBN 0-471-22361-1.  See page 218.
    23     3. Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System
    24        Structure and Classification Rule for Recognition in Partially Exposed
    25        Environments".  IEEE Transactions on Pattern Analysis and Machine
    26        Intelligence, Vol. PAMI-2, No. 1, 67-71.
    27        -- Results:
    28           -- very low misclassification rates (0% for the setosa class)
    29     4. Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule".  IEEE 
    30        Transactions on Information Theory, May 1972, 431-433.
    31        -- Results:
    32           -- very low misclassification rates again
    33     5. See also: 1988 MLC Proceedings, 54-64.  Cheeseman et al's AUTOCLASS II
    34        conceptual clustering system finds 3 classes in the data.
    35  
    36  4. Relevant Information:
    37     --- This is perhaps the best known database to be found in the pattern
    38         recognition literature.  Fisher's paper is a classic in the field
    39         and is referenced frequently to this day.  (See Duda & Hart, for
    40         example.)  The data set contains 3 classes of 50 instances each,
    41         where each class refers to a type of iris plant.  One class is
    42         linearly separable from the other 2; the latter are NOT linearly
    43         separable from each other.
    44     --- Predicted attribute: class of iris plant.
    45     --- This is an exceedingly simple domain.
    46     --- This data differs from the data presented in Fishers article
    47  	(identified by Steve Chadwick,  spchadwick@espeedaz.net )
    48  	The 35th sample should be: 4.9,3.1,1.5,0.2,"Iris-setosa"
    49  	where the error is in the fourth feature.
    50  	The 38th sample: 4.9,3.6,1.4,0.1,"Iris-setosa"
    51  	where the errors are in the second and third features.  
    52  
    53  5. Number of Instances: 150 (50 in each of three classes)
    54  
    55  6. Number of Attributes: 4 numeric, predictive attributes and the class
    56  
    57  7. Attribute Information:
    58     1. sepal length in cm
    59     2. sepal width in cm
    60     3. petal length in cm
    61     4. petal width in cm
    62     5. class: 
    63        -- Iris Setosa
    64        -- Iris Versicolour
    65        -- Iris Virginica
    66  
    67  8. Missing Attribute Values: None
    68  
    69  Summary Statistics:
    70  	         Min  Max   Mean    SD   Class Correlation
    71     sepal length: 4.3  7.9   5.84  0.83    0.7826   
    72      sepal width: 2.0  4.4   3.05  0.43   -0.4194
    73     petal length: 1.0  6.9   3.76  1.76    0.9490  (high!)
    74      petal width: 0.1  2.5   1.20  0.76    0.9565  (high!)
    75  
    76  9. Class Distribution: 33.3% for each of 3 classes.