GeneData
Artificial Intelligence lab BIEN
Agnieszka Nowak - Brzezińska
• http://archive.ics.uci.edu/ml/datasets/Molecular+Biology+%28Promoter+
Gene+Sequences%29
• 1. Title of Database: E. coli promoter gene sequences (DNA) with associated imperfect domain theory
• 2. Sources:
• (a) Creators: - promoter instances: University of Wisconsin Biochemistry Department
• (b) Date received: 6/30/90
• 3.Number of Instances: 106
• 4. Number of Attributes: 59 -- class (positive or negative) -- instance name -- 57 sequential nucleotide ("base-pair") positions
• 5. Frequencies: Promoters Non-Promoters --- --- A 27.7% 24.4% G 20.0%
25.4% T 30.2% 26.5% C 22.1% 23.7%
• Attribute #:
• 1 One of {+/-}, indicating the class ("+" = promoter).
• 2 The instance name (non-promoters named by position in the 1500-long nucleotide sequence provided by T. Record).
• 3-59 The remaining 57 fields are the sequence, starting at position -50 (p-50) and ending at position +7 (p7). Each of these fields is filled by one of {a, g, t, c}.
• 6. Missing Attribute Values: none
• 7. Class Distribution: 50% (53 positive instances, 53 negative instances)