Clever Algorithms: Nature-Inspired Programming Recipes

A book by Jason Brownlee

Negative Selection Algorithm

Negative Selection Algorithm, NSA.

Taxonomy

The Negative Selection Algorithm belongs to the field of Artificial Immune Systems. The algorithm is related to other Artificial Immune Systems such as the Clonal Selection Algorithm, and the Immune Network Algorithm.

Inspiration

The Negative Selection algorithm is inspired by the self-nonself discrimination behavior observed in the mammalian acquired immune system. The clonal selection theory of acquired immunity accounts for the adaptive behavior of the immune system including the ongoing selection and proliferation of cells that select-for potentially harmful (and typically foreign) material in the body. An interesting aspect of this process is that it is responsible for managing a population of immune cells that do not select-for the tissues of the body, specifically it does not create self-reactive immune cells known as auto-immunity. This problem is known as 'self-nonself discrimination' and it involves the preparation and on going maintenance of a repertoire of immune cells such that none are auto-immune. This is achieved by a negative selection process that selects-for and removes those cells that are self-reactive during cell creation and cell proliferation. This process has been observed in the preparation of T-lymphocytes, naive versions of which are matured using both a positive and negative selection process in the thymus.

Metaphor

The self-nonself discrimination principle suggests that the anticipatory guesses made in clonal selection are filtered by regions of infeasibility (protein conformations that bind to self-tissues). Further, the self-nonself immunological paradigm proposes the modeling of the unknown domain (encountered pathogen) by modeling the complement of what is known. This is unintuitive as the natural inclination is to categorize unknown information by what is different from that which is known, rather than guessing at the unknown information and filtering those guesses by what is known.

Strategy

The information processing principles of the self-nonself discrimination process via negative selection are that of a anomaly and change detection systems that model the anticipation of variation from what is known. The principle is achieved by building a model of changes, anomalies, or unknown (non-normal or non-self) data by generating patterns that do not match an existing corpus of available (self or normal) patterns. The prepared non-normal model is then used to either monitor the existing normal data or streams of new data by seeking matches to the non-normal patterns.

Procedure

Algorithm (below) provides a pseudocode listing of the detector generation procedure for the Negative Selection Algorithm. Algorithm (below) provides a pseudocode listing of the detector application procedure for the Negative Selection Algorithm.

Input: SelfData
Output: Repertoire
Repertoire $\leftarrow \emptyset$
While ($\neg$StopCondition())
    Detectors $\leftarrow$ GenerateRandomDetectors()
    For ($Detector_{i}$ $\in$ Repertoire)
        If ($\neg$Matches{$Detector_{i}$, SelfData})
            Repertoire $\leftarrow$ $Detector_{i}$
        End
    End
End
Return (Repertoire)

Pseudocode for detector generation.

Input: InputSamples, Repertoire
For ($Input_{i}$ $\in$ InputSamples)
    $Inputi_{class}$ $\leftarrow$ "non-self"
    For ($Detector_{i}$ $\in$ Repertoire)
        If (Matches{$Input_{i}$, $Detector_{i}$})
            $Inputi_{class}$ $\leftarrow$ "self"
            Break
        End
    End
End

Pseudocode for detector application.

Heuristics

The Negative Selection Algorithm was designed for change detection, novelty detection, intrusion detection and similar pattern recognition and two-class classification problem domains.
Traditional negative selection algorithms used binary representations and binary matching rules such as Hamming distance, and $r$-contiguous bits.
A data representation should be selected that is most suitable for a given problem domain, and a matching rule is in turn selected or tailored to the data representation.
Detectors can be prepared with no prior knowledge of the problem domain other than the known (normal or self) dataset.
The algorithm can be configured to balance between detector convergence (quality of the matches) and the space complexity (number of detectors).
The lack of dependence between detectors means that detector preparation and application is inherently parallel and suited for a distributed and parallel implementation, respectively.

Code Listing

Listing (below) provides an example of the Negative Selection Algorithm implemented in the Ruby Programming Language. The demonstration problem is a two-class classification problem where samples are drawn from a two-dimensional domain, where $x_i \in [0,1]$. Those samples in $1.0>x_i>0.5$ are classified as self and the rest of the space belongs to the non-self class. Samples are drawn from the self class and presented to the algorithm for the preparation of pattern detectors for classifying unobserved samples from the non-self class. The algorithm creates a set of detectors that do not match the self data, and are then applied to a set of randomly generated samples from the domain. The algorithm uses a real-valued representation. The Euclidean distance function is used during matching and a minimum distance value is specified as a user parameter for approximate matches between patterns. The algorithm includes the additional computationally expensive check for duplicates in the preparation of the self dataset and the detector set.

def random_vector(minmax)
  return Array.new(minmax.length) do |i|
    minmax[i][0] + ((minmax[i][1] - minmax[i][0]) * rand())
  end
end

def euclidean_distance(c1, c2)
  sum = 0.0
  c1.each_index {|i| sum += (c1[i]-c2[i])**2.0}
  return Math.sqrt(sum)
end

def contains?(vector, space)
  vector.each_with_index do |v,i|
    return false if v<space[i][0] or v>space[i][1]
  end
  return true
end

def matches?(vector, dataset, min_dist)
  dataset.each do |pattern|
    dist = euclidean_distance(vector, pattern[:vector])
    return true if dist <= min_dist
  end
  return false
end

def generate_detectors(max_detectors, search_space, self_dataset, min_dist)
  detectors = []
  begin
    detector = {:vector=>random_vector(search_space)}
    if !matches?(detector[:vector], self_dataset, min_dist)
      detectors << detector if !matches?(detector[:vector], detectors, 0.0)
    end
  end while detectors.size < max_detectors
  return detectors
end

def generate_self_dataset(num_records, self_space, search_space)
  self_dataset = []
  begin
    pattern = {}
    pattern[:vector] = random_vector(search_space)
    next if matches?(pattern[:vector], self_dataset, 0.0)
    if contains?(pattern[:vector], self_space)
      self_dataset << pattern
    end
  end while self_dataset.length < num_records
  return self_dataset
end

def apply_detectors(detectors, bounds, self_dataset, min_dist, trials=50)
  correct = 0
  trials.times do |i|
    input = {:vector=>random_vector(bounds)}
    actual = matches?(input[:vector], detectors, min_dist) ? "N" : "S"
    expected = matches?(input[:vector], self_dataset, min_dist) ? "S" : "N"
    correct += 1 if actual==expected
    puts "#{i+1}/#{trials}: predicted=#{actual}, expected=#{expected}"
  end
  puts "Done. Result: #{correct}/#{trials}"
  return correct
end

def execute(bounds, self_space, max_detect, max_self, min_dist)
  self_dataset = generate_self_dataset(max_self, self_space, bounds)
  puts "Done: prepared #{self_dataset.size} self patterns."
  detectors = generate_detectors(max_detect, bounds, self_dataset, min_dist)
  puts "Done: prepared #{detectors.size} detectors."
  apply_detectors(detectors, bounds, self_dataset, min_dist)
  return detectors
end

if __FILE__ == $0
  # problem configuration
  problem_size = 2
  search_space = Array.new(problem_size) {[0.0, 1.0]}
  self_space = Array.new(problem_size) {[0.5, 1.0]}
  max_self = 150
  # algorithm configuration
  max_detectors = 300
  min_dist = 0.05
  # execute the algorithm
  execute(search_space, self_space, max_detectors, max_self, min_dist)
end

Negative Selection Algorithm in Ruby

Download: negative_selection_algorithm.rb.

References

Primary Sources

The seminal negative selection algorithm was proposed by Forrest, et al. [Forrest1994] in which a population of detectors are prepared in the presence of known information, where those randomly generated detectors that match against known data are discarded. The population of pattern guesses in the unknown space then monitors the corpus of known information for changes. The algorithm was applied to the monitoring of files for changes (corruptions and infections by computer viruses), and later formalized as a change detection algorithm [Dhaeseleer1996a] [Dhaeseleer1996].

Learn More

The Negative Selection algorithm has been applied to the monitoring of changes in the execution behavior of Unix processes [Forrest1996] [Hofmeyr1998], and to monitor changes in remote connections of a network computer (intrusion detection) [Hofmeyr1999] [Hofmeyr1999a]. The application of the algorithm has been predominantly to virus host intrusion detection and their abstracted problems of classification (two-class) and anomaly detection. Esponda provides some interesting work showing some compression and privacy benefits provided by maintaining a negative model (non-self) [Darlington2005] Ji and Dasgupta provide a contemporary and detailed review of Negative Selection Algorithms covering topics such as data representations, matching rules, detector generation procedures, computational complexity, hybridization, and theoretical frameworks [Ji2007]. Recently, the validity of the application of negative selection algorithms in high-dimensional spaces has been questioned, specifically given the scalability of the approach in the face of the exponential increase in volume within the problem space [Stibor2006].

Bibliography

[Darlington2005]	C. F. Esponda Darlington, "Negative Representations of Information", [PhD Thesis] The University of New Mexico, 2005.
[Dhaeseleer1996]	P. D'haeseleer, "An immunological approach to change detection: theoretical results", in Proceedings of the 9th IEEE Computer Security Foundations Workshop, 1996.
[Dhaeseleer1996a]	P. D'haeseleer and S. Forrest and P. Helman, "An immunological approach to change detection: algorithms, analysis\n\tand implications", in Proceedings of the IEEE Symposium on Security and Privacy, 1996.
[Forrest1994]	S. Forrest and A. S. Perelson and L. Allen and R. Cherukuri, "Self-Nonself Discrimination in a Computer", in Proceedings of the 1992 IEEE Symposium on Security and Privacy, 1994.
[Forrest1996]	S. Forrest and S. A. Hofmeyr and A. Somayaji and T.\n\tA. Longstaff, "A Sense of Self for Unix Processes", in Proceedings of the 1996 IEEE Symposium on Security and Privacy, 1996.
[Hofmeyr1998]	S. A. Hofmeyr and S. Forrest and A. Somayaji, "Intrusion Detection using Sequences of System Calls", Journal of Computer Security, 1998.
[Hofmeyr1999]	S. Hofmeyr and S. Forrest, "Immunity by Design: An Artificial Immune System", in Proceedings of the Genetic and Evolutionary Computation Conference\n\n\t(GECCO), 1999.
[Hofmeyr1999a]	S. A. Hofmeyr, "An Immunological Model of Distributed Detection and its Application\n\tto Computer Security", [PhD Thesis] Department of Computer Sciences, University of New Mexico, 1999.
[Ji2007]	Z. Ji and D. Dasgupta, "Revisiting Negative Selection Algorithms", Evolutionary Computation, 2007.
[Stibor2006]	T. Stibor, "On the Appropriateness of Negative Selection for Anomaly Detection\n\tand Network Intrusion Detection", [PhD Thesis] Darmstadt University of Technology, 2006.

Please Note: This content was automatically generated from the book content and may contain minor differences.