Clever Algorithms: Nature-Inspired Programming Recipes

A book by Jason Brownlee

Learning Classifier System

Learning Classifier System, LCS.

Taxonomy

The Learning Classifier System algorithm is both an instance of an Evolutionary Algorithm from the field of Evolutionary Computation and an instance of a Reinforcement Learning algorithm from Machine Learning. Internally, Learning Classifier Systems make use of a Genetic Algorithm. The Learning Classifier System is a theoretical system with a number of implementations. The two main approaches to implementing and investigating the system empirically are the Pittsburgh-style that seeks to optimize the whole classifier, and the Michigan-style that optimize responsive rulesets. The Michigan-style Learning Classifier is the most common and is comprised of two versions: the ZCS (zeroth-level classifier system) and the XCS (accuracy-based classifier system).

Strategy

The objective of the Learning Classifier System algorithm is to optimize payoff based on exposure to stimuli from a problem-specific environment. This is achieved by managing credit assignment for those rules that prove useful and searching for new rules and new variations on existing rules using an evolutionary process.

Procedure

The actors of the system include detectors, messages, effectors, feedback, and classifiers. Detectors are used by the system to perceive the state of the environment. Messages are the discrete information packets passed from the detectors into the system. The system performs information processing on messages, and messages may directly result in actions in the environment. Effectors control the actions of the system on and within the environment. In addition to the system actively perceiving via its detections, it may also receive directed feedback from the environment (payoff). Classifiers are condition-action rules that provide a filter for messages. If a message satisfies the conditional part of the classifier, the action of the classifier triggers. Rules act as message processors. Message a fixed length bitstring. A classifier is defined as a ternary string with an alphabet $\in {1, 0, #}$, where the $#$ represents do not care (matching either 1 or 0).

The processing loop for the Learning Classifier system is as follows:

Messages from the environment are placed on the message list.
The conditions of each classifier are checked to see if they are satisfied by at least one message in the message list.
All classifiers that are satisfied participate in a competition, those that win post their action to the message list.
All messages directed to the effectors are executed (causing actions in the environment).
All messages on the message list from the previous cycle are deleted (messages persist for a single cycle).

The algorithm may be described in terms of the main processing loop and two sub-algorithms: a reinforcement learning algorithm such as the bucket brigade algorithm or Q-learning, and a genetic algorithm for optimization of the system. Algorithm (below) provides a pseudocode listing of the high-level processing loop of the Learning Classifier System, specifically the XCS as described by Butz and Wilson [Butz2002a].

Input: EnvironmentDetails
Output: Population
env $\leftarrow$ InitializeEnvironment{EnvironmentDetails}
Population $\leftarrow$ InitializePopulation()
$ActionSet_{t-1}$ $\leftarrow$ $\emptyset$
$Input_{t-1}$ $\leftarrow$ $\emptyset$
$Reward_{t-1}$ $\leftarrow$ $\emptyset$
While ($\neg$StopCondition())
    $Input_{t}$ $\leftarrow$ env
    Matchset $\leftarrow$ GenerateMatchSet{Population, $Input_{t}$}
    Prediction $\leftarrow$ GeneratePrediction{Matchset}
    Action $\leftarrow$ SelectionAction{Prediction}
    $ActionSet_{t}$ $\leftarrow$ GenerateActionSet{Action, Matchset}
    $Reward_{t}$ $\leftarrow$ ExecuteAction{Action, env}
    If ($ActionSet_{t-1}$ $\neq$ $\emptyset$)
        $Payoff_{t}$ $\leftarrow$ CalculatePayoff{$Reward_{t-1}$, Prediction}
        PerformLearning{$ActionSet_{t-1}$, $Payoff_{t}$, Population}
        RunGeneticAlgorithm{$ActionSet_{t-1}$, $Input_{t-1}$, Population}
    End
    If (LastStepOfTask{env, Action})
        $Payoff_{t}$ $\leftarrow$ $Reward_{t}$
        PerformLearning{$ActionSet_{t}$, $Payoff_{t}$, Population}
        RunGeneticAlgorithm{$ActionSet_{t}$, $Input_{t}$, Population}
        $ActionSet_{t-1}$ $\leftarrow$ $\emptyset$
    Else
        $ActionSet_{t-1}$ $\leftarrow$ $ActionSet_{t}$
        $Input_{t-1}$ $\leftarrow$ $Input_{t}$
        $Reward_{t-1}$ $\leftarrow$ $Reward_{t}$
    End
End

Pseudocode for the LCS.

Heuristics

The majority of the heuristics in this section are specific to the XCS Learning Classifier System as described by Butz and Wilson [Butz2002a].

Learning Classifier Systems are suited for problems with the following characteristics: perpetually novel events with significant noise, continual real-time requirements for action, implicitly or inexactly defined goals, and sparse payoff or reinforcement obtainable only through long sequences of tasks.
The learning rate $\beta$ for a classifier's expected payoff, error, and fitness are typically in the range $[0.1,0.2]$.
The frequency of running the genetic algorithm $\theta_{GA}$ should be in the range $[25,50]$.
The discount factor used in multi-step programs $\gamma$ are typically in the around $0.71$.
The minimum error whereby classifiers are considered to have equal accuracy $\epsilon_{0}$ is typically 10% of the maximum reward.
The probability of crossover in the genetic algorithm $\chi$ is typically in the range $[0.5,1.0]$.
The probability of mutating a single position in a classifier in the genetic algorithm $\mu$ is typically in the range $[0.01,0.05]$.
The experience threshold during classifier deletion $\theta_{del}$ is typically about 20.
The experience threshold for a classifier during subsumption $\theta_{sub}$ is typically around 20.
The initial values for a classifier's expected payoff $p_1$, error $\epsilon_1$, and fitness $f_1$ are typically small and close to zero.
The probability of selecting a random action for the purposes of exploration $p_{exp}$ is typically close to 0.5.
The minimum number of different actions that must be specified in a match set $\theta_{mna}$ is usually the total number of possible actions in the environment for the input.
Subsumption should be used on problem domains that are known contain well defined rules for mapping inputs to outputs.

Code Listing

Listing (below) provides an example of the Learning Classifier System algorithm implemented in the Ruby Programming Language. The problem is an instance of a Boolean multiplexer called the 6-multiplexer. It can be described as a classification problem, where each of the $2^6$ patterns of bits is associated with a boolean class $\in {1,0}$. For this problem instance, the first two bits may be decoded as an address into the remaining four bits that specify the class (for example in 100011, '10' decode to the index of '2' in the remaining 4 bits making the class '1'). In propositional logic this problem instance may be described as $F=(\neg x_0) (\neg x_1) x_2 + (\neg x_0) x_1 x_3 + x_0 (\neg x_1) x_4 + x_0 x_1 x_5$. The algorithm is an instance of XCS based on the description provided by Butz and Wilson [Butz2002a] with the parameters based on the application of XCS to Boolean multiplexer problems by Wilson [Wilson1995] [Wilson1998]. The population is grown as needed, and subsumption which would be appropriate for the Boolean multiplexer problem was not used for brevity. The multiplexer problem is a single step problem, so the complexities of delayed payoff are not required. A number of parameters were hard coded to recommended values, specifically: $\alpha=0.1$, $v=-0.5$, $\delta=0.1$ and $P_{#}=\frac{1}{3}$.

def neg(bit)
  return (bit==1) ? 0 : 1
end

def target_function(s)
  ints = Array.new(6){|i| s[i].chr.to_i}
  x0,x1,x2,x3,x4,x5 = ints
  return neg(x0)*neg(x1)*x2 + neg(x0)*x1*x3 + x0*neg(x1)*x4 + x0*x1*x5
end

def new_classifier(condition, action, gen, p1=10.0, e1=0.0, f1=10.0)
  other = {}
  other[:condition],other[:action],other[:lasttime] = condition, action, gen
  other[:pred], other[:error], other[:fitness] = p1, e1, f1
  other[:exp], other[:setsize], other[:num] = 0.0, 1.0, 1.0
  return other
end

def copy_classifier(parent)
  copy = {}
  parent.keys.each do |k|
    copy[k] = (parent[k].kind_of? String) ? ""+parent[k] : parent[k]
  end
  copy[:num],copy[:exp] = 1.0, 0.0
  return copy
end

def random_bitstring(size=6)
  return (0...size).inject(""){|s,i| s+((rand<0.5) ? "1" : "0")}
end

def calculate_deletion_vote(classifier, pop, del_thresh, f_thresh=0.1)
  vote = classifier[:setsize] * classifier[:num]
  total = pop.inject(0.0){|s,c| s+c[:num]}
  avg_fitness = pop.inject(0.0){|s,c| s + (c[:fitness]/total)}
  derated = classifier[:fitness] / classifier[:num].to_f
  if classifier[:exp]>del_thresh and derated<(f_thresh*avg_fitness)
    return vote * (avg_fitness / derated)
  end
  return vote
end

def delete_from_pop(pop, pop_size, del_thresh=20.0)
  total = pop.inject(0) {|s,c| s+c[:num]}
  return if total <= pop_size
  pop.each {|c| c[:dvote] = calculate_deletion_vote(c, pop, del_thresh)}
  vote_sum = pop.inject(0.0) {|s,c| s+c[:dvote]}
  point = rand() * vote_sum
  vote_sum, index = 0.0, 0
  pop.each_with_index do |c,i|
    vote_sum += c[:dvote]
    if vote_sum >= point
      index = i
      break
    end
  end
  if pop[index][:num] > 1
    pop[index][:num] -= 1
  else
    pop.delete_at(index)
  end
end

def generate_random_classifier(input, actions, gen, rate=1.0/3.0)
  condition = ""
  input.size.times {|i| condition << ((rand<rate) ? '#' : input[i].chr)}
  action = actions[rand(actions.size)]
  return new_classifier(condition, action, gen)
end

def does_match?(input, condition)
  input.size.times do |i|
    return false if condition[i].chr!='#' and input[i].chr!=condition[i].chr
  end
  return true
end

def get_actions(pop)
  actions = []
  pop.each do |c|
    actions << c[:action] if !actions.include?(c[:action])
  end
  return actions
end

def generate_match_set(input, pop, all_actions, gen, pop_size)
  match_set = pop.select{|c| does_match?(input, c[:condition])}
  actions = get_actions(match_set)
  while actions.size < all_actions.size do
    remaining = all_actions - actions
    classifier = generate_random_classifier(input, remaining, gen)
    pop << classifier
    match_set << classifier
    delete_from_pop(pop, pop_size)
    actions << classifier[:action]
  end
  return match_set
end

def generate_prediction(match_set)
  pred = {}
  match_set.each do |classifier|
    key = classifier[:action]
    pred[key] = {:sum=>0.0,:count=>0.0,:weight=>0.0} if pred[key].nil?
    pred[key][:sum] += classifier[:pred]*classifier[:fitness]
    pred[key][:count] += classifier[:fitness]
  end
  pred.keys.each do |key|
    pred[key][:weight] = 0.0
    if pred[key][:count] > 0
      pred[key][:weight] = pred[key][:sum]/pred[key][:count]
    end
  end
  return pred
end

def select_action(predictions, p_explore=false)
  keys = Array.new(predictions.keys)
  return keys[rand(keys.size)] if p_explore
  keys.sort!{|x,y| predictions[y][:weight]<=>predictions[x][:weight]}
  return keys.first
end

def update_set(action_set, reward, beta=0.2)
  sum = action_set.inject(0.0) {|s,other| s+other[:num]}
  action_set.each do |c|
    c[:exp] += 1.0
    if c[:exp] < 1.0/beta
	      c[:error] = (c[:error]*(c[:exp]-1.0)+(reward-c[:pred]).abs)/c[:exp]
	      c[:pred] = (c[:pred] * (c[:exp]-1.0) + reward) / c[:exp]
	      c[:setsize] = (c[:setsize]*(c[:exp]-1.0)+sum) / c[:exp]
	  else
	      c[:error] += beta * ((reward-c[:pred]).abs - c[:error])
	      c[:pred] += beta * (reward-c[:pred])
	      c[:setsize] += beta * (sum - c[:setsize])
	  end
  end
end

def update_fitness(action_set, min_error=10, l_rate=0.2, alpha=0.1, v=-5.0)
  sum = 0.0
  acc = Array.new(action_set.size)
  action_set.each_with_index do |c,i|
    acc[i] = (c[:error]<min_error) ? 1.0 : alpha*(c[:error]/min_error)**v
    sum += acc[i] * c[:num].to_f
  end
  action_set.each_with_index do |c,i|
    c[:fitness] += l_rate * ((acc[i] * c[:num].to_f) / sum - c[:fitness])
  end
end

def can_run_genetic_algorithm(action_set, gen, ga_freq)
  return false if action_set.size <= 2
  total = action_set.inject(0.0) {|s,c| s+c[:lasttime]*c[:num]}
  sum = action_set.inject(0.0) {|s,c| s+c[:num]}
  return true if gen - (total/sum) > ga_freq
  return false
end

def binary_tournament(pop)
  i, j = rand(pop.size), rand(pop.size)
  j = rand(pop.size) while j==i
  return (pop[i][:fitness] > pop[j][:fitness]) ? pop[i] : pop[j]
end

def mutation(cl, action_set, input, rate=0.04)
  cl[:condition].size.times do |i|
    if rand() < rate
      cl[:condition][i] = (cl[:condition][i].chr=='#') ? input[i] : '#'
    end
  end
  if rand() < rate
    subset = action_set - [cl[:action]]
    cl[:action] = subset[rand(subset.size)]
  end
end

def uniform_crossover(parent1, parent2)
  child = ""
  parent1.size.times do |i|
    child << ((rand()<0.5) ? parent1[i].chr : parent2[i].chr)
  end
  return child
end

def insert_in_pop(cla, pop)
  pop.each do |c|
    if cla[:condition]==c[:condition] and cla[:action]==c[:action]
      c[:num] += 1
      return
    end
  end
  pop << cla
end

def crossover(c1, c2, p1, p2)
  c1[:condition] = uniform_crossover(p1[:condition], p2[:condition])
  c2[:condition] = uniform_crossover(p1[:condition], p2[:condition])
  c2[:pred] = c1[:pred] = (p1[:pred]+p2[:pred])/2.0
  c2[:error] = c1[:error] = 0.25*(p1[:error]+p2[:error])/2.0
  c2[:fitness] = c1[:fitness] = 0.1*(p1[:fitness]+p2[:fitness])/2.0
end

def run_ga(actions, pop, action_set, input, gen, pop_size, crate=0.8)
  p1, p2 = binary_tournament(action_set), binary_tournament(action_set)
  c1, c2 = copy_classifier(p1), copy_classifier(p2)
  crossover(c1, c2, p1, p2) if rand() < crate
  [c1,c2].each do |c|
    mutation(c, actions, input)
    insert_in_pop(c, pop)
  end
  while pop.inject(0) {|s,c| s+c[:num]} > pop_size
    delete_from_pop(pop, pop_size)
  end
end

def train_model(pop_size, max_gens, actions, ga_freq)
  pop, perf = [], []
  max_gens.times do |gen|
    explore = gen.modulo(2)==0
    input = random_bitstring()
    match_set = generate_match_set(input, pop, actions, gen, pop_size)
    pred_array = generate_prediction(match_set)
    action = select_action(pred_array, explore)
    reward = (target_function(input)==action.to_i) ? 1000.0 : 0.0
    if explore
      action_set = match_set.select{|c| c[:action]==action}
      update_set(action_set, reward)
      update_fitness(action_set)
      if can_run_genetic_algorithm(action_set, gen, ga_freq)
        action_set.each {|c| c[:lasttime] = gen}
        run_ga(actions, pop, action_set, input, gen, pop_size)
      end
    else
      e,a = (pred_array[action][:weight]-reward).abs, ((reward==1000.0)?1:0)
      perf << {:error=>e,:correct=>a}
      if perf.size >= 50
        err = (perf.inject(0){|s,x|s+x[:error]}/perf.size).round
        acc = perf.inject(0.0){|s,x|s+x[:correct]}/perf.size
        puts " >iter=#{gen+1} size=#{pop.size}, error=#{err}, acc=#{acc}"
        perf = []
      end
    end
  end
  return pop
end

def test_model(system, num_trials=50)
  correct = 0
  num_trials.times do
    input = random_bitstring()
    match_set = system.select{|c| does_match?(input, c[:condition])}
    pred_array = generate_prediction(match_set)
    action = select_action(pred_array, false)
    correct += 1 if target_function(input) == action.to_i
  end
  puts "Done! classified correctly=#{correct}/#{num_trials}"
  return correct
end

def execute(pop_size, max_gens, actions, ga_freq)
  system = train_model(pop_size, max_gens, actions, ga_freq)
  test_model(system)
  return system
end

if __FILE__ == $0
  # problem configuration
  all_actions = ['0', '1']
  # algorithm configuration
  max_gens, pop_size = 5000, 200
  ga_freq = 25
  # execute the algorithm
  execute(pop_size, max_gens, all_actions, ga_freq)
end

Learning Classifier System in Ruby

Download: learning_classifier_system.rb.

References

Primary Sources

Early ideas on the theory of Learning Classifier Systems were proposed by Holland [Holland1976] [Holland1977], culminating in a standardized presentation a few years later [Holland1980]. A number of implementations of the theoretical system were investigated, although a taxonomy of the two main streams was proposed by De Jong [Jong1988]: 1) Pittsburgh-style proposed by Smith [Smith1980] [Smith1983] and 2) Holland-style or Michigan-style Learning classifiers that are further comprised of the Zeroth-level classifier (ZCS) [Wilson1994] and the accuracy-based classifier (XCS) [Wilson1995].

Learn More

Booker, Goldberg, and Holland provide a classical introduction to Learning Classifier Systems including an overview of the state of the field and the algorithm in detail [Booker1989]. Wilson and Goldberg also provide an introduction and review of the approach, taking a more critical stance [Wilson1989]. Holmes et al. provide a contemporary review of the field focusing both on a description of the method and application areas to which the approach has been demonstrated successfully [Holmes2002]. Lanzi, Stolzmann, and Wilson provide a seminal book in the field as a collection of papers covering the basics, advanced topics, and demonstration applications; a particular highlight from this book is the first section that provides a concise description of Learning Classifier Systems by many leaders and major contributors to the field [Holland2000], providing rare insight. Another paper from Lanzi and Riolo's book provides a detailed review of the development of the approach as it matured throughout the 1990s [Lanzi2000a]. Bull and Kovacs provide a second book introductory book to the field focusing on the theory of the approach and its practical application [Bull2005].

Bibliography

[Booker1989]	L. B. Booker and D. E. Goldberg and J. H. Holland, "Classifier systems and genetic algorithms", Artificial Intelligence, 1989.
[Bull2005]	L. Bull and T. Kovacs, "Foundations of learning classifier systems", Springer, 2005.
[Butz2002a]	M. V. Butz and S. W. Wilson, "An algorithmic description of XCS", Journal of Soft Computing, 2002.
[Holland1976]	J. H. Holland, "Adaptation", in Progress in Theoretical Biology IV, pages 263–293, Academic Press, 1976.
[Holland1977]	J. H. Holland and J. S. Reitman, "Cognitive systems based on adaptive algorithms", ACM SIGART Bulletin, 1977.
[Holland1980]	J. H. Holland, "Adaptive algorithms for discovering and using general patterns in\n\tgrowing knowledge-bases", International Journal of Policy Analysis and Information Systems, 1980.
[Holland2000]	J. H. Holland and L. B. Booker and M. Colombetti and M. Dorigo and\n\tD. E. Goldberg and S. Forrest and R. L. Riolo and R. E. Smith and\n\tP. L. Lanzi and W. Stolzmann and S. W. Wilson, "What is a learning classifier system?", in Learning classifier systems: from foundations to applications, pages 3–32, Springer, 2000.
[Holmes2002]	J. H. Holmes and P. L. Lanzi and W. Stolzmann and S. W. Wilson, "Learning classifier systems: New models, successful applications", Information Processing Letters, 2002.
[Jong1988]	K. De Jong, "Learning with Genetic Algorithms: An Overview", Machine Learning, 1988.
[Lanzi2000a]	P. L. Lanzi and R. L. Riolo, "A Roadmap to the Last Decade of Learning Classifier System Research", in Learning classifier systems: from foundations to applications, pages 33-62, Springer, 2000.
[Smith1980]	S. F. Smith, "A learning system based on genetic adaptive algorithms", [PhD Thesis] Department of Computer Science, University of Pittsburgh, 1980.
[Smith1983]	S. Smith, "Flexible Learning of Problem Solving Heuristics Through Adaptive\n\tSearch", in Proceedings 8th International Joint Conference on Artificial Intelligence, 1983.
[Wilson1989]	S. W. Wilson and D. E. Goldberg, "A critical review of classifier systems", in Proceedings of the third international conference on Genetic algorithms, 1989.
[Wilson1994]	S. W. Wilson, "ZCS: A Zeroth Level Classifier Systems", Evolutionary Computation, 1994.
[Wilson1995]	S. W. Wilson, "Classifier Fitness Based on Accuracy", Evolutionary Computation, 1995.
[Wilson1998]	S. W. Wilson, "Generalization in the XCS classifier systems", in Genetic Programming 1998: Proceedings of the Third Annual Conference, 1998.

Please Note: This content was automatically generated from the book content and may contain minor differences.