Clever Algorithms: Nature-Inspired Programming Recipes

A book by Jason Brownlee

Home | Read Online | Amazon | GoodReads | Google Books | PDF (code) | GitHub



Bayesian Optimization Algorithm

Bayesian Optimization Algorithm, BOA.

Taxonomy

The Bayesian Optimization Algorithm belongs to the field of Estimation of Distribution Algorithms, also referred to as Population Model-Building Genetic Algorithms (PMBGA) an extension to the field of Evolutionary Computation. More broadly, BOA belongs to the field of Computational Intelligence. The Bayesian Optimization Algorithm is related to other Estimation of Distribution Algorithms such as the Population Incremental Learning Algorithm, and the Univariate Marginal Distribution Algorithm. It is also the basis for extensions such as the Hierarchal Bayesian Optimization Algorithm (hBOA) and the Incremental Bayesian Optimization Algorithm (iBOA).

Inspiration

Bayesian Optimization Algorithm is a technique without an inspiration. It is related to the Genetic Algorithm and other Evolutionary Algorithms that are inspired by the biological theory of evolution by means of natural selection.

Strategy

The information processing objective of the technique is to construct a probabilistic model that describes the relationships between the components of fit solutions in the problem space. This is achieved by repeating the process of creating and sampling from a Bayesian network that contains the conditional dependancies, independencies, and conditional probabilities between the components of a solution. The network is constructed from the relative frequencies of the components within a population of high fitness candidate solutions. Once the network is constructed, the candidate solutions are discarded and a new population of candidate solutions are generated from the model. The process is repeated until the model converges on a fit prototype solution.

Procedure

Algorithm (below) provides a pseudocode listing of the Bayesian Optimization Algorithm for minimizing a cost function. The Bayesian network is constructed each iteration using a greedy algorithm. The network is assessed based on its fit of the information in the population of candidate solutions using either a Bayesian Dirichlet Metric (BD) [Pelikan1999a], or a Bayesian Information Criterion (BIC). Refer to Chapter 3 of Pelikan's book for a more detailed presentation of the pseudocode for BOA [Pelikan2005].

Input: $Bits_{num}$, $Population_{size}$, $Selection_{size}$
Output: $S_{best}$
Population $\leftarrow$ InitializePopulation{$Bits_{num}$, $Population_{size}$}
EvaluatePopulation{Population}
$S_{best}$ $\leftarrow$ GetBestSolution{Population}
While ($\neg$StopCondition())
    Selected $\leftarrow$ SelectFitSolutions{Population, $Selection_{size}$}
    Model $\leftarrow$ ConstructBayesianNetwork{Selected}
    Offspring $\leftarrow$ $\emptyset$
    For ($i$ To $Population_{size}$)
        Offspring $\leftarrow$ ProbabilisticallyConstructSolution{Model}
    End
    EvaluatePopulation{Offspring}
    $S_{best}$ $\leftarrow$ GetBestSolution{Offspring}
    Population $\leftarrow$ Combine{Population, Offspring}
End
Return ($S_{best}$)
Pseudocode for BOA.

Heuristics

Code Listing

Listing (below) provides an example of the Bayesian Optimization Algorithm implemented in the Ruby Programming Language. The demonstration problem is a maximizing binary optimization problem called OneMax that seeks a binary string of unity (all '1' bits). The objective function provides only an indication of the number of correct bits in a candidate string, not the positions of the correct bits.

The Bayesian Optimization Algorithm can be tricky to implement given the use of of a Bayesian Network at the core of the technique. The implementation of BOA provided is based on the the C++ implementation provided by Pelikan, version 1.0 [Pelikan1999b]. Specifically, the implementation uses the K2 metric to construct a Bayesian network from a population of candidate solutions [Cooper1992]. Essentially, this metric is a greedy algorithm that starts with an empty graph and adds the arc with the most gain each iteration until a maximum number of edges have been added or no further edges can be added. The result is a directed acyclic graph. The process that constructs the graph imposes limits, such as the maximum number of edges and the maximum number of in-bound connections per node.

New solutions are sampled from the graph by first topologically ordering the graph (so that bits can be generated based on their dependencies), then probabilistically sampling the bits based on the conditional probabilities encoded in the graph. The algorithm used for sampling the conditional probabilities from the network is Probabilistic Logic Sampling [Henrion1988]. The stopping condition is either the best solution for the problem is found or the system converges to a single bit pattern.

Given that the implementation was written for clarity, it is slow to execute and provides an great opportunity for improvements and efficiencies.

def onemax(vector)
  return vector.inject(0){|sum, value| sum + value}
end

def random_bitstring(size)
  return Array.new(size){ ((rand()<0.5) ? 1 : 0) }
end

def path_exists?(i, j, graph)
  visited, stack = [], [i]
  while !stack.empty?
    return true if stack.include?(j)
    k = stack.shift
    next if visited.include?(k)
    visited << k
    graph[k][:out].each {|m| stack.unshift(m) if !visited.include?(m)}
  end
  return false
end

def can_add_edge?(i, j, graph)
  return !graph[i][:out].include?(j) && !path_exists?(j, i, graph)
end

def get_viable_parents(node, graph)
  viable = []
  graph.size.times do |i|
    if node!=i and can_add_edge?(node, i, graph)
      viable << i
    end
  end
  return viable
end

def compute_count_for_edges(pop, indexes)
  counts = Array.new(2**(indexes.size)){0}
  pop.each do |p|
    index = 0
    indexes.reverse.each_with_index do |v,i|
      index += ((p[:bitstring][v] == 1) ? 1 : 0) * (2**i)
    end
    counts[index] += 1
  end
 return counts
end

def fact(v)
  return v <= 1 ? 1 : v*fact(v-1)
end

def k2equation(node, candidates, pop)
  counts = compute_count_for_edges(pop, [node]+candidates)
  total = nil
  (counts.size/2).times do |i|
    a1, a2 = counts[i*2], counts[(i*2)+1]
    rs = (1.0/fact((a1+a2)+1).to_f) * fact(a1).to_f * fact(a2).to_f
    total = (total.nil? ? rs : total*rs)
  end
  return total
end

def compute_gains(node, graph, pop, max=2)
  viable = get_viable_parents(node[:num], graph)
  gains = Array.new(graph.size) {-1.0}
  gains.each_index do |i|
    if graph[i][:in].size < max and viable.include?(i)
      gains[i] = k2equation(node[:num], node[:in]+[i], pop)
    end
  end
  return gains
end

def construct_network(pop, prob_size, max_edges=3*pop.size)
  graph = Array.new(prob_size) {|i| {:out=>[], :in=>[], :num=>i} }
  gains = Array.new(prob_size)
  max_edges.times do
    max, from, to = -1, nil, nil
    graph.each_with_index do |node, i|
      gains[i] = compute_gains(node, graph, pop)
      gains[i].each_with_index {|v,j| from,to,max = i,j,v if v>max}
    end
    break if max <= 0.0
    graph[from][:out] << to
    graph[to][:in] << from
  end
  return graph
end

def topological_ordering(graph)
  graph.each {|n| n[:count] = n[:in].size}
  ordered,stack = [], graph.select {|n| n[:count]==0}
  while ordered.size < graph.size
    current = stack.shift
    current[:out].each do |edge|
      node = graph.find {|n| n[:num]==edge}
      node[:count] -= 1
      stack << node if node[:count] <= 0
    end
    ordered << current
  end
  return ordered
end

def marginal_probability(i, pop)
  return pop.inject(0.0){|s,x| s + x[:bitstring][i]} / pop.size.to_f
end

def calculate_probability(node, bitstring, graph, pop)
  return marginal_probability(node[:num], pop) if node[:in].empty?
  counts = compute_count_for_edges(pop, [node[:num]]+node[:in])
  index = 0
  node[:in].reverse.each_with_index do |v,i|
    index += ((bitstring[v] == 1) ? 1 : 0) * (2**i)
  end
  i1 = index + (1*2**(node[:in].size))
  i2 = index + (0*2**(node[:in].size))
  a1, a2 = counts[i1].to_f, counts[i2].to_f
  return a1/(a1+a2)
end

def probabilistic_logic_sample(graph, pop)
  bitstring = Array.new(graph.size)
  graph.each do |node|
    prob = calculate_probability(node, bitstring, graph, pop)
    bitstring[node[:num]] = ((rand() < prob) ? 1 : 0)
  end
  return {:bitstring=>bitstring}
end

def sample_from_network(pop, graph, num_samples)
  ordered = topological_ordering(graph)
  samples = Array.new(num_samples) do
    probabilistic_logic_sample(ordered, pop)
  end
  return samples
end

def search(num_bits, max_iter, pop_size, select_size, num_children)
  pop = Array.new(pop_size) { {:bitstring=>random_bitstring(num_bits)} }
  pop.each{|c| c[:cost] = onemax(c[:bitstring])}
  best = pop.sort!{|x,y| y[:cost] <=> x[:cost]}.first
  max_iter.times do |it|
    selected = pop.first(select_size)
    network = construct_network(selected, num_bits)
    arcs = network.inject(0){|s,x| s+x[:out].size}
    children = sample_from_network(selected, network, num_children)
    children.each{|c| c[:cost] = onemax(c[:bitstring])}
    children.each {|c| puts " >>sample, f=#{c[:cost]} #{c[:bitstring]}"}
    pop = pop[0...(pop_size-select_size)] + children
    pop.sort! {|x,y| y[:cost] <=> x[:cost]}
    best = pop.first if pop.first[:cost] >= best[:cost]
    puts " >it=#{it}, arcs=#{arcs}, f=#{best[:cost]}, [#{best[:bitstring]}]"
    converged = pop.select {|x| x[:bitstring]!=pop.first[:bitstring]}.empty?
    break if converged or best[:cost]==num_bits
  end
  return best
end

if __FILE__ == $0
  # problem configuration
  num_bits = 20
  # algorithm configuration
  max_iter = 100
  pop_size = 50
  select_size = 15
  num_children = 25
  # execute the algorithm
  best = search(num_bits, max_iter, pop_size, select_size, num_children)
  puts "done! Solution: f=#{best[:cost]}/#{num_bits}, s=#{best[:bitstring]}"
end
Bayesian Optimization Algorithm in Ruby
Download: boa.rb.

References

Primary Sources

The Bayesian Optimization Algorithm was proposed by Pelikan, Goldberg, and Cantú-Paz in the technical report [Pelikan1998a], that was later published [Pelikan2002]. The technique was proposed as an extension to the state of Estimation of Distribution algorithms (such as the Univariate Marginal Distribution Algorithm and the Bivariate Marginal Distribution Algorithm) that used a Bayesian Network to model the relationships and conditional probabilities for the components expressed in a population of fit candidate solutions. Pelikan, Goldberg, and Cantú-Paz also described the approach applied to deceptive binary optimization problems (trap functions) in a paper that was published before the seminal journal article [Pelikan1999a].

Learn More

Pelikan and Goldberg described an extension to the approach called the Hierarchical Bayesian Optimization Algorithm (hBOA) [Pelikan2000] [Pelikan2001b]. The differences in the hBOA algorithm are that it replaces the decision tables (used to store the probabilities) with decision graphs and used a niching method called Restricted Tournament Replacement to maintain diversity in the selected set of candidate solutions used to construct the network models. Pelikan's work on BOA culminated in his PhD thesis that provides a detailed treatment of the approach, its configuration and application [Pelikan2002a]. Pelikan, Sastry, and Goldberg proposed the Incremental Bayesian Optimization Algorithm (iBOA) extension of the approach that removes the population and adds incremental updates to the Bayesian network [Pelikan2008].

Pelikan published a book that focused on the technique, walking through the development of probabilistic algorithms inspired by evolutionary computation, a detailed look at the Bayesian Optimization Algorithm (Chapter 3), the hierarchic extension to Hierarchical Bayesian Optimization Algorithm and demonstration studies of the approach on test problems [Pelikan2005].

Bibliography

[Cooper1992] G. F. Cooper and E. Herskovits, "A Bayesian Method for the Induction of Probabilistic Networks from\n\tData", Machine Learning, 1992.
[Henrion1988] M. Henrion, "Propagation of Uncertainty by Probabilistic Logic Sampling in Bayes\n\tNetworks", in Uncertainty in Artificial Intelligence 2, pages 149–163, Elsevier Science Publishing Company, Inc., 1988.
[Pelikan1998a] M. Pelikan and D. E. Goldberg and E. Cant&uacute;–Paz, "Linkage Problem, Distribution Estimation, and Bayesian Networks", Technical Report IlliGAL Report No. 98013, llinois Genetic Algorithms Laboratory, University of Illinois at\n\tUrbana-Champaign, 1998.
[Pelikan1999a] M. Pelikan and D. E. Goldberg and E. Cant&uacute;–Paz, "BOA: The Bayesian optimization algorithm.", in Proceedings of the Genetic and Evolutionary Computation Conference\n\t(GECCO-99), 1999.
[Pelikan1999b] M. Pelikan, "A Simple Implementation of the Bayesian Optimization Algorithm (BOA)\n\tin C++ (version 1.0)", Technical Report IlliGAL Report No. 99011, University of Illinois at Urbana-Champaign, Illinois Genetic Algorithms\n\tLaboratory, Urbana, IL, 1999.
[Pelikan2000] M. Pelikan and D. E. Goldberg, "Hierarchical Problem Solving and the Bayesian Optimization Algorithms", in Genetic and Evolutionary Computation Conference 2000 (GECCO-2000), 2000.
[Pelikan2001b] M. Pelikan and D. E. Goldberg, "Escaping hierarchical traps with competent genetic algorithms", in Proceedings of the Genetic and Evolutionary Computation Conference\n\t(GECCO-2001), 2001.
[Pelikan2002] M. Pelikan and D. E. Goldberg and E. Cant&uacute;–Paz, "Linkage Problem, Distribution Estimation, and Bayesian Networks", Evolutionary Computation, 2002.
[Pelikan2002a] M. Pelikan, "Bayesian optimization algorithm: From single level to hierarchy", [PhD Thesis] University of Illinois at Urbana-Champaign, 2002.
[Pelikan2005] M. Pelikan, "Hierarchical Bayesian Optimization Algorithm: Toward a New Generation\n\tof Evolutionary Algorithms", Springer, 2005.
[Pelikan2008] M. Pelikan and K. Sastry and D. E. Goldberg, "iBOA: The Incremental Bayesian Optimization Algorithms", in Proceedings of the Genetic and Evolutionary Computation Conference\n\t(GECCO-2008), 2008.



Please Note: This content was automatically generated from the book content and may contain minor differences.