Self-Organizing Map *

        A self-organizing map (SOM) or self-organizing feature map (SOFM) is a type of artificial neural network (ANN) that is trained using unsupervised learning to produce a low-dimensional (typically two-dimensional), discretized representation of the input space of the training samples, called a map. Self-organizing maps are different from other artificial neural networks in the sense that they use a neighborhood function to preserve the topological properties of the input space.
        This makes SOMs useful for visualizing low-dimensional views of high-dimensional data, akin to multidimensional scaling. The artificial neural network introduced by the Finnish professor Teuvo Kohonen in the 1980s is sometimes called a Kohonen map or network.[1][2] The Kohonen net is a computationally convenient abstraction building on work on biologically neural models from the 1970s[3] and morphogenesis models dating back to Alan Turing in the 1950s[4]
        Like most artificial neural networks, SOMs operate in two modes: training and mapping. "Training" builds the map using input examples (a competitive process, also called vector quantization), while "mapping" automatically classifies a new input vector.
        A self-organizing map consists of components called nodes or neurons. Associated with each node are a weight vector of the same dimension as the input data vectors, and a position in the map space. The usual arrangement of nodes is a two-dimensional regular spacing in a hexagonal or rectangular grid. The self-organizing map describes a mapping from a higher-dimensional input space to a lower-dimensional map space. The procedure for placing a vector from data space onto the map is to find the node with the closest (smallest distance metric) weight vector to the data space vector.
        While it is typical to consider this type of network structure as related to feedforward networks where the nodes are visualized as being attached, this type of architecture is fundamentally different in arrangement and motivation.
        Useful extensions include using toroidal grids where opposite edges are connected and using large numbers of nodes.
        It has been shown that while self-organizing maps with a small number of nodes behave in a way that is similar to K-means, larger self-organizing maps rearrange data in a way that is fundamentally topological in character.[5]
        It is also common to use the U-Matrix.[6] The U-Matrix value of a particular node is the average distance between the node's weight vector and that of its closest neighbors.[7] In a square grid, for instance, we might consider the closest 4 or 8 nodes (the Von Neumann and Moore neighborhoods, respectively), or six nodes in a hexagonal grid.
        Large SOMs display emergent properties. In maps consisting of thousands of nodes, it is possible to perform cluster operations on the map itself.[8]
Learning algorithm
        The goal of learning in the self-organizing map is to cause different parts of the network to respond similarly to certain input patterns. This is partly motivated by how visual, auditory or other sensory information is handled in separate parts of the cerebral cortex in the human brain.[9]
        The weights of the neurons are initialized either to small random values or sampled evenly from the subspace spanned by the two largest principal component eigenvectors. With the latter alternative, learning is much faster because the initial weights already give a good approximation of SOM weights.[10]
        The network must be fed a large number of example vectors that represent, as close as possible, the kinds of vectors expected during mapping. The examples are usually administered several times as iterations.
The training utilizes competitive learning. When a training example is fed to the network, its Euclidean distance to all weight vectors is computed. The neuron whose weight vector is most similar to the input is called the best matching unit (BMU). The weights of the BMU and neurons close to it in the SOM lattice are adjusted towards the input vector. The magnitude of the change decreases with time and with distance (within the lattice) from the BMU. The update formula for a neuron v with weight vector Wv(s) is
Wv(s + 1) = Wv(s) + Θ(u, v, s) α(s)(D(t) - Wv(s)),
where s is the step index, t an index into the training sample, u is the index of the BMU for D(t), α(s) is a monotonically decreasing learning coefficient and D(t) is the input vector; Θ(u, v, s) is the neighborhood function which gives the distance between the neuron u and the neuron v in step s.[11] Depending on the implementations, t can scan the training data set systematically (t is 0, 1, 2...T-1, then repeat, T being the training sample's size), be randomly drawn from the data set (bootstrap sampling), or implement some other sampling method (such as jackknifing).
        The neighborhood function Θ(u, v, s) depends on the lattice distance between the BMU (neuron u) and neuron v. In the simplest form it is 1 for all neurons close enough to BMU and 0 for others, but a Gaussian function is a common choice, too. Regardless of the functional form, the neighborhood function shrinks with time.[9] At the beginning when the neighborhood is broad, the self-organizing takes place on the global scale. When the neighborhood has shrunk to just a couple of neurons, the weights are converging to local estimates. In some implementations the learning coefficient α and the neighborhood function Θ decrease steadily with increasing s, in others (in particular those where t scans the training data set) they decrease in step-wise fashion, once every T steps.
        This process is repeated for each input vector for a (usually large) number of cycles λ. The network winds up associating output nodes with groups or patterns in the input data set. If these patterns can be named, the names can be attached to the associated nodes in the trained net.
        During mapping, there will be one single winning neuron: the neuron whose weight vector lies closest to the input vector. This can be simply determined by calculating the Euclidean distance between input vector and weight vector.
        While representing input data as vectors has been emphasized in this article, it should be noted that any kind of object which can be represented digitally, which has an appropriate distance measure associated with it, and in which the necessary operations for training are possible can be used to construct a self-organizing map. This includes matrices, continuous functions or even other self-organizing maps.
Preliminary definitions
          Consider an n×m array of nodes, each of which contains a weight vector and is aware of its location in the array. Each weight vector is of the same dimension as the node's input vector. The weights may initially be set to random values.
        Now we need input to feed the map —The generated map and the given input exist in separate subspaces. We will create three vectors to represent colors. Colors can be represented by their red, green, and blue components. Consequently our input vectors will have three components, each corresponding to a color space. The input vectors will be:
R = <255, 0, 0>
G = <0, 255, 0>
B = <0, 0, 255>
The color training vector data sets used in SOM:
threeColors = [255, 0, 0], [0, 255, 0], [0, 0, 255]
eightColors = [0, 0, 0], [255, 0, 0], [0, 255, 0], [0, 0, 255], [255, 255, 0], [0, 255, 255], [255, 0, 255], [255, 255, 255]
The data vectors should preferably be normalized (vector length is equal to one) before training the SOM.
        Neurons (40×40 square grid) are trained for 250 iterations with a learning rate of 0.1 using the normalized Iris flower data set which has four-dimensional data vectors. Shown are: a color image formed by the first three dimensions of the four-dimensional SOM weight vectors (top left), a pseudo-color image of the magnitude of the SOM weight vectors (top right), a U-Matrix (Euclidean distance between weight vectors of neighboring cells) of the SOM (bottom left), and an overlay of data points (red: I. setosa, green: I. versicolor and blue: I. virginica) on the U-Matrix based on the minimum Euclidean distance between data vectors and SOM weight vectors (bottom right).
These are the variables needed, with vectors in bold,
  is a learning restraint due to iteration progress.
  1. Randomize the map's nodes' weight vectors
  2. Grab an input vector 
  3. Traverse each node in the map
4. Update the nodes in the neighborhood of the BMU (including the BMU itself) by pulling them closer to the input vector
5.Increase s and repeat from step 2 while
A variant algorithm:
1. Randomize the map's nodes' weight vectors
2. Traverse each input vector in the input data set
1. Traverse each node in the map
1. Use the Euclidean distance formula to find the similarity between the input vector and the map's node's weight vector
2. Track the node that produces the smallest distance (this node is the best matching unit, BMU)
2. Update the nodes in the neighborhood of the BMU (including the BMU itself) by pulling them closer to the input vector
1. Wv(s + 1) = Wv(s) + Θ(u, v, s) α(s)(D(t) - Wv(s))
3. Increase s and repeat from step 2 while
  1. Kohonen, Teuvo; Honkela, Timo (2007). "Kohonen Network". Scholarpedia.
  2. Kohonen, Teuvo (1982). "Self-Organized Formation of Topologically Correct Feature Maps". Biological Cybernetics 43 (1): 59–69. doi:10.1007/bf00337288
  3. Von der Malsburg, C (1973). "Self-organization of orientation sensitive cells in the striate cortex". Kybernetik 14: 85–100. doi:10.1007/bf00288907
  4. Turing, Alan (1952). "The chemical basis of morphogenesis". Phil. Trans. Of the Royal Society 237: 5–72.
  5. "Self-organizing map"
  6. Ultsch, Alfred; Siemon, H. Peter (1990). "Kohonen's Self Organizing Feature Maps for Exploratory Data Analysis". In Widrow, Bernard; Angeniol, Bernard. Proceedings of the International Neural Network Conference (INNC-90), Paris, France, July 9–13, 1990 1. Dordrecht, Netherlands: Kluwer. pp. 305–308. ISBN 978-0-7923-0831-7.
  7. Ultsch, Alfred (2003); U*-Matrix: A tool to visualize clusters in high dimensional data, Department of Computer Science, University of Marburg, Technical Report Nr. 36:1-12
  8. Ultsch, Alfred (2007). "Emergence in Self-Organizing Feature Maps". In Ritter, H.; Haschke, R. Proceedings of the 6th International Workshop on Self-Organizing Maps (WSOM '07). Bielefeld, Germany: Neuroinformatics Group. ISBN 978-3-00-022473-7.
  9. Haykin, Simon (1999). "9. Self-organizing maps". Neural networks - A comprehensive foundation (2nd ed.). Prentice-Hall. ISBN 0-13-908385-5.
  10. Kohonen, Teuvo (2005). "Intro to SOM". SOM Toolbox. Retrieved 2006-06-18.
Kohonen, Teuvo; Honkela, Timo (2011). "Kohonen network". Scholarpedia. Retrieved 2012-09-24.
The online help was made with Dr.Explain