Hypersimensional computing: wide embedding meets Hopfield


This post if from a series of quick notes written primarily for personal usage while reading random ML/SWE/CS papers. As such they might be incomprehensible and/or flat out wrong.

  • 2004.11204, Symbolic Representation and Learning With Hyperdimensional Computing
  • Main primitive - hypervector: ~10 000+ bits vectors (variations with integer/float vectors exist)
    • Information related to each object/concept represented in distributed manner across whole vector (equally significant bits)
    • Recombination of these vectors preserves information -> allows entities composition using well defined vector operations
    • Super wide bit embedding + composition operators with explicit similarity-like classification/Hopfield-like asociativity
  • Benefits:
    • Wide representation brings robustness against noise
    • Only three operators: addition (normal), multiplication (xor), permutation (actual permutation)
    • Very easily super-HW accelerated
  • Initial step: encoding to hypervectors - essentially embedding problem - very important for final accuracy
    • Goal pre-trained DNNs (e.g. initial layers of CNNs)
    • Random gram-based encoding;
      • Random level hypervectors are generated (e.g. per letter)
      • Rotationally permutated by their spatial location (e.g. position in n-gram)
      • Result for whole n-gram is binding (multiplication) all of these together, result for text is summation of n-grams
    • Random record-based encoding; e.g. for encoding speech using per time-bucket:
      • Position hypervectors: generated orthogonal to each other
      • Quantized level hypervectors: correlated based on distance (next higher level only portion of random bits flipped)
      • Result for whole speech clip is summation of all (position, level) pairs bound together (multiplication)
  • Allows quantization, compression (cut & replace w. sum of short orth. vectors), adaptive tain. (~iterative update on missclass.)
  • Procedure (e.g. image classification):
    • Training
      • Training images hashed (using pre-trained DNN) to binary representation, transformations are made
      • Aggregate per class to form hypervectors (e.g. consensus sum: most frequent bit at every pos. wins)
      • Class hypervectors are stored in associative memory (e.g. Hopfield-like, explicit list)
    • Inference:
      • Hash input image, same transformations as during training
      • Query closest class hypervector (~KNN-like) and output most similar -> class with closest hypervector (distance metric)
      • Possible explicit small NN to transform outputs of similarity query (per class similarity) instead of argmax
Written by on