Hypersimensional computing: wide embedding meets Hopfield

#papers

This post if from a series of quick notes written primarily for personal usage while reading random ML/SWE/CS papers. As such they might be incomprehensible and/or flat out wrong.

2004.11204, Symbolic Representation and Learning With Hyperdimensional Computing
Main primitive - hypervector: ~10 000+ bits vectors (variations with integer/float vectors exist)
- Information related to each object/concept represented in distributed manner across whole vector (equally significant bits)
- Recombination of these vectors preserves information -> allows entities composition using well defined vector operations
- Super wide bit embedding + composition operators with explicit similarity-like classification/Hopfield-like asociativity
Benefits:
- Wide representation brings robustness against noise
- Only three operators: addition (normal), multiplication (xor), permutation (actual permutation)
- Very easily super-HW accelerated
Initial step: encoding to hypervectors - essentially embedding problem - very important for final accuracy
- Goal pre-trained DNNs (e.g. initial layers of CNNs)
- Random gram-based encoding;
  - Random level hypervectors are generated (e.g. per letter)
  - Rotationally permutated by their spatial location (e.g. position in n-gram)
  - Result for whole n-gram is binding (multiplication) all of these together, result for text is summation of n-grams
- Random record-based encoding; e.g. for encoding speech using per time-bucket:
  - Position hypervectors: generated orthogonal to each other
  - Quantized level hypervectors: correlated based on distance (next higher level only portion of random bits flipped)
  - Result for whole speech clip is summation of all (position, level) pairs bound together (multiplication)
Allows quantization, compression (cut & replace w. sum of short orth. vectors), adaptive tain. (~iterative update on missclass.)
Procedure (e.g. image classification):
- Training
  - Training images hashed (using pre-trained DNN) to binary representation, transformations are made
  - Aggregate per class to form hypervectors (e.g. consensus sum: most frequent bit at every pos. wins)
  - Class hypervectors are stored in associative memory (e.g. Hopfield-like, explicit list)
- Inference:
  - Hash input image, same transformations as during training
  - Query closest class hypervector (~KNN-like) and output most similar -> class with closest hypervector (distance metric)
  - Possible explicit small NN to transform outputs of similarity query (per class similarity) instead of argmax

Written by Petr Houška on Jan 30, 2021