Hi,
Could it be that "o1" likely refers to "Optimizer 1".
And what could this include?
- Compressing weights or activations into
fewer bits can significantly reduce computation,
especially in hardware, mimicking O(1)-like
efficiency for certain operations.
- Removing redundant connections in the neural
network leads to fewer computations. Sparse matrix
operations can optimize dense workloads, making specific
inference tasks faster.
- Large models are distilled into smaller ones
with similar capabilities, reducing computational
costs during inference. If the optimized paths are
cleverly structured, their complexity might be
closer to O(1) for lookup-style tasks.
So maybe Ilya Sutskever wants to tell us, in his
recent talk when he refers to the 700g brain line:
Look we did the same as biological evolution,
we found a way to construct more compact brains.
Bye
Post by Mild ShockHi,
Ilya Sutskever: The Next Oppenheimer
http://youtu.be/jryDWOKikys
Ilya Sutskever: Sequence to Sequence Learning
http://youtu.be/WQQdd6qGxNs
Bye