Pretrain
Performance v.s. Data & Size
For a given compute budget, the best performances are not achieved by the largest models, but by smaller models trained on more data. – from LLaMA
用更多的数据训练,Size小一点也会有更好的效果。
LLaMA
Encoding: BPE
Training Data: 1.4T token, Wikipedia和Books Domain训练了两个epochs
Epoch meaning:
In the context of machine learning, an epoch is one complete pass through the training data1. It is typical to train a deep neural network for multiple epochs, meaning that the same data is used repeatedly to update the model’s parameters.
上篇AI算法