【AI】LLM Learning

 

Pretrain

Performance v.s. Data & Size

For a given compute budget, the best performances are not achieved by the largest models, but by smaller models trained on more data. – from LLaMA

用更多的数据训练,Size小一点也会有更好的效果。

LLaMA

Encoding: BPE

Training Data: 1.4T token, Wikipedia和Books Domain训练了两个epochs

Epoch meaning:

In the context of machine learning, an epoch is one complete pass through the training data1. It is typical to train a deep neural network for multiple epochs, meaning that the same data is used repeatedly to update the model’s parameters.