Configurations? #49
-
|
what hyperparameters are good in case of gpu cluster ? |
Beta Was this translation helpful? Give feedback.
Answered by
codeaddict-119
May 23, 2026
Replies: 1 comment
-
|
batch_size=16, block_size=8192, max_iters=50000, eval_interval=1, learning_rate=3e-4, n_embd=6144, n_head=48, n_layer=48, dropout=0.0 . The tokenizer must use GPT4 i think- BPE o200k_base |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
Eamon2009
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
batch_size=16, block_size=8192, max_iters=50000, eval_interval=1, learning_rate=3e-4, n_embd=6144, n_head=48, n_layer=48, dropout=0.0 . The tokenizer must use GPT4 i think- BPE o200k_base