1. Core Training Parameters

per_device_train_batch_size

gradient_accumulation_steps

num_train_epochs

learning_rate

max_grad_norm