SFT Llama-3 8b on Colab
SFT Llama-3 8b on Colab
Thanks to Unsloth and SFTTrainer, fine tune Llama-3 is much easier.
# .......................
# code block...
# .......................
from datasets import load_dataset
dataset = load_dataset("yahma/alpaca-cleaned", split = "train")
dataset = dataset.map(formatting_prompts_func, batched = True,)
from trl import SFTTrainer
from transformers import TrainingArguments
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = dataset,
dataset_text_field = "text",
max_seq_length = max_seq_length,
dataset_num_proc = 2,
packing = False, # Can make training 5x faster for short sequences.
args = TrainingArguments(
per_device_train_batch_size = 2,
gradient_accumulation_steps = 4,
warmup_steps = 5,
max_steps = 60,
learning_rate = 2e-4,
fp16 = not torch.cuda.is_bf16_supported(),
bf16 = torch.cuda.is_bf16_supported(),
logging_steps = 1,
optim = "adamw_8bit",
weight_decay = 0.01,
lr_scheduler_type = "linear",
seed = 3407,
output_dir = "outputs",
),
)
Now, trainer_stats = trainer.train()
.
(()) Unsloth - 2x faster free finetuning | Num GPUs = 1
\ /| Num examples = 51,760 | Num Epochs = 1
O^O/ _/ \ Batch size per device = 2 | Gradient Accumulation steps = 4
\ / Total batch size = 8 | Total steps = 60
"-____-" Number of trainable parameters = 41,943,040
[20/60 02:08 < 04:45, 0.14 it/s, Epoch 0.00/1]
Step | Training Loss |
---|---|
1 | 1.823800 |
2 | 2.305700 |
3 | 1.692200 |
4 | 1.947700 |
5 | 1.644100 |
6 | 1.606800 |
7 | 1.193300 |
8 | 1.259400 |
9 | 1.109100 |
10 | 1.165600 |
11 | 0.965300 |
12 | 1.003100 |
13 | 0.935100 |
14 | 1.060900 |
15 | 0.909700 |
16 | 0.911000 |
17 | 1.023800 |
18 | 1.285900 |
[60/60 07:14, Epoch 0/1]
Step | Training Loss |
---|---|
1 | 1.823800 |
2 | 2.305700 |
3 | 1.692200 |
4 | 1.947700 |
5 | 1.644100 |
6 | 1.606800 |
7 | 1.193300 |
8 | 1.259400 |
... | ... |
49 | 0.915200 |
50 | 1.049000 |
51 | 1.025600 |
52 | 0.927200 |
53 | 1.008400 |
54 | 1.169600 |
55 | 0.799000 |
56 | 1.029200 |
57 | 0.887200 |
58 | 0.830100 |
59 | 0.862400 |
60 | 0.905500 |
453.9762 seconds used for training. 7.57 minutes used for training. Peak reserved memory = 7.529 GB. Peak reserved memory for training = 1.935 GB. Peak reserved memory % of max memory = 51.051 %. Peak reserved memory for training % of max memory = 13.12 %.
Test:
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
alpaca_prompt.format(
"Continue the fibonnaci sequence.", # instruction
"1, 1, 2, 3, 5, 8", # input
"", # output - leave this blank for generation!
)
], return_tensors = "pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
tokenizer.batch_decode(outputs)
Results:
Setting
pad_token_id
toeos_token_id
: 128001 for open-end generation.
['<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nContinue the fibonnaci sequence.\n\n### Input:\n 1, 1, 2, 3, 5, 8\n\n### Response:\n 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025']
Then, you can save the model!
- Local Saving: Use
model.save_pretrained("lora_model")
to save locally. - Online Saving: Use
model.push_to_hub("your_name/lora_model", token = "...")
to save online to Hugging Face’s model hub. - Saving to Float16: Use
model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit",)
for saving in float16 format. - Saving to 4 bit: Use
model.save_pretrained_merged("model", tokenizer, save_method = "merged_4bit",)
for saving in 4 bit format.
To save to GGUF
/ llama.cpp
, you also can use save_pretrained_gguf
for local saving and push_to_hub_gguf
for uploading to HF.
And , one more question.
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
alpaca_prompt.format(
"Answer the thought-provoking question.", # instruction
"Is China the West 'North Korea'?", # input
"", # output - leave this blank for generation!
)
], return_tensors = "pt").to("cuda")
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 256)
Setting
pad_token_id
toeos_token_id
: 128001 for open-end generation.<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
Instruction:
Answer the thought-provoking question.
Input:
Is China the West 'North Korea'?
Response:
China is not the West's "North Korea." While China and North Korea share some similarities, such as being communist countries, they are two distinct nations with different histories, cultures, and political systems. China is a much larger and more powerful country than North Korea, with a population of over 1.4 billion people and a GDP of over $14 trillion. North Korea, on the other hand, has a population of around 25 million people and a GDP of around $40 billion. China is also a member of the United Nations and has a seat on the UN Security Council, while North Korea is not a member of the UN and has been subject to international sanctions for its nuclear weapons program. In conclusion, while China and North Korea may share some similarities, they are two separate countries with different political systems, economies, and international standing.<|end_of_text|>
Why so serious?