Configure Parameters and Fine-Tune the Model

Configure Parameters and Fine-Tune the Model

  • (Optional) Modify model_name_or_path and template in settings.jsonc to use another locally downloaded model.

  • Adjust per_device_train_batch_size and gradient_accumulation_steps to control VRAM usage.

  • Depending on the quantity and quality of your dataset, you can modify the following in train_sft_args to fine-tune performance:

    • num_train_epochs

    • lora_rank

    • lora_dropout

Single-GPU Training

Run the following command to start fine-tuning with a single GPU:

weclone-cli train-sft

If you're in a multi-GPU environment but want to use only one GPU, run this command first:

export CUDA_VISIBLE_DEVICES=0

Multi-GPU Training

  1. Uncomment the deepspeed line in settings.jsonc.

  2. Install Deepspeed:

uv pip install deepspeed
  1. Start multi-GPU training (replace number_of_gpus with the number of GPUs you want to use):

Run Web Demo for Inference

You can use this step to test appropriate temperature and top_p values, and then update the infer_args in settings.jsonc for future inference.

Run API Server for Inference

Test with Common Chat Scenarios

These test cases exclude any personal information inquiries, and focus on everyday conversations. Test results will be saved to test_result-my.txt.

Last updated