大模型微调-使用Low-rank低秩适应快速微调Diffusion扩散模型-附项目源码-LORA-优质项目实战.zip

共58个文件

py：17个

jpg：13个

safetensors：10个

版权申诉

51 浏览量 2024-05-16 16:55:48 上传评论收藏 90.2MB ZIP 举报

在当前的AI领域，大模型微调已经成为提升预训练模型性能的重要手段。本文将深入探讨如何利用Low-rank低秩适应技术对Diffusion扩散模型进行快速微调，以实现更高效的模型优化。在这个名为“大模型微调-使用Low-rank低秩适应快速微调Diffusion扩散模型-附项目源码-LORA-优质项目实战”的压缩包中，包含了一个实战项目，旨在帮助开发者更好地理解和应用这项技术。让我们理解大模型微调的概念。大模型通常是指预训练的深度学习模型，如BERT、GPT等，这些模型已经在大规模语料库上进行了训练，具有强大的语言理解和生成能力。然而，为了适应特定任务或领域的需要，通常需要对这些大模型进行微调，即在原有模型基础上，针对新的数据集进行少量的额外训练，以改善模型在新任务上的性能。 Low-rank低秩适应是一种在微调过程中降低计算复杂度和内存消耗的有效方法。在Diffusion扩散模型中，低秩思想可以用于简化权重矩阵，通过保留主要的特征模式，丢弃次要信息，从而实现快速微调。Diffusion模型是一种用于生成高质图像的模型，它通过逆向过程逐步消除噪声，恢复清晰图像。低秩适应可以加速这个过程，减少计算资源需求，同时保持模型的泛化能力。 LORA（Low-Rank Adaptation）是这种技术的具体实现，它通过分解模型的权重矩阵为一个低秩矩阵加上一个稀疏矩阵，使得模型的更新更加高效。在微调过程中，LORA只更新稀疏部分，降低了计算复杂度，同时保留了原始模型的关键特性。在这个项目实战中，开发者可以找到相关的源代码，了解如何将LORA应用于Diffusion模型的微调。源代码将展示如何设置参数、构建低秩结构、以及如何在实际数据集上进行训练和评估。这不仅有助于理解理论概念，还能提供实践经验，帮助开发者将这项技术应用到自己的项目中。总结来说，大模型微调结合Low-rank低秩适应，特别是LORA技术，为Diffusion扩散模型的优化提供了一条有效路径。通过实践项目中的源代码，我们可以学习如何在有限的计算资源下，快速地微调模型以适应特定任务，从而提高模型在新环境下的性能。这个实战项目对于任何对AI模型优化感兴趣的开发者来说，都是一份宝贵的参考资料。

资源推荐

资源详情

资源评论

收起资源包目录

大模型微调_使用Low-rank低秩适应快速微调Diffusion扩散模型_附项目源码_LORA_优质项目实战.zip （58个子文件）

大模型微调_使用Low-rank低秩适应快速微调Diffusion扩散模型_附项目源码_LORA_优质项目实战

example_loras

analog_svd_rank4.safetensors 2.97MB

lora_illust.safetensors 5.92MB

lora_disney.safetensors 1.51MB

analog_svd_rank8.safetensors 5.91MB

lora_krk_inpainting.safetensors 5.92MB

lora_popart.safetensors 5.92MB

concat_disney_krk.safetensors 17.68MB

and.safetensors 11.8MB

modern_disney_svd.safetensors 5.91MB

lora_krk.safetensors 5.92MB

setup.py 848B

contents

lora_pti_inpainting.jpg 32KB

lora_with_clip_4x4_training_progress.jpg 542KB

lion_illust.jpg 23KB

inpainting_mask.png 3KB

lora_with_clip_and_illust.jpg 30KB

lora_pti_inpainting_example.jpg 371KB

lora_some_tweaks.jpg 63KB

pop_art.jpg 59KB

comp_scale_clip_unet.jpg 177KB

alpha_scale.gif 4.99MB

lora_pti_example.jpg 445KB

inpainting_base_image.png 619KB

disney_lora.jpg 29KB

lora_just_unet.jpg 31KB

mixing_schedule.png 733KB

lora_just_text_encoder.jpg 25KB

horse.jpg 255KB

lora_diffusion

utils.py 6KB

__init__.py 124B

xformers_utils.py 2KB

safe_open.py 2KB

cli_svd.py 4KB

lora.py 33KB

dataset.py 10KB

preprocess_files.py 10KB

cli_lora_pti.py 33KB

cli_lora_add.py 6KB

to_ckpt_v2.py 8KB

lora_manager.py 4KB

cli_pt_to_safetensors.py 2KB

requirements.txt 109B

training_scripts

run_lora_db_w_text.sh 680B

train_lora_w_ti.py 42KB

multivector_example.sh 1021B

run_lora_db_unet_only.sh 589B

use_face_conditioning_example.sh 1KB

inpainting_example.sh 1KB

train_lora_pt_caption.py 37KB

run_lorpt.sh 929B

train_lora_dreambooth.py 35KB

README.md 14KB

scripts

make_alpha_gifs.ipynb 1.37MB

run_inference.ipynb 10.41MB

run_inpainting_inference.ipynb 8.36MB

lora_training_process_visualized.ipynb 1.86MB

run_img2img.ipynb 2.5MB

merge_lora_with_lora.ipynb 1.85MB

# Low-rank Adaptation for Fast Text-to-Image Diffusion Fine-tuning  <img src="contents/alpha_scale.gif">  > Using LoRA to fine tune on illustration dataset : $W = W_0 + \alpha \Delta W$, where $\alpha$ is the merging ratio. Above gif is scaling alpha from 0 to 1. Setting alpha to 0 is same as using the original model, and setting alpha to 1 is same as using the fully fine-tuned model.  <img src="contents/lora_pti_example.jpg">  > SD 1.5 PTI on Kiriko, the game character, Various Prompts.  <img src="contents/disney_lora.jpg">  > `"baby lion in style of <s1><s2>"`, with disney-style LoRA model.  <img src="contents/pop_art.jpg">  > `"superman, style of <s1><s2>"`, with pop-art style LoRA model. ## Main Features - Fine-tune Stable diffusion models twice as fast than dreambooth method, by Low-rank Adaptation - Get insanely small end result (1MB ~ 6MB), easy to share and download. - Compatible with `diffusers` - Support for inpainting - Sometimes _even better performance_ than full fine-tuning (but left as future work for extensive comparisons) - Merge checkpoints + Build recipes by merging LoRAs together - Pipeline to fine-tune CLIP + Unet + token to gain better results. - Out-of-the box multi-vector pivotal tuning inversion # Lengthy Introduction Thanks to the generous work of Stability AI and Huggingface, so many people have enjoyed fine-tuning stable diffusion models to fit their needs and generate higher fidelity images. **However, the fine-tuning process is very slow, and it is not easy to find a good balance between the number of steps and the quality of the results.** Also, the final results (fully fined-tuned model) is very large. Some people instead works with textual-inversion as an alternative for this. But clearly this is suboptimal: textual inversion only creates a small word-embedding, and the final image is not as good as a fully fine-tuned model. Well, what's the alternative? In the domain of LLM, researchers have developed Efficient fine-tuning methods. LoRA, especially, tackles the very problem the community currently has: end users with Open-sourced stable-diffusion model want to try various other fine-tuned model that is created by the community, but the model is too large to download and use. LoRA instead attempts to fine-tune the "residual" of the model instead of the entire model: i.e., train the $\Delta W$ instead of $W$. $$ W' = W + \Delta W $$ Where we can further decompose $\Delta W$ into low-rank matrices : $\Delta W = A B^T $, where $A, \in \mathbb{R}^{n \times d}, B \in \mathbb{R}^{m \times d}, d << n$. This is the key idea of LoRA. We can then fine-tune $A$ and $B$ instead of $W$. In the end, you get an insanely small model as $A$ and $B$ are much smaller than $W$. Also, not all of the parameters need tuning: they found that often, $Q, K, V, O$ (i.e., attention layer) of the transformer model is enough to tune. (This is also the reason why the end result is so small). This repo will follow the same idea. Now, how would we actually use this to update diffusion model? First, we will use Stable-diffusion from [stability-ai](https://stability.ai/). Their model is nicely ported through Huggingface API, so this repo has built various fine-tuning methods around them. In detail, there are three subtle but important distictions in methods to make this work out. 1. [Dreambooth](https://arxiv.org/abs/2208.12242) First, there is LoRA applied to Dreambooth. The idea is to use prior-preservation class images to regularize the training process, and use low-occuring tokens. This will keep the model's generalization capability while keeping high fidelity. If you turn off prior preservation, and train text encoder embedding as well, it will become naive fine tuning. 2. [Textual Inversion](https://arxiv.org/abs/2208.01618) Second, there is Textual inversion. There is no room to apply LoRA here, but it is worth mentioning. The idea is to instantiate new token, and learn the token embedding via gradient descent. This is a very powerful method, and it is worth trying out if your use case is not focused on fidelity but rather on inverting conceptual ideas. 3. [Pivotal Tuning](https://arxiv.org/abs/2106.05744) Last method (although originally proposed for GANs) takes the best of both worlds to further benefit. When combined together, this can be implemented as a strict generalization of both methods. Simply you apply textual inversion to get a matching token embedding. Then, you use the token embedding + prior-preserving class image to fine-tune the model. This two-fold nature make this strict generalization of both methods. Enough of the lengthy introduction, let's get to the code. # Installation ```bash pip install git+https://github.com/cloneofsimo/lora.git ``` # Getting Started ## 1. Fine-tuning Stable diffusion with LoRA CLI If you have over 12 GB of memory, it is recommended to use Pivotal Tuning Inversion CLI provided with lora implementation. They have the best performance, and will be updated many times in the future as well. These are the parameters that worked for various dataset. _ALL OF THE EXAMPLE ABOVE WERE TRAINED WITH BELOW PARAMETERS_ ```bash export MODEL_NAME="runwayml/stable-diffusion-v1-5" export INSTANCE_DIR="./data/data_disney" export OUTPUT_DIR="./exps/output_dsn" lora_pti \ --pretrained_model_name_or_path=$MODEL_NAME \ --instance_data_dir=$INSTANCE_DIR \ --output_dir=$OUTPUT_DIR \ --train_text_encoder \ --resolution=512 \ --train_batch_size=1 \ --gradient_accumulation_steps=4 \ --scale_lr \ --learning_rate_unet=1e-4 \ --learning_rate_text=1e-5 \ --learning_rate_ti=5e-4 \ --color_jitter \ --lr_scheduler="linear" \ --lr_warmup_steps=0 \ --placeholder_tokens="<s1>|<s2>" \ --use_template="style"\ --save_steps=100 \ --max_train_steps_ti=1000 \ --max_train_steps_tuning=1000 \ --perform_inversion=True \ --clip_ti_decay \ --weight_decay_ti=0.000 \ --weight_decay_lora=0.001\ --continue_inversion \ --continue_inversion_lr=1e-4 \ --device="cuda:0" \ --lora_rank=1 \ # --use_face_segmentation_condition\ ``` [Check here to see what these parameters mean](https://github.com/cloneofsimo/lora/discussions/121). ## 2. Other Options Basic usage is as follows: prepare sets of $A, B$ matrices in an unet model, and fine-tune them. ```python from lora_diffusion import inject_trainable_lora, extract_lora_ups_down ... unet = UNet2DConditionModel.from_pretrained( pretrained_model_name_or_path, subfolder="unet", ) unet.requires_grad_(False) unet_lora_params, train_names = inject_trainable_lora(unet) # This will # turn off all of the gradients of unet, except for the trainable LoRA params. optimizer = optim.Adam( itertools.chain(*unet_lora_params, text_encoder.parameters()), lr=1e-4 ) ``` Another example of this, applied on [Dreambooth](https://arxiv.org/abs/2208.12242) can be found in `training_scripts/train_lora_dreambooth.py`. Run this example with ```bash training_scripts/run_lora_db.sh ``` Another dreambooth example, with text_encoder training on can be run with: ```bash training_scripts/run_lora_db_w_text.sh ``` ## Loading, merging, and interpolating trained LORAs with CLIs. We've seen that people have been merging different checkpoints with different ratios, and this seems to be very useful to the community. LoRA is extremely easy to merge. By the nature of LoRA, one can interpolate between different fine-tuned models by adding different $A, B$ matrices. Currently, LoRA cli has three options : merge full model with LoRA, merge LoRA with LoRA, or merge full model with LoRA and changes to `ckpt` format (original format) ``` SYNOPSIS lora_add PATH_1 PATH_2 OUTPUT_PATH <flags> POSITIONAL ARGUME

评论收藏

内容反馈

版权申诉