# Low-rank Adaptation for Fast Text-to-Image Diffusion Fine-tuning
<!-- #region -->
<p align="center">
<img src="contents/alpha_scale.gif">
</p>
<!-- #endregion -->
> Using LoRA to fine tune on illustration dataset : $W = W_0 + \alpha \Delta W$, where $\alpha$ is the merging ratio. Above gif is scaling alpha from 0 to 1. Setting alpha to 0 is same as using the original model, and setting alpha to 1 is same as using the fully fine-tuned model.
<!-- #region -->
<p align="center">
<img src="contents/lora_pti_example.jpg">
</p>
<!-- #endregion -->
> SD 1.5 PTI on Kiriko, the game character, Various Prompts.
<!-- #region -->
<p align="center">
<img src="contents/disney_lora.jpg">
</p>
<!-- #endregion -->
> `"baby lion in style of <s1><s2>"`, with disney-style LoRA model.
<!-- #region -->
<p align="center">
<img src="contents/pop_art.jpg">
</p>
<!-- #endregion -->
> `"superman, style of <s1><s2>"`, with pop-art style LoRA model.
## Main Features
- Fine-tune Stable diffusion models twice as fast than dreambooth method, by Low-rank Adaptation
- Get insanely small end result (1MB ~ 6MB), easy to share and download.
- Compatible with `diffusers`
- Support for inpainting
- Sometimes _even better performance_ than full fine-tuning (but left as future work for extensive comparisons)
- Merge checkpoints + Build recipes by merging LoRAs together
- Pipeline to fine-tune CLIP + Unet + token to gain better results.
- Out-of-the box multi-vector pivotal tuning inversion
# Web Demo
- Integrated into [Huggingface Spaces ��](https://huggingface.co/spaces) using [Gradio](https://github.com/gradio-app/gradio). Try out the Web Demo [](https://huggingface.co/spaces/lora-library/LoRA-DreamBooth-Training-UI)
- Easy [colab running example](https://colab.research.google.com/drive/1iSFDpRBKEWr2HLlz243rbym3J2X95kcy?usp=sharing) of Dreambooth by @pedrogengo
# UPDATES & Notes
### 2023/02/06
- Support for training inpainting on LoRA PTI. Use flag `--train-inpainting` with a inpainting stable diffusion base model (see `inpainting_example.sh`).
### 2023/02/01
- LoRA Joining is now available with `--mode=ljl` flag. Only three parameters are required : `path_to_lora1`, `path_to_lora2`, and `path_to_save`.
### 2023/01/29
- Dataset pipelines
- LoRA Applied to Resnet as well, use `--use_extended_lora` to use it.
- SVD distillation now supports resnet-lora as well.
- Compvis format Conversion script now works with safetensors, and will for PTI it will return Textual inversion format as well, so you can use it in embeddings folder.
- �コ�コ, LoRA is now officially integrated into the amazing Huggingface �� `diffusers` library! Check out the [Blog](https://huggingface.co/blog/lora) and [examples](https://github.com/huggingface/diffusers/tree/main/examples/text_to_image#training-with-lora)! (NOTE : It is CURRENTLY DIFFERENT FILE FORMAT)
### 2023/01/09
- Pivotal Tuning Inversion with extended latent
- Better textual inversion with Norm prior
- Mask conditioned score estimation loss
- safetensor support, xformers support (thanks to @[hafriedlander](https://github.com/hafriedlander))
- Distill fully trained model to LoRA with SVD distillation CLI
- Flexiable dataset support
### 2022/12/22
- Pivotal Tuning now available with `run_lorpt.sh`
- More Utilities added, such as datasets, `patch_pipe` function to patch CLIP, Unet, Token all at once.
- Adjustable Ranks, Fine-tuning Feed-forward layers.
- More example notebooks added.
### 2022/12/10
- **You can now fine-tune text_encoder as well! Enabled with simple `--train_text_encoder`**
- **Converting to CKPT format for A1111's repo consumption!** (Thanks to [jachiam](https://github.com/jachiam)'s conversion script)
- Img2Img Examples added.
- Please use large learning rate! Around 1e-4 worked well for me, but certainly not around 1e-6 which will not be able to learn anything.
# Lengthy Introduction
Thanks to the generous work of Stability AI and Huggingface, so many people have enjoyed fine-tuning stable diffusion models to fit their needs and generate higher fidelity images. **However, the fine-tuning process is very slow, and it is not easy to find a good balance between the number of steps and the quality of the results.**
Also, the final results (fully fined-tuned model) is very large. Some people instead works with textual-inversion as an alternative for this. But clearly this is suboptimal: textual inversion only creates a small word-embedding, and the final image is not as good as a fully fine-tuned model.
Well, what's the alternative? In the domain of LLM, researchers have developed Efficient fine-tuning methods. LoRA, especially, tackles the very problem the community currently has: end users with Open-sourced stable-diffusion model want to try various other fine-tuned model that is created by the community, but the model is too large to download and use. LoRA instead attempts to fine-tune the "residual" of the model instead of the entire model: i.e., train the $\Delta W$ instead of $W$.
$$
W' = W + \Delta W
$$
Where we can further decompose $\Delta W$ into low-rank matrices : $\Delta W = A B^T $, where $A, \in \mathbb{R}^{n \times d}, B \in \mathbb{R}^{m \times d}, d << n$.
This is the key idea of LoRA. We can then fine-tune $A$ and $B$ instead of $W$. In the end, you get an insanely small model as $A$ and $B$ are much smaller than $W$.
Also, not all of the parameters need tuning: they found that often, $Q, K, V, O$ (i.e., attention layer) of the transformer model is enough to tune. (This is also the reason why the end result is so small). This repo will follow the same idea.
Now, how would we actually use this to update diffusion model? First, we will use Stable-diffusion from [stability-ai](https://stability.ai/). Their model is nicely ported through Huggingface API, so this repo has built various fine-tuning methods around them. In detail, there are three subtle but important distictions in methods to make this work out.
1. [Dreambooth](https://arxiv.org/abs/2208.12242)
First, there is LoRA applied to Dreambooth. The idea is to use prior-preservation class images to regularize the training process, and use low-occuring tokens. This will keep the model's generalization capability while keeping high fidelity. If you turn off prior preservation, and train text encoder embedding as well, it will become naive fine tuning.
2. [Textual Inversion](https://arxiv.org/abs/2208.01618)
Second, there is Textual inversion. There is no room to apply LoRA here, but it is worth mentioning. The idea is to instantiate new token, and learn the token embedding via gradient descent. This is a very powerful method, and it is worth trying out if your use case is not focused on fidelity but rather on inverting conceptual ideas.
3. [Pivotal Tuning](https://arxiv.org/abs/2106.05744)
Last method (although originally proposed for GANs) takes the best of both worlds to further benefit. When combined together, this can be implemented as a strict generalization of both methods.
Simply you apply textual inversion to get a matching token embedding. Then, you use the token embedding + prior-preserving class image to fine-tune the model. This two-fold nature make this strict generalization of both methods.
Enough of the lengthy introduction, let's get to the code.
# Installation
```bash
pip install git+https://github.com/cloneofsimo/lora.git
```
# Getting Started
## 1. Fine-tuning Stable diffusion with LoRA CLI
If you have over 12 GB of memory, it is recommended to use Pivotal Tuning Inversion CLI provided with lora implementation. They have the best performance, and will be updated many times in the future as well. These are the parameters that worked for various dataset. _ALL OF THE EXAMPLE ABOVE WERE TRAINED WITH BELOW PARAMETERS_
```bash
export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export INSTANCE_DIR="./data/data_disn

竹山全栈
- 粉丝: 2448
- 资源: 261
最新资源
- 西门子S7-200模拟器bet2.5e版本:支持多种通讯与编程功能的测试平台.pdf
- 西门子1200与1500PLC通用PID调节仿真程序:实物操作与视频解说指南.pdf
- 通过MatlabSimulink仿真平台对IEEE10节点配电系统中风机和光伏接入位置及容量对电压分布与波动影响的探究(包含word说明文档).pdf
- PLC工业超滤净水控制系统(西门子SMART&海为云触摸屏版).pdf
- COMSOL模型中的锂离子电池热管理:电化学热耦合模型与风冷换热、相变换热.pdf
- “步行机器人PID控制MATLAB仿真程序:基于拉格朗日力学罗盘步态模型的应用”.pdf
- BD快快(自测).zip
- 基于Matlab中Cplex的5节点电力市场出清程序:分有阻塞和无阻塞情况.pdf
- 基于RS485通讯的恒压供水一拖二项目案例:西门子S7-200SMART_PLC+smart700触摸屏与ABB变频器ModbusRTU通讯的实现及动作说明.pdf
- 基于TMS320F28335的DSP主控芯片的Matlab Simulink嵌入式模型及其永磁同步电机电压开环控制.pdf
- 基于多目标算法的冷热电联供系统优化模型:含燃气轮机等机组的运行优化.pdf
- 基于麻雀搜索算法优化的BP神经网络预测及Matlab程序实现——预测精度显著提升.pdf
- 基于双向DCDC变换器的储能蓄电池SOC均衡控制:引入加速因子k提高均衡速度及拓展均衡组数.pdf
- STM32电机库5.4开源注释:KEIL工程文件中的寄存器设置与电机控制算法详解.pdf
- 基于Matlab的火电机组深度调峰模型编写与优化,采用直流潮流、功率平衡及爬坡约束(适用于IEEE30和39节点系统).pdf
- 基于二阶锥规划的主动配电网动态重构研究:单时段与多时段优化策略.pdf
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈


