Training realtime video LoRAs for fun and profit

00:00
00:00

Training realtime video LoRAs for fun and profit

Daydream Scope

Explore new worlds with Daydream Scope

Check out the latest model drops and powerful integrations.

Download Now

I recently found this really cool LoRA that somebody had trained on top of the Qwen image model to give a nice pixel art aesthetic to the images it generated and wanted to see if I could apply the same effect to videos.

What are LoRAs?

If we think of a generative image / video model as a giant set of weights then a LoRA is just a minor adjustment of those weights that nudges the model in whichever direction we want.

Instead of retraining a massive, multi-billion parameter video model from scratch just to teach it a new aesthetic, a LoRA acts like a lightweight, modular patch. It freezes the original base model and trains a tiny side-network of new weights that piggyback on top.

For video generation workflows, this means you can snap a small, megabyte-sized LoRA file into a massive base model to strictly guide its output. It allows you to reliably render custom characters, specific cinematic lighting, or even distinct camera movements, all without needing a dedicated server farm to process the changes.

Data, data, data

Ryan recently released a great framework for generating Wan 2.1 Video LoRAs from synthetic data. What this means is that rather than going through the pain of searching for, curating and polishing a large data set, we can just:

1. Find an image model from somewhere like Civit.ai that gives us an interesting effect, which in this case was my pixel art model

Cosy

Cosy

2. Generate a set of prompts related to the effect

Loading...

3. Generate a set of images based on those prompts

4. Generate a set of short videos based on those images

5. Train your LoRA from a combination of the generated images and videos

6. Profit!

What surprised me most was that you need very, very little data to get a good effect. Ryan's suggested configuration is:

  • 30 images at 768x768
  • 20 videos at 640x640, 33 frames
  • 64 epochs total training (two runs of 32 epochs)

Did it work?

Surprisingly, yes! It came out exactly as I wanted, really nailing that Lucasarts-era pixel art vibe.

The legend of Monkey Thomland

The legend of Monkey Thomland

I ran the training on a single Nvidia 5090 GPU and the training (including all of the synthetic data generation) took just a few hours.

The plugin is ready to try in Scope and can be downloaded from Civit.ai here

Check out this demo of me trying it out with the longlive video model and changing the strength of the LoRA in realtime using Scope's remote inference capability since I don't have a local GPU.