Check out the latest model drops and powerful integrations.
I found a fairly abstract depth map video on Civit (from https://civitai.com/user/Synthesense), generated a number of prompts with ChatGPT, and created this real time visuals generator. The output feels like a good starting point for live music visuals.
This project uses:
* LongLive (can run on a RTX 4090)
* VACE (for depth map input control)
Future improvements:
* An easy improvement is using reference images. It can provide better controls than text prompt.
* It would be cool to make this audio-reactive, scheduling the prompt updates based on beats.
* It would be nice to chain this with a upscaling model like FlashVSR