VACE + DepthMap experiment. I found a fairly abstract depth map video on Civit (from https://civitai.com/user/Synthesense), and generated a number of prompts with ChatGPT. The output feels like a good starting point for music visuals.
Future improvements:
* It would be cool to make this audio-reactive, scheduling the prompt updates based on beats.
* It would be nice to chain this with a upscaling model like FlashVSR