Move + Sound + Vision: Live Human-Machine Interplay [WIP]

00:00
00:00

Move + Sound + Vision: Live Human-Machine Interplay [WIP]

Daydream Scope

Explore new worlds with Daydream Scope

Check out the latest model drops and powerful integrations.

Download Now

Project presentation and overview is here - https://youtu.be/lK6hdnlTZLg

Concept & Background

I am currently developing an evolution of  "Sound + Vision" project, originally conceived for a live concert setting - https://app.daydream.live/creators/oceanradiostation/soundvision-experimental-sound-performance.

My goal is to refine this into a modular instrument for the modern performer (musician, DJ, dancer) that allows for total control over audio and image simultaneously. This project moves beyond standard audio-reactive visualizations. It empowers the artist to be the conductor of a synesthetic experience, proving that in the age of AI, the human element - our movement and musicality - remains the central driver of creativity.

The core idea is to create a bi-directional feedback loop where the performer's body and musical choices drive the generative reality in real-time.

Current Workflow & Mechanics

I am building a system of interdependent parameters between TouchDesigner and Ableton Live. Here is the current setup I am iterating on:

  • Curation as the human core: The human element isn't just movement; it is the intentional curation of the sonic palette and the specific prompt sequences. I am designing the sample banks (Ableton) and the prompt architecture (StreamDiffusion) to ensure the AI operates within a specific, cohesive aesthetic rather than generating random chaos.
  • Dual-purpose MIDI control: I am setting up the MIDI controller to perform two tasks simultaneously: triggering specific audio clips in Ableton and changing the prompt context within StreamDiffusion. This ensures the visual vibe shifts instantly with the musical arrangement.
  • MediaPipe integration: I am implementing MediaPipe as a virtual MIDI controller. The goal is for specific hand gestures to modulate audio effects (like filters or reverb) while simultaneously manipulating the visual input parameters fed into the diffusion engine.
  • Camera pre-processing: Optimizing the camera feed to ensure StreamDiffusion interprets the figure in the frame accurately, maintaining a recognizable link between the performer and the generation.

Current Challenges & Next Steps

My main focus right now is on optimization and mapping.

  • System Load: I am configuring the dependency grid to ensure the data flow is efficient and doesn't overload the CPU/GPU, allowing for real-time fluidity.
  • Gestural Mapping: I am experimenting with selecting the most natural gestures and mapping them to the right parameters. I want the connection between a hand movement, the resulting sound effect, and the visual distortion to feel intuitive and seamless.

Credits / Resources

Torin Blankensmith - essential tutorials on MediaPipe integration:

loopMIDI software:

Matthew Ragan - workflow for MIDI signals and controller mapping:

Daydream (Andrew & Lyall) - insights on preparing input for StreamDiffusion:

Midpoint update:

  • added feedback elements to get the effect of painting with hand
  • added pinch gesture for feedback reset
  • changed the gesture horizontal mapping for a more intuitive approach - now the hand gives the effect of washing the texture over
  • added a second camera processing unit based on POPs and a switcher to choose between two of them (for a different character of an output video)
  • tweaked noise parameters in the first camera processing unit
  • cleaned up the network - every processing unit is now packed into containers with previews of what is going on inside for a quicker understanding

Plans for the next week:

  • tighter integration with Ableton Live (choosing samples, choosing the gestures for an intuitive approach to sound design) and tweaking the audio-reactivity
  • smooth prompts transition

Final update:

Demo1
Demo2
  • additional MIDI controlling gestures added (left and right side of the screen logic)
  • integration with Ableton
  • audioreactivity tweaked
  • general adjustments of the network and annotations inside of the project

Project presentation and overview is here - https://youtu.be/lK6hdnlTZLg

Attachments
v8