Depth Sensor Diffusion

01/07/2026 - Project file is live!

01/08/2026 - v0_02_42 some fixes, notes, and QoL features

Full instruction guide and project insights: https://www.youtube.com/watch?v=5M5d21H43LM

Hello :) My name is Gordon, but if you recognize me it's probably as basement vibes. I'm participating in the TouchDesigner cohort of this first DayDream hackathon, underway now thru January 8th, 2026.

This project explores how StreamDiffusion could be the color texture in an explorable 3d scene. Using a Kinect v2 for Windows, TouchDesigner, and DayDream's integration of StreamDiffusionTD, I pair the RGB camera with a Depth Cloud to make a 3d interpolation of what the camera can view. The magic happens when I squish the RGB camera into a 1024x1024 square and transform it with StreamDiffusion-XL. When the RGB cam, depth and generative image stream are all uniformly scaled it becomes easy to swap between the real and reimagined in a typical 3d instancing setup, which does not have to conform to the square aspect ratio of a full SD-XL image. This would not be possible without some type of external processing, and I don't think I would have thought about this concept without an invitation to this program. It's already been exciting, rewarding, and I am very happy to be sharing this workflow with you all.

I only have a Kinect v2 for Windows, but this proof of concept should be just as repeatable with newer depth sensors, as long as they are compatible with TD and include normal color sensor. I'm curious, what other devices might work in this project?

Conclusion:

Time has been flying by, but this has been coming together just as fast. Here's what's new and improved since I last checked in:

- Xbox controller input for 3d camera/scene exploration. This can be blended with an audio reactive sequence of camera positions for additional movement and variance. This was way trickier than I expected, but very worth it for this and future projects.

Left Stick = strafe L/R + Forward/Back

Right Stick = Pan/Tilt (Look)

L/R Bumpers = Ascend/Decend

D-Pad L/R = Movement Speed/Look Sensitivity

D-Pad Up/Down = Zoom In/Out (FOV adj)

X / Button 3 = Fine Tune (0.1 input multiplier)

B / Button2 = reset all

- Demo Mode: Explore several examples of image generation in 3D without needing a Kinect. You can generate new images over the frozen Kinect demo, but you'll have to bypass the 'SD_demos' component from StreamDiffusionTD -> switch1 TOP/instance_texture.

- I added an audio reactive point cloud displacement to emphasize the 3d perspective and collect cool points. It's a very simple randomization of 2d shape size, orientation, and color. Shapes displace via a subtractive composite with the depth cloud. There's a strength control in the .tox menu, and button on the Ui to toggle it off and on.

- I put the point cloud in a 3d box. Currently it's demonstrated with with my Analog_TV .tox effect, and I like how it's cohesive, yet another distorted dimension with it's own aesthetic texture. Since I've been selling that for $6, I just swapped it out for something I've been giving out for a few years. It's a great place to inject your own 2d work, or simply use the StreamDiffusionTD output.

- I added audio-reactive prompt/step/seed progression logic. Prompts are pulled from a table row, while the instructions evolve to audio triggers. The logic is 1) New Prompt -> 2) Random Initial Step -> 3) Random Seed -> 4) Initial Step x2 ->5) Random Seed -> 6) Initial Step x3 -> 7) Random Seed ... or something along those lines. There's also the option to inject the RGB sensor into the 3d instancing during prompt transition. I include an option to skip x amount of triggers between new prompt/step/seed instructions, which can give more time to rest on each new instruction, even with faster firing audio triggers.

- .tox UI for the most immediate top level controls, and menu controls for parameters that are important, but would be visual clutter on the Ui.

- Speaking of Ui, I finally made a sync mode. The Kinect's Depth Cloud data is real-time, but the DayDream API returns frames later, and that can have variance based on internet connection and StreamDiffusion settings. I didn't want users to have to guess the right Cache TOP settings, and when the SD image is more abstract it's hard to align even when I know the in's and out's of the variables at play. It was very important to have easy control and troubleshooting ability to realign the 3d instancing and image generation with a single top-level control point. I think this works really well.

Sync Mode demonstration - Using the default count.mov to align StreamDiff output.

- discovered the Kinect TOP had an option to remap the RGB sensor. This could have saved me a day's worth of constantly manually stretching and offsetting the source image to line up with the depth information. Now I can just move the camera around and only have to worry about adjusting 3d cameras and Box SOP orientation.

- I've included many instructions, design descriptions, and some lingering thoughts throughout the TouchDesigner network.

I could keep thinking of features and tweaks to implement, but at this point I want to focus on how this initial release lands with the community. Please give it a try

< < < Mid-Point Update > > >

Now I have a working framework, a clear direction, but a week with many more distractions than the previous. I did visuals for a bunch of bands in Minneapolis on New Year's Eve. Rather than stress about VJing on top of this project I took my laptop and Kinect to this local showcase to get more experience with the capabilities and limitations to working with one of the older depth sensors available to consumers. It gave me the chance to work closely with the 3d side of this system. I got to stress test perspectives, figure out how to place the 3d instancing inside a 3d Box for a more dynamic shift of perspectives, as well as use an array of audio sources to inject new audio control points. It was too scrappy of a situation to use StreamDiffusion, and I wanted to just hang out some of the time, so it was a little off-track yet a good side quest in the end.

The cheap projector screen fell down during the first band 🙂🙃🙂🙃🙂

My project is mostly in place, but needs a lot of attention to detail, cleaning up and custom controls. A rough glimpse published on my YouTube: https://youtu.be/Ot9hXy2iATQ

Much of this time was spent figuring out how to make my generic Xbox controller explore 3d space like a drone or intuitive spectator view. I didn't know how much I've taken for granted that pressing forward in Z space gets messed up the second Y rotations are applied.

< < < Initial Project Post > > >

Week 1:

Dear Dev Diary,

I am exploring Kinect v2 depth sensor data, and injecting DayDream's cloud StreamDiffusion processing as an reinterpretation of the RGB sensor in this TouchDesigner instancing render pipeline. I have to say, this is kinda cool for only a few hours of work, but it's soooo not what I was trying to do these last two weeks. Anyone have success turning MediaPipe pose info into individual 3d segments that orient and scale to the pose channels? I thought I was so close.

I’m sorry in advance, but the only thing I can write about right now is me vs me. I'll get into the guts of designing this Kinect system as it gets more interesting, but I'm coming in hot off of some fresh frustration and I just need to vent. If I'm here to introduce myself in any way, it's important to understand why I would be interested in DayDream after not using StreamDiffusion for over a year, and how it could still play into my design philosophy and ethos.

I really thought I could piece together this ambitious experience that was not just generative in rendered output, but something closer to a game that we could play together in a live stream. I know it’s possible, so that obsession won’t die easily, but yesterday I had to admit defeat. TouchDesigner has ruined walking away from anything I think is possible, because I’ve had too many ideas that eventually existed. The thrill of pounding my head against a brand new wall has been more personally rewarding than any of my past solo creative pursuits, whether I break thru to the other side or not.

There’s not much point in describing what I originally set out to do if I didn’t get far enough to bother pressing record. Trying, failing, rebuilding twice, yet I'm still thinking about solutions. New tech opens next level workflows, but this hackathon puts me at a crossroads between what’s possible and what is realistic with time constraints. It’s not like I ran out of approaches to get me into the realm of “hey this is possible.” It’s that the best case scenario has me problem solving rotations, translations, and scale for at least another week without even faking the “here’s this tech you never imagined possible.”

This was the last project I used StreamDiffusion for. There was this moment that I realized that I was more excited about the Ui and system design than the output, and have been waiting for the right reason to use this tech ever since.

The thing with generative imagery is that I can’t help but feel that a single image output as the final delivery is never enough. That's just me, but it's also the difference between having it be part of my next ambition or not. I've played with this stuff enough to know it can be amazing, but with a novelty that wears off or is off-putting to my local community. The rest of my creative pursuits don't stir up such things in myself or others so viscerally. Despite the broader conversations at play, this sort of tech can still find a place in my creative journey. I make complex systems that I get to experience uniquely with others in the same time and space. Clearly that's something that excites you about generative images, but I only want to know what they can uniquely do as dynamic parts of a broader system. I really thought I knew how to check that box, making the best of all of that and growing into some new skills along the way. Ha! Grrr. Hmmmm. Next time. Moving on!

Thankfully, I was very inspired by Andrew’s off-the-cuff workshop demo yesterday. Manipulating point clouds and transforming them with DayDream was simple and effective. The exact opposite of where my head has been. All I can present now is this one blast of crisis-turned-hyper focus, and I have to say that the immediacy of input being transformed to an unexpected output is a relief. Have simple idea, do it, see results. I highly recommend!

With a framework in place it's time to explore how this project can be more interactive, interesting, and aligned with what inspired me to join the hackathon in the first place. Will that involve a live stream that reinterprets chat as evolving sets of prompts? How else may this be part of larger system or a shared moment?

I love to share knowledge and get very excited about process, so my next post will have an actual project file that demonstrates this technique. That said, you'll need a little knowledge about the quirks and pitfalls, as well as some kind of depth sensor to truly get similar output. We'll get into all of that very soon!

Attachments

v15

Depth Sensor Diffusion

Explore new worlds with Daydream Scope

Tags