burn me while i'm hot

00:00
00:00

burn me while i'm hot

Daydream Scope

Explore new worlds with Daydream Scope

Check out the latest model drops and powerful integrations.

Download Now

 π“†©β€οΈβ€πŸ”₯π“†ͺ Burn me while i'm hot 𓆩❀️‍πŸ”₯π“†ͺ

Why do you want to get burned?

live video synth. live video samizdat.

encrypt data, or encrypt perception.

video is usually the opposite of private. it is one of the most exposed mediums we have, the easiest to duplicate, the easiest to circulate, the hardest to contain. anyone can record you, repost you, archive you, watch you without context, without consent, without you ever knowing.

the question is not just how to keep something yours anymore. the question is what kind of machine lets you exist in public without being completely legible.

maybe you don't need to disappear.but you do need to become harder to read.

not privacy through absence, but privacy through interpretation, where cryptography can live inside a medium that already moves through culture on its own. instead of pushing everything down into invisible layers of math and code, the surface itself starts to carry the work. what you see becomes part of how meaning is shaped.

layered, shifting, slightly off, suspicious in that CYOA escaping through the mirrors of victorian halls at midnight kind of way.

and now its not just an mp4.

it becomes an mp4p.

an mp4-privy.

burn me while i’m hot does not pull meaning away from view. it lets it sit right in front of everyone, but in a form that refuses to settle into something readable on its own. the content moves freely. the meaning does not. the synth is no longer just decoration. it burns itself into the video.so the question shifts.not how do you regain the keys to agentic privacy.but how do you burn yourself into the surface in a way that stays yours.

why live video?live video is computationally much more demandingly random. real-time synthesis introduces variation at the level of frame timing, internal state, and render order. those differences are small, but they are stable enough to matter cryptographically and unstable enough to resist replay.

here, the frames of the live synth are inputs to the encryption process. each burn frame is part of the key schedule for the visual payloads. the exact bytes of the frame are hashed and folded into per-frame key material. that means the encryption context is bound not only to abstract settings like prompt or seed, but to the concrete result of a specific render that happened at a specific moment.

live video provides a time-bound, non-replayable visual source that the encryption pipeline can depend on, making decryption inseparable from the exact circumstances in which the burn was created.

how it works?

  1. upload a video where you are clearly visible in frame
  2. sam3 runs on the footage and builds a frame by frame mask of you
  3. pick a y2k style and prompt that works with a custom 1.3b y2k lora, set to visually transform with loud colours, patterns, and overlays
  4. hit burn to create the synth output and bind the masked pixels through per frame visual cipher keying
  5. export the mp4p container together with the separate key file used for decryption later
  6. load the mp4p into the player to view the burned version of the video
  7. import the key file to decrypt the visual cipher and reconstruct the masked areas

how encryption and decryption happen

encryption starts with a secret input, keyMaterial, and a fixed set of public visual-cipher metadata: prompt, stable params, seed, and mask configuration. from these, the system derives a base_key by hashing keyMaterial, the prompt, the params, and the seed with sha256.

the burn video is rendered and its frames become cryptographic input.for each frame, the exact frame bytes are hashed to produce a frame_hash. this hash and the frame index are combined with the base key using hmac-sha256 to derive a unique frame_key.

from each frame_key, a keystream is generated by running hmac-sha256 with incrementing counters and truncating the output to the full frame rgb length (height Γ— width Γ— 3). this keystream is xor-ed with the original video’s rgb pixels inside the mask. pixels outside the mask are zeroed. the result is encrypted visual noise tied to that specific burn frame.

each encrypted frame is stored as a png with rgba channels, where rgb contains the encrypted pixels and alpha contains the mask. these frames are base64-encoded and saved in the mp4p file as encryptedMaskFrames, with a maskFrameIndexMap linking each payload to its burn frame.

in parallel, the original and synthed videos are encrypted with aes-gcm using a key derived from keyMaterial via pbkdf2, with salt, iv, and authtag stored in mp4p metadata.

decryption requires the mp4p file, the key file, and the exact burn frames. the system re-derives the same base_key, recomputes each frame_key, regenerates the keystream, and xor-s it with the stored payloads to recover the masked pixels. these pixels are composited back onto the burn frames using the stored mask and mask mode.

if the burn frames and key material match, the masked regions resolve.if either differs, the keystream diverges and the output stays noise.

some mp4p specifics

the format is a cryptographic container that holds encrypted media streams, visual-cipher payloads, and the metadata needed to recompute them. the original video exists inside it as aes-gcm encrypted bytes.it is not extractable without key material, and it also exists asper-frame visual payloads that are mathematically bound to a specific burn video and a separate key file.

without that exact burn context and the matching key material, there is nothing meaningful to decode, no partial frames to recover, no underlying stream to dump. mp4p stores encrypted video bytes plus the conditions under which a video can be reconstructed, and outside of those conditions the format resolves only to cipher text and synchronised noise.

improvements

scale + speed. scale + speeeeeedd.

  1. handle longer videos
  2. better sam3 masking and real time not pre-processed
  3. the current 1.3b custom y2k lora works, but it loses a lot of fine detail compared to its 14b trained counterpart

all of this ultimately comes down to memory, ram, gpus, machines, machines, machines etc....

why y2k?

because as everything gets endshitified, y2k is the style that still remembers web1, when things felt possible and not drowned in polarising bot slop (nothing against bots, everything against their overlords)

 

----

privacy is a right. most people don't care to get that.

why wait. why ask. why depend.

reshape how you come into focus. edit your mediums. 

github repo, all burn me while i'm hot code is cc0: https://github.com/emmajane1313/burnme