Interactive audiovisual insanity, directed in real-time by a motion-captured performer
I have always loved the chaos of public-access television. Amateur camera work, rough greenscreen abuse, unscreened phone calls, a general sense of overwhelm and confusion.
Let’s Paint TV is as close to perfect as the medium gets:
Platforms like Twitch and YouTube are clearly the spiritual successors to public access television, but their content doesn’t scratch the same itch for me. It’s all a little too polished. Public access television was distinctly wack.
What if I used modern techniques to recreate the chaotic energy of live public access television?
Motion captured performance. Wacky visual effects. Synchronized to music. Controlled in real-time by the performer. Streamed live on the internet.
Initial version:
Upgraded version:
I already had a basic livestreaming motion capture system up and running, which I could use as a starting point. My homebuilt motion capture suit animates a 3D character in real-time, the character is composited over a pre-rendered background, and the output can be recorded or streamed onto the internet.
I needed to add the following functionality:
My current solution only offered a single camera with a stationary front view of the character. The looping background scene is pre-rendered using the same camera as the character to keep everything spatially consistent.
If I wanted to create dynamic content I would need multiple camera positions to choose from. The following camera positions should be sufficient:
I anticipated wanting sweeping/panning camera moves between these positions, in which case only a single camera would be needed in Blender and I could update its position and rotation to transition between these “standard” viewing angles. For now I decided to keep things simple and switch between different static cameras:
I pre-rendered a looping background scene from each of these new camera positions:
By matching the Blender viewport camera to the appropriate background render I can dynamically change the scene in a convincing way:
This was where things got fun. I’m a huge fan of datamoshing, analog video artifacts, video feedback, digital compression artifacts, and psychedelic art. I wanted to build a system which would allow me to play with these types of effects.
I had three video streams: a 3D character, a background, and a composite of the two. Each of these layers is suited to different type of effects: the character could experience glitch effects (ala Max Headroom), the background could cycle through colors in a psychedelic fashion, and the final composite might have effects applied to it to simulate various media (for example, a green monochrome CRT display, or a scratchy 16mm video).
What I needed was a system which could work with these various video streams independently, offering the flexibility to toggle and combine various effects on-demand in a performant way. This is the problem that all VJs face, so I started by surveying the software they use.
Although each of these showed potential, none of these options were general enough to meet my needs. I knew I wanted the performer to control the effects (and the camera) in real-time using a controller of some kind. Presumably this would require a system to handle these commands, routing them where appropriate, all while keeping track of the current state of the scene (e.g. which effects are toggled, the order of the layers, which camera is currently active, etc.)
There are software solutions which operate at this higher-level, allowing the scripting/programming of various control systems (e.g. Chataigne, Node-RED). These programs would be able to receive real-time inputs from the performer and then route the appropriate actions to one of the above effects programs, but I preferred an all-in-one (routing + effects) solution.
Enter one of the greatest pieces of software ever created: TouchDesigner.
TouchDesigner is difficult to describe. The wiki calls it a “node-based visual programming language for real-time interactive multimedia content”. The simplest way to describe it is a visual scripting environment that lets you take almost anything as an input (e.g. USB device, network message, video stream, audio stream, etc.), and drive almost anything (e.g. a microcontroller, USB device, video stream, audio stream, laser controller, etc.) as an output. What you do in the middle is completely up to you (e.g. generate a texture, modify a sound, logical operations, mathematical operations, etc.).
My first exposure to TouchDesigner was a Deadmau5 livestream showing how he uses TouchDesigner to run his cube. Enough said, TouchDesigner wins.
As a visual scripting environment, TouchDesigner made it easy to see the logical flow of the various video streams. I set up a pipeline to apply effects on each video layer independently:
And then ran the composited image through its own effects pipeline:
As for how I implemented the effects themselves, the most powerful technique available was shaders.
Shaders can be thought of as small programs which modify pixels before they are displayed. The shader is aware of the relative position of each pixel and can use this information while making the modifications, enabling a wide variety of effects and transformations.
Shaders in TouchDesigner are written using GLSL. The language is easy to understand and there is an enormous amount of reference material to learn from. After much experimentation, I implemented a wide variety of effects.
Some effects are applied only to the character:
Other effects are applied to the background:
But the most fun are the ones applied to the composited image. By stacking numerous shaders to recreate various flaws in old display technologies, I can recreate the look of a variety of mediums:
With shader effects implemented, I moved on to a different kind of effect - one that considers the element of time.
We all remember the first time we pointed a recording device at itself.
TouchDesigner has a built-in feedback effect, but building my own version offered significantly more flexibility. By maintaining a cache of previous frames (for example the last 100 frames) of a video stream, I could create interesting effects by combining the previous frames with the current frame in various ways.
For example, I can create psychedelic effects by retaining previous frames and applying different colors to them:
Or I can progressively blur previous frames for a ghostly trail:
But most importantly, with this cache I can implement one of the most iconic video effects of all time - the Max Headroom glitch:
Jumping back-and-forth between a recent character frame and a random previous frame (at varying speeds) recreates the classic effect (using an alternate scene previously created):
Now that I had a collection of effects, I could experiment with combinations that led to the most visually interesting results.
Although most of the effects I created work fine on their own, I found the results much more interesting when they were combined in various ways. Some combinations were obvious (e.g. silhouette and distortion), but others weren’t. To facilitate this discovery process, I implemented a way of dumping the current list of active effects. By doing this I compiled a long list of pleasing combinations. After some time, some patterns emerged - for example many of the pleasing combinations relied on cycling through colors. I grouped these combinations within a higher-level group of “presets” (for example “RGB” for cycling colors). The simplified selection of pleasing effects in real-time - instead of manually combining effects I could pick from existing presets (randomly, or deliberately).
For flexibility, I retained the functionality for manual experimentation by combining individual effects on-the-fly. This mode can be toggled on when desired.
I ended up with tables of presets for each context:
I then turned my attention to controlling everything I had developed so far.
My motion capture suit is “wireless” (i.e. not tethered to a PC), so the controller system needs to be wireless as well. I wanted the performer to have quick access to a number of hotkeys, without having to look down at their controller (as this glance would be obvious in the motion-captured performance). Voice commands were my first thought, but in the long-run I wanted the performer to be able to speak with the audience, so voice commands were out. What I needed was a physical controller with multiple buttons that could fit in a single hand.
A convenient solution was a wireless numpad:
The labels on the keys were irrelevant for this use case, so to increase one-handed grip I covered the buttons and the rear with grip tape:
The numpad provides the “director” controls for the performer. It controls camera position, scene selection, microphone on/off, etc.
The numpad has a limited number of keys, but its layout lends itself well to camera control (as there are 8 camera positions to choose from, and their relative placement can be logically mapped to the numeric keys).
The following hotkey layout served as a good starting point for camera control:
I planned to use some of the remaining keys for scene control, microphone control, and so on - but it was already clear that there were not enough keys available on the numpad to control the effects system. I needed a way to easily trigger numerous effects across 3 different contexts (character, background, composited image). Ideally the performer would have a full keyboard in a small form factor, operated blindly in one hand. Is this possible?
The answer is yes, thanks to chording.
By treating unique combinations of multiple keys (known as chords) as individual hotkeys, the number of key-mappings expands considerably. With a small number of buttons you can achieve the functionality of a full keyboard.
The Twiddler does exactly that:
The Twiddler is ergonomic, lightweight, and extremely flexible. It takes some time to develop the muscle memory for each button’s location, but once that is established it is possible to enter complex combinations without glancing. By combining the buttons underneath the user’s thumb with the buttons available to their other fingers, it is possible (in theory) to easily toggle effects in 3 different contexts.
The Twiddler uses a nomenclature for the various combinations which makes it easier to remember the combinations, L M R (left middle right) for each of the 4 possible rows.
The final layout looks like this:
With two input devices and layouts for triggering action, I moved on to receiving these input commands on the PC side.
Both the numpad and the Twiddler are recognized by a PC as USB input devices and they transmit typical keyboard (and mouse, in the case of the Twiddler) button presses. The simplest approach would have been to use built-in hotkey features in Blender, TouchDesigner, and OBS - I could distribute the hotkeys so there are no collisions, and have each program deal with them individually. This would be cumbersome to setup and manage, especially as the number of hotkeys increases (which is expected to happen as more effect are added in the future). There is also a risk of overlapping with an existing Windows hotkey (e.g. Ctrl-S or Ctrl-X). A more centralized and easily-configurable option was needed.
AutoHotKey is the go-to for solving this problem: it is a free, open-source scripting language for macros, hotkeys, and other automations. My controller system posed a problem: I had multiple input devices acting as keyboard and mouse, and I didn’t want keypresses to be accepted by the operating system. To solve this I used AutoHotInterception to “intercept” the commands from specific USB devices (i.e. only the wireless numpad and the Twiddler), preventing them from continuing to the operating system and other programs. But how to route the intercepted commands to the appropriate software?
I was already using the OSC protocol for transmitting facial blendshapes from an iPhone to Blender, so it was the natural choice. Although it was originally intended for networking synthesizers with computers, the format is so open-ended that it can be used by any device to send a message to any other device or software.
An OSC message is very simple, consisting of:
So a message being sent to Blender to activate a particular camera could look like:
/Blender s(“Front”)
What I did in this case was intercept all inputs from the two controllers, convert them into OSC messages, and had TouchDesigner receive the OSC messages and take the appropriate action(s). The following OSC message format made it clear which button combination on which controller has been pressed:
/twid s("(ALT)LLOO")
/nump s(“NumpadHome”)
In TouchDesigner, I parsed the OSC messages to split off the address and the argument, then have logic setup to handle each request. TouchDesigner primarily uses a node-based visual scripting to implement logic, but for more complex situations it offers the ability to program in Python. In this case I call Python functions based upon the OSC message that is received.
For maximum flexibility, I managed which commands call which functions in a table:
At this point I had real-time motion-captured performance, triggerable visual effects, and a flexible controller system in place. The final piece was streaming the result live to the internet.
In the earlier version of this system I used OBS as the recording/streaming solution (as every other livestreamer does). This remained the best choice, but some changes were required now that I was using TouchDesigner as the video source - I needed a way to capture TouchDesigner’s output in a performant way. Previously I used screencapturing in OBS to achieve a similar goal (capturing the Blender viewport), but this was only a quick-and-dirty solution with many limitations (low resolution, locking the position of the application window, etc.) There was a much better solution available: Spout
This VJ system is implemented on a single computer. The visuals created in TouchDesigner have already been rendered by the computer, shouldn’t it be possible for other applications to use the existing render for their own purposes? This is what Spout achieves - by sharing the GPU’s frame buffer across multiple programs, each program can have zero latency access to the same visual imagery. This is exactly what I was looking for, and it was easily implemented in both TouchDesigner and OBS.
Sidenote: if I was dealing with the situation of visual imagery being generated on one computer, and the streaming occurring on a second computer, I could use NDI to achieve a similar result by streaming the video over a network (with minimal latency of ~50ms).
I had built a powerful system for VJing, but have not addressed the most important part: the music. OBS has built-in support for playlists, and an OSC plugin allows the use of my existing controller system to control playback (next, previous, pause, etc.)
It would be ideal if the currently-playing song was displayed onscreen. This was achieved with another plugin.
Although effects are intended to be manually triggered, I added functionality which allows effects to react to the music directly (e.g. beat detection). This was achieved by routing the audio output from OBS into TouchDesigner. Unfortunately this introduced some latency - without compensating for this latency, the performer will be responding slightly after the music has already been played. I addressed this by adding a bit of latency on the audio stream which is sent to the final output (but not the audio stream which is sent to TouchDesigner), ensuring the final output from OBS has the audio in sync with the visuals.
Until this point I had been using a generic low-poly character, but I wanted something more distinct. My ultimate goal was to create a modern version of live-access television with it’s characteristic aesthetic. I had referenced Max Headroom as an influence, and many of the effects I implemented are refined versions of those found in 80s and 90s television - so I figured why not continue this trend and put Max Headroom in a tuxedo?
For his name, I set out from the beginning to create a wack aesthetic so why beat around the bush? “wackbar”.
There was plenty of room on both sides of the main character for additional animated characters. Thinking of the additional characters as backup dancers offered an excuse for various formations alongside and behind the main character. For the choice of backup dancer model, I continued the trend of ripping off iconic characters from the 80s and used one of Hajime Sorayama’s female robots:
The VJ system operates under the assumption of a single performer who “does-it-all”, so how could multiple characters be animated at once?
One option was to use pre-recorded animations for the backup dancers, but I would need to synchronize these pre-recorded movements with the music that is currently playing (a challenging task). An easier approach came to mind - why not re-use the motion capture I was using for the main character? I could mirror the skeleton in the backup dancers to provide some visual distinction, and could introduce a slight delay (perhaps the length of a beat) so the duplication is less obvious.
This approach worked quite well:
After experimenting with the system in its current state, a few limitations stand out:
I addressed each of these with a few upgrades.
As much fun as it was to build my motion capture suit, the inability to walk around an environment limited my system to a stationary dancing character. One option to remedy this was to enhance my motion capture suit by attaching a Vive tracker to the waist, using the Vive’s outside-in tracking system to track the performer’s position in the room. There were no technical reasons why this should not work, but I decided to go with an easier solution: I bought a professional inertial motion capture suit.
The explosion in popularity of VTubing and VRChat over the last few years has resulted in a healthy secondary market for motion capture suits. I picked up an older Perception Neuron suit on eBay for a reasonable price:
One reason this model of suit was desirable is that no subscription is required for the interfacing software (unfortunately common for high-end motion capture suits) - an older version of the Neuron software was still available and will work indefinitely:
My approach of using Blender for animation of the characters was initially motivated by the choice of motion capture suit (Chordata only provides a Blender plugin for capturing motion data). But with the switch to the Perception Neuron suit I could return to my initial preferred environment: Unreal Engine. This switch brought with it a number of improvements:
I started by creating a new environment:
Then I redesigned the character model using MetaHuman:
I programmed all the functionality needed using the Blueprints visual scripting system:
I replaced the previous facial tracking app with Live Link Face (since I’d already be using Live Link to receive motion capture data in UE5 from the Perception Neuron suit):
And used Spout to share the rendered output with TouchDesigner in 3 layers (character, background, both):
This change in video source had an impact on the effects pipeline (for example, I now had an alpha channel and no longer needed to do chroma key masking). This resulted in a number of other changes and improvements in the rapidly-expanding TouchDesigner project. The end result:
The visual improvements from the migration to UE5 were substantial:
As a final upgrade, I added some variety to the scenes.
A Matrix-inspired grid of screens:
A Korean BJ-style three-pane view:
And kaleidoscopic patterns:
A compilation of the original Blender version and upgraded UE5 version: