Using AI Scene Detection to Edit Stream VODs for Youtube

I’ve been working on a desktop app that will cut down stream VODs based on how high your heart rate is. During that research, I came across a python library called scenedetect that can automatically separate the different scenes inside a long video. This sounded especially useful for editing stream VODs.

Recently a Super Smash Brothers Melee streamer I watch did a 24 hour stream. He has a special scene in OBS for talking to his viewers, and these segments are especially useful for the youtube content he’s going to make.

The Question: Does scenedetect make it easy to find the most relevant clips for editing down streams into Youtube content?

The Experiment

We start by downloading the scenedetect package.

pip install scenedetect

Python

I then tried this command from the scenedetect documentation.

scenedetect --input 24hr-stream.mp4 detect-adaptive list-scenes save-images

Python

My initial hypothesis was that the results would be practically worthless. Unlike cutting apart movies, where all changes are relevant, streamers are playing video games or watching random youtube videos, so the change detection is not as predictable.

And unfortunately I was correct. The default settings for the program are way too liberal with the scene detections, detecting over 4000 cuts in a 24 hr stream. In the attached screenshot, the highlighted pictures are from a single game of melee. At 3 pictures per scene, the program split this one game of melee into 10 different scenes, which is not correct.

All is not lost though. Scenedetect has many ways of determing cuts in its documentation. It looks like the most dynamic one would be “detect-content”. It uses hue, luminosity, saturation, and edge changes to create a metric for each frame. Whenever that metric exceeds a specified threshold, we can mark a scene change.

When tinkering with these settings, It’s best to create a .cfg file and pass it into scenedetect because the CLI options can get pretty messy. Here is the .cfg I ended up with.

[global]
default-detector = detect-content
min-scene-len = 0.8s

[detect-content]
threshold = 40
weights = 1.0 1.0 0.8 0.15

[split-video]
preset = slow
rate-factor = 17
# Don't need to use quotes even if filename contains spaces
filename = $VIDEO_NAME-Clip-$SCENE_NUMBER

[save-images]
format = jpeg
quality = 80
num-images = 3

Bash

The values under [detect-content] are the ones you want to tinker with. The 4 numbers in the weights specify how much it should acknowledge changes in hue, saturation, luminosity, and edges respectively. The threshold determines how big that weight change needs to be.

The scenedetect documentation recommends that you map out the weighted metric in the stats file using pyplot. This is useful for determining the threshold, but is not as useful for determining weights. For me, it was quicker to guess and check the scene cuts over an hour of footage. I have the video open on one screen, and the list of cuts on the other. I scrubbed through the cuts I thought were important, then ended up being happy with the values in the config above.

Using this configuration, we can run the following command to create a CSV with all the scene timestamps.

scenedetect -i 24hr-stream.mp4 -c ./scenedetect.cfg list-scenes

Bash

The Final Product

The scenedetect documentation recommended chaptertool as a way of converting its output to a format you can use in Premiere. Unfortunately, the documentation was out of date, and even when I got it working, the output was not useful.

Instead, I wrote the python script below to convert our scenes csv into an EDL file that can be used in Premiere. Just run this code, and put in the filename of your video as well as what video editor you are using. You should get a .EDL file in that folder, which will cut our footage at the scene changes.

Be sure that you have the edl and timecode libaries installed using pip. Change the constants at the beginning as needed.

# %%
import timecode
import pandas as pd
from edl import Event

VIDEO_NAME = "24hr-stream.mp4"
CSV_FILE = "24hr-stream-Scenes.csv"
OUTPUT_EDL_FILE = "24hr-stream.edl"
VIDEO_EDITOR = "premiere" # "resolve" or "premiere"

def create_edl_events(data):
    result = []
    framerate = round(data['Length (frames)'][0] / data['Length (seconds)'][0], 2)
    playhead = timecode.Timecode(framerate=framerate)

    # in resolve, the destination timecode is offset by one hour, or 3600 seconds
    offset = 3600 if VIDEO_EDITOR == "resolve" else 0
    for index, row in data.iterrows():
        comments = [f"* FROM CLIP NAME: {VIDEO_NAME}"]
        if VIDEO_EDITOR == "premiere":
            # these comments are required for each premiere event
            comments.append(f"* AUDIO LEVEL AT 00:00:00:00 IS -0.00 DB  (REEL AX A1)")
            comments.append(f"* AUDIO LEVEL AT 00:00:00:00 IS -0.00 DB  (REEL AX A2)")

        length = timecode.Timecode(framerate=framerate, frames=row['Length (frames)'])
        preprocess = lambda x: str(x).replace(";", ":") # convert ; to : for use in premiere
        event = Event({
            "clip_name": VIDEO_NAME,
            "num": f"{index + 1:03d}",
            "comments": comments,
            "rec_start_tc": preprocess(playhead + offset),
            "rec_end_tc": preprocess(playhead + length + offset),
            "src_start_tc": preprocess(playhead + offset),
            "src_end_tc": preprocess(playhead + length + offset),
            "tr_code": "C",
            "track": "V" if VIDEO_EDITOR == "resolve" else "AA/V",
            "reel": "AX",
        })
        result.append(event)
        playhead += length + timecode.Timecode(framerate=framerate, frames=1) # add one frame so the cut is recognized by premiere
    return result

def save_events_as_edl(events, filepath):
    with open(filepath, "w") as edl_file:
        edl_file.write("TITLE: " + VIDEO_NAME + "\n")
        edl_file.write("FCM: NON-DROP FRAME\n\n")

        for event in events:
            edl_file.write(event.to_string() + "\n")

if __name__ == "__main__":
    # We set the header to 1 because the first row is just a list of timecodes
    data = pd.read_csv(CSV_FILE, header=1)
    events = create_edl_events(data)
    save_events_as_edl(events, OUTPUT_EDL_FILE)
    print(f"EDL file saved as {OUTPUT_EDL_FILE}")

Bash

Importing these Cuts to Premiere

Importing these cuts to Adobe Premiere is simple, but clunky.

To import the cuts into premiere, you go to File > Import... and select the EDL file created from the python script. Select NTSC, then set up your sequence settings like you normally would. This will create a folder called “output” in your project. Under that folder is your sequence, and a ton of source clips. But if you open the sequence, it will say all of your media is offline. It should look something like this.

The bottom green layer is the sequence, the rest are the source clips. To fix this, you need to right click the sequence, select Link Media.... You should see a screen like this.

From here, press Locate and a file browser will open up. Point it to your stream VOD. This will take a while depending on your system and how many clips you have. Once you’re finished, you should see a properly cut stream inside of your Premiere sequence!

In the example below, it’s pretty easy to see which cuts represent gameplay and navigating through menus (at least to a Melee player). Other parts of the timeline are cut when the streamer talks directly to the viewers.

Once the media is relinked, you can press Edit > Consolidate Duplicates to clean up your project browser and get all of those separate files into one.

Conclusion and Final Thoughts

If you have a mid-powered PC, this solution is really useful for cutting out fluff and quickly finding the sections you want using the clip thumbnails.

Mid-sized streams of 4-6 hours are plenty manageable with this solution. Unfortunately, the bigger the video, the worse Adobe Premiere handles it. With the 24 hour stream I used as an example, there are over 800 clips in a 8GB file, and relinking them all in Premiere maxed out my memory before it could relink.

Overall, I’ll be including this solution in my eventual desktop app.

Using AI Scene Detection to Edit Stream VODs for Youtube

The Experiment

The Final Product

Importing these Cuts to Premiere

Conclusion and Final Thoughts

Hi
Want to see the cool stuff I do in your inbox?

Sign up to get more programming, gaming, and 3D art content.

Using AI Scene Detection to Edit Stream VODs for Youtube

The Experiment

The Final Product

Importing these Cuts to Premiere

Conclusion and Final Thoughts

Hi Want to see the cool stuff I do in your inbox?

Sign up to get more programming, gaming, and 3D art content.

Hi
Want to see the cool stuff I do in your inbox?