Cropping videos with ffmpeg, but visually?

There are many ways to skin a cat. The same thing can be said about editing videos.

This post is specifically about cropping.

By the end of it, you will learn about some challenges when dealing with multimedia on the Web, how to solve or avoid them, what ffmpeg and SVG can do, and how we could build a graphical video editor using these technologies.

The post follows the following plan:

Normal way

Just use a graphical video editor like Shotcut (opens in a new tab).

Manual approach

I often use a command-line program called ffmpeg.

Conveniently, ffmpeg has a filter called crop (opens in a new tab) that is used to crop stuff:
ffmpeg -i in.mp4 -filter:v "crop=out_w:out_h:x:y" out.mp4

My usual workflow:

  1. Import the snapshot to a new document.
  2. Draw a rectangle (hotkey: R).
  3. Copy the rect's values: x, y, w, and h.

Inkscape - Drawing a rectangle

Why not use a proper (GUI) video editor? Laptop too shit.

Also, this is not a huge issue since I hardly ever need to perform this. Still, I thought I might as well try to automate it... or make it "semi-automatic" at least.

Semi-automatic approach

Let's write a simple web app that helps us crop stuff using ffmpeg: ffmpeg crop helper.

The idea is simple:

Dimensions problem

Let's consider an example with specific dimensions to illustrate a possible challenge.

Original image dimensions:

Transformation:

New image dimensions:

Now, let's assume we want to define a crop area on this scaled image using traditional coordinate systems without considering scaling.

Traditional coordinates:

Challenge:

Meaning:

Also, keep in mind the size of the element (image in our example) may change dynamically due to the environment: screen rotation, styles added, etc.


Solution:
We need to convert values between coordinate systems. Geometry problem. Math. I can't be bothered with that.

Actual solution:
Instead, let's keep everything (the image and crop mask) in a shared coordinate system so we don't have to convert anything. It should be able to scale freely. That description kinda feels like SVG and its viewBox attribute (opens in a new tab). SVG is our canvas.

As for the original media, we can use the foreignObject (opens in a new tab) SVG tag to embed an HTML video or img element. This is the original media.

A rectangle can be used to define the crop area.

Using SVG satisfies the need for a consistent coordinate system. With viewBox, we can define the coordinate system based on the original image dimensions, even when the image is scaled or transformed, ensuring accurate and consistent positioning of elements within the SVG canvas.


UX

There is no building way to make an element (HTML or SVG) easily resizable and moveable. And, no, CSS resize attribute (opens in a new tab) does not help much.

To build a decent UI, we need to create different event handlers (like at corners) and manage different mouse/pointer events.
I've done something similar in Java (Swing); it's bothersome and hard to get right; I don't wanna redo it again, so a crude solution will have to do... for now.

Let's just display various inputs and let the user change them.

Pseudocode

<form>
    Original file:
    <input type="file" accept="image/*,video/*" />
 
    Crop area:
    Width <input value="$out_w" type="number" min="0" max="$orig_width" />
    Height <input value="$out_h" type="number" min="0" max="$orig_height" />
    X <input value="$out_x" type="number" min="0" max="$orig_width" />
    Y <input value="$out_y" type="number" min="0" max="$orig_height" />
</form>
 
<svg viewBox="0 0 $orig_width $orig_height">
    <foreignObject class="crop-object">
        <img /> or <video />
    </foreignObject>
 
    <rect class="crop-mask" />
</svg>
 
<output>
    <code>
    ffmpeg
        -i in.mp4
        <b>-filter:v "crop=$out_w:$out_h:$out_x:$out_y"</b>
        out.mp4
    </code>
</output>

Some implementation details

Some implementation details (or problems) worth mentioning:

When:

Intrinsic dimensions:

Why couldn't both img and video properties be called intrinsicWidth and intrinsicHeight? Web things, I guess.

Rough implementation

I built a rough implementation in Vue.

It works as described above and supports both images and videos.

Selected nothing Selected an image Selected a video

Possible improvements:

It's unlikely that I implement them anytime soon. After all, this was just a proof-of-concept.

What concept? You ask. I'll answer that in a bit.

Existing solutions

There exist many projects that address the mentioned "possible improvements".

Other than dedicated video editors, some open-source projects serve as graphical user interfaces for ffmpeg:

Conclusion

Takeaway: There are many ways to solve a problem. Alice to Cheshire Cat: And if not, there may be more than one way to skin a cat, if you'll pardon the expression.

Also, FFmpeg and SVG are cool. JavaScript, not so much.

The goal of this project was to test the concept of using SVG to avoid coordinate transformation challenges while creating a video editor. The proof-of-concept works as expected.

If you are curious, you can check it: