April 21, 2026·Pooldayai-videoreframingaspect-ratiosocial-videoagentic-editing

Reframing video with AI: vertical, square, widescreen without recropping

One shoot, every aspect ratio. That's the promise of AI reframing, and the gap between the promise and reality is where most creators waste an afternoon.

Reframing is the process of taking a video shot in one aspect ratio (usually 16:9) and producing versions in others (9:16, 1:1, 4:5) without manually re-editing each one. Done well, it tracks the subject, anticipates motion, and respects safe zones for captions and platform UI. Done badly, it crops the speaker's forehead off and centers a wall.

The difference between good and bad reframing is the difference between one shoot covering every distribution channel and three separate edits with three separate timelines.

Why reframing matters now

Distribution fragmented. The same brand video needs to live on TikTok (9:16), Instagram Feed (1:1 or 4:5), Instagram Reels (9:16), YouTube Shorts (9:16), YouTube long-form (16:9), LinkedIn (1:1 or 16:9), and X (16:9 or 1:1). Seven aspect ratios across four platforms, minimum.

Two ways to handle this. Shoot each format separately. Or shoot once at the highest quality and reframe.

The first approach is how teams used to work. It's expensive, slow, and produces inconsistent edits across formats. The second approach is the default now, and AI reframing is what makes it viable. It's one of the most common workflows we see across Poolday use cases, particularly for mobile gaming UA and podcast cutdowns.

What good reframing actually does

Tracks the active subject across the frame, not just the center. If a presenter walks from left to right, the frame follows them.
Switches focus when the subject changes. If two people talk, the frame should follow whoever is speaking. This requires speaker detection, not just face detection.
Respects platform-specific UI overlays. TikTok's right-side icons eat 12% of the screen. Instagram Reels eats less. YouTube Shorts has its own safe zone. A reframe that ignores all of them produces unusable output.
Holds wide enough on multi-person shots. Pulling tight on a single face during a group conversation looks worse than a static center crop. Sometimes the right answer is to not move.
Smooths motion paths. The crop window should glide, not snap. Sudden jumps read as broken even when the tracking is correct.

Common reframing failures

Center-crop fallback. The system gives up on tracking and just centers the frame. Acceptable for static shots. Awful for anything with motion. This is the most common failure mode in cheaper tools.

Jitter. The crop window shakes as the model second-guesses itself frame to frame. Smooth motion paths matter more than perfect subject centering. A slightly off-center crop that holds steady beats a perfectly centered crop that vibrates.

Text and graphics get clipped. Lower-thirds positioned for 16:9 vanish in 9:16. The reframe should detect on-screen text and either preserve it or remove it cleanly. Most tools just crop through it.

Action getting cut. A pour shot reframed for vertical that loses the receiving glass is a broken edit. The system has to understand what the shot is about, not just where the subject's face is.

Boundary failures. The first and last 2 seconds of any clip are where reframe systems are weakest. Tracking takes a moment to lock on, and clips often end on a frame the model has lost confidence in.

Caption collisions. Burned-in captions positioned for 16:9 land on top of the speaker's mouth in 9:16. The fix is to regenerate captions per aspect ratio, not crop the original.

How Poolday handles it

Poolday treats reframing as a multi-step decision, not a single crop pass. The system identifies subjects, predicts motion, picks the active speaker, checks for on-screen text and graphics, then produces a smoothed crop path tuned for the target platform's safe zones. You review the result on a single screen with all aspect ratios side by side, and you can override any decision the system made. This is the multi-agent approach applied to one specific task: many specialized models, one coordinated output.

That last part matters more than it sounds. Reframing, like every other editing task, can be fully automated by an agent with the right skills. A well-built agentic system hits 100% autonomy on the mechanical work: subject tracking, motion smoothing, safe-zone respect, text handling. What still belongs to the human is taste, and taste is a separate axis from autonomy. A system can be 100% autonomous and still hand you the controls when you want to make a different call. The right question isn't "how much can the agent do." It's "how easily can you change its mind." We covered the framework behind this in the autonomy ratio piece.

A practical checklist

Always preview every aspect ratio before exporting. Don't trust thumbnails.
Watch with sound on. Lip sync issues sometimes only surface when the frame moves and the speaker drifts off-center.
Check the first and last 2 seconds of every clip. Reframe systems are weakest at boundaries.
For text-heavy videos, regenerate captions per aspect ratio rather than reusing. Burned-in captions designed for 16:9 will fight every other format.
For multi-speaker scenes, verify the speaker switches landed. Single-face tracking systems often miss these.
Export at the platform's native resolution, not a downscale from 4K. 1080x1920 for vertical, 1080x1080 for square. Reframing at the wrong resolution costs sharpness.

What this unlocks

Get this right and one shoot covers TikTok, Reels, Shorts, YouTube, LinkedIn, and a podcast clip without anyone touching a timeline. The cost difference between a team that reframes well and a team that doesn't is roughly 4x on output volume for the same shoot day.

Multiply that across a year of content and reframing stops being a feature. It becomes the production model. Pair it with agent-driven sound design and multilingual localization and one shoot day produces a quarter of distribution.

FAQ

What is AI video reframing? The automated process of taking a video shot in one aspect ratio (usually 16:9) and producing versions in others (9:16, 1:1, 4:5) by intelligently tracking subjects, smoothing motion, and respecting platform safe zones. Good reframing avoids manual re-editing each format separately.

Which aspect ratios should I export to? For most brands: 9:16 (TikTok, Reels, Shorts), 1:1 (Instagram Feed, LinkedIn), 4:5 (Instagram Feed alternative), and 16:9 (YouTube long-form, LinkedIn, X). Six exports cover most distribution.

Does AI reframing work on multi-person shots? Yes, if the system supports active-speaker detection. Single-face tracking systems handle one person well and fail on group conversations. Agentic systems that route between detection models do meaningfully better.

Should I shoot 4K to reframe later? Yes for vertical extraction from horizontal source. Cropping a 1080p horizontal video to vertical leaves you with sub-1080p output. 4K source produces clean 1080x1920 verticals.

Why does my reframed video look jittery? The system is second-guessing itself frame to frame. Smooth motion paths require temporal consistency, which cheaper tools skip. Look for systems that prioritize crop-path smoothing over perfect subject centering.

Can AI reframing handle on-screen text and graphics? Better systems detect text and either preserve it within the new frame or remove it cleanly so you can regenerate captions per aspect ratio. Cheaper tools crop through text without detecting it.

How long does reframing a 60-second video take? With a good agentic system, under a minute per aspect ratio. Manual reframing in a traditional editor takes 20 to 40 minutes per format. Six formats manually is a half-day of work.

One shoot, every format. See how the agent handles reframing on real projects.