.png&w=3840&q=75&dpl=dpl_GyYJ7owZ6EFXvfarX4cLuUiDCnyJ)
ElevenLabs Scribe v2 hits 5% English WER – 90+ language STT
Stay in the loop
Free daily newsletter & Telegram daily report
Executive Summary
ElevenLabs’ Scribe v2 targets post and localization teams with a speech‑to‑text model claiming ~5% word error rate in English and sub‑10% in Hindi across 90+ languages; it handles single files up to 10 hours and now underpins subtitles and transcripts inside ElevenLabs Studio plus a public API. Keyterm Prompting lets users bias up to 100 custom terms toward correct spellings for brand and character names; Entity Detection flags 56 PII/health/payment categories with timestamps; Smart Speaker Diarization, word‑level timing, and sound‑event tags collapse what was previously a multi‑tool captioning stack. ElevenLabs leans on SOC 2, ISO27001, PCI DSS L1, HIPAA, and GDPR compliance plus EU/India residency and zero‑retention modes, while conceding low‑resource languages still lag.
• Motion, cameras, infra: Kling 2.6 Motion Control evolves with JSON‑style prompts, dance‑cloning recipes, and HeyGlif’s Lego‑stairs and Room Renovator agents; Qwen‑Image Multiple Angles expands into a 96‑pose Gaussian‑splat‑trained camera rig plus fal’s Qwen→Kling fashion‑video pipeline; MongoDB + Voyage collapse RAG to two calls, citing ~140 ms latency and 83–99% storage cuts, while MiniMax’s $619M HKEX IPO and ComfyUI’s 2025 production reel signal both lean labs and open node‑graphs maturing into studio‑grade infrastructure.
Together these moves tighten the creative stack—from speech capture to camera planning to retrieval and training ops—but many claims (WER rankings, RAG cost reductions, routing/RL benefits, rumored DeepSeek V4 coding gains) still lack broad independent benchmarks.
Feature Spotlight
Scribe v2 sets the new subtitle/transcription bar
ElevenLabs Scribe v2 delivers state‑of‑the‑art accuracy (~5% WER EN), 90+ languages, Keyterm Prompting, Entity Detection, diarization, and enterprise compliance—turning transcription/subtitling into a reliable, scalable building block for creators.
Major STT release for post teams: ElevenLabs Scribe v2 lands with benchmark‑low WER, multi‑language, diarization, and enterprise controls. This thread dominated today; expect immediate impact on captions, dubbing prep, and archival workflows.
Jump to Scribe v2 sets the new subtitle/transcription bar topicsTable of Contents
🗣️ Scribe v2 sets the new subtitle/transcription bar
Major STT release for post teams: ElevenLabs Scribe v2 lands with benchmark‑low WER, multi‑language, diarization, and enterprise controls. This thread dominated today; expect immediate impact on captions, dubbing prep, and archival workflows.
ElevenLabs Scribe v2 pushes STT accuracy and control for post workflows
Scribe v2 (ElevenLabs): ElevenLabs releases Scribe v2, describing it as its most accurate transcription model with around 5% word error rate in English and under 10% in Hindi across 90+ languages, aimed at large‑scale captioning, subtitling, and archival use according to the recap in feature summary and the launch details in launch thread; the model supports 10‑hour files, is already powering subtitles and transcripts inside ElevenLabs Studio, and is exposed via an API documented in the stt docs. Creators highlighting the launch note strong quality gains but still call out weaknesses on niche languages, framing it as a big step rather than a solved problem yet feature summary.
• Accuracy and language coverage: The company and external recap both cite ~5% WER for English and sub‑10% for languages like Hindi across more than 90 languages, positioning Scribe v2 near the top of current STT options for global media catalogs and localization tasks feature summary, launch thread .
• Creative‑friendly controls: Keyterm Prompting lets users specify up to 100 custom terms (brand names, character names, in‑world jargon) that Scribe v2 will prefer when contextually appropriate, while Entity Detection tags up to 56 categories of PII, health, and payment data with timestamps to drive redaction, compliance edits, or alternate cuts feature summary, launch thread .
• Timing, diarization, and events: Smart Speaker Diarization assigns lines to individual voices, word‑level timestamps align tightly with picture for subtitles, and dynamic audio tagging adds labels for sound events like laughter or footsteps, giving post teams more structured metadata in a single pass rather than juggling multiple tools launch thread.
• Enterprise posture and limits: ElevenLabs emphasizes SOC 2, ISO27001, PCI DSS L1, HIPAA, and GDPR compliance, plus EU and India data residency and a zero‑retention mode for sensitive work, while also acknowledging that low‑resource and niche languages remain noticeably weaker than the main set despite overall gains feature summary, launch thread .
For AI‑driven film, creator, and music pipelines, the combination of relatively low error rates, glossary‑like term control, rich diarization and tagging, and an enterprise‑ready posture means Scribe v2 can slot directly into captioning, dubbing prep, podcast clipping, and compliance review flows that previously depended on a patchwork of less accurate or less controlled tools.
🎥 Animate and swap performances: Wan2.2 and studio BTS
Practical video tools and pipelines for filmmakers: Runware’s Wan 2.2 Animate pricing and links, plus Autodesk Flow Studio BTS. Also notes on Google Vids avatar upgrades. Excludes Scribe v2 (feature).
Runware launches Wan 2.2 Animate for cheap character animation and swaps
Wan 2.2 Animate (Runware): Runware added Wan 2.2 Animate to its lineup as a character animation and swap service that turns still images into moving characters or replaces actors in existing footage, with pricing starting at 0.010 dollars per second according to the launch note Wan launch tweet. The direct model page shows it supports motion driven by a reference video while keeping expressions and body movement consistent, with output resolutions up to 720p and several preset durations, as laid out in the official listing Wan model page.

Why it matters for filmmakers and designers: This gives small teams an API-style way to bring illustrated or photoreal characters to life or to swap them into a live-action plate without running their own video models, and the per-second pricing makes it practical for short social clips, UGC-style ads, or animatics rather than only big-budget work Wan launch tweet.
Google Vids upgrades AI avatars with Veo 3.1 for more natural training videos
AI avatars in Google Vids (Google): Commentators report that Google Vids now runs its AI talking-head avatars on Veo 3.1, bringing more realistic faces, smoother lip-sync, and less robotic expressions, with the new behavior aimed squarely at corporate training, onboarding, and internal update videos Google Vids avatar note. The framing is that teams can create professional explainer clips in minutes with no camera or studio setup, as the avatars deliver scripts with improved emotional timing and mouth movements that better match speech Google Vids avatar note.

For content teams and educators, this means Google’s slide-and-script tool is edging closer to broadcast-style presenters powered entirely by video models, which directly competes with dedicated avatar platforms while living inside the broader Workspace stack.
Autodesk Flow Studio shows AI mocap plus Golaem crowds in new fight short BTS
Flow Studio fight pipelines (Autodesk): Autodesk shared another behind-the-scenes look at Flow Studio powering action-heavy shorts, this time for The Circle Forms Us, where AI motion capture from Flow Studio is refined in Maya and paired with Golaem for dense chanting crowds, following up on earlier anime-style fight previs in Both of Me anime fight BTS that used Flow Studio with 3ds Max Circle Forms Us tease.

• Hybrid workflow: The new clip highlights a typical game-pipeline-style stack where Flow Studio captures a solo fighter’s performance, Maya handles motion cleanup and stylization, and Golaem instantiates and simulates a full arena crowd reacting in sync with the action Circle Forms Us tease.
• Signal for AI mocap: Compared with the prior Both of Me breakdown, which focused on one-on-one combat, this shows Flow Studio being used not only for hero motion but as the driver for large-scale crowd shots that would traditionally need separate extras or hand-keyed loops Circle Forms Us tease.
For AI-first storytellers, the pattern is that Flow Studio is moving from experimental tests into full sequences that mirror real production layouts, especially for fight scenes and stadium-scale environments.
🧩 Motion‑control directing: Kling 2.6 recipes and agents
Continues the Motion Control wave with hands‑on directing tips: cloning performance from a source video, staircase grounding tests, and one‑click real‑estate makeover agents. Excludes Scribe v2 (feature).
HeyGlif’s Kling Motion Control agent stress‑tests grounded steps on Lego stairs
Kling Motion Control agent stairs test (HeyGlif): HeyGlif extends its Kling 2.6 Motion Control agent—following up on directing tips that formalized reusable prompt templates—with a new demo where a toy soldier walks down a spiral staircase made of Lego bricks, emphasizing heavy, grounded footsteps and strict contact on each step as shown in stairs demo.

• High-level prompt as direction: The agent’s internal prompt, shared in stairs prompt, spells out side-view framing, spiral Lego stairs, “heavy, grounded footsteps,” and “no floating,” illustrating how directors can encode physics expectations and camera notes into a single Motion Control instruction.
• Reusable agent surface: HeyGlif exposes this as a ready-made "Kling Motion Control" agent so users can swap in different characters or scenes while keeping the same grounding logic, with configuration and run details available on the agent page in agent page.
Techhalla shares hands-on Kling 2.6 Motion Control dance‑cloning recipe
Kling 2.6 Motion Control dance cloning (Techhalla/Kling): Techhalla walks through a full workflow for cloning a Macarena-style dance onto a different character in Kling 2.6 Motion Control, using a source performance video, a single reference image, and a detailed prompt to hit both body and facial performance, as explained in the step-by-step thread Macarena tutorial and summarized in the follow-up recap workflow recap.

• Practical directing pattern: The recipe shown in Macarena tutorial has creators pick model 2.6, switch to Motion Control mode, upload a dance clip plus a still of the target performer, then drive the result with a descriptive motion prompt—mirroring studio-style retargeting without a mocap stage.
• Validation from Kling: Kling’s own account boosts the tutorial in the quoted thread workflow recap, reinforcing this ref-video-plus-photo setup as a recommended way for filmmakers and performers to get precise, character-specific choreography out of Kling 2.6.
JSON-structured prompts emerge as control hack for Kling 2.6 Motion Control
JSON prompt patterns for Kling 2.6 (CharaspowerAI/Kling): Kling_ai amplifies CharaspowerAI’s claim that JSON-formatted prompts give “incredibly powerful” control over Kling 2.6 Motion Control, letting creators specify motion beats, style, and scene constraints in structured fields instead of loose prose, according to the shared observation in json prompt claim.

• More deterministic motion notes: The JSON approach described in json prompt claim frames movement, timing, and sometimes camera notes as key–value pairs, which practitioners say reduces ambiguity and helps the model honor detailed direction.
• Expressive output examples: Official demos like the "cute kitty dance" loop generated from a character image plus motion template kitty dance demo show how tightly scoped prompts can still produce playful, on-model animation, and JSON structures are being pitched as a way to make that behavior more repeatable for complex shots.
HeyGlif’s Room Renovator agent auto-directs before/after real-estate makeover videos
Room Renovator agent for real-estate clips (HeyGlif): HeyGlif showcases a "Room Renovator" agent that turns real before-and-after room photos into complete renovation-style videos, automatically handling image generation, video sequences, music selection, and final edits from a single high-level request, as demonstrated in the workflow clip room workflow.

• Single-agent production loop: The agent orchestrates still-image creation, transformation into pans and transitions, soundtrack, and pacing in one graph, so the user only provides the renovation idea and source photos while the system outputs a ready-to-share walkthrough, with more details and an interactive interface shown on the Glif page in agent page.
🖼️ One‑image storyboards + reusable styles and prompts
Freepik’s Variations turns a single frame into boards/angles/animations; plus new Midjourney srefs and prompt packs (fog lighting, premium product macro). Strong day for look dev templates.
Freepik’s Variations turns one image into storyboards and variants
Variations (Freepik): Freepik rolled out Variations inside its AI Suite, letting creatives start from a single frame and generate multi‑panel storyboards, alternate camera angles, expression and age changes, and even animated sequences from that one base visual, as shown in the launch demo in the Freepik announcement and the feature rundown by Eugenio Fierro in the Eugenio overview; the feature is positioned as a way to expand one strong keyframe into a whole narrative without constant re‑prompting.

• Storyboard and character work: Variations is pitched for turning a key cinematic frame into full sequences with controlled angle shifts and character tweaks, rather than starting each shot from scratch, according to the Freepik announcement and the workflow notes in the Eugenio overview.
Midjourney sref 4221689279 delivers narrative ink and watercolor sketchbook look
Midjourney sref 4221689279 (Artedeingenio): A second Midjourney style reference, --sref 4221689279, focuses on narrative ink‑and‑watercolor illustration with visible cross‑hatching, paper texture, and cinematic framing, pitched for graphic novel covers, illustrated books, indie posters, and rotoscope‑style shorts in the description and examples in the style thread.
• Visual traits: The samples show tight character portraits with sketchbook borders, rain‑streaked alley scenes, and umbrella close‑ups that mix traditional inking with soft washes, giving a hand‑drawn look that still reads clearly in sequential panels, as seen in the style thread.
Midjourney sref 5321982507 captures dark 90s seinen anime look
Midjourney sref 5321982507 (Artedeingenio): Oscar Artedeingenio introduced Midjourney style reference --sref 5321982507 as a dark seinen anime language with a 90s OVA feel and modern cinematic treatment, stressing high and low camera angles, simulated depth of field, volumetric lighting, and sharp warm‑vs‑cool contrast in the sample gunman, vampire, swordsman, and femme fatale portraits shown in the style announcement.
• Use cases: The creator frames this sref as suited for mature anime storytelling, with expressive violence and dramatic framing that can carry thrillers, action shorts, or moody character studies, according to the notes in the style announcement.
Nano Banana Pro gets reusable premium food macro prompt
Nano Banana Pro prompt (Nano Banana): Prompt designer Azed shared a detailed "premium food advertising" recipe for Nano Banana Pro, specifying a white seamless background, high‑key studio lighting, floating stacked composition, scattered ingredient bits, 100mm macro look at f/8, and 8K output so food products render as crisp levitating stacks with subtle texture, as spelled out in the full text prompt in the prompt breakdown.
Volumetric Fog Lighting prompt offers reusable cinematic mood template
Volumetric Fog Lighting prompt (azed_ai): Azed released a reusable prompt for "Volumetric Fog Lighting" that can wrap any subject—brides, forest spirits, knights, swordsmen—in layered fog, warm backlight rays, and floating particles, creating atmospheric depth with strong directional light and silhouettes, as demonstrated across four cinematic samples in the example set.
• Atmospheric building block: The template focuses on mood ingredients (volumetric rays, dense fog layers, backlit particles) rather than style keywords, so artists can drop in their own subjects and settings while keeping a consistent lighting language, as seen in the example set.
Magnific AI Skin Enhancer workflow shared for refining MJ and Grok portraits
Skin Enhancer workflow (Magnific AI): AI ArtworkGen walked through using Magnific AI’s Skin Enhancer spell as a reusable finishing step, taking a single portrait from Midjourney or Grok and re‑rendering it with more detailed skin, refined grain, and filmic polish by choosing Skin Enhancer → v1 creative or faithful → and tuning sharpness/grain sliders (0%/12% in the example), as outlined in the side‑by‑side comparisons in the workflow overview and the additional Grok and Nano Banana tests in the extra results.
• Template settings: The thread suggests v1 creative for looser beautifying changes and v1 faithful for closer adherence to the input, framing both as plug‑and‑play recipes that can be applied across different generators for consistent portrait upgrades, based on the notes in the summary examples.
Midjourney sref 5548750956 offers bold anime fashion and portrait style
Midjourney sref 5548750956 (azed_ai): Azed debuted Midjourney style reference --sref 5548750956, which renders subjects—from boxers in the ring to Joker‑style clowns, caped walkers, and crowded city commuters—in a cohesive anime/graphic novel look with strong spotlights, deep shadows, and saturated reds and teals, as showcased in the multi‑image gallery in the style gallery.
• Fashion and character fit: The creator nudges people to apply this sref to fashion art, since it holds up across character close‑ups and full‑body silhouettes while keeping backgrounds abstracted and on‑brand in the follow‑up prompt.
Sci‑fi 3×3 grid prompt turns films into cohesive icon sets
Sci‑fi 3×3 icon grid prompt (fofrAI): Fofr highlighted a single prompt that creates 3×3 grids of colorful, tactile 3D icons representing famous sci‑fi movies on a clean white background, with each tile echoing a film like Blade Runner, 2001, The Matrix, or Her while remaining part of one coherent visual system, as demonstrated in the sample grid in the icon grid example.
• Prompt pattern: The shared text asks for "a collection of icons representing interesting scifi movies" in a unified, text‑free, 3D style, which can be adapted by swapping genre or theme to generate consistent icon packs from a single instruction, according to the description in the icon grid example.
Shopify Editions-inspired Midjourney style blends painterly portraits with modern brands
Shopify Editions-inspired style (gcwalther_x): GC Walther surfaced a Midjourney look modeled on Shopify Editions artwork, combining classical painting techniques with modern products—Coke bottles, laptops, sneakers, and fashion—so outputs feel like oil portraits that also act as brand visuals, as illustrated in the four‑image collage in the style collage.
• Brand storytelling angle: The examples show royal‑style drink ads, Renaissance‑meets‑MacBook portraits, flying figures with tiny skateboards, and Nike‑branded headwear, hinting at a reusable template for campaigns that want "museum painting" energy around digital products, per the visuals in the style collage.
🎵 Song generation lands on fal (voice stays separate)
Music creation news for storytellers: ElevenLabs Music becomes available on fal with multilingual vocals and narrative tone sync, plus section‑level editing. Excludes Scribe v2 transcription (covered as the feature).
ElevenLabs Music arrives on fal for text-to-song creation with fine control
ElevenLabs Music (fal): fal has integrated ElevenLabs Music so creators can generate full songs from simple text prompts, with multilingual vocals and narrative tone-sync features exposed directly in fal’s workflow UI, as shown in the launch clip. The integration also highlights section‑by‑section song editing, giving filmmakers and storytellers precise control over structure (verses, choruses, bridges) rather than one monolithic generation, which targets use cases like trailers, UGC spots, and background scores where pacing and emotional beats matter.

📐 True camera angles: multi‑angle LoRA and 3D consistency
For consistent characters across shots: Qwen Image Edit Multiple‑Angles LoRA pipelines and a 96‑pose camera rig trained on Gaussian splats for stronger 3D coherence.
Qwen multi-angle LoRA grows into 96‑pose camera rig and fashion video pipeline
Qwen multi-angle rigs (fal/Qwen): The multi-angle LoRA for Qwen-Image-Edit-2511 is now driving both a practical fashion-video pipeline and a dedicated 96‑pose camera controller, following up on multi-angle lora which first covered the open-sourced Multiple Angles LoRA; fal shows a workflow where Qwen Image 2512 generates base shots, the Multiple Angles 2511 LoRA produces varied viewpoints, and Kling Video compiles them into smooth multi-angle clips built in minutes via the fal API wired into Cursor according to the fal multi-angle demo. Eugenio Fierro highlights a separate Hugging Face Space that adds "real 3D camera control" on top of Alibaba Qwen Image Edit 2511, offering 96 discrete camera poses (4 elevations × 8 azimuths × 3 distances) trained on 3,000+ Gaussian Splatting renders for stronger 3D consistency and good low-angle (−30°) support, as explained in the camera control thread and detailed on the hf camera space.

• Fashion video pipeline: fal’s example sequences Qwen Image 2512 → Multiple Angles 2511 → Kling Video to generate multiple shots with smooth camera rotations from a single fashion prompt, then picks the best samples into a final reel, all orchestrated through a small script that calls fal’s APIs inside Cursor, as shown in the fal multi-angle demo.
• 96-pose camera rig: The new controller exposes a grid of predefined camera positions around a subject—four height tiers, eight around-the-object angles, and three distances—built from thousands of Gaussian-splat renders to keep characters and props consistent frame to frame, with examples emphasizing stable silhouettes even at steep low angles in the camera control thread and on the hf camera space.
• Storyboard and shot design use: Together these tools push the Qwen multi-angle ecosystem from a research LoRA into something storyboard artists and fashion or character directors can use for repeatable shot lists, multi-angle turnarounds, and previsualization that preserves identity across complex camera moves.
🏆 Creator calls: challenges, hackathons, awards, summits
Revenue and discovery ops for creatives: $20K Higgsfield AI‑Cinema contest, PixVerse×GMI hack day, Bionic Awards deadline, and OpenArt’s AI Influencer Summit lineup.
Higgsfield launches $20K AI-Cinema Challenge with Jan 24 deadline
Higgsfield Cinema Challenge (Higgsfield): Higgsfield opened a $20,000 AI‑cinema contest aimed squarely at generative filmmakers, with a $10k top prize for “Most Cinematic” and submissions due January 24, Sunday, EOD PST according to the launch details in the challenge launch and follow‑up reminder in the rules thread. The brief centers on making short films with Higgsfield’s tools, tagging the account, and including a watermark, while a parallel engagement push offers 215 free credits for users who like, retweet, reply, and follow.

• Prize structure and exposure: Awards span $10k/$5k/$3k for the top three entries plus ten $200 "Higgsfield Choice" picks, and every qualifying post must add “@higgsfield.ai #HiggsfieldCinema” in the caption, include the official watermark, and tag @higgsfield_ai, which turns the contest into both a cash opportunity and a discovery surface for AI filmmakers, as laid out in the challenge launch and rules thread.
OpenArt sets Jan 30 AI Influencer Summit in San Francisco
AI Influencer Summit (OpenArt): OpenArt announced what it bills as the world’s first AI Influencer Summit, scheduled for January 30, 2026 in San Francisco, aimed at creators building virtual influencers and brands exploring AI‑native talent, according to the event teaser in the summit announcement. The linked site describes a cross‑industry lineup from Hollywood, Madison Avenue, tech, and leading AI avatar projects like digital supermodels and AI musicians, framed around how AI characters are turning into real media IP and brand partners as outlined in the summit site.

• Discovery and deal flow: The summit’s focus on AI influencers with “massive followers” and “real brand partnerships” positions it as both a networking and education venue where creators of AI characters can learn directly from teams behind prominent avatars, while brands and agencies gauge how fast this space is professionalizing, as promoted in the summit announcement and expanded on the summit site.
PixVerse x GMI SF hackathon offers ~$1k in credits on Jan 17
PixVerse x GMI Hackathon (PixVerse): PixVerse and GMI Studio announced an in‑person hackathon on January 17 at The Hibernia in San Francisco, focused on building AI video projects with PixVerse models running on GMI’s infra as described in the hackathon details. The organizers highlight that no prior video generation experience is required and put up roughly $1,000 in value through 40,000 PixVerse credits (about $500), matching GMI compute credits, and post‑event social and podcast features for standout teams per the hackathon details and prize recap.

• Workflow focus: Participants are asked to 1) use PixVerse’s video generation models, 2) run them on GMI Studio, and 3) create finished videos, effectively turning the event into a day‑long lab for testing end‑to‑end AI video pipelines in a production‑style environment as framed in the hackathon details.
Bionic Awards set Jan 25 deadline for AI creativity submissions
Bionic Awards (Bionic Awards): The Bionic Awards, a program “celebrating creativity in the age of AI,” are in their final call phase, with entries closing on January 25, 2026, as highlighted by judge Uncanny_Harry in the deadline reminder. The event is positioned as a showcase for AI‑driven creative work and is run in partnership with Adobe, with the promo artwork emphasizing themes like “Creators Unleashed” and inviting artists to submit work that blends human and machine creativity.
• Signal for AI creatives: Beyond the deadline nudge, the callout that a practicing AI creative is on the judging panel and that winners will “take the stage” and “make your mark” suggests the awards are not only a prestige play but also a discovery channel for artists working with generative tools, as indicated in the deadline reminder.
🎬 Micro‑shorts and mood studies (Grok + stylized tests)
A steady stream of AI‑native shorts: Grok Imagine action POVs, surreal vignettes, and classic‑cinema homages; plus dark‑fantasy mood tests. Good pulse check on style/motion limits today.
Grok Imagine stress-tested across action, horror, and storm-eye loops
Grok Imagine micro‑shorts (xAI): Creator cfryant keeps pushing Grok Imagine with a run of new surreal 6‑second clips—an image‑plus‑video corridor run‑and‑gun sequence made entirely in Grok according to the description in corridor run, a green tentacled creature writhing on black in tentacle test, and a set of eye‑themed loops that move from a hyper‑detailed iris close‑up in eye closeup to a swirling storm vortex branded GROK in storm vortex; these follow painterly walk short, where earlier tests focused on softer tall‑grass motion rather than fast action or abstract horror.

The clips highlight how Grok Imagine currently favors short, punchy compositions at 416–480p‑class resolutions with strong lighting and texture but limited fine detail, while still handling rapid camera moves, tentacle overlap, and macro cloud motion without obvious subject drift across frames as shown in the horror and eye loops in tentacle test and surreal eye loop.
Dark fantasy knight and Seven Year Itch homage showcase AI micro‑short mood
Stylized mood shorts (Artedeingenio): Artedeingenio posts two non‑Grok AI micro‑shorts that show how far current tools can go on tone and reference work—a dark fantasy teaser titled "The Age of Darkness" where a knight in heavy armor raises a glowing sword in a shadowy hall in dark fantasy clip, and a beat‑for‑beat homage to The Seven Year Itch that recreates the subway‑grate gust lifting a woman’s white dress before cutting to a retro title card in Seven Year Itch homage.

Both clips run roughly 6 seconds and focus more on atmosphere and framing than plot, with the fantasy piece emphasizing volumetric light and silhouette while the Marilyn‑style homage tests how reliably models can emulate specific camera angles, wardrobe, and mid‑century film typography from a single iconic scene as seen in Seven Year Itch homage.

Grok Imagine leans into cartoon wink and skateboarding‑cat gags
Cartoon shorts with Grok Imagine (xAI): Artist Artedeingenio leans into Grok Imagine’s character strengths, sharing a classic cartoon‑style face that winks at the camera in cartoon wink and a text‑to‑video bit where typing “cat on skateboard” cuts straight to a photoreal cat riding a board before the GROK IMAGINE tag appears in skateboard cat; this builds on poetic anime microshorts, where the same creator used Grok for more kinetic anime experiments.

The two clips stay within Grok’s short‑form comfort zone (around 6 seconds per shot) but show it handling both exaggerated 2D cartoon motion and more grounded pet physics, giving storytellers a feel for how far they can push character expressiveness and realism inside the current 480–540px video constraint described in cartoon wink and skateboard cat.

📈 Studios hit the market; open tools hit the stage
Big business pulse: MiniMax’s HKEX debut with a sharp day‑one spike, industry congrats, and ComfyUI’s production reel signaling open‑source in real campaigns. Entertainment panels add context.
MiniMax debuts on HKEX after $619M IPO and 54% day‑one jump
MiniMax HKEX listing (MiniMax): Chinese AI lab MiniMax, which builds the M2.1 text model, Hailuo video generator, audio/music models and an agent stack, listed on the Hong Kong Stock Exchange after a US$619M IPO, with shares jumping as much as 54% in first‑day trading according to the ipo recap. The same breakdown notes MiniMax runs this full multimodal stack with a 389‑person team at around 1% of OpenAI’s spend, positioning it as a lean public benchmark for AI-native studios that still ship state-of-the-art models across text, video, audio and agents ipo recap.
Ecosystem signal (Runware): Infrastructure provider Runware publicly congratulates MiniMax on the HKEX listing and calls it a "big day" that shows AI companies are starting to reach public markets, framing the debut as an industry milestone rather than a one‑off event runware congrats. For AI filmmakers and creative teams already experimenting with Hailuo video or MiniMax speech, this moves a key vendor into the scrutiny and (potential) stability of public-market oversight, with its broader product lineup outlined on the official site in the company homepage.
ComfyUI’s 2025 reel showcases real film, VFX, and brand work
Made with ComfyUI 2025 reel (ComfyUI): The ComfyUI team released a "Made with ComfyUI 2025" montage and blog that reposition the open-source node graph UI as a production-grade creative engine, showing it in use across film, VFX, live visuals, product renders, billboards, and brand work, as highlighted in the reel announcement and the accompanying comfy blog. The thread and post emphasize that three years in, ComfyUI has moved from an experimental playground into real pipelines for studios like Salesforce, Puma x Heliot Emil, Coca-Cola campaigns, and Corridor Crew projects, while inviting the community to share standout work from the past year reel announcement.

Open tool goes pro (hiring): A separate careers note describes ComfyUI as an "operating system" for generative AI and details new roles in frontend, infra, design, and growth, signaling an intent to harden the open stack for large-scale creative production as outlined on the careers page. For AI artists, designers, and filmmakers, this marks one of the clearest cases of a community‑driven, locally runnable tool crossing over into high-end commercial campaigns rather than remaining a hobbyist sandbox.
Promise highlights GenAI-native studios at 1 Billion Followers Summit
GenAI-native studio panel (Promise): Promise, a GenAI-native entertainment studio, took a visible spot at the 1 Billion Followers Summit in Dubai, where cofounder and CEO George Strompolos joined journalist Kara Yurieff on stage to talk about how studios built around generative AI are shaping the future of entertainment, as shown in the panel recap. Co-founder Diesol amplified the appearance as the company "on the big stage in Dubai," underlining that AI-first content studios are now part of mainstream creator-economy conversations rather than fringe tech demos cofounder shoutout.
For AI storytellers and filmmakers, the panel situates GenAI studios like Promise alongside top social and media players at a major global influencer event, signaling that AI-generated characters and shows are being discussed as a core part of tomorrow’s entertainment pipeline, not a separate experimental track.
🛠️ Infra for creative apps: faster RAG + better training ops
Under‑the‑hood gains for app builders: a 2‑call MongoDB+Voyage embedding stack promising lower latency/cost, and Trackio logs now visible on Hugging Face model pages.
MongoDB + Voyage AI collapse RAG stacks to two calls and big cost cuts
MongoDB + Voyage AI stack: A detailed architecture thread shows how pairing Voyage embeddings with MongoDB Atlas vectorSearch turns the classic three-hop RAG stack (LLM → vector DB → OLTP DB) into a two-call flow, cutting typical query latency from ~450 ms to about 140 ms while consolidating vendors and data storage according to the RAG walkthrough in architecture thread. The same setup uses Voyage’s quantization-aware training and binary embeddings to claim 83–99% infrastructure cost reduction versus float32 storage, while keeping vectors and operational data in one MongoDB collection as described in cost breakdown.
• Latency and architecture: The old pattern sends text to OpenAI/Gemini for embeddings, then Pinecone for vector search, then MongoDB/Postgres for metadata; the proposed flow is a Voyage API call for embeddings followed by a single Atlas $vectorSearch query that returns full documents with scores, eliminating extra network hops and ETL bugs where “vector says X but database says Y” per stack comparison.
• Storage and model options: Voyage-3-large’s 32k-token context window, Matryoshka Representation Learning (allowing truncated vectors), and domain-specific models like voyage-law-2, voyage-finance-2, and voyage-code-3 are highlighted as a fit for specialized search across creative assets, scripts, or codebases in cost breakdown; binary quantization is positioned as the path to the quoted 83–99% storage savings on Atlas.
• Operational refresh: The thread notes that Atlas Triggers or Change Streams can call Voyage automatically whenever documents change so embeddings stay fresh without a separate sync job, tightening the loop between content updates and semantic search quality in refresh details.
For teams building creative search (scripts, briefs, shot lists, design assets), the pitch is a simpler RAG pipeline with one database, faster responses, and room to specialize embedding models without adding another vendor, as further unpacked in the linked tutorial in tutorial guide.
Trackio logs now surface directly on Hugging Face Hub model pages
Trackio × Hugging Face Hub: Experiment-tracking tool Trackio now pipes its training logs straight into the Training Metrics section on Hugging Face Hub model pages, so anyone browsing a model can see its tracked loss and metrics without leaving the Hub UI, as shown in the integration clip in integration demo. The example uses an ltx-2-TURBO video model card where Trackio metrics appear inline under the standard Training Metrics tab, making training runs more transparent for downstream users.
• Workflow change: Model authors can associate Trackio runs with a Hub repo and have charts show up directly on the model page, rather than sending teammates to a separate dashboard, according to integration demo; the tweet also links to Trackio’s "get started" docs for wiring this up on new or existing projects.
• Why creatives care: For teams fine‑tuning image/video models or style LoRAs, this makes it easier to inspect how a public checkpoint was trained—epochs, overfitting signals, metric plateaus—before adopting it into a production pipeline, since the metrics live where they already manage models on the Hub.
The move brings experiment tracking closer to model distribution, which can help creative app builders choose and debug checkpoints without juggling extra URLs or guessing at unseen training history.
📚 Papers to bookmark: routing, multi‑reward RL, training scale
Mostly optimization/routing methods relevant to toolchains, not end‑user apps: GDPO stabilizes multi‑reward RL, FusionRoute coordinates token‑level experts, and Learnable Multipliers improve matrix scaling.
FusionRoute coordinates token‑level experts with complementary logits
FusionRoute (token‑level routing): FusionRoute introduces a lightweight router that chooses an expert model at each decoding step and then adds complementary logits from other models to refine the next‑token distribution, aiming to balance efficiency and quality as described in the summary thread and the ArXiv paper. Experiments with Llama‑3 and Gemma backbones show gains on math reasoning, code generation, and instruction following compared with prior token‑level collaboration schemes that rely on fixed expert outputs.
• Routing plus fusion: At every token, the router selects a primary expert based on context, then incorporates adjusted logits from the remaining experts instead of treating them as static black boxes, which the authors argue leads to better decoding policies in the ArXiv paper.
• Theoretical critique: The paper points out that earlier token‑level collaboration methods require strong global coverage assumptions to be optimal, whereas FusionRoute’s adaptive fusion is designed to relax those constraints and improve robustness, according to the summary thread.
• Stack relevance: For creative toolchains that already mix general models with code‑ or math‑specialists, FusionRoute sketches a way to route and blend them at the token level rather than hard‑switching per request, a behavior emphasized in the ArXiv paper.
GDPO stabilizes multi‑reward RL beyond GRPO
GDPO (multi‑reward RL): GDPO proposes group reward‑decoupled normalization to fix the reward‑normalization collapse seen in GRPO when optimizing multiple rewards, aiming for steadier training in complex reasoning tasks according to the paper note and the ArXiv paper. It reports more stable learning dynamics and stronger results than GRPO on a range of reasoning benchmarks by normalizing each reward group independently rather than forcing a single shared baseline.
• Multi‑reward focus: The authors analyze how group relative policy optimization breaks down when several objectives compete, then separate normalization per reward group so no single signal dominates, as detailed in the ArXiv paper.
• RLHF alignment angle: The method explicitly targets multi‑reward RL‑from‑human‑feedback setups that must juggle style, safety, and preference rewards, suggesting GDPO can drop into existing GRPO‑style pipelines with limited code changes per the paper note.
• Relevance for creatives: More reliable multi‑signal RL can help tune assistants that balance creativity, brand tone, and guardrail scores without one reward destabilizing the others, which is the use case emphasized in the ArXiv paper.
Learnable Multipliers decouple matrix scale from weight decay
Learnable Multipliers (training scale): This work argues that standard weight decay traps language model matrices in a suboptimal “WD‑noise equilibrium” and attaches learnable scalar multipliers to each weight matrix so training can discover better global scaling, as summarized in the tweet recap and the ArXiv paper. The approach extends these multipliers to per‑row and per‑column factors, giving finer control over scale and reportedly improving downstream performance without altering the core architecture.
• Weight decay problem: The authors describe how stochastic gradient noise pushes matrix norms outward while weight decay pulls them back, creating a fixed equilibrium that may not match the best representational scale for a layer, according to the ArXiv paper.
• Scalar and structured multipliers: By adding global, row‑wise, or column‑wise learnable multipliers, the method lets optimization tune effective scale separately from raw weights, keeping regularization benefits while freeing layers to adopt more suitable norms per the tweet recap.
• Toolchain impact: For teams training or fine‑tuning creative models, learnable multipliers offer a way to experiment with scale control at the optimizer level rather than via manual LR or WD schedules, which is the practical angle highlighted in the ArXiv paper.
💬 Tool‑preference memes and rumor mill
Discourse itself was newsy: ‘everyone uses Claude’ meme, creators praising Gemini Flash, and rumors of DeepSeek V4 coding gains; plus a creator clapback to ‘AI slop’.
“Everyone uses Claude” meme reinforces cross-lab reliance narrative
Claude usage meme (Anthropic): A viral post claims that employees at Google, xAI, OpenAI and Meta "use Claude" and concludes that "everyone uses Claude," pulling 1,500+ likes and turning tool preference into a running joke about cross-lab dependence on Anthropic’s assistant, according to the Claude adoption quip. For AI creatives, this meme functions as soft social proof that Claude has become a default tab for many practitioners, even inside labs that are supposed to champion their own models, and it keeps Anthropic top of mind in the informal "what do you actually use" conversation.
DeepSeek V4 rumor teases stronger coding than current GPT and Claude
DeepSeek V4 (DeepSeek): A rumor thread claims DeepSeek V4 is "coming" and that people close to the launch say it shows stronger coding performance than current GPT and Claude models, without yet sharing evals or demos, as stated in the Deepseek rumor post. The same post frames this as a coming shift in the coding-model pecking order and uses a reaction image to underline the surprise factor at a smaller player potentially beating frontier labs on code generation and reasoning.
If the claim holds up once benchmarks and real projects appear, this would add another contender to the short list of models that AI developers and technical filmmakers rely on for complex scripting, tooling, and pipeline automation.
Creator sentiment swings toward Gemini 3.0 Flash for day-to-day use
Gemini 3.0 Flash (Google): A creator with a large AI-focused audience says they "fell in love" with Gemini 3.0 Flash today, signaling that Google’s speed-focused model is starting to displace incumbents in some personal workflows, as expressed in the Gemini flash praise. The post does not include benchmarks or side-by-side comparisons, but it adds to a pattern of builders publicly rotating between Claude, GPT, and now Gemini Flash as their primary assistant, which matters for creatives choosing which model to lean on for scripting, prompt-writing, and quick iteration.
Techhalla clip pushes back on “AI slop” criticism and centers makers
AI slop discourse (Techhalla): Creator Techhalla posts a short video telling critics they can "keep calling it 'AI slop' or face reality," with bold on-screen text contrasting "THEY AREN'T DOING IT" against "WE ARE," making the case that active makers—not spectators—are shaping what AI media becomes, as shown in the Makers vs critics caption and the
. The clip reinforces an emerging divide in the community narrative—between people experimenting daily with tools like Kling, Grok Imagine, and Nano Banana workflows, and those dismissing generative work wholesale—which matters for creatives who worry about stigma around AI-assisted art and film.
Tweet not found
The embedded tweet could not be found…


