Skip to content
Jason Peterson Jason Peterson
Go back

Making AI Music Scores with Gemini, Kling, and Claude Code

Google just shipped music generation in Gemini via Lyria 3. You can prompt it with text, images, or video and it composes a 30-second track. I wanted to test how well it scores moody, cinematic scenes.


The Videos

Generated with Kling 3.0 (Kuaishou) via fal.ai. Three scenes: a slow push through an empty 70s house, a Severance-style corporate corridor with a lone figure, and a woman crying in blue TV light. Kling is pricey — about $3.36 for 10 seconds with audio — but the cinematic quality is genuinely impressive. It also generates its own ambient audio.


The Music

Fed each scene to Gemini on the web to generate scores. Key finding: video input didn’t seem to actually influence the output much. What worked was image + text prompt — extracting a center frame from each video and pairing it with a detailed mood/style prompt. Thinking mode produced noticeably better results.

Getting Gemini to go sparse took iteration — it defaults to overcomposing. Prompts that explicitly say “no melody, no rhythm, no percussion” and describe silence and space worked best.

The 70s house is my favorite — the score landed exactly what I had in my head. Eerie, spare, perfectly matched to the slow drift through that wood-paneled hallway. First try.


The Mix

Claude Code handled all the ffmpeg work — extracting frames, mixing Kling’s ambient audio (ducked to 25%) with Gemini’s score, dialing in offsets to find the right section of each 30-second track, and adding fade in/out. The whole mixing workflow was conversational: “offset by 14s,” “too busy, try again,” “2s offset for this one.”


The Result

Three short AI films, zero manual audio editing, made with a video model, a music model, and an AI coding assistant as the mixing board.



Next Post
I Ran My Own Multi-Agent Swarm in 60 Seconds