Fix & update to complete flow to build traduction

2025-10-24 10:11:20 -07:00 · 2025-10-24 10:11:20 -07:00 · 293007db64
commit 293007db64
parent 85691f13dc
28 changed files with 713 additions and 199 deletions
--- a/EXAMPLES.md
+++ b/EXAMPLES.md
@ -0,0 +1,48 @@
+EXAMPLES - Pipeline Whisper + Kokoro TTS
+
+Ejemplos de uso (desde la raíz del repo, usando el venv .venv):
+
+1) Dry-run (muestra los comandos que se ejecutarían):
+
+.venv/bin/python whisper_project/run_full_pipeline.py \
+  --video dailyrutines.mp4 \
+  --kokoro-endpoint "https://kokoro.bfzqqk.easypanel.host/api/v1/audio/speech" \
+  --kokoro-key "$KOKORO_TOKEN" --voice em_alex \
+  --whisper-model base --dry-run
+
+2) Ejecución completa (reemplaza el audio):
+
+.venv/bin/python whisper_project/run_full_pipeline.py \
+  --video dailyrutines.mp4 \
+  --kokoro-endpoint "https://kokoro.bfzqqk.easypanel.host/api/v1/audio/speech" \
+  --kokoro-key "$KOKORO_TOKEN" --voice em_alex \
+  --whisper-model base
+
+3) Usar un SRT ya generado (evita transcribir):
+
+.venv/bin/python whisper_project/run_full_pipeline.py \
+  --video dailyrutines.mp4 --srt subs_en.srt \
+  --kokoro-endpoint "https://kokoro..." --kokoro-key "$KOKORO_TOKEN" --voice em_alex
+
+4) Traducir con Gemini (si tienes clave) o usar fallback local:
+
+# Usar Gemini (requiere --gemini-key o la variable GEMINI_API_KEY)
+.venv/bin/python whisper_project/run_full_pipeline.py \
+  --video dailyrutines.mp4 --translate-method gemini --gemini-key "$GEMINI_KEY" \
+  --kokoro-endpoint "https://kokoro..." --kokoro-key "$KOKORO_TOKEN" --voice em_alex
+
+# Forzar traducción local (MarianMT):
+.venv/bin/python whisper_project/run_full_pipeline.py \
+  --video dailyrutines.mp4 --translate-method local \
+  --kokoro-endpoint "https://kokoro..." --kokoro-key "$KOKORO_TOKEN" --voice em_alex
+
+5) Mezclar (mix) en lugar de reemplazar:
+
+.venv/bin/python whisper_project/run_full_pipeline.py \
+  --video dailyrutines.mp4 --mix --mix-background-volume 0.3 \
+  --kokoro-endpoint "https://kokoro..." --kokoro-key "$KOKORO_TOKEN" --voice em_alex
+
+Notas:
+- Si algo falla con Gemini, el pipeline soporta fallback a la traducción local.
+- Usa --keep-temp y/o --keep-chunks para inspeccionar los WAV intermedios.
+- Ajusta --whisper-model a "base", "small", "medium" según recursos.
--- a/README.md
+++ b/README.md
@ -0,0 +1,75 @@
+# Whisper dubbing pipeline
+
+Proyecto con utilidades para transcribir, traducir y doblar vídeos por segmentos usando Whisper + TTS (Kokoro). Está pensado para ejecutar dentro de un entorno virtual Python y con `ffmpeg` disponible en PATH.
+
+Contenido principal
+- `whisper_project/transcribe.py` - transcribe audio a SRT (faster-whisper backend recomendado).
+- `whisper_project/translate_srt_local.py` - traduce SRT localmente con MarianMT (Helsinki-NLP/opus-mt-en-es).
+- `whisper_project/srt_to_kokoro.py` - sintetiza cada segmento del SRT usando un endpoint TTS compatible (Kokoro), alinea, concatena y opcionalmente mezcla/reemplaza audio en el vídeo.
+- `whisper_project/run_full_pipeline.py` - orquestador "todo en uno" para extraer, transcribir (si hace falta), traducir y sintetizar + quemar subtítulos.
+
+Requisitos
+- Python 3.10+ (se recomienda usar el `.venv` del proyecto)
+- ffmpeg y ffprobe en PATH
+- Paquetes Python (instala en el venv):
+  - requests, srt, transformers, sentencepiece, torch (si usas MarianMT en CPU), etc.
+
+Uso recomendado (ejemplos)
+
+1) Ejecutar en dry-run para ver los comandos que se ejecutarán:
+
+```bash
+.venv/bin/python whisper_project/run_full_pipeline.py \
+  --video dailyrutines.mp4 \
+  --kokoro-endpoint "https://kokoro.example/api/v1/audio/speech" \
+  --kokoro-key "$KOKORO_TOKEN" \
+  --voice em_alex \
+  --whisper-model base \
+  --dry-run
+```
+
+2) Ejecutar la canalización real (traducción local y reemplazo de la pista de audio):
+
+```bash
+.venv/bin/python whisper_project/run_full_pipeline.py \
+  --video dailyrutines.mp4 \
+  --kokoro-endpoint "https://kokoro.example/api/v1/audio/speech" \
+  --kokoro-key "$KOKORO_TOKEN" \
+  --voice em_alex \
+  --whisper-model base
+```
+
+Flags importantes del orquestador (`run_full_pipeline.py`)
+- `--translate-method` : `local` | `gemini` | `none`. Por defecto `local` (MarianMT). Si eliges `gemini` necesitas `--gemini-key`.
+- `--gemini-key` : API key para Gemini (si usas `--translate-method=gemini`).
+- `--mix` : en lugar de reemplazar, mezcla el audio sintetizado con la pista original. Ajusta volumen de fondo con `--mix-background-volume`.
+- `--mix-background-volume` : volumen de la pista original cuando se mezclan (0.0 - 1.0).
+- `--keep-chunks` : conserva los WAV por segmento (útil para debugging).
+- `--keep-temp` : no borra el directorio temporal final (conserva `dub_final.wav` y chunks si `--keep-chunks`).
+- `--dry-run` : sólo muestra los comandos que se ejecutarían.
+
+Uso directo de `srt_to_kokoro.py` (si ya tienes un SRT traducido)
+
+```bash
+.venv/bin/python whisper_project/srt_to_kokoro.py \
+  --srt translated.srt \
+  --endpoint "https://kokoro.example/api/v1/audio/speech" \
+  --payload-template '{"model":"model","voice":"em_alex","input":"{text}","response_format":"wav"}' \
+  --api-key "$KOKORO_TOKEN" \
+  --out out.wav \
+  --video input.mp4 --align --replace-original
+```
+
+Notas y troubleshooting
+- Si el endpoint TTS devuelve `400 Bad Request` suele ser por quoting/format del `--payload-template`. `run_full_pipeline.py` ya maneja el quoting para el caso común.
+- Si `ffmpeg` muestra mensajes sobre "Too many bits" o "clamping" al crear el AAC, es una advertencia por bitrate; el MP4 suele generarse correctamente.
+- Si la síntesis remota falla por autenticación, revisa la clave (`--kokoro-key`) o usa `--translate-method local` y prueba con un proveedor de TTS alternativo en `srt_to_kokoro.py`.
+
+Siguientes mejoras sugeridas
+- Validar que `--mix` y `--replace-original` no se usen simultáneamente y añadir una opción explícita mutuamente exclusiva.
+- Añadir soporte para más backends de TTS (local TTS, Whisper TTS engines, o Argos local si se desea).
+
+Licencia y seguridad
+- Este repositorio contiene scripts de ejemplo. Cuida tus claves API y no las subas a repositorios públicos.
+
+Si quieres, añado ejemplos concretos de comandos con `--mix` o con `--keep-temp` y un breve archivo `EXAMPLES.md` con variantes más avanzadas.
--- a/dailyrutines.dubbed.es.aligned.mp4
+++ b/dailyrutines.dubbed.es.aligned.mp4
--- a/dailyrutines.dubbed.es.mixed.mp4
+++ b/dailyrutines.dubbed.es.mixed.mp4
--- a/dailyrutines.dubbed.es.mp4
+++ b/dailyrutines.dubbed.es.mp4
--- a/dailyrutines.dubbed.es.subs.mp4
+++ b/dailyrutines.dubbed.es.subs.mp4
--- a/dailyrutines.dubbed.gemini.mp4
+++ b/dailyrutines.dubbed.gemini.mp4
--- a/dailyrutines.dubbed.mp4
+++ b/dailyrutines.dubbed.mp4
--- a/dailyrutines.replaced_audio.mp4
+++ b/dailyrutines.replaced_audio.mp4
--- a/dailyrutines.replaced_audio.subs.mp4
+++ b/dailyrutines.replaced_audio.subs.mp4
--- a/output/dailyrutines.replaced_audio.subs.mp4
+++ b/output/dailyrutines.replaced_audio.subs.mp4
--- a/whisper_project/coqui_test.wav
+++ b/whisper_project/coqui_test.wav
--- a/whisper_project/dailyrutines.audio.srt
+++ b/whisper_project/dailyrutines.audio.srt
@ -1,56 +0,0 @@
-1
-00:00:00,000 --> 00:00:10,000
-
-2
-00:00:10,000 --> 00:00:20,000
-
-3
-00:00:20,000 --> 00:00:30,000
-
-4
-00:00:30,000 --> 00:00:40,000
-
-5
-00:00:40,000 --> 00:00:50,000
-
-6
-00:00:50,000 --> 00:01:00,000
-
-7
-00:01:00,000 --> 00:01:10,000
-
-8
-00:01:10,000 --> 00:01:20,000
-
-9
-00:01:20,000 --> 00:01:30,000
-
-10
-00:01:30,000 --> 00:01:40,000
-
-11
-00:01:40,000 --> 00:01:50,000
-
-12
-00:01:50,000 --> 00:02:00,000
-
-13
-00:02:00,000 --> 00:02:10,000
-
-14
-00:02:10,000 --> 00:02:20,000
-
-15
-00:02:20,000 --> 00:02:30,000
-
-16
-00:02:30,000 --> 00:02:40,000
-
-17
-00:02:40,000 --> 00:02:50,000
-
-18
-00:02:50,000 --> 00:03:00,000
-
-19
-00:03:00,000 --> 00:03:09,009
--- a/whisper_project/dailyrutines.audio.wav
+++ b/whisper_project/dailyrutines.audio.wav
--- a/whisper_project/dailyrutines.kokoro.api.wav
+++ b/whisper_project/dailyrutines.kokoro.api.wav
--- a/whisper_project/dailyrutines.kokoro.dub.es.aligned.wav
+++ b/whisper_project/dailyrutines.kokoro.dub.es.aligned.wav
--- a/whisper_project/dailyrutines.kokoro.dub.es.srt
+++ b/whisper_project/dailyrutines.kokoro.dub.es.srt
@ -1,72 +0,0 @@
-1
-00:00:00,000 --> 00:00:06,960
-Rutinas diarias
-
-2
-00:00:06,960 --> 00:00:14,480
-Hola mamá, estoy disfrutando la vida en Nueva Zelanda.
-
-3
-00:00:14,480 --> 00:00:19,240
-El campo es tan hermoso.
-
-4
-00:00:19,240 --> 00:00:23,199
-Mi rutina es diferente ahora.
-
-5
-00:00:23,199 --> 00:00:29,960
-Me despierto a las 6 en punto cada mañana y salgo a correr.
-
-6
-00:00:29,960 --> 00:00:36,640
-A las 7 en punto desayuno.
-
-7
-00:00:36,640 --> 00:00:42,120
-El café en Nueva Zelanda es tan bueno.
-
-8
-00:00:42,120 --> 00:00:46,240
-A las 8 voy a trabajar.
-
-9
-00:00:46,240 --> 00:00:52,679
-Normalmente tomo el autobús, pero a veces camino.
-
-10
-00:00:52,679 --> 00:00:57,439
-Empiezo a trabajar a las 9.
-
-11
-00:00:57,439 --> 00:01:02,399
-Trabajo en mi oficina hasta la hora del almuerzo.
-
-12
-00:01:02,399 --> 00:01:08,920
-A las 12 almuerzo con mis colegas en el parque.
-
-13
-00:01:08,920 --> 00:01:15,239
-Es agradable disfrutar del aire fresco y charlar juntos.
-
-14
-00:01:15,239 --> 00:01:23,759
-A las 5 salgo del trabajo y voy al gimnasio.
-
-15
-00:01:23,760 --> 00:01:32,920
-Hago ejercicio hasta las seis y luego voy a casa.
-
-16
-00:01:32,920 --> 00:01:39,520
-A las 8 ceno, luego me relajo.
-
-17
-00:01:39,520 --> 00:01:44,800
-I normally go to bed at 11 o'clock.
-
-18
-00:01:44,799 --> 00:01:51,799
-Hasta pronto, Stephen.
-
--- a/whisper_project/dailyrutines.kokoro.dub.es.wav
+++ b/whisper_project/dailyrutines.kokoro.dub.es.wav
--- a/whisper_project/dailyrutines.kokoro.dub.srt
+++ b/whisper_project/dailyrutines.kokoro.dub.srt
@ -1,71 +0,0 @@
-1
-00:00:00,000 --> 00:00:06,960
-Dayly routines
-
-2
-00:00:06,960 --> 00:00:14,480
-Hi mom, I'm enjoying life in New Zealand.
-
-3
-00:00:14,480 --> 00:00:19,240
-The countryside is so beautiful.
-
-4
-00:00:19,240 --> 00:00:23,199
-My routine is different now.
-
-5
-00:00:23,199 --> 00:00:29,960
-I wake at 6 o'clock every morning and go for a run.
-
-6
-00:00:29,960 --> 00:00:36,640
-At 7 o'clock I have breakfast.
-
-7
-00:00:36,640 --> 00:00:42,120
-The coffee in New Zealand is so good.
-
-8
-00:00:42,120 --> 00:00:46,240
-At 8 o'clock I go to work.
-
-9
-00:00:46,240 --> 00:00:52,679
-I usually take the bus, but sometimes I walk.
-
-10
-00:00:52,679 --> 00:00:57,439
-I start work at 9 o'clock.
-
-11
-00:00:57,439 --> 00:01:02,399
-I work in my office until lunchtime.
-
-12
-00:01:02,399 --> 00:01:08,920
-At 12 o'clock I have lunch with my colleagues in the park.
-
-13
-00:01:08,920 --> 00:01:15,239
-It's nice to enjoy the fresh air and chat together.
-
-14
-00:01:15,239 --> 00:01:23,759
-At 5 o'clock I leave work and go to the gym.
-
-15
-00:01:23,760 --> 00:01:32,920
-I exercise until 6 o'clock and then go home.
-
-16
-00:01:32,920 --> 00:01:39,520
-At 8 o'clock I eat dinner, then relax.
-
-17
-00:01:39,520 --> 00:01:44,800
-I normally go to bed at 11 o'clock.
-
-18
-00:01:44,799 --> 00:01:51,799
-See you soon, Stephen.
--- a/whisper_project/dailyrutines.kokoro.dub.wav
+++ b/whisper_project/dailyrutines.kokoro.dub.wav
--- a/whisper_project/dub_female_clone_es.wav
+++ b/whisper_project/dub_female_clone_es.wav
--- a/whisper_project/dub_male_clone_ptbr.wav
+++ b/whisper_project/dub_male_clone_ptbr.wav
--- a/whisper_project/dub_male_style.wav
+++ b/whisper_project/dub_male_style.wav
--- a/whisper_project/dub_male_style_out.wav
+++ b/whisper_project/dub_male_style_out.wav
--- a/whisper_project/ref_female_es.wav
+++ b/whisper_project/ref_female_es.wav
--- a/whisper_project/run_full_pipeline.py
+++ b/whisper_project/run_full_pipeline.py
@ -0,0 +1,449 @@
+#!/usr/bin/env python3
+# Orquesta: transcripción -> traducción -> síntesis por segmento -> reemplazo/mezcla -> quemado de subtítulos
+
+import argparse
+import os
+import shlex
+import shutil
+import subprocess
+import sys
+import tempfile
+
+
+def run(cmd, dry_run=False, env=None):
+    # Ejecuta un comando. Acepta str (ejecuta vía shell) o list (sin shell).
+    # Imprime el comando de forma segura para copiar/pegar. Si dry_run=True
+    # no ejecuta nada.
+    if isinstance(cmd, (list, tuple)):
+        printable = " ".join(shlex.quote(str(x)) for x in cmd)
+    else:
+        printable = cmd
+    print("+", printable)
+    if dry_run:
+        return 0
+    if isinstance(cmd, (list, tuple)):
+        return subprocess.run(cmd, shell=False, check=True, env=env)
+    return subprocess.run(cmd, shell=True, check=True, env=env)
+
+
+def json_payload_template(model, voice):
+    # Payload JSON con {text} como placeholder que acepta srt_to_kokoro
+    return '{"model":"' + model + '","voice":"' + voice + '","input":"{text}","response_format":"wav"}'
+
+
+def main():
+    p = argparse.ArgumentParser()
+    p.add_argument("--video", required=True, help="Vídeo de entrada")
+    p.add_argument(
+        "--srt",
+        help=("SRT de entrada (si ya existe). Si no, se transcribe del audio"),
+    )
+    p.add_argument("--kokoro-endpoint", required=True, help="URL del endpoint TTS")
+    p.add_argument("--kokoro-key", required=True, help="API key para Kokoro")
+    p.add_argument("--voice", default="em_alex", help="Nombre de voz (p.ej. em_alex)")
+    p.add_argument("--kokoro-model", default="model", help="ID del modelo Kokoro")
+    p.add_argument("--whisper-model", default="base", help="Modelo de Whisper para transcribir")
+    p.add_argument("--out", default=None, help="Vídeo de salida final (opcional)")
+    p.add_argument(
+        "--translate-method",
+        choices=["local", "gemini", "none"],
+        default="local",
+        help=(
+            "Método para traducir el SRT: 'local' (MarianMT), 'gemini' (API)"
+            " o 'none' (usar SRT proporcionado)"
+        ),
+    )
+    p.add_argument("--gemini-key", default=None, help="API key para Gemini (si aplica)")
+    p.add_argument(
+        "--mix",
+        action="store_true",
+        help="Mezclar el audio sintetizado con la pista original en lugar de reemplazarla",
+    )
+    p.add_argument(
+        "--mix-background-volume",
+        type=float,
+        default=0.2,
+        help="Volumen de la pista original al mezclar (0.0-1.0)",
+    )
+    p.add_argument(
+        "--keep-chunks",
+        action="store_true",
+        help="Conservar los archivos de chunks generados por la síntesis (debug)",
+    )
+    p.add_argument(
+        "--keep-temp",
+        action="store_true",
+        help="No borrar el directorio temporal de trabajo al terminar",
+    )
+    p.add_argument("--dry-run", action="store_true", help="Solo mostrar comandos sin ejecutar")
+    args = p.parse_args()
+
+    video = os.path.abspath(args.video)
+    if not os.path.exists(video):
+        print("Vídeo no encontrado:", video, file=sys.stderr)
+        sys.exit(2)
+
+    workdir = tempfile.mkdtemp(prefix="full_pipeline_")
+    try:
+        # 1) obtener SRT: si no se pasa, extraer audio y transcribir
+        if args.srt:
+            srt_in = os.path.abspath(args.srt)
+            print("Usando SRT proporcionado:", srt_in)
+        else:
+            audio_tmp = os.path.join(workdir, "extracted_audio.wav")
+            cmd_extract = [
+                "ffmpeg",
+                "-y",
+                "-i",
+                video,
+                "-vn",
+                "-acodec",
+                "pcm_s16le",
+                "-ar",
+                "16000",
+                "-ac",
+                "1",
+                audio_tmp,
+            ]
+            run(cmd_extract, dry_run=args.dry_run)
+
+            # llamar al script transcribe.py para generar SRT
+            srt_in = os.path.join(workdir, "transcribed.srt")
+            cmd_trans = [
+                sys.executable,
+                "whisper_project/transcribe.py",
+                "--file",
+                audio_tmp,
+                "--backend",
+                "faster-whisper",
+                "--model",
+                args.whisper_model,
+                "--srt",
+                "--srt-file",
+                srt_in,
+            ]
+            run(cmd_trans, dry_run=args.dry_run)
+
+        # 2) traducir SRT según método elegido
+        srt_translated = os.path.join(workdir, "translated.srt")
+        if args.translate_method == "local":
+            cmd_translate = [
+                sys.executable,
+                "whisper_project/translate_srt_local.py",
+                "--in",
+                srt_in,
+                "--out",
+                srt_translated,
+            ]
+            run(cmd_translate, dry_run=args.dry_run)
+        elif args.translate_method == "gemini":
+            gem_key = args.gemini_key or os.environ.get("GEMINI_API_KEY")
+            if not gem_key:
+                print(
+                    "--translate-method=gemini requiere --gemini-key o la var de entorno GEMINI_API_KEY",
+                    file=sys.stderr,
+                )
+                sys.exit(4)
+            cmd_translate = [
+                sys.executable,
+                "whisper_project/translate_srt_with_gemini.py",
+                "--in",
+                srt_in,
+                "--out",
+                srt_translated,
+                "--gemini-api-key",
+                gem_key,
+            ]
+            run(cmd_translate, dry_run=args.dry_run)
+        else:
+            # none: usar SRT tal cual
+            srt_translated = srt_in
+
+        # 3) sintetizar por segmento con Kokoro, alinear, concatenar y
+        #    reemplazar o mezclar audio en el vídeo
+        dub_wav = os.path.join(workdir, "dub_final.wav")
+        payload = json_payload_template(args.kokoro_model, args.voice)
+        synth_cmd = [
+            sys.executable,
+            "whisper_project/srt_to_kokoro.py",
+            "--srt",
+            srt_translated,
+            "--endpoint",
+            args.kokoro_endpoint,
+            "--payload-template",
+            payload,
+            "--api-key",
+            args.kokoro_key,
+            "--out",
+            dub_wav,
+            "--video",
+            video,
+            "--align",
+        ]
+        if args.keep_chunks:
+            synth_cmd.append("--keep-chunks")
+        if args.mix:
+            synth_cmd += ["--mix-with-original", "--mix-background-volume", str(args.mix_background_volume)]
+        else:
+            synth_cmd.append("--replace-original")
+
+        run(synth_cmd, dry_run=args.dry_run)
+
+        # 4) quemar SRT en vídeo resultante
+        out_video = args.out if args.out else os.path.splitext(video)[0] + ".replaced_audio.subs.mp4"
+        replaced_src = os.path.splitext(video)[0] + ".replaced_audio.mp4"
+        # build filter string
+        vf = f"subtitles={srt_translated}:force_style='FontName=Arial,FontSize=24'"
+        cmd_burn = [
+            "ffmpeg",
+            "-y",
+            "-i",
+            replaced_src,
+            "-vf",
+            vf,
+            "-c:a",
+            "copy",
+            out_video,
+        ]
+        run(cmd_burn, dry_run=args.dry_run)
+
+        print("Flujo completado. Vídeo final:", out_video)
+
+    finally:
+        if args.dry_run:
+            print("(dry-run) leaving workdir:", workdir)
+        else:
+            if not args.keep_temp:
+                try:
+                    shutil.rmtree(workdir)
+                except Exception:
+                    pass
+
+
+if __name__ == '__main__':
+    main()
+#!/usr/bin/env python3
+# run_full_pipeline.py
+# Orquesta: transcripción -> traducción -> síntesis por segmento -> reemplazo/mezcla -> quemado de subtítulos
+
+import argparse
+import os
+import shlex
+import shutil
+import subprocess
+import sys
+import tempfile
+
+
+def run(cmd, dry_run=False, env=None):
+    # Ejecuta un comando. Acepta str (ejecuta vía shell) o list (sin shell).
+    # Imprime el comando de forma segura para copiar/pegar. Si dry_run=True
+    # no ejecuta nada.
+    if isinstance(cmd, (list, tuple)):
+        printable = " ".join(shlex.quote(str(x)) for x in cmd)
+    else:
+        printable = cmd
+    print("+", printable)
+    if dry_run:
+        return 0
+    if isinstance(cmd, (list, tuple)):
+        return subprocess.run(cmd, shell=False, check=True, env=env)
+    return subprocess.run(cmd, shell=True, check=True, env=env)
+
+
+def json_payload_template(model, voice):
+    # Payload JSON con {text} como placeholder que acepta srt_to_kokoro
+    return '{"model":"' + model + '","voice":"' + voice + '","input":"{text}","response_format":"wav"}'
+
+
+def main():
+    p = argparse.ArgumentParser()
+    p.add_argument("--video", required=True, help="Vídeo de entrada")
+    p.add_argument(
+        "--srt",
+        help=("SRT de entrada (si ya existe). Si no, se transcribe del audio"),
+    )
+    p.add_argument("--kokoro-endpoint", required=True, help="URL del endpoint TTS")
+    p.add_argument("--kokoro-key", required=True, help="API key para Kokoro")
+    p.add_argument("--voice", default="em_alex", help="Nombre de voz (p.ej. em_alex)")
+    p.add_argument("--kokoro-model", default="model", help="ID del modelo Kokoro")
+    p.add_argument("--whisper-model", default="base", help="Modelo de Whisper para transcribir")
+    p.add_argument("--out", default=None, help="Vídeo de salida final (opcional)")
+    p.add_argument(
+        "--translate-method",
+        choices=["local", "gemini", "none"],
+        default="local",
+        help=(
+            "Método para traducir el SRT: 'local' (MarianMT), 'gemini' (API)"
+            " o 'none' (usar SRT proporcionado)"
+        ),
+    )
+    p.add_argument("--gemini-key", default=None, help="API key para Gemini (si aplica)")
+    p.add_argument(
+        "--mix",
+        action="store_true",
+        help="Mezclar el audio sintetizado con la pista original en lugar de reemplazarla",
+    )
+    p.add_argument(
+        "--mix-background-volume",
+        type=float,
+        default=0.2,
+        help="Volumen de la pista original al mezclar (0.0-1.0)",
+    )
+    p.add_argument(
+        "--keep-chunks",
+        action="store_true",
+        help="Conservar los archivos de chunks generados por la síntesis (debug)",
+    )
+    p.add_argument(
+        "--keep-temp",
+        action="store_true",
+        help="No borrar el directorio temporal de trabajo al terminar",
+    )
+    p.add_argument("--dry-run", action="store_true", help="Solo mostrar comandos sin ejecutar")
+    args = p.parse_args()
+
+    video = os.path.abspath(args.video)
+    if not os.path.exists(video):
+        print("Vídeo no encontrado:", video, file=sys.stderr)
+        sys.exit(2)
+
+    workdir = tempfile.mkdtemp(prefix="full_pipeline_")
+    try:
+        # 1) obtener SRT: si no se pasa, extraer audio y transcribir
+        if args.srt:
+            srt_in = os.path.abspath(args.srt)
+            print("Usando SRT proporcionado:", srt_in)
+        else:
+            audio_tmp = os.path.join(workdir, "extracted_audio.wav")
+            cmd_extract = [
+                "ffmpeg",
+                "-y",
+                "-i",
+                video,
+                "-vn",
+                "-acodec",
+                "pcm_s16le",
+                "-ar",
+                "16000",
+                "-ac",
+                "1",
+                audio_tmp,
+            ]
+            run(cmd_extract, dry_run=args.dry_run)
+
+            # llamar al script transcribe.py para generar SRT
+            srt_in = os.path.join(workdir, "transcribed.srt")
+            cmd_trans = [
+                sys.executable,
+                "whisper_project/transcribe.py",
+                "--file",
+                audio_tmp,
+                "--backend",
+                "faster-whisper",
+                "--model",
+                args.whisper_model,
+                "--srt",
+                "--srt-file",
+                srt_in,
+            ]
+            run(cmd_trans, dry_run=args.dry_run)
+
+        # 2) traducir SRT según método elegido
+        srt_translated = os.path.join(workdir, "translated.srt")
+        if args.translate_method == "local":
+            cmd_translate = [
+                sys.executable,
+                "whisper_project/translate_srt_local.py",
+                "--in",
+                srt_in,
+                "--out",
+                srt_translated,
+            ]
+            run(cmd_translate, dry_run=args.dry_run)
+        elif args.translate_method == "gemini":
+            gem_key = args.gemini_key or os.environ.get("GEMINI_API_KEY")
+            if not gem_key:
+                print(
+                    "--translate-method=gemini requiere --gemini-key o la var de entorno GEMINI_API_KEY",
+                    file=sys.stderr,
+                )
+                sys.exit(4)
+            cmd_translate = [
+                sys.executable,
+                "whisper_project/translate_srt_with_gemini.py",
+                "--in",
+                srt_in,
+                "--out",
+                srt_translated,
+                "--gemini-api-key",
+                gem_key,
+            ]
+            run(cmd_translate, dry_run=args.dry_run)
+        else:
+            # none: usar SRT tal cual
+            srt_translated = srt_in
+
+        # 3) sintetizar por segmento con Kokoro, alinear, concatenar y
+        #    reemplazar o mezclar audio en el vídeo
+        dub_wav = os.path.join(workdir, "dub_final.wav")
+        payload = json_payload_template(args.kokoro_model, args.voice)
+        synth_cmd = [
+            sys.executable,
+            "whisper_project/srt_to_kokoro.py",
+            "--srt",
+            srt_translated,
+            "--endpoint",
+            args.kokoro_endpoint,
+            "--payload-template",
+            payload,
+            "--api-key",
+            args.kokoro_key,
+            "--out",
+            dub_wav,
+            "--video",
+            video,
+            "--align",
+        ]
+        if args.keep_chunks:
+            synth_cmd.append("--keep-chunks")
+        if args.mix:
+            synth_cmd += ["--mix-with-original", "--mix-background-volume", str(args.mix_background_volume)]
+        else:
+            synth_cmd.append("--replace-original")
+
+        run(synth_cmd, dry_run=args.dry_run)
+
+        # 4) quemar SRT en vídeo resultante
+        out_video = args.out if args.out else os.path.splitext(video)[0] + ".replaced_audio.subs.mp4"
+        replaced_src = os.path.splitext(video)[0] + ".replaced_audio.mp4"
+        # build filter string
+        vf = f"subtitles={srt_translated}:force_style='FontName=Arial,FontSize=24'"
+        cmd_burn = [
+            "ffmpeg",
+            "-y",
+            "-i",
+            replaced_src,
+            "-vf",
+            vf,
+            "-c:a",
+            "copy",
+            out_video,
+        ]
+        run(cmd_burn, dry_run=args.dry_run)
+
+        print("Flujo completado. Vídeo final:", out_video)
+
+    finally:
+        if args.dry_run:
+            print("(dry-run) leaving workdir:", workdir)
+        else:
+            if not args.keep_temp:
+                try:
+                    shutil.rmtree(workdir)
+                except Exception:
+                    pass
+
+
+if __name__ == '__main__':
+    main()
--- a/whisper_project/translate_srt_argos.py
+++ b/whisper_project/translate_srt_argos.py
@ -0,0 +1,84 @@
+#!/usr/bin/env python3
+"""translate_srt_argos.py
+Traduce un .srt localmente usando Argos Translate (más ligero que transformers/torch).
+Instala automáticamente el paquete en caso de no existir.
+
+Uso:
+  source .venv/bin/activate
+  python3 whisper_project/translate_srt_argos.py --in in.srt --out out.srt
+
+Requisitos: argostranslate (el script intentará instalarlo si no está presente)
+"""
+import argparse
+import srt
+import tempfile
+import os
+
+try:
+    from argostranslate import package, translate
+except Exception:
+    raise
+
+
+def ensure_en_es_package():
+    installed = package.get_installed_packages()
+    for p in installed:
+        if p.from_code == 'en' and p.to_code == 'es':
+            return True
+    # Si no está instalado, buscar disponible y descargar
+    avail = package.get_available_packages()
+    for p in avail:
+        if p.from_code == 'en' and p.to_code == 'es':
+            print('Descargando paquete Argos en->es...')
+            download_path = tempfile.mktemp(suffix='.zip')
+            try:
+                import requests
+
+                with requests.get(p.download_url, stream=True, timeout=60) as r:
+                    r.raise_for_status()
+                    with open(download_path, 'wb') as fh:
+                        for chunk in r.iter_content(chunk_size=8192):
+                            if chunk:
+                                fh.write(chunk)
+                # instalar desde el zip descargado
+                package.install_from_path(download_path)
+                return True
+            except Exception as e:
+                print(f"Error descargando/instalando paquete Argos: {e}")
+            finally:
+                try:
+                    if os.path.exists(download_path):
+                        os.remove(download_path)
+                except Exception:
+                    pass
+    return False
+
+
+def translate_srt(in_path: str, out_path: str):
+    with open(in_path, 'r', encoding='utf-8') as fh:
+        subs = list(srt.parse(fh.read()))
+
+    # Asegurar paquete en->es
+    ok = ensure_en_es_package()
+    if not ok:
+        raise SystemExit('No se encontró paquete Argos en->es y no se pudo descargar')
+
+    for i, sub in enumerate(subs, start=1):
+        text = sub.content.strip()
+        if not text:
+            continue
+        tr = translate.translate(text, 'en', 'es')
+        sub.content = tr
+        print(f'Translated {i}/{len(subs)}')
+
+    with open(out_path, 'w', encoding='utf-8') as fh:
+        fh.write(srt.compose(subs))
+    print(f'Wrote translated SRT to: {out_path}')
+
+
+if __name__ == '__main__':
+    p = argparse.ArgumentParser()
+    p.add_argument('--in', dest='in_srt', required=True)
+    p.add_argument('--out', dest='out_srt', required=True)
+    args = p.parse_args()
+    translate_srt(args.in_srt, args.out_srt)
--- a/whisper_project/translate_srt_local.py
+++ b/whisper_project/translate_srt_local.py
@ -0,0 +1,57 @@
+#!/usr/bin/env python3
+"""translate_srt_local.py
+Traduce un .srt localmente usando MarianMT (Helsinki-NLP/opus-mt-en-es).
+
+Uso:
+  source .venv/bin/activate
+  python3 whisper_project/translate_srt_local.py --in path/to/in.srt --out path/to/out.srt
+
+Requisitos: transformers, sentencepiece, srt
+"""
+import argparse
+import srt
+from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
+
+
+def translate_srt(in_path: str, out_path: str, model_name: str = "Helsinki-NLP/opus-mt-en-es", batch_size: int = 8):
+    with open(in_path, "r", encoding="utf-8") as f:
+        subs = list(srt.parse(f.read()))
+
+    # Cargar modelo y tokenizador
+    tok = AutoTokenizer.from_pretrained(model_name)
+    model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
+
+    texts = [sub.content.strip() for sub in subs]
+    translated = []
+
+    for i in range(0, len(texts), batch_size):
+        batch = texts[i:i+batch_size]
+        # tokenizar
+        enc = tok(batch, return_tensors="pt", padding=True, truncation=True)
+        outs = model.generate(**enc, max_length=512)
+        outs_decoded = tok.batch_decode(outs, skip_special_tokens=True)
+        translated.extend(outs_decoded)
+
+    # Asignar traducidos
+    for sub, t in zip(subs, translated):
+        sub.content = t.strip()
+
+    with open(out_path, "w", encoding="utf-8") as f:
+        f.write(srt.compose(subs))
+
+    print(f"SRT traducido guardado en: {out_path}")
+
+
+def main():
+    p = argparse.ArgumentParser()
+    p.add_argument("--in", dest="in_srt", required=True)
+    p.add_argument("--out", dest="out_srt", required=True)
+    p.add_argument("--model", default="Helsinki-NLP/opus-mt-en-es")
+    p.add_argument("--batch-size", dest="batch_size", type=int, default=8)
+    args = p.parse_args()
+
+    translate_srt(args.in_srt, args.out_srt, model_name=args.model, batch_size=args.batch_size)
+
+
+if __name__ == '__main__':
+    main()