diff --git a/EXAMPLES.md b/EXAMPLES.md new file mode 100644 index 0000000..d0de9cd --- /dev/null +++ b/EXAMPLES.md @@ -0,0 +1,48 @@ +EXAMPLES - Pipeline Whisper + Kokoro TTS + +Ejemplos de uso (desde la raíz del repo, usando el venv .venv): + +1) Dry-run (muestra los comandos que se ejecutarían): + +.venv/bin/python whisper_project/run_full_pipeline.py \ + --video dailyrutines.mp4 \ + --kokoro-endpoint "https://kokoro.bfzqqk.easypanel.host/api/v1/audio/speech" \ + --kokoro-key "$KOKORO_TOKEN" --voice em_alex \ + --whisper-model base --dry-run + +2) Ejecución completa (reemplaza el audio): + +.venv/bin/python whisper_project/run_full_pipeline.py \ + --video dailyrutines.mp4 \ + --kokoro-endpoint "https://kokoro.bfzqqk.easypanel.host/api/v1/audio/speech" \ + --kokoro-key "$KOKORO_TOKEN" --voice em_alex \ + --whisper-model base + +3) Usar un SRT ya generado (evita transcribir): + +.venv/bin/python whisper_project/run_full_pipeline.py \ + --video dailyrutines.mp4 --srt subs_en.srt \ + --kokoro-endpoint "https://kokoro..." --kokoro-key "$KOKORO_TOKEN" --voice em_alex + +4) Traducir con Gemini (si tienes clave) o usar fallback local: + +# Usar Gemini (requiere --gemini-key o la variable GEMINI_API_KEY) +.venv/bin/python whisper_project/run_full_pipeline.py \ + --video dailyrutines.mp4 --translate-method gemini --gemini-key "$GEMINI_KEY" \ + --kokoro-endpoint "https://kokoro..." --kokoro-key "$KOKORO_TOKEN" --voice em_alex + +# Forzar traducción local (MarianMT): +.venv/bin/python whisper_project/run_full_pipeline.py \ + --video dailyrutines.mp4 --translate-method local \ + --kokoro-endpoint "https://kokoro..." --kokoro-key "$KOKORO_TOKEN" --voice em_alex + +5) Mezclar (mix) en lugar de reemplazar: + +.venv/bin/python whisper_project/run_full_pipeline.py \ + --video dailyrutines.mp4 --mix --mix-background-volume 0.3 \ + --kokoro-endpoint "https://kokoro..." --kokoro-key "$KOKORO_TOKEN" --voice em_alex + +Notas: +- Si algo falla con Gemini, el pipeline soporta fallback a la traducción local. +- Usa --keep-temp y/o --keep-chunks para inspeccionar los WAV intermedios. +- Ajusta --whisper-model a "base", "small", "medium" según recursos. diff --git a/README.md b/README.md new file mode 100644 index 0000000..ec931be --- /dev/null +++ b/README.md @@ -0,0 +1,75 @@ +# Whisper dubbing pipeline + +Proyecto con utilidades para transcribir, traducir y doblar vídeos por segmentos usando Whisper + TTS (Kokoro). Está pensado para ejecutar dentro de un entorno virtual Python y con `ffmpeg` disponible en PATH. + +Contenido principal +- `whisper_project/transcribe.py` - transcribe audio a SRT (faster-whisper backend recomendado). +- `whisper_project/translate_srt_local.py` - traduce SRT localmente con MarianMT (Helsinki-NLP/opus-mt-en-es). +- `whisper_project/srt_to_kokoro.py` - sintetiza cada segmento del SRT usando un endpoint TTS compatible (Kokoro), alinea, concatena y opcionalmente mezcla/reemplaza audio en el vídeo. +- `whisper_project/run_full_pipeline.py` - orquestador "todo en uno" para extraer, transcribir (si hace falta), traducir y sintetizar + quemar subtítulos. + +Requisitos +- Python 3.10+ (se recomienda usar el `.venv` del proyecto) +- ffmpeg y ffprobe en PATH +- Paquetes Python (instala en el venv): + - requests, srt, transformers, sentencepiece, torch (si usas MarianMT en CPU), etc. + +Uso recomendado (ejemplos) + +1) Ejecutar en dry-run para ver los comandos que se ejecutarán: + +```bash +.venv/bin/python whisper_project/run_full_pipeline.py \ + --video dailyrutines.mp4 \ + --kokoro-endpoint "https://kokoro.example/api/v1/audio/speech" \ + --kokoro-key "$KOKORO_TOKEN" \ + --voice em_alex \ + --whisper-model base \ + --dry-run +``` + +2) Ejecutar la canalización real (traducción local y reemplazo de la pista de audio): + +```bash +.venv/bin/python whisper_project/run_full_pipeline.py \ + --video dailyrutines.mp4 \ + --kokoro-endpoint "https://kokoro.example/api/v1/audio/speech" \ + --kokoro-key "$KOKORO_TOKEN" \ + --voice em_alex \ + --whisper-model base +``` + +Flags importantes del orquestador (`run_full_pipeline.py`) +- `--translate-method` : `local` | `gemini` | `none`. Por defecto `local` (MarianMT). Si eliges `gemini` necesitas `--gemini-key`. +- `--gemini-key` : API key para Gemini (si usas `--translate-method=gemini`). +- `--mix` : en lugar de reemplazar, mezcla el audio sintetizado con la pista original. Ajusta volumen de fondo con `--mix-background-volume`. +- `--mix-background-volume` : volumen de la pista original cuando se mezclan (0.0 - 1.0). +- `--keep-chunks` : conserva los WAV por segmento (útil para debugging). +- `--keep-temp` : no borra el directorio temporal final (conserva `dub_final.wav` y chunks si `--keep-chunks`). +- `--dry-run` : sólo muestra los comandos que se ejecutarían. + +Uso directo de `srt_to_kokoro.py` (si ya tienes un SRT traducido) + +```bash +.venv/bin/python whisper_project/srt_to_kokoro.py \ + --srt translated.srt \ + --endpoint "https://kokoro.example/api/v1/audio/speech" \ + --payload-template '{"model":"model","voice":"em_alex","input":"{text}","response_format":"wav"}' \ + --api-key "$KOKORO_TOKEN" \ + --out out.wav \ + --video input.mp4 --align --replace-original +``` + +Notas y troubleshooting +- Si el endpoint TTS devuelve `400 Bad Request` suele ser por quoting/format del `--payload-template`. `run_full_pipeline.py` ya maneja el quoting para el caso común. +- Si `ffmpeg` muestra mensajes sobre "Too many bits" o "clamping" al crear el AAC, es una advertencia por bitrate; el MP4 suele generarse correctamente. +- Si la síntesis remota falla por autenticación, revisa la clave (`--kokoro-key`) o usa `--translate-method local` y prueba con un proveedor de TTS alternativo en `srt_to_kokoro.py`. + +Siguientes mejoras sugeridas +- Validar que `--mix` y `--replace-original` no se usen simultáneamente y añadir una opción explícita mutuamente exclusiva. +- Añadir soporte para más backends de TTS (local TTS, Whisper TTS engines, o Argos local si se desea). + +Licencia y seguridad +- Este repositorio contiene scripts de ejemplo. Cuida tus claves API y no las subas a repositorios públicos. + +Si quieres, añado ejemplos concretos de comandos con `--mix` o con `--keep-temp` y un breve archivo `EXAMPLES.md` con variantes más avanzadas. \ No newline at end of file diff --git a/dailyrutines.dubbed.es.aligned.mp4 b/dailyrutines.dubbed.es.aligned.mp4 deleted file mode 100644 index e9e6788..0000000 Binary files a/dailyrutines.dubbed.es.aligned.mp4 and /dev/null differ diff --git a/dailyrutines.dubbed.es.mixed.mp4 b/dailyrutines.dubbed.es.mixed.mp4 deleted file mode 100644 index c0093b8..0000000 Binary files a/dailyrutines.dubbed.es.mixed.mp4 and /dev/null differ diff --git a/dailyrutines.dubbed.es.mp4 b/dailyrutines.dubbed.es.mp4 deleted file mode 100644 index b145e57..0000000 Binary files a/dailyrutines.dubbed.es.mp4 and /dev/null differ diff --git a/dailyrutines.dubbed.es.subs.mp4 b/dailyrutines.dubbed.es.subs.mp4 deleted file mode 100644 index 2d6711d..0000000 Binary files a/dailyrutines.dubbed.es.subs.mp4 and /dev/null differ diff --git a/dailyrutines.dubbed.gemini.mp4 b/dailyrutines.dubbed.gemini.mp4 deleted file mode 100644 index af6611f..0000000 Binary files a/dailyrutines.dubbed.gemini.mp4 and /dev/null differ diff --git a/dailyrutines.dubbed.mp4 b/dailyrutines.dubbed.mp4 deleted file mode 100644 index d3ec20a..0000000 Binary files a/dailyrutines.dubbed.mp4 and /dev/null differ diff --git a/dailyrutines.replaced_audio.mp4 b/dailyrutines.replaced_audio.mp4 deleted file mode 100644 index bb4f5f8..0000000 Binary files a/dailyrutines.replaced_audio.mp4 and /dev/null differ diff --git a/dailyrutines.replaced_audio.subs.mp4 b/dailyrutines.replaced_audio.subs.mp4 deleted file mode 100644 index b805003..0000000 Binary files a/dailyrutines.replaced_audio.subs.mp4 and /dev/null differ diff --git a/dailyrutines.dubbed.es.mixed.subs.mp4 b/output/dailyrutines.replaced_audio.subs.mp4 similarity index 65% rename from dailyrutines.dubbed.es.mixed.subs.mp4 rename to output/dailyrutines.replaced_audio.subs.mp4 index 3476d44..2cf0174 100644 Binary files a/dailyrutines.dubbed.es.mixed.subs.mp4 and b/output/dailyrutines.replaced_audio.subs.mp4 differ diff --git a/whisper_project/coqui_test.wav b/whisper_project/coqui_test.wav deleted file mode 100644 index 31df696..0000000 Binary files a/whisper_project/coqui_test.wav and /dev/null differ diff --git a/whisper_project/dailyrutines.audio.srt b/whisper_project/dailyrutines.audio.srt deleted file mode 100644 index 32a1260..0000000 --- a/whisper_project/dailyrutines.audio.srt +++ /dev/null @@ -1,56 +0,0 @@ -1 -00:00:00,000 --> 00:00:10,000 - -2 -00:00:10,000 --> 00:00:20,000 - -3 -00:00:20,000 --> 00:00:30,000 - -4 -00:00:30,000 --> 00:00:40,000 - -5 -00:00:40,000 --> 00:00:50,000 - -6 -00:00:50,000 --> 00:01:00,000 - -7 -00:01:00,000 --> 00:01:10,000 - -8 -00:01:10,000 --> 00:01:20,000 - -9 -00:01:20,000 --> 00:01:30,000 - -10 -00:01:30,000 --> 00:01:40,000 - -11 -00:01:40,000 --> 00:01:50,000 - -12 -00:01:50,000 --> 00:02:00,000 - -13 -00:02:00,000 --> 00:02:10,000 - -14 -00:02:10,000 --> 00:02:20,000 - -15 -00:02:20,000 --> 00:02:30,000 - -16 -00:02:30,000 --> 00:02:40,000 - -17 -00:02:40,000 --> 00:02:50,000 - -18 -00:02:50,000 --> 00:03:00,000 - -19 -00:03:00,000 --> 00:03:09,009 diff --git a/whisper_project/dailyrutines.audio.wav b/whisper_project/dailyrutines.audio.wav deleted file mode 100644 index 10fc45b..0000000 Binary files a/whisper_project/dailyrutines.audio.wav and /dev/null differ diff --git a/whisper_project/dailyrutines.kokoro.api.wav b/whisper_project/dailyrutines.kokoro.api.wav deleted file mode 100644 index bab2a43..0000000 Binary files a/whisper_project/dailyrutines.kokoro.api.wav and /dev/null differ diff --git a/whisper_project/dailyrutines.kokoro.dub.es.aligned.wav b/whisper_project/dailyrutines.kokoro.dub.es.aligned.wav deleted file mode 100644 index 82a4016..0000000 Binary files a/whisper_project/dailyrutines.kokoro.dub.es.aligned.wav and /dev/null differ diff --git a/whisper_project/dailyrutines.kokoro.dub.es.srt b/whisper_project/dailyrutines.kokoro.dub.es.srt deleted file mode 100644 index 3de095f..0000000 --- a/whisper_project/dailyrutines.kokoro.dub.es.srt +++ /dev/null @@ -1,72 +0,0 @@ -1 -00:00:00,000 --> 00:00:06,960 -Rutinas diarias - -2 -00:00:06,960 --> 00:00:14,480 -Hola mamá, estoy disfrutando la vida en Nueva Zelanda. - -3 -00:00:14,480 --> 00:00:19,240 -El campo es tan hermoso. - -4 -00:00:19,240 --> 00:00:23,199 -Mi rutina es diferente ahora. - -5 -00:00:23,199 --> 00:00:29,960 -Me despierto a las 6 en punto cada mañana y salgo a correr. - -6 -00:00:29,960 --> 00:00:36,640 -A las 7 en punto desayuno. - -7 -00:00:36,640 --> 00:00:42,120 -El café en Nueva Zelanda es tan bueno. - -8 -00:00:42,120 --> 00:00:46,240 -A las 8 voy a trabajar. - -9 -00:00:46,240 --> 00:00:52,679 -Normalmente tomo el autobús, pero a veces camino. - -10 -00:00:52,679 --> 00:00:57,439 -Empiezo a trabajar a las 9. - -11 -00:00:57,439 --> 00:01:02,399 -Trabajo en mi oficina hasta la hora del almuerzo. - -12 -00:01:02,399 --> 00:01:08,920 -A las 12 almuerzo con mis colegas en el parque. - -13 -00:01:08,920 --> 00:01:15,239 -Es agradable disfrutar del aire fresco y charlar juntos. - -14 -00:01:15,239 --> 00:01:23,759 -A las 5 salgo del trabajo y voy al gimnasio. - -15 -00:01:23,760 --> 00:01:32,920 -Hago ejercicio hasta las seis y luego voy a casa. - -16 -00:01:32,920 --> 00:01:39,520 -A las 8 ceno, luego me relajo. - -17 -00:01:39,520 --> 00:01:44,800 -I normally go to bed at 11 o'clock. - -18 -00:01:44,799 --> 00:01:51,799 -Hasta pronto, Stephen. - diff --git a/whisper_project/dailyrutines.kokoro.dub.es.wav b/whisper_project/dailyrutines.kokoro.dub.es.wav deleted file mode 100644 index 7b9fe62..0000000 Binary files a/whisper_project/dailyrutines.kokoro.dub.es.wav and /dev/null differ diff --git a/whisper_project/dailyrutines.kokoro.dub.srt b/whisper_project/dailyrutines.kokoro.dub.srt deleted file mode 100644 index 12f584a..0000000 --- a/whisper_project/dailyrutines.kokoro.dub.srt +++ /dev/null @@ -1,71 +0,0 @@ -1 -00:00:00,000 --> 00:00:06,960 -Dayly routines - -2 -00:00:06,960 --> 00:00:14,480 -Hi mom, I'm enjoying life in New Zealand. - -3 -00:00:14,480 --> 00:00:19,240 -The countryside is so beautiful. - -4 -00:00:19,240 --> 00:00:23,199 -My routine is different now. - -5 -00:00:23,199 --> 00:00:29,960 -I wake at 6 o'clock every morning and go for a run. - -6 -00:00:29,960 --> 00:00:36,640 -At 7 o'clock I have breakfast. - -7 -00:00:36,640 --> 00:00:42,120 -The coffee in New Zealand is so good. - -8 -00:00:42,120 --> 00:00:46,240 -At 8 o'clock I go to work. - -9 -00:00:46,240 --> 00:00:52,679 -I usually take the bus, but sometimes I walk. - -10 -00:00:52,679 --> 00:00:57,439 -I start work at 9 o'clock. - -11 -00:00:57,439 --> 00:01:02,399 -I work in my office until lunchtime. - -12 -00:01:02,399 --> 00:01:08,920 -At 12 o'clock I have lunch with my colleagues in the park. - -13 -00:01:08,920 --> 00:01:15,239 -It's nice to enjoy the fresh air and chat together. - -14 -00:01:15,239 --> 00:01:23,759 -At 5 o'clock I leave work and go to the gym. - -15 -00:01:23,760 --> 00:01:32,920 -I exercise until 6 o'clock and then go home. - -16 -00:01:32,920 --> 00:01:39,520 -At 8 o'clock I eat dinner, then relax. - -17 -00:01:39,520 --> 00:01:44,800 -I normally go to bed at 11 o'clock. - -18 -00:01:44,799 --> 00:01:51,799 -See you soon, Stephen. diff --git a/whisper_project/dailyrutines.kokoro.dub.wav b/whisper_project/dailyrutines.kokoro.dub.wav deleted file mode 100644 index 42cc20a..0000000 Binary files a/whisper_project/dailyrutines.kokoro.dub.wav and /dev/null differ diff --git a/whisper_project/dub_female_clone_es.wav b/whisper_project/dub_female_clone_es.wav deleted file mode 100644 index b3645ac..0000000 Binary files a/whisper_project/dub_female_clone_es.wav and /dev/null differ diff --git a/whisper_project/dub_male_clone_ptbr.wav b/whisper_project/dub_male_clone_ptbr.wav deleted file mode 100644 index 42b3b57..0000000 Binary files a/whisper_project/dub_male_clone_ptbr.wav and /dev/null differ diff --git a/whisper_project/dub_male_style.wav b/whisper_project/dub_male_style.wav deleted file mode 100644 index 28202e3..0000000 Binary files a/whisper_project/dub_male_style.wav and /dev/null differ diff --git a/whisper_project/dub_male_style_out.wav b/whisper_project/dub_male_style_out.wav deleted file mode 100644 index 014aad0..0000000 Binary files a/whisper_project/dub_male_style_out.wav and /dev/null differ diff --git a/whisper_project/ref_female_es.wav b/whisper_project/ref_female_es.wav deleted file mode 100644 index 839bd04..0000000 Binary files a/whisper_project/ref_female_es.wav and /dev/null differ diff --git a/whisper_project/run_full_pipeline.py b/whisper_project/run_full_pipeline.py new file mode 100644 index 0000000..4e31f18 --- /dev/null +++ b/whisper_project/run_full_pipeline.py @@ -0,0 +1,449 @@ +#!/usr/bin/env python3 +# Orquesta: transcripción -> traducción -> síntesis por segmento -> reemplazo/mezcla -> quemado de subtítulos + +import argparse +import os +import shlex +import shutil +import subprocess +import sys +import tempfile + + +def run(cmd, dry_run=False, env=None): + # Ejecuta un comando. Acepta str (ejecuta vía shell) o list (sin shell). + # Imprime el comando de forma segura para copiar/pegar. Si dry_run=True + # no ejecuta nada. + if isinstance(cmd, (list, tuple)): + printable = " ".join(shlex.quote(str(x)) for x in cmd) + else: + printable = cmd + print("+", printable) + if dry_run: + return 0 + if isinstance(cmd, (list, tuple)): + return subprocess.run(cmd, shell=False, check=True, env=env) + return subprocess.run(cmd, shell=True, check=True, env=env) + + +def json_payload_template(model, voice): + # Payload JSON con {text} como placeholder que acepta srt_to_kokoro + return '{"model":"' + model + '","voice":"' + voice + '","input":"{text}","response_format":"wav"}' + + +def main(): + p = argparse.ArgumentParser() + p.add_argument("--video", required=True, help="Vídeo de entrada") + p.add_argument( + "--srt", + help=("SRT de entrada (si ya existe). Si no, se transcribe del audio"), + ) + p.add_argument("--kokoro-endpoint", required=True, help="URL del endpoint TTS") + p.add_argument("--kokoro-key", required=True, help="API key para Kokoro") + p.add_argument("--voice", default="em_alex", help="Nombre de voz (p.ej. em_alex)") + p.add_argument("--kokoro-model", default="model", help="ID del modelo Kokoro") + p.add_argument("--whisper-model", default="base", help="Modelo de Whisper para transcribir") + p.add_argument("--out", default=None, help="Vídeo de salida final (opcional)") + p.add_argument( + "--translate-method", + choices=["local", "gemini", "none"], + default="local", + help=( + "Método para traducir el SRT: 'local' (MarianMT), 'gemini' (API)" + " o 'none' (usar SRT proporcionado)" + ), + ) + p.add_argument("--gemini-key", default=None, help="API key para Gemini (si aplica)") + p.add_argument( + "--mix", + action="store_true", + help="Mezclar el audio sintetizado con la pista original en lugar de reemplazarla", + ) + p.add_argument( + "--mix-background-volume", + type=float, + default=0.2, + help="Volumen de la pista original al mezclar (0.0-1.0)", + ) + p.add_argument( + "--keep-chunks", + action="store_true", + help="Conservar los archivos de chunks generados por la síntesis (debug)", + ) + p.add_argument( + "--keep-temp", + action="store_true", + help="No borrar el directorio temporal de trabajo al terminar", + ) + p.add_argument("--dry-run", action="store_true", help="Solo mostrar comandos sin ejecutar") + args = p.parse_args() + + video = os.path.abspath(args.video) + if not os.path.exists(video): + print("Vídeo no encontrado:", video, file=sys.stderr) + sys.exit(2) + + workdir = tempfile.mkdtemp(prefix="full_pipeline_") + try: + # 1) obtener SRT: si no se pasa, extraer audio y transcribir + if args.srt: + srt_in = os.path.abspath(args.srt) + print("Usando SRT proporcionado:", srt_in) + else: + audio_tmp = os.path.join(workdir, "extracted_audio.wav") + cmd_extract = [ + "ffmpeg", + "-y", + "-i", + video, + "-vn", + "-acodec", + "pcm_s16le", + "-ar", + "16000", + "-ac", + "1", + audio_tmp, + ] + run(cmd_extract, dry_run=args.dry_run) + + # llamar al script transcribe.py para generar SRT + srt_in = os.path.join(workdir, "transcribed.srt") + cmd_trans = [ + sys.executable, + "whisper_project/transcribe.py", + "--file", + audio_tmp, + "--backend", + "faster-whisper", + "--model", + args.whisper_model, + "--srt", + "--srt-file", + srt_in, + ] + run(cmd_trans, dry_run=args.dry_run) + + # 2) traducir SRT según método elegido + srt_translated = os.path.join(workdir, "translated.srt") + if args.translate_method == "local": + cmd_translate = [ + sys.executable, + "whisper_project/translate_srt_local.py", + "--in", + srt_in, + "--out", + srt_translated, + ] + run(cmd_translate, dry_run=args.dry_run) + elif args.translate_method == "gemini": + gem_key = args.gemini_key or os.environ.get("GEMINI_API_KEY") + if not gem_key: + print( + "--translate-method=gemini requiere --gemini-key o la var de entorno GEMINI_API_KEY", + file=sys.stderr, + ) + sys.exit(4) + cmd_translate = [ + sys.executable, + "whisper_project/translate_srt_with_gemini.py", + "--in", + srt_in, + "--out", + srt_translated, + "--gemini-api-key", + gem_key, + ] + run(cmd_translate, dry_run=args.dry_run) + else: + # none: usar SRT tal cual + srt_translated = srt_in + + # 3) sintetizar por segmento con Kokoro, alinear, concatenar y + # reemplazar o mezclar audio en el vídeo + dub_wav = os.path.join(workdir, "dub_final.wav") + payload = json_payload_template(args.kokoro_model, args.voice) + synth_cmd = [ + sys.executable, + "whisper_project/srt_to_kokoro.py", + "--srt", + srt_translated, + "--endpoint", + args.kokoro_endpoint, + "--payload-template", + payload, + "--api-key", + args.kokoro_key, + "--out", + dub_wav, + "--video", + video, + "--align", + ] + if args.keep_chunks: + synth_cmd.append("--keep-chunks") + if args.mix: + synth_cmd += ["--mix-with-original", "--mix-background-volume", str(args.mix_background_volume)] + else: + synth_cmd.append("--replace-original") + + run(synth_cmd, dry_run=args.dry_run) + + # 4) quemar SRT en vídeo resultante + out_video = args.out if args.out else os.path.splitext(video)[0] + ".replaced_audio.subs.mp4" + replaced_src = os.path.splitext(video)[0] + ".replaced_audio.mp4" + # build filter string + vf = f"subtitles={srt_translated}:force_style='FontName=Arial,FontSize=24'" + cmd_burn = [ + "ffmpeg", + "-y", + "-i", + replaced_src, + "-vf", + vf, + "-c:a", + "copy", + out_video, + ] + run(cmd_burn, dry_run=args.dry_run) + + print("Flujo completado. Vídeo final:", out_video) + + finally: + if args.dry_run: + print("(dry-run) leaving workdir:", workdir) + else: + if not args.keep_temp: + try: + shutil.rmtree(workdir) + except Exception: + pass + + +if __name__ == '__main__': + main() +#!/usr/bin/env python3 +# run_full_pipeline.py +# Orquesta: transcripción -> traducción -> síntesis por segmento -> reemplazo/mezcla -> quemado de subtítulos + +import argparse +import os +import shlex +import shutil +import subprocess +import sys +import tempfile + + +def run(cmd, dry_run=False, env=None): + # Ejecuta un comando. Acepta str (ejecuta vía shell) o list (sin shell). + # Imprime el comando de forma segura para copiar/pegar. Si dry_run=True + # no ejecuta nada. + if isinstance(cmd, (list, tuple)): + printable = " ".join(shlex.quote(str(x)) for x in cmd) + else: + printable = cmd + print("+", printable) + if dry_run: + return 0 + if isinstance(cmd, (list, tuple)): + return subprocess.run(cmd, shell=False, check=True, env=env) + return subprocess.run(cmd, shell=True, check=True, env=env) + + +def json_payload_template(model, voice): + # Payload JSON con {text} como placeholder que acepta srt_to_kokoro + return '{"model":"' + model + '","voice":"' + voice + '","input":"{text}","response_format":"wav"}' + + +def main(): + p = argparse.ArgumentParser() + p.add_argument("--video", required=True, help="Vídeo de entrada") + p.add_argument( + "--srt", + help=("SRT de entrada (si ya existe). Si no, se transcribe del audio"), + ) + p.add_argument("--kokoro-endpoint", required=True, help="URL del endpoint TTS") + p.add_argument("--kokoro-key", required=True, help="API key para Kokoro") + p.add_argument("--voice", default="em_alex", help="Nombre de voz (p.ej. em_alex)") + p.add_argument("--kokoro-model", default="model", help="ID del modelo Kokoro") + p.add_argument("--whisper-model", default="base", help="Modelo de Whisper para transcribir") + p.add_argument("--out", default=None, help="Vídeo de salida final (opcional)") + p.add_argument( + "--translate-method", + choices=["local", "gemini", "none"], + default="local", + help=( + "Método para traducir el SRT: 'local' (MarianMT), 'gemini' (API)" + " o 'none' (usar SRT proporcionado)" + ), + ) + p.add_argument("--gemini-key", default=None, help="API key para Gemini (si aplica)") + p.add_argument( + "--mix", + action="store_true", + help="Mezclar el audio sintetizado con la pista original en lugar de reemplazarla", + ) + p.add_argument( + "--mix-background-volume", + type=float, + default=0.2, + help="Volumen de la pista original al mezclar (0.0-1.0)", + ) + p.add_argument( + "--keep-chunks", + action="store_true", + help="Conservar los archivos de chunks generados por la síntesis (debug)", + ) + p.add_argument( + "--keep-temp", + action="store_true", + help="No borrar el directorio temporal de trabajo al terminar", + ) + p.add_argument("--dry-run", action="store_true", help="Solo mostrar comandos sin ejecutar") + args = p.parse_args() + + video = os.path.abspath(args.video) + if not os.path.exists(video): + print("Vídeo no encontrado:", video, file=sys.stderr) + sys.exit(2) + + workdir = tempfile.mkdtemp(prefix="full_pipeline_") + try: + # 1) obtener SRT: si no se pasa, extraer audio y transcribir + if args.srt: + srt_in = os.path.abspath(args.srt) + print("Usando SRT proporcionado:", srt_in) + else: + audio_tmp = os.path.join(workdir, "extracted_audio.wav") + cmd_extract = [ + "ffmpeg", + "-y", + "-i", + video, + "-vn", + "-acodec", + "pcm_s16le", + "-ar", + "16000", + "-ac", + "1", + audio_tmp, + ] + run(cmd_extract, dry_run=args.dry_run) + + # llamar al script transcribe.py para generar SRT + srt_in = os.path.join(workdir, "transcribed.srt") + cmd_trans = [ + sys.executable, + "whisper_project/transcribe.py", + "--file", + audio_tmp, + "--backend", + "faster-whisper", + "--model", + args.whisper_model, + "--srt", + "--srt-file", + srt_in, + ] + run(cmd_trans, dry_run=args.dry_run) + + # 2) traducir SRT según método elegido + srt_translated = os.path.join(workdir, "translated.srt") + if args.translate_method == "local": + cmd_translate = [ + sys.executable, + "whisper_project/translate_srt_local.py", + "--in", + srt_in, + "--out", + srt_translated, + ] + run(cmd_translate, dry_run=args.dry_run) + elif args.translate_method == "gemini": + gem_key = args.gemini_key or os.environ.get("GEMINI_API_KEY") + if not gem_key: + print( + "--translate-method=gemini requiere --gemini-key o la var de entorno GEMINI_API_KEY", + file=sys.stderr, + ) + sys.exit(4) + cmd_translate = [ + sys.executable, + "whisper_project/translate_srt_with_gemini.py", + "--in", + srt_in, + "--out", + srt_translated, + "--gemini-api-key", + gem_key, + ] + run(cmd_translate, dry_run=args.dry_run) + else: + # none: usar SRT tal cual + srt_translated = srt_in + + # 3) sintetizar por segmento con Kokoro, alinear, concatenar y + # reemplazar o mezclar audio en el vídeo + dub_wav = os.path.join(workdir, "dub_final.wav") + payload = json_payload_template(args.kokoro_model, args.voice) + synth_cmd = [ + sys.executable, + "whisper_project/srt_to_kokoro.py", + "--srt", + srt_translated, + "--endpoint", + args.kokoro_endpoint, + "--payload-template", + payload, + "--api-key", + args.kokoro_key, + "--out", + dub_wav, + "--video", + video, + "--align", + ] + if args.keep_chunks: + synth_cmd.append("--keep-chunks") + if args.mix: + synth_cmd += ["--mix-with-original", "--mix-background-volume", str(args.mix_background_volume)] + else: + synth_cmd.append("--replace-original") + + run(synth_cmd, dry_run=args.dry_run) + + # 4) quemar SRT en vídeo resultante + out_video = args.out if args.out else os.path.splitext(video)[0] + ".replaced_audio.subs.mp4" + replaced_src = os.path.splitext(video)[0] + ".replaced_audio.mp4" + # build filter string + vf = f"subtitles={srt_translated}:force_style='FontName=Arial,FontSize=24'" + cmd_burn = [ + "ffmpeg", + "-y", + "-i", + replaced_src, + "-vf", + vf, + "-c:a", + "copy", + out_video, + ] + run(cmd_burn, dry_run=args.dry_run) + + print("Flujo completado. Vídeo final:", out_video) + + finally: + if args.dry_run: + print("(dry-run) leaving workdir:", workdir) + else: + if not args.keep_temp: + try: + shutil.rmtree(workdir) + except Exception: + pass + + +if __name__ == '__main__': + main() \ No newline at end of file diff --git a/whisper_project/translate_srt_argos.py b/whisper_project/translate_srt_argos.py new file mode 100644 index 0000000..2451551 --- /dev/null +++ b/whisper_project/translate_srt_argos.py @@ -0,0 +1,84 @@ +#!/usr/bin/env python3 +"""translate_srt_argos.py +Traduce un .srt localmente usando Argos Translate (más ligero que transformers/torch). +Instala automáticamente el paquete en caso de no existir. + +Uso: + source .venv/bin/activate + python3 whisper_project/translate_srt_argos.py --in in.srt --out out.srt + +Requisitos: argostranslate (el script intentará instalarlo si no está presente) +""" +import argparse +import srt +import tempfile +import os + +try: + from argostranslate import package, translate +except Exception: + raise + + +def ensure_en_es_package(): + installed = package.get_installed_packages() + for p in installed: + if p.from_code == 'en' and p.to_code == 'es': + return True + # Si no está instalado, buscar disponible y descargar + avail = package.get_available_packages() + for p in avail: + if p.from_code == 'en' and p.to_code == 'es': + print('Descargando paquete Argos en->es...') + download_path = tempfile.mktemp(suffix='.zip') + try: + import requests + + with requests.get(p.download_url, stream=True, timeout=60) as r: + r.raise_for_status() + with open(download_path, 'wb') as fh: + for chunk in r.iter_content(chunk_size=8192): + if chunk: + fh.write(chunk) + # instalar desde el zip descargado + package.install_from_path(download_path) + return True + except Exception as e: + print(f"Error descargando/instalando paquete Argos: {e}") + finally: + try: + if os.path.exists(download_path): + os.remove(download_path) + except Exception: + pass + return False + + +def translate_srt(in_path: str, out_path: str): + with open(in_path, 'r', encoding='utf-8') as fh: + subs = list(srt.parse(fh.read())) + + # Asegurar paquete en->es + ok = ensure_en_es_package() + if not ok: + raise SystemExit('No se encontró paquete Argos en->es y no se pudo descargar') + + for i, sub in enumerate(subs, start=1): + text = sub.content.strip() + if not text: + continue + tr = translate.translate(text, 'en', 'es') + sub.content = tr + print(f'Translated {i}/{len(subs)}') + + with open(out_path, 'w', encoding='utf-8') as fh: + fh.write(srt.compose(subs)) + print(f'Wrote translated SRT to: {out_path}') + + +if __name__ == '__main__': + p = argparse.ArgumentParser() + p.add_argument('--in', dest='in_srt', required=True) + p.add_argument('--out', dest='out_srt', required=True) + args = p.parse_args() + translate_srt(args.in_srt, args.out_srt) diff --git a/whisper_project/translate_srt_local.py b/whisper_project/translate_srt_local.py new file mode 100644 index 0000000..0a2625a --- /dev/null +++ b/whisper_project/translate_srt_local.py @@ -0,0 +1,57 @@ +#!/usr/bin/env python3 +"""translate_srt_local.py +Traduce un .srt localmente usando MarianMT (Helsinki-NLP/opus-mt-en-es). + +Uso: + source .venv/bin/activate + python3 whisper_project/translate_srt_local.py --in path/to/in.srt --out path/to/out.srt + +Requisitos: transformers, sentencepiece, srt +""" +import argparse +import srt +from transformers import AutoModelForSeq2SeqLM, AutoTokenizer + + +def translate_srt(in_path: str, out_path: str, model_name: str = "Helsinki-NLP/opus-mt-en-es", batch_size: int = 8): + with open(in_path, "r", encoding="utf-8") as f: + subs = list(srt.parse(f.read())) + + # Cargar modelo y tokenizador + tok = AutoTokenizer.from_pretrained(model_name) + model = AutoModelForSeq2SeqLM.from_pretrained(model_name) + + texts = [sub.content.strip() for sub in subs] + translated = [] + + for i in range(0, len(texts), batch_size): + batch = texts[i:i+batch_size] + # tokenizar + enc = tok(batch, return_tensors="pt", padding=True, truncation=True) + outs = model.generate(**enc, max_length=512) + outs_decoded = tok.batch_decode(outs, skip_special_tokens=True) + translated.extend(outs_decoded) + + # Asignar traducidos + for sub, t in zip(subs, translated): + sub.content = t.strip() + + with open(out_path, "w", encoding="utf-8") as f: + f.write(srt.compose(subs)) + + print(f"SRT traducido guardado en: {out_path}") + + +def main(): + p = argparse.ArgumentParser() + p.add_argument("--in", dest="in_srt", required=True) + p.add_argument("--out", dest="out_srt", required=True) + p.add_argument("--model", default="Helsinki-NLP/opus-mt-en-es") + p.add_argument("--batch-size", dest="batch_size", type=int, default=8) + args = p.parse_args() + + translate_srt(args.in_srt, args.out_srt, model_name=args.model, batch_size=args.batch_size) + + +if __name__ == '__main__': + main()