Fix & update to complete flow to build traduction

This commit is contained in:
Cesar Mendivil 2025-10-24 10:11:20 -07:00
parent 85691f13dc
commit 293007db64
28 changed files with 713 additions and 199 deletions

48
EXAMPLES.md Normal file
View File

@ -0,0 +1,48 @@
EXAMPLES - Pipeline Whisper + Kokoro TTS
Ejemplos de uso (desde la raíz del repo, usando el venv .venv):
1) Dry-run (muestra los comandos que se ejecutarían):
.venv/bin/python whisper_project/run_full_pipeline.py \
--video dailyrutines.mp4 \
--kokoro-endpoint "https://kokoro.bfzqqk.easypanel.host/api/v1/audio/speech" \
--kokoro-key "$KOKORO_TOKEN" --voice em_alex \
--whisper-model base --dry-run
2) Ejecución completa (reemplaza el audio):
.venv/bin/python whisper_project/run_full_pipeline.py \
--video dailyrutines.mp4 \
--kokoro-endpoint "https://kokoro.bfzqqk.easypanel.host/api/v1/audio/speech" \
--kokoro-key "$KOKORO_TOKEN" --voice em_alex \
--whisper-model base
3) Usar un SRT ya generado (evita transcribir):
.venv/bin/python whisper_project/run_full_pipeline.py \
--video dailyrutines.mp4 --srt subs_en.srt \
--kokoro-endpoint "https://kokoro..." --kokoro-key "$KOKORO_TOKEN" --voice em_alex
4) Traducir con Gemini (si tienes clave) o usar fallback local:
# Usar Gemini (requiere --gemini-key o la variable GEMINI_API_KEY)
.venv/bin/python whisper_project/run_full_pipeline.py \
--video dailyrutines.mp4 --translate-method gemini --gemini-key "$GEMINI_KEY" \
--kokoro-endpoint "https://kokoro..." --kokoro-key "$KOKORO_TOKEN" --voice em_alex
# Forzar traducción local (MarianMT):
.venv/bin/python whisper_project/run_full_pipeline.py \
--video dailyrutines.mp4 --translate-method local \
--kokoro-endpoint "https://kokoro..." --kokoro-key "$KOKORO_TOKEN" --voice em_alex
5) Mezclar (mix) en lugar de reemplazar:
.venv/bin/python whisper_project/run_full_pipeline.py \
--video dailyrutines.mp4 --mix --mix-background-volume 0.3 \
--kokoro-endpoint "https://kokoro..." --kokoro-key "$KOKORO_TOKEN" --voice em_alex
Notas:
- Si algo falla con Gemini, el pipeline soporta fallback a la traducción local.
- Usa --keep-temp y/o --keep-chunks para inspeccionar los WAV intermedios.
- Ajusta --whisper-model a "base", "small", "medium" según recursos.

75
README.md Normal file
View File

@ -0,0 +1,75 @@
# Whisper dubbing pipeline
Proyecto con utilidades para transcribir, traducir y doblar vídeos por segmentos usando Whisper + TTS (Kokoro). Está pensado para ejecutar dentro de un entorno virtual Python y con `ffmpeg` disponible en PATH.
Contenido principal
- `whisper_project/transcribe.py` - transcribe audio a SRT (faster-whisper backend recomendado).
- `whisper_project/translate_srt_local.py` - traduce SRT localmente con MarianMT (Helsinki-NLP/opus-mt-en-es).
- `whisper_project/srt_to_kokoro.py` - sintetiza cada segmento del SRT usando un endpoint TTS compatible (Kokoro), alinea, concatena y opcionalmente mezcla/reemplaza audio en el vídeo.
- `whisper_project/run_full_pipeline.py` - orquestador "todo en uno" para extraer, transcribir (si hace falta), traducir y sintetizar + quemar subtítulos.
Requisitos
- Python 3.10+ (se recomienda usar el `.venv` del proyecto)
- ffmpeg y ffprobe en PATH
- Paquetes Python (instala en el venv):
- requests, srt, transformers, sentencepiece, torch (si usas MarianMT en CPU), etc.
Uso recomendado (ejemplos)
1) Ejecutar en dry-run para ver los comandos que se ejecutarán:
```bash
.venv/bin/python whisper_project/run_full_pipeline.py \
--video dailyrutines.mp4 \
--kokoro-endpoint "https://kokoro.example/api/v1/audio/speech" \
--kokoro-key "$KOKORO_TOKEN" \
--voice em_alex \
--whisper-model base \
--dry-run
```
2) Ejecutar la canalización real (traducción local y reemplazo de la pista de audio):
```bash
.venv/bin/python whisper_project/run_full_pipeline.py \
--video dailyrutines.mp4 \
--kokoro-endpoint "https://kokoro.example/api/v1/audio/speech" \
--kokoro-key "$KOKORO_TOKEN" \
--voice em_alex \
--whisper-model base
```
Flags importantes del orquestador (`run_full_pipeline.py`)
- `--translate-method` : `local` | `gemini` | `none`. Por defecto `local` (MarianMT). Si eliges `gemini` necesitas `--gemini-key`.
- `--gemini-key` : API key para Gemini (si usas `--translate-method=gemini`).
- `--mix` : en lugar de reemplazar, mezcla el audio sintetizado con la pista original. Ajusta volumen de fondo con `--mix-background-volume`.
- `--mix-background-volume` : volumen de la pista original cuando se mezclan (0.0 - 1.0).
- `--keep-chunks` : conserva los WAV por segmento (útil para debugging).
- `--keep-temp` : no borra el directorio temporal final (conserva `dub_final.wav` y chunks si `--keep-chunks`).
- `--dry-run` : sólo muestra los comandos que se ejecutarían.
Uso directo de `srt_to_kokoro.py` (si ya tienes un SRT traducido)
```bash
.venv/bin/python whisper_project/srt_to_kokoro.py \
--srt translated.srt \
--endpoint "https://kokoro.example/api/v1/audio/speech" \
--payload-template '{"model":"model","voice":"em_alex","input":"{text}","response_format":"wav"}' \
--api-key "$KOKORO_TOKEN" \
--out out.wav \
--video input.mp4 --align --replace-original
```
Notas y troubleshooting
- Si el endpoint TTS devuelve `400 Bad Request` suele ser por quoting/format del `--payload-template`. `run_full_pipeline.py` ya maneja el quoting para el caso común.
- Si `ffmpeg` muestra mensajes sobre "Too many bits" o "clamping" al crear el AAC, es una advertencia por bitrate; el MP4 suele generarse correctamente.
- Si la síntesis remota falla por autenticación, revisa la clave (`--kokoro-key`) o usa `--translate-method local` y prueba con un proveedor de TTS alternativo en `srt_to_kokoro.py`.
Siguientes mejoras sugeridas
- Validar que `--mix` y `--replace-original` no se usen simultáneamente y añadir una opción explícita mutuamente exclusiva.
- Añadir soporte para más backends de TTS (local TTS, Whisper TTS engines, o Argos local si se desea).
Licencia y seguridad
- Este repositorio contiene scripts de ejemplo. Cuida tus claves API y no las subas a repositorios públicos.
Si quieres, añado ejemplos concretos de comandos con `--mix` o con `--keep-temp` y un breve archivo `EXAMPLES.md` con variantes más avanzadas.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@ -1,56 +0,0 @@
1
00:00:00,000 --> 00:00:10,000
2
00:00:10,000 --> 00:00:20,000
3
00:00:20,000 --> 00:00:30,000
4
00:00:30,000 --> 00:00:40,000
5
00:00:40,000 --> 00:00:50,000
6
00:00:50,000 --> 00:01:00,000
7
00:01:00,000 --> 00:01:10,000
8
00:01:10,000 --> 00:01:20,000
9
00:01:20,000 --> 00:01:30,000
10
00:01:30,000 --> 00:01:40,000
11
00:01:40,000 --> 00:01:50,000
12
00:01:50,000 --> 00:02:00,000
13
00:02:00,000 --> 00:02:10,000
14
00:02:10,000 --> 00:02:20,000
15
00:02:20,000 --> 00:02:30,000
16
00:02:30,000 --> 00:02:40,000
17
00:02:40,000 --> 00:02:50,000
18
00:02:50,000 --> 00:03:00,000
19
00:03:00,000 --> 00:03:09,009

View File

@ -1,72 +0,0 @@
1
00:00:00,000 --> 00:00:06,960
Rutinas diarias
2
00:00:06,960 --> 00:00:14,480
Hola mamá, estoy disfrutando la vida en Nueva Zelanda.
3
00:00:14,480 --> 00:00:19,240
El campo es tan hermoso.
4
00:00:19,240 --> 00:00:23,199
Mi rutina es diferente ahora.
5
00:00:23,199 --> 00:00:29,960
Me despierto a las 6 en punto cada mañana y salgo a correr.
6
00:00:29,960 --> 00:00:36,640
A las 7 en punto desayuno.
7
00:00:36,640 --> 00:00:42,120
El café en Nueva Zelanda es tan bueno.
8
00:00:42,120 --> 00:00:46,240
A las 8 voy a trabajar.
9
00:00:46,240 --> 00:00:52,679
Normalmente tomo el autobús, pero a veces camino.
10
00:00:52,679 --> 00:00:57,439
Empiezo a trabajar a las 9.
11
00:00:57,439 --> 00:01:02,399
Trabajo en mi oficina hasta la hora del almuerzo.
12
00:01:02,399 --> 00:01:08,920
A las 12 almuerzo con mis colegas en el parque.
13
00:01:08,920 --> 00:01:15,239
Es agradable disfrutar del aire fresco y charlar juntos.
14
00:01:15,239 --> 00:01:23,759
A las 5 salgo del trabajo y voy al gimnasio.
15
00:01:23,760 --> 00:01:32,920
Hago ejercicio hasta las seis y luego voy a casa.
16
00:01:32,920 --> 00:01:39,520
A las 8 ceno, luego me relajo.
17
00:01:39,520 --> 00:01:44,800
I normally go to bed at 11 o'clock.
18
00:01:44,799 --> 00:01:51,799
Hasta pronto, Stephen.

View File

@ -1,71 +0,0 @@
1
00:00:00,000 --> 00:00:06,960
Dayly routines
2
00:00:06,960 --> 00:00:14,480
Hi mom, I'm enjoying life in New Zealand.
3
00:00:14,480 --> 00:00:19,240
The countryside is so beautiful.
4
00:00:19,240 --> 00:00:23,199
My routine is different now.
5
00:00:23,199 --> 00:00:29,960
I wake at 6 o'clock every morning and go for a run.
6
00:00:29,960 --> 00:00:36,640
At 7 o'clock I have breakfast.
7
00:00:36,640 --> 00:00:42,120
The coffee in New Zealand is so good.
8
00:00:42,120 --> 00:00:46,240
At 8 o'clock I go to work.
9
00:00:46,240 --> 00:00:52,679
I usually take the bus, but sometimes I walk.
10
00:00:52,679 --> 00:00:57,439
I start work at 9 o'clock.
11
00:00:57,439 --> 00:01:02,399
I work in my office until lunchtime.
12
00:01:02,399 --> 00:01:08,920
At 12 o'clock I have lunch with my colleagues in the park.
13
00:01:08,920 --> 00:01:15,239
It's nice to enjoy the fresh air and chat together.
14
00:01:15,239 --> 00:01:23,759
At 5 o'clock I leave work and go to the gym.
15
00:01:23,760 --> 00:01:32,920
I exercise until 6 o'clock and then go home.
16
00:01:32,920 --> 00:01:39,520
At 8 o'clock I eat dinner, then relax.
17
00:01:39,520 --> 00:01:44,800
I normally go to bed at 11 o'clock.
18
00:01:44,799 --> 00:01:51,799
See you soon, Stephen.

Binary file not shown.

Binary file not shown.

View File

@ -0,0 +1,449 @@
#!/usr/bin/env python3
# Orquesta: transcripción -> traducción -> síntesis por segmento -> reemplazo/mezcla -> quemado de subtítulos
import argparse
import os
import shlex
import shutil
import subprocess
import sys
import tempfile
def run(cmd, dry_run=False, env=None):
# Ejecuta un comando. Acepta str (ejecuta vía shell) o list (sin shell).
# Imprime el comando de forma segura para copiar/pegar. Si dry_run=True
# no ejecuta nada.
if isinstance(cmd, (list, tuple)):
printable = " ".join(shlex.quote(str(x)) for x in cmd)
else:
printable = cmd
print("+", printable)
if dry_run:
return 0
if isinstance(cmd, (list, tuple)):
return subprocess.run(cmd, shell=False, check=True, env=env)
return subprocess.run(cmd, shell=True, check=True, env=env)
def json_payload_template(model, voice):
# Payload JSON con {text} como placeholder que acepta srt_to_kokoro
return '{"model":"' + model + '","voice":"' + voice + '","input":"{text}","response_format":"wav"}'
def main():
p = argparse.ArgumentParser()
p.add_argument("--video", required=True, help="Vídeo de entrada")
p.add_argument(
"--srt",
help=("SRT de entrada (si ya existe). Si no, se transcribe del audio"),
)
p.add_argument("--kokoro-endpoint", required=True, help="URL del endpoint TTS")
p.add_argument("--kokoro-key", required=True, help="API key para Kokoro")
p.add_argument("--voice", default="em_alex", help="Nombre de voz (p.ej. em_alex)")
p.add_argument("--kokoro-model", default="model", help="ID del modelo Kokoro")
p.add_argument("--whisper-model", default="base", help="Modelo de Whisper para transcribir")
p.add_argument("--out", default=None, help="Vídeo de salida final (opcional)")
p.add_argument(
"--translate-method",
choices=["local", "gemini", "none"],
default="local",
help=(
"Método para traducir el SRT: 'local' (MarianMT), 'gemini' (API)"
" o 'none' (usar SRT proporcionado)"
),
)
p.add_argument("--gemini-key", default=None, help="API key para Gemini (si aplica)")
p.add_argument(
"--mix",
action="store_true",
help="Mezclar el audio sintetizado con la pista original en lugar de reemplazarla",
)
p.add_argument(
"--mix-background-volume",
type=float,
default=0.2,
help="Volumen de la pista original al mezclar (0.0-1.0)",
)
p.add_argument(
"--keep-chunks",
action="store_true",
help="Conservar los archivos de chunks generados por la síntesis (debug)",
)
p.add_argument(
"--keep-temp",
action="store_true",
help="No borrar el directorio temporal de trabajo al terminar",
)
p.add_argument("--dry-run", action="store_true", help="Solo mostrar comandos sin ejecutar")
args = p.parse_args()
video = os.path.abspath(args.video)
if not os.path.exists(video):
print("Vídeo no encontrado:", video, file=sys.stderr)
sys.exit(2)
workdir = tempfile.mkdtemp(prefix="full_pipeline_")
try:
# 1) obtener SRT: si no se pasa, extraer audio y transcribir
if args.srt:
srt_in = os.path.abspath(args.srt)
print("Usando SRT proporcionado:", srt_in)
else:
audio_tmp = os.path.join(workdir, "extracted_audio.wav")
cmd_extract = [
"ffmpeg",
"-y",
"-i",
video,
"-vn",
"-acodec",
"pcm_s16le",
"-ar",
"16000",
"-ac",
"1",
audio_tmp,
]
run(cmd_extract, dry_run=args.dry_run)
# llamar al script transcribe.py para generar SRT
srt_in = os.path.join(workdir, "transcribed.srt")
cmd_trans = [
sys.executable,
"whisper_project/transcribe.py",
"--file",
audio_tmp,
"--backend",
"faster-whisper",
"--model",
args.whisper_model,
"--srt",
"--srt-file",
srt_in,
]
run(cmd_trans, dry_run=args.dry_run)
# 2) traducir SRT según método elegido
srt_translated = os.path.join(workdir, "translated.srt")
if args.translate_method == "local":
cmd_translate = [
sys.executable,
"whisper_project/translate_srt_local.py",
"--in",
srt_in,
"--out",
srt_translated,
]
run(cmd_translate, dry_run=args.dry_run)
elif args.translate_method == "gemini":
gem_key = args.gemini_key or os.environ.get("GEMINI_API_KEY")
if not gem_key:
print(
"--translate-method=gemini requiere --gemini-key o la var de entorno GEMINI_API_KEY",
file=sys.stderr,
)
sys.exit(4)
cmd_translate = [
sys.executable,
"whisper_project/translate_srt_with_gemini.py",
"--in",
srt_in,
"--out",
srt_translated,
"--gemini-api-key",
gem_key,
]
run(cmd_translate, dry_run=args.dry_run)
else:
# none: usar SRT tal cual
srt_translated = srt_in
# 3) sintetizar por segmento con Kokoro, alinear, concatenar y
# reemplazar o mezclar audio en el vídeo
dub_wav = os.path.join(workdir, "dub_final.wav")
payload = json_payload_template(args.kokoro_model, args.voice)
synth_cmd = [
sys.executable,
"whisper_project/srt_to_kokoro.py",
"--srt",
srt_translated,
"--endpoint",
args.kokoro_endpoint,
"--payload-template",
payload,
"--api-key",
args.kokoro_key,
"--out",
dub_wav,
"--video",
video,
"--align",
]
if args.keep_chunks:
synth_cmd.append("--keep-chunks")
if args.mix:
synth_cmd += ["--mix-with-original", "--mix-background-volume", str(args.mix_background_volume)]
else:
synth_cmd.append("--replace-original")
run(synth_cmd, dry_run=args.dry_run)
# 4) quemar SRT en vídeo resultante
out_video = args.out if args.out else os.path.splitext(video)[0] + ".replaced_audio.subs.mp4"
replaced_src = os.path.splitext(video)[0] + ".replaced_audio.mp4"
# build filter string
vf = f"subtitles={srt_translated}:force_style='FontName=Arial,FontSize=24'"
cmd_burn = [
"ffmpeg",
"-y",
"-i",
replaced_src,
"-vf",
vf,
"-c:a",
"copy",
out_video,
]
run(cmd_burn, dry_run=args.dry_run)
print("Flujo completado. Vídeo final:", out_video)
finally:
if args.dry_run:
print("(dry-run) leaving workdir:", workdir)
else:
if not args.keep_temp:
try:
shutil.rmtree(workdir)
except Exception:
pass
if __name__ == '__main__':
main()
#!/usr/bin/env python3
# run_full_pipeline.py
# Orquesta: transcripción -> traducción -> síntesis por segmento -> reemplazo/mezcla -> quemado de subtítulos
import argparse
import os
import shlex
import shutil
import subprocess
import sys
import tempfile
def run(cmd, dry_run=False, env=None):
# Ejecuta un comando. Acepta str (ejecuta vía shell) o list (sin shell).
# Imprime el comando de forma segura para copiar/pegar. Si dry_run=True
# no ejecuta nada.
if isinstance(cmd, (list, tuple)):
printable = " ".join(shlex.quote(str(x)) for x in cmd)
else:
printable = cmd
print("+", printable)
if dry_run:
return 0
if isinstance(cmd, (list, tuple)):
return subprocess.run(cmd, shell=False, check=True, env=env)
return subprocess.run(cmd, shell=True, check=True, env=env)
def json_payload_template(model, voice):
# Payload JSON con {text} como placeholder que acepta srt_to_kokoro
return '{"model":"' + model + '","voice":"' + voice + '","input":"{text}","response_format":"wav"}'
def main():
p = argparse.ArgumentParser()
p.add_argument("--video", required=True, help="Vídeo de entrada")
p.add_argument(
"--srt",
help=("SRT de entrada (si ya existe). Si no, se transcribe del audio"),
)
p.add_argument("--kokoro-endpoint", required=True, help="URL del endpoint TTS")
p.add_argument("--kokoro-key", required=True, help="API key para Kokoro")
p.add_argument("--voice", default="em_alex", help="Nombre de voz (p.ej. em_alex)")
p.add_argument("--kokoro-model", default="model", help="ID del modelo Kokoro")
p.add_argument("--whisper-model", default="base", help="Modelo de Whisper para transcribir")
p.add_argument("--out", default=None, help="Vídeo de salida final (opcional)")
p.add_argument(
"--translate-method",
choices=["local", "gemini", "none"],
default="local",
help=(
"Método para traducir el SRT: 'local' (MarianMT), 'gemini' (API)"
" o 'none' (usar SRT proporcionado)"
),
)
p.add_argument("--gemini-key", default=None, help="API key para Gemini (si aplica)")
p.add_argument(
"--mix",
action="store_true",
help="Mezclar el audio sintetizado con la pista original en lugar de reemplazarla",
)
p.add_argument(
"--mix-background-volume",
type=float,
default=0.2,
help="Volumen de la pista original al mezclar (0.0-1.0)",
)
p.add_argument(
"--keep-chunks",
action="store_true",
help="Conservar los archivos de chunks generados por la síntesis (debug)",
)
p.add_argument(
"--keep-temp",
action="store_true",
help="No borrar el directorio temporal de trabajo al terminar",
)
p.add_argument("--dry-run", action="store_true", help="Solo mostrar comandos sin ejecutar")
args = p.parse_args()
video = os.path.abspath(args.video)
if not os.path.exists(video):
print("Vídeo no encontrado:", video, file=sys.stderr)
sys.exit(2)
workdir = tempfile.mkdtemp(prefix="full_pipeline_")
try:
# 1) obtener SRT: si no se pasa, extraer audio y transcribir
if args.srt:
srt_in = os.path.abspath(args.srt)
print("Usando SRT proporcionado:", srt_in)
else:
audio_tmp = os.path.join(workdir, "extracted_audio.wav")
cmd_extract = [
"ffmpeg",
"-y",
"-i",
video,
"-vn",
"-acodec",
"pcm_s16le",
"-ar",
"16000",
"-ac",
"1",
audio_tmp,
]
run(cmd_extract, dry_run=args.dry_run)
# llamar al script transcribe.py para generar SRT
srt_in = os.path.join(workdir, "transcribed.srt")
cmd_trans = [
sys.executable,
"whisper_project/transcribe.py",
"--file",
audio_tmp,
"--backend",
"faster-whisper",
"--model",
args.whisper_model,
"--srt",
"--srt-file",
srt_in,
]
run(cmd_trans, dry_run=args.dry_run)
# 2) traducir SRT según método elegido
srt_translated = os.path.join(workdir, "translated.srt")
if args.translate_method == "local":
cmd_translate = [
sys.executable,
"whisper_project/translate_srt_local.py",
"--in",
srt_in,
"--out",
srt_translated,
]
run(cmd_translate, dry_run=args.dry_run)
elif args.translate_method == "gemini":
gem_key = args.gemini_key or os.environ.get("GEMINI_API_KEY")
if not gem_key:
print(
"--translate-method=gemini requiere --gemini-key o la var de entorno GEMINI_API_KEY",
file=sys.stderr,
)
sys.exit(4)
cmd_translate = [
sys.executable,
"whisper_project/translate_srt_with_gemini.py",
"--in",
srt_in,
"--out",
srt_translated,
"--gemini-api-key",
gem_key,
]
run(cmd_translate, dry_run=args.dry_run)
else:
# none: usar SRT tal cual
srt_translated = srt_in
# 3) sintetizar por segmento con Kokoro, alinear, concatenar y
# reemplazar o mezclar audio en el vídeo
dub_wav = os.path.join(workdir, "dub_final.wav")
payload = json_payload_template(args.kokoro_model, args.voice)
synth_cmd = [
sys.executable,
"whisper_project/srt_to_kokoro.py",
"--srt",
srt_translated,
"--endpoint",
args.kokoro_endpoint,
"--payload-template",
payload,
"--api-key",
args.kokoro_key,
"--out",
dub_wav,
"--video",
video,
"--align",
]
if args.keep_chunks:
synth_cmd.append("--keep-chunks")
if args.mix:
synth_cmd += ["--mix-with-original", "--mix-background-volume", str(args.mix_background_volume)]
else:
synth_cmd.append("--replace-original")
run(synth_cmd, dry_run=args.dry_run)
# 4) quemar SRT en vídeo resultante
out_video = args.out if args.out else os.path.splitext(video)[0] + ".replaced_audio.subs.mp4"
replaced_src = os.path.splitext(video)[0] + ".replaced_audio.mp4"
# build filter string
vf = f"subtitles={srt_translated}:force_style='FontName=Arial,FontSize=24'"
cmd_burn = [
"ffmpeg",
"-y",
"-i",
replaced_src,
"-vf",
vf,
"-c:a",
"copy",
out_video,
]
run(cmd_burn, dry_run=args.dry_run)
print("Flujo completado. Vídeo final:", out_video)
finally:
if args.dry_run:
print("(dry-run) leaving workdir:", workdir)
else:
if not args.keep_temp:
try:
shutil.rmtree(workdir)
except Exception:
pass
if __name__ == '__main__':
main()

View File

@ -0,0 +1,84 @@
#!/usr/bin/env python3
"""translate_srt_argos.py
Traduce un .srt localmente usando Argos Translate (más ligero que transformers/torch).
Instala automáticamente el paquete en caso de no existir.
Uso:
source .venv/bin/activate
python3 whisper_project/translate_srt_argos.py --in in.srt --out out.srt
Requisitos: argostranslate (el script intentará instalarlo si no está presente)
"""
import argparse
import srt
import tempfile
import os
try:
from argostranslate import package, translate
except Exception:
raise
def ensure_en_es_package():
installed = package.get_installed_packages()
for p in installed:
if p.from_code == 'en' and p.to_code == 'es':
return True
# Si no está instalado, buscar disponible y descargar
avail = package.get_available_packages()
for p in avail:
if p.from_code == 'en' and p.to_code == 'es':
print('Descargando paquete Argos en->es...')
download_path = tempfile.mktemp(suffix='.zip')
try:
import requests
with requests.get(p.download_url, stream=True, timeout=60) as r:
r.raise_for_status()
with open(download_path, 'wb') as fh:
for chunk in r.iter_content(chunk_size=8192):
if chunk:
fh.write(chunk)
# instalar desde el zip descargado
package.install_from_path(download_path)
return True
except Exception as e:
print(f"Error descargando/instalando paquete Argos: {e}")
finally:
try:
if os.path.exists(download_path):
os.remove(download_path)
except Exception:
pass
return False
def translate_srt(in_path: str, out_path: str):
with open(in_path, 'r', encoding='utf-8') as fh:
subs = list(srt.parse(fh.read()))
# Asegurar paquete en->es
ok = ensure_en_es_package()
if not ok:
raise SystemExit('No se encontró paquete Argos en->es y no se pudo descargar')
for i, sub in enumerate(subs, start=1):
text = sub.content.strip()
if not text:
continue
tr = translate.translate(text, 'en', 'es')
sub.content = tr
print(f'Translated {i}/{len(subs)}')
with open(out_path, 'w', encoding='utf-8') as fh:
fh.write(srt.compose(subs))
print(f'Wrote translated SRT to: {out_path}')
if __name__ == '__main__':
p = argparse.ArgumentParser()
p.add_argument('--in', dest='in_srt', required=True)
p.add_argument('--out', dest='out_srt', required=True)
args = p.parse_args()
translate_srt(args.in_srt, args.out_srt)

View File

@ -0,0 +1,57 @@
#!/usr/bin/env python3
"""translate_srt_local.py
Traduce un .srt localmente usando MarianMT (Helsinki-NLP/opus-mt-en-es).
Uso:
source .venv/bin/activate
python3 whisper_project/translate_srt_local.py --in path/to/in.srt --out path/to/out.srt
Requisitos: transformers, sentencepiece, srt
"""
import argparse
import srt
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
def translate_srt(in_path: str, out_path: str, model_name: str = "Helsinki-NLP/opus-mt-en-es", batch_size: int = 8):
with open(in_path, "r", encoding="utf-8") as f:
subs = list(srt.parse(f.read()))
# Cargar modelo y tokenizador
tok = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
texts = [sub.content.strip() for sub in subs]
translated = []
for i in range(0, len(texts), batch_size):
batch = texts[i:i+batch_size]
# tokenizar
enc = tok(batch, return_tensors="pt", padding=True, truncation=True)
outs = model.generate(**enc, max_length=512)
outs_decoded = tok.batch_decode(outs, skip_special_tokens=True)
translated.extend(outs_decoded)
# Asignar traducidos
for sub, t in zip(subs, translated):
sub.content = t.strip()
with open(out_path, "w", encoding="utf-8") as f:
f.write(srt.compose(subs))
print(f"SRT traducido guardado en: {out_path}")
def main():
p = argparse.ArgumentParser()
p.add_argument("--in", dest="in_srt", required=True)
p.add_argument("--out", dest="out_srt", required=True)
p.add_argument("--model", default="Helsinki-NLP/opus-mt-en-es")
p.add_argument("--batch-size", dest="batch_size", type=int, default=8)
args = p.parse_args()
translate_srt(args.in_srt, args.out_srt, model_name=args.model, batch_size=args.batch_size)
if __name__ == '__main__':
main()