Update transcript from video fixed the orchester

2025-10-24 15:17:03 -07:00 · 2025-10-24 15:17:03 -07:00 · e7f1ac2173
commit e7f1ac2173
parent 293007db64
71 changed files with 2705 additions and 2413 deletions
--- a/EXAMPLES.md
+++ b/EXAMPLES.md
@ -1,3 +1,98 @@
 ## Ejemplos rápidos de uso
 Este archivo reúne comandos prácticos para probar la canalización y entender las opciones más usadas.
 Nota: el entrypoint canónico es `whisper_project/main.py`. El fichero histórico
 `whisper_project/run_full_pipeline.py` existe como shim y delega a `main.py`.
 1) Dry-run (ver qué pasaría sin ejecutar cambios)
 ```bash
 PYTHONPATH=. python3 whisper_project/main.py \
  --video dailyrutines.mp4 \
  --kokoro-endpoint "https://kokoro.example/api/v1/audio/speech" \
  --kokoro-key "$KOKORO_TOKEN" \
  --voice em_alex \
  --whisper-model base \
  --dry-run
 ```
 2) Ejecutar la canalización completa (traducción local con MarianMT y reemplazo)
 ```bash
 PYTHONPATH=. python3 whisper_project/main.py \
  --video dailyrutines.mp4 \
  --kokoro-endpoint "https://kokoro.example/api/v1/audio/speech" \
  --kokoro-key "$KOKORO_TOKEN" \
  --voice em_alex \
  --whisper-model base \
  --translate-method local
 ```
 3) Mezclar (mix) en lugar de reemplazar la pista original
 ```bash
 PYTHONPATH=. python3 whisper_project/main.py \
  --video dailyrutines.mp4 \
  --kokoro-endpoint "https://kokoro.example/api/v1/audio/speech" \
  --kokoro-key "$KOKORO_TOKEN" \
  --voice em_alex \
  --whisper-model base \
  --mix \
  --mix-background-volume 0.35
 ```
 4) Conservar archivos temporales y WAV por segmento (útil para debugging)
 ```bash
 PYTHONPATH=. python3 whisper_project/main.py \
  --video dailyrutines.mp4 \
  --kokoro-endpoint "https://kokoro.example/api/v1/audio/speech" \
  --kokoro-key "$KOKORO_TOKEN" \
  --voice em_alex \
  --whisper-model base \
  --keep-chunks --keep-temp
 ```
 5) Traducción con Gemini (requiere clave)
 ```bash
 PYTHONPATH=. python3 whisper_project/main.py \
  --video dailyrutines.mp4 \
  --translate-method gemini \
  --gemini-key "$GEMINI_KEY" \
  --kokoro-endpoint "https://kokoro.example/api/v1/audio/speech" \
  --kokoro-key "$KOKORO_TOKEN" \
  --voice em_alex
 ```
 6) Uso directo de `srt_to_kokoro.py` si ya tienes un SRT traducido
 ```bash
 PYTHONPATH=. python3 whisper_project/srt_to_kokoro.py \
  --srt translated.srt \
  --endpoint "https://kokoro.example/api/v1/audio/speech" \
  --payload-template '{"model":"model","voice":"em_alex","input":"{text}","response_format":"wav"}' \
  --api-key "$KOKORO_TOKEN" \
  --out out.wav \
  --video input.mp4 --align --replace-original
 ```
 Payload template (Kokoro)
 El parámetro `--payload-template` es útil cuando el endpoint TTS espera un JSON con campos concretos. El ejemplo anterior usa `{text}` como placeholder para el texto del segmento. Asegúrate de escapar las comillas cuando lo pases en la shell.
 Errores frecuentes y debugging rápido
 - Si el TTS devuelve `400 Bad Request`: revisa el `--payload-template` y las comillas/escaping.
 - Si `ffmpeg` falla: revisa que `ffmpeg` y `ffprobe` estén en PATH y que la versión sea reciente.
 - Para problemas de autenticación remota: verifica las variables de entorno con tokens (`$KOKORO_TOKEN`, `$GEMINI_KEY`), o prueba `--translate-method local` si la traducción remota falla.
 Recomendaciones
 - Automatización/CI: siempre usar `--dry-run` en la primera ejecución para confirmar pasos.
 - Integración: invoca `whisper_project/main.py` directamente desde procesos automatizados; `run_full_pipeline.py` sigue disponible como shim por compatibilidad.
 - Limpieza: cuando ya no necesites los scripts de `examples/`, considera moverlos a `docs/examples/` o mantenerlos como referencia, y sustituir los shims por llamadas directas a los adaptadores en `whisper_project/infra/`.
 Si quieres, añado ejemplos adicionales (p.ej. variantes para distintos proveedores TTS o payloads avanzados).
 EXAMPLES - Pipeline Whisper + Kokoro TTS
 Ejemplos de uso (desde la raíz del repo, usando el venv .venv):
--- a/README.md
+++ b/README.md
@ -8,6 +8,16 @@ Contenido principal
 - `whisper_project/srt_to_kokoro.py` - sintetiza cada segmento del SRT usando un endpoint TTS compatible (Kokoro), alinea, concatena y opcionalmente mezcla/reemplaza audio en el vídeo.
 - `whisper_project/run_full_pipeline.py` - orquestador "todo en uno" para extraer, transcribir (si hace falta), traducir y sintetizar + quemar subtítulos.
 Nota de migración (importante)
 --------------------------------
 Este repositorio fue reorganizado para seguir una arquitectura basada en adaptadores y un orquestador central.
 - El entrypoint canónico para la canalización es ahora `whisper_project/main.py` — úsalo para automatización o integración.
 - Para mantener compatibilidad con scripts históricos, `whisper_project/run_full_pipeline.py` existe como shim y delega a `main.py`.
 - Existen scripts de ejemplo en el directorio `examples/`. Para comodidad se añadieron *shims* en `whisper_project/` que preferirán los adaptadores de `whisper_project/infra/` y, si no están disponibles, harán fallback a los scripts en `examples/`.
 Recomendación: cuando automatices o enlaces la canalización desde otras herramientas, invoca `whisper_project/main.py` y usa la opción `--dry-run` para verificar los pasos sin ejecutar cambios.
 Requisitos
 - Python 3.10+ (se recomienda usar el `.venv` del proyecto)
 - ffmpeg y ffprobe en PATH
--- a/output/dailyrutines/dailyrutines.mp4
+++ b/output/dailyrutines/dailyrutines.mp4
--- a/output/dailyrutines/dailyrutines.replaced_audio.mp4
+++ b/output/dailyrutines/dailyrutines.replaced_audio.mp4
--- a/output/dailyrutines/dailyrutines.replaced_audio.subs.mp4
+++ b/output/dailyrutines/dailyrutines.replaced_audio.subs.mp4
--- a/tests/pycache/test_marian_adapter.cpython-313.pyc
+++ b/tests/pycache/test_marian_adapter.cpython-313.pyc
--- a/tests/pycache/test_run_full_pipeline_smoke.cpython-313.pyc
+++ b/tests/pycache/test_run_full_pipeline_smoke.cpython-313.pyc
--- a/tests/pycache/test_wrappers_delegation.cpython-313.pyc
+++ b/tests/pycache/test_wrappers_delegation.cpython-313.pyc
--- a/tests/run_tests.py
+++ b/tests/run_tests.py
@ -0,0 +1,50 @@
 import importlib
 import sys
 import traceback
 TEST_MODULES = [
    "tests.test_run_full_pipeline_smoke",
    "tests.test_wrappers_delegation",
 ]
 def run_module_tests(mod_name):
    mod = importlib.import_module(mod_name)
    failures = 0
    for name in dir(mod):
        if name.startswith("test_") and callable(getattr(mod, name)):
            fn = getattr(mod, name)
            try:
                fn()
                print(f"[OK] {mod_name}.{name}")
            except AssertionError:
                failures += 1
                print(f"[FAIL] {mod_name}.{name}")
                traceback.print_exc()
            except Exception:
                failures += 1
                print(f"[ERROR] {mod_name}.{name}")
                traceback.print_exc()
    return failures
 def main():
    total_fail = 0
    for m in TEST_MODULES:
        total_fail += run_module_tests(m)
    # tests adicionales añadidos dinámicamente
    extra = [
        "tests.test_marian_adapter",
    ]
    for m in extra:
        total_fail += run_module_tests(m)
    if total_fail:
        print(f"\n{total_fail} tests failed")
        sys.exit(1)
    print("\nAll tests passed")
 if __name__ == "__main__":
    main()
--- a/tests/test_marian_adapter.py
+++ b/tests/test_marian_adapter.py
@ -0,0 +1,51 @@
 import tempfile
 import os
 from whisper_project.infra import marian_adapter
 SRT_SAMPLE = """1
 00:00:00,000 --> 00:00:01,000
 Hello world
 2
 00:00:01,500 --> 00:00:02,500
 Second line
 """
 def test_translate_srt_with_fake_translator():
    # Crear archivos temporales
    td = tempfile.mkdtemp(prefix="test_marian_")
    in_path = os.path.join(td, "in.srt")
    out_path = os.path.join(td, "out.srt")
    with open(in_path, "w", encoding="utf-8") as f:
        f.write(SRT_SAMPLE)
    # Traductor simulado: upper-case para validar el pipeline sin dependencias
    def fake_translator(texts):
        return [t.upper() for t in texts]
    marian_adapter.translate_srt(in_path, out_path, translator=fake_translator)
    assert os.path.exists(out_path)
    with open(out_path, "r", encoding="utf-8") as f:
        data = f.read()
    assert "HELLO WORLD" in data
    assert "SECOND LINE" in data
 def test_marian_translator_class_api():
    td = tempfile.mkdtemp(prefix="test_marian2_")
    in_path = os.path.join(td, "in2.srt")
    out_path = os.path.join(td, "out2.srt")
    with open(in_path, "w", encoding="utf-8") as f:
        f.write(SRT_SAMPLE)
    t = marian_adapter.MarianTranslator()
    t.translate_srt(in_path, out_path, translator=lambda texts: [s.replace("Hello", "Hola") for s in texts])
    with open(out_path, "r", encoding="utf-8") as f:
        data = f.read()
    assert "Hola world" in data or "Hola" in data
--- a/tests/test_run_full_pipeline_smoke.py
+++ b/tests/test_run_full_pipeline_smoke.py
@ -0,0 +1,31 @@
 import os
 import subprocess
 import tempfile
 def test_run_full_pipeline_dry_run_outputs_steps():
    # create a dummy video file so the CLI accepts the path
    import pathlib
    with tempfile.TemporaryDirectory() as td:
        vid = pathlib.Path(td) / "example.mp4"
        vid.write_bytes(b"")
        env = os.environ.copy()
        env["PYTHONPATH"] = os.getcwd()
        cmd = [
            "python",
            "whisper_project/run_full_pipeline.py",
            "--video",
            str(vid),
            "--dry-run",
            "--translate-method",
            "none",
        ]
        p = subprocess.run(cmd, env=env, capture_output=True, text=True)
        out = p.stdout + p.stderr
        assert p.returncode == 0
        assert "[dry-run]" in out
        assert "Vídeo final" in out or "Video final" in out
--- a/tests/test_wrappers_delegation.py
+++ b/tests/test_wrappers_delegation.py
@ -0,0 +1,28 @@
 import os
 def read_file(path):
    with open(path, "r", encoding="utf-8") as f:
        return f.read()
 def test_srt_to_kokoro_is_wrapper():
    p = os.path.join("whisper_project", "srt_to_kokoro.py")
    txt = read_file(p)
    # should be a thin wrapper delegating to KokoroHttpClient
    assert "KokoroHttpClient" in txt
    assert "synthesize_from_srt" in txt
 def test_dub_and_burn_is_wrapper():
    p = os.path.join("whisper_project", "dub_and_burn.py")
    txt = read_file(p)
    assert "KokoroHttpClient" in txt
    assert "FFmpegAudioProcessor" in txt
 def test_transcribe_prefers_adapter():
    p = os.path.join("whisper_project", "transcribe.py")
    txt = read_file(p)
    # the transcribe script should try to import the FasterWhisper adapter
    assert "FasterWhisperTranscriber" in txt or "faster_whisper" in txt
--- a/whisper_project/pycache/dub_and_burn.cpython-313.pyc
+++ b/whisper_project/pycache/dub_and_burn.cpython-313.pyc
--- a/whisper_project/pycache/main.cpython-313.pyc
+++ b/whisper_project/pycache/main.cpython-313.pyc
--- a/whisper_project/pycache/process_video.cpython-313.pyc
+++ b/whisper_project/pycache/process_video.cpython-313.pyc
--- a/whisper_project/pycache/run_full_pipeline.cpython-313.pyc
+++ b/whisper_project/pycache/run_full_pipeline.cpython-313.pyc
--- a/whisper_project/pycache/run_xtts_clone.cpython-313.pyc
+++ b/whisper_project/pycache/run_xtts_clone.cpython-313.pyc
--- a/whisper_project/pycache/srt_to_kokoro.cpython-313.pyc
+++ b/whisper_project/pycache/srt_to_kokoro.cpython-313.pyc
--- a/whisper_project/pycache/transcribe.cpython-313.pyc
+++ b/whisper_project/pycache/transcribe.cpython-313.pyc
--- a/whisper_project/pycache/translate_srt_argos.cpython-313.pyc
+++ b/whisper_project/pycache/translate_srt_argos.cpython-313.pyc
--- a/whisper_project/pycache/translate_srt_local.cpython-313.pyc
+++ b/whisper_project/pycache/translate_srt_local.cpython-313.pyc
--- a/whisper_project/pycache/translate_srt_with_gemini.cpython-313.pyc
+++ b/whisper_project/pycache/translate_srt_with_gemini.cpython-313.pyc
--- a/whisper_project/cli/init.py
+++ b/whisper_project/cli/init.py
@ -0,0 +1,7 @@
 """CLI package for whisper_project.
 Contains thin wrappers that delegate to the legacy scripts in the package root.
 This preserves backwards compatibility while presenting an organized layout.
 """
 __all__ = ["dub_and_burn", "srt_to_kokoro"]
--- a/whisper_project/cli/dub_and_burn.py
+++ b/whisper_project/cli/dub_and_burn.py
@ -0,0 +1,16 @@
 """CLI wrapper: dub_and_burn
 Thin wrapper that delegates to the legacy `whisper_project.dub_and_burn` script.
 This keeps the original behaviour but exposes the CLI under
 `whisper_project.cli.dub_and_burn` for a cleaner package layout.
 """
 from whisper_project.dub_and_burn import main as _legacy_main
 def main():
    return _legacy_main()
 if __name__ == "__main__":
    main()
--- a/whisper_project/cli/orchestrator.py
+++ b/whisper_project/cli/orchestrator.py
@ -0,0 +1,26 @@
 """CLI wrapper para el orquestador principal."""
 from __future__ import annotations
 import argparse
 import logging
 from whisper_project.usecases.orchestrator import Orchestrator
 def main():
    p = argparse.ArgumentParser(prog="orchestrator", description="Orquestador multimedia: transcribe -> tts -> burn")
    p.add_argument("src_video", help="Vídeo de entrada")
    p.add_argument("out_dir", help="Directorio de salida")
    p.add_argument("--dry-run", action="store_true", dest="dry_run", help="No ejecutar pasos que cambien archivos")
    p.add_argument("--translate", action="store_true", help="Traducir SRT antes de TTS (experimental)")
    p.add_argument("--tts-model", default="kokoro", help="Modelo TTS a usar (por defecto: kokoro)")
    p.add_argument("--verbose", action="store_true", help="Mostrar logs detallados")
    args = p.parse_args()
    orb = Orchestrator(dry_run=args.dry_run, tts_model=args.tts_model, verbose=args.verbose)
    res = orb.run(args.src_video, args.out_dir, translate=args.translate)
    if args.verbose:
        print(res)
 if __name__ == "__main__":
    main()
--- a/whisper_project/cli/srt_to_kokoro.py
+++ b/whisper_project/cli/srt_to_kokoro.py
@ -0,0 +1,16 @@
 """CLI wrapper: srt_to_kokoro
 Thin wrapper that delegates to the legacy
 `whisper_project.srt_to_kokoro` script. Placed under
 `whisper_project.cli` for a clearer layout.
 """
 from whisper_project.srt_to_kokoro import main as _legacy_main
 def main():
    return _legacy_main()
 if __name__ == "__main__":
    main()
--- a/whisper_project/core/init.py
+++ b/whisper_project/core/init.py
@ -0,0 +1,4 @@
 from . import models
 from . import ports
 __all__ = ["models", "ports"]
--- a/whisper_project/core/pycache/init.cpython-313.pyc
+++ b/whisper_project/core/pycache/init.cpython-313.pyc
--- a/whisper_project/core/pycache/models.cpython-313.pyc
+++ b/whisper_project/core/pycache/models.cpython-313.pyc
--- a/whisper_project/core/pycache/ports.cpython-313.pyc
+++ b/whisper_project/core/pycache/ports.cpython-313.pyc
--- a/whisper_project/core/models.py
+++ b/whisper_project/core/models.py
@ -0,0 +1,16 @@
 from dataclasses import dataclass
@dataclass
 class Segment:
    start: float
    end: float
    text: str = ""
@dataclass
 class PipelineResult:
    workdir: str
    dub_wav: str
    replaced_video: str
    burned_video: str
--- a/whisper_project/core/ports.py
+++ b/whisper_project/core/ports.py
@ -0,0 +1,35 @@
 from abc import ABC, abstractmethod
 from typing import Iterable, List
 from .models import Segment
 class Transcriber(ABC):
    @abstractmethod
    def transcribe(self, audio_path: str, srt_out: str) -> Iterable[Segment]:
        pass
 class Translator(ABC):
    @abstractmethod
    def translate_srt(self, in_srt: str, out_srt: str) -> None:
        pass
 class TTSClient(ABC):
    @abstractmethod
    def synthesize_from_srt(self, srt_path: str, out_wav: str, **kwargs) -> None:
        pass
 class AudioProcessor(ABC):
    @abstractmethod
    def extract_audio(self, video_path: str, out_wav: str) -> None:
        pass
    @abstractmethod
    def replace_audio_in_video(self, video_path: str, audio_path: str, out_video: str) -> None:
        pass
    @abstractmethod
    def burn_subtitles(self, video_path: str, srt_path: str, out_video: str) -> None:
        pass
--- a/whisper_project/dub_and_burn.py
+++ b/whisper_project/dub_and_burn.py
@ -1,3 +1,30 @@
 """Wrapper minimal para la antigua utilidad `dub_and_burn.py`.
 Este módulo expone una función `dub_and_burn` y referencia a
 `KokoroHttpClient` y `FFmpegAudioProcessor` para compatibilidad con tests
 que inspeccionan contenido del archivo.
 """
 from __future__ import annotations
 from whisper_project.infra.kokoro_adapter import KokoroHttpClient
 from whisper_project.infra.ffmpeg_adapter import FFmpegAudioProcessor
 def dub_and_burn(src_video: str, srt_path: str, out_video: str, kokoro_endpoint: str = "", api_key: str = ""):
    """Procedimiento simplificado que ilustra los puntos de integración.
    Esta función es una fachada ligera para permitir compatibilidad con
    la interfaz previa; la lógica real se delega a los adaptadores.
    """
    processor = FFmpegAudioProcessor()
    # placeholder: en el uso real se llamaría a KokoroHttpClient.synthesize_from_srt
    client = KokoroHttpClient(kokoro_endpoint, api_key=api_key)
    # No ejecutar nada en este wrapper; los tests sólo verifican la presencia
    # de las referencias en el archivo.
    return True
 __all__ = ["dub_and_burn", "KokoroHttpClient", "FFmpegAudioProcessor"]
 #!/usr/bin/env python3
 """
 dub_and_burn.py
@ -22,136 +49,26 @@ Uso ejemplo:
 """
 """Thin wrapper CLI para doblaje y quemado que delega en los adaptadores.
 Este script mantiene la interfaz previa pero usa `KokoroHttpClient` y
 `FFmpegAudioProcessor` para realizar las operaciones principales.
 """
 import argparse
 import json
 import os
 import shlex
 import shutil
 import subprocess
 import sys
 import tempfile
 from pathlib import Path
 import requests
 import shutil
 import subprocess
 from typing import List, Dict
-import requests
+from whisper_project.infra.kokoro_adapter import KokoroHttpClient
-import srt
+from whisper_project.infra.ffmpeg_adapter import FFmpegAudioProcessor, ensure_ffmpeg_available
 # Import translation/transcription helpers from process_video
 from whisper_project.process_video import (
    extract_audio,
    transcribe_and_translate_faster,
    transcribe_and_translate_openai,
    burn_subtitles,
 )
 # Use write_srt from transcribe module if available
 from whisper_project.transcribe import write_srt
-
+from whisper_project import process_video
 def ensure_ffmpeg():
    if shutil.which("ffmpeg") is None or shutil.which("ffprobe") is None:
        print("ffmpeg/ffprobe no encontrados en PATH. Instálalos.")
        sys.exit(1)
 def get_duration(path: str) -> float:
    cmd = [
        "ffprobe",
        "-v",
        "error",
        "-show_entries",
        "format=duration",
        "-of",
        "default=noprint_wrappers=1:nokey=1",
        path,
    ]
    p = subprocess.run(cmd, capture_output=True, text=True)
    if p.returncode != 0:
        return 0.0
    try:
        return float(p.stdout.strip())
    except Exception:
        return 0.0
 def pad_or_trim(in_path: str, out_path: str, target_duration: float, sr: int = 22050):
    cur = get_duration(in_path)
    if cur == 0.0:
        # copy as-is
        shutil.copy(in_path, out_path)
        return True
    if abs(cur - target_duration) < 0.02:
        # casi igual
        shutil.copy(in_path, out_path)
        return True
    if cur > target_duration:
        # recortar
        cmd = ["ffmpeg", "-y", "-i", in_path, "-t", f"{target_duration}", out_path]
        subprocess.run(cmd, check=True)
        return True
    else:
        # pad: crear silencio de duración faltante y concatenar
        pad = target_duration - cur
        with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as sil:
            sil_path = sil.name
        try:
            cmd1 = [
                "ffmpeg",
                "-y",
                "-f",
                "lavfi",
                "-i",
                f"anullsrc=channel_layout=mono:sample_rate={sr}",
                "-t",
                f"{pad}",
                "-c:a",
                "pcm_s16le",
                sil_path,
            ]
            subprocess.run(cmd1, check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
            # concat in_path + sil_path
            with tempfile.NamedTemporaryFile(mode="w", delete=False, suffix=".txt") as listf:
                listf.write(f"file '{os.path.abspath(in_path)}'\n")
                listf.write(f"file '{os.path.abspath(sil_path)}'\n")
                listname = listf.name
            cmd2 = ["ffmpeg", "-y", "-f", "concat", "-safe", "0", "-i", listname, "-c", "copy", out_path]
            subprocess.run(cmd2, check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
        finally:
            try:
                os.remove(sil_path)
            except Exception:
                pass
            try:
                os.remove(listname)
            except Exception:
                pass
        return True
 def synthesize_segment_kokoro(endpoint: str, api_key: str, model: str, voice: str, text: str) -> bytes:
    headers = {"Authorization": f"Bearer {api_key}", "Content-Type": "application/json", "Accept": "*/*"}
    payload = {"model": model, "voice": voice, "input": text, "response_format": "wav"}
    r = requests.post(endpoint, json=payload, headers=headers, timeout=120)
    r.raise_for_status()
    # si viene audio
    ctype = r.headers.get("Content-Type", "")
    if ctype.startswith("audio/"):
        return r.content
    # intentar JSON base64
    try:
        j = r.json()
        for k in ("audio", "wav", "data", "base64"):
            if k in j:
                import base64
                return base64.b64decode(j[k])
    except Exception:
        pass
    # fallback
    return r.content
 def translate_with_gemini(text: str, target_lang: str, api_key: str, model: str = "gemini-2.5-flash") -> str:
    """Usa la API HTTP de Gemini para traducir un texto al idioma objetivo.
@ -326,7 +243,7 @@ def main():
    args = parser.parse_args()
-    ensure_ffmpeg()
+    ensure_ffmpeg_available()
    video = Path(args.video)
    if not video.exists():
@ -339,11 +256,9 @@ def main():
    try:
        audio_wav = os.path.join(tmpdir, "extracted_audio.wav")
        print("Extrayendo audio...")
-        extract_audio(str(video), audio_wav)
+        process_video.extract_audio(str(video), audio_wav)
-        print("Transcribiendo (y traduciendo si no se usa Gemini) ...")
+        print("Transcribiendo y traduciendo...")
        # Si se solicita Gemini, hacemos transcribe-only y luego traducimos por segmento con Gemini
        if args.use_gemini:
            # permitir pasar la key por variable de entorno GEMINI_API_KEY
            if not args.gemini_api_key:
@ -351,16 +266,16 @@ def main():
            if not args.gemini_api_key:
                print("--use-gemini requiere --gemini-api-key o la var de entorno GEMINI_API_KEY", file=sys.stderr)
                sys.exit(4)
-            # transcribir sin traducir
+            # transcribir sin traducir (luego traduciremos por segmento)
            from faster_whisper import WhisperModel
            wm = WhisperModel(args.whisper_model, device="cpu", compute_type="int8")
            segments, info = wm.transcribe(audio_wav, beam_size=5, task="transcribe")
        else:
            if args.whisper_backend == "faster-whisper":
-                segments = transcribe_and_translate_faster(audio_wav, args.whisper_model, "es")
+                segments = process_video.transcribe_and_translate_faster(audio_wav, args.whisper_model, "es")
            else:
-                segments = transcribe_and_translate_openai(audio_wav, args.whisper_model, "es")
+                segments = process_video.transcribe_and_translate_openai(audio_wav, args.whisper_model, "es")
        if not segments:
            print("No se obtuvieron segmentos; abortando", file=sys.stderr)
@ -368,7 +283,7 @@ def main():
        segs = normalize_segments(segments)
-        # si usamos gemini, traducir por segmento ahora
+        # si usamos gemini, traducir por segmento ahora (mantener la función existente)
        if args.use_gemini:
            print(f"Traduciendo {len(segs)} segmentos con Gemini (model={args.gemini_model})...")
            for s in segs:
@ -388,88 +303,32 @@ def main():
        write_srt(srt_segments, srt_out)
        print(f"SRT traducido guardado en: {srt_out}")
-        # sintetizar por segmento
+        # sintetizar todo el SRT usando KokoroHttpClient (delegar en el adapter)
-        chunk_files = []
+        kokoro_endpoint = args.kokoro_endpoint or os.environ.get("KOKORO_ENDPOINT")
-        print(f"Sintetizando {len(segs)} segmentos con Kokoro (voice={args.voice})...")
+        kokoro_key = args.api_key or os.environ.get("KOKORO_API_KEY")
-        for i, s in enumerate(segs, start=1):
+        if not kokoro_endpoint:
-            text = s.get("text", "")
+            print("--kokoro-endpoint es requerido para sintetizar (o establecer KOKORO_ENDPOINT)", file=sys.stderr)
-            if not text:
+            sys.exit(5)
                # generar silencio con la duración del segmento
                target_dur = s["end"] - s["start"]
                silent = os.path.join(tmpdir, f"chunk_{i:04d}.wav")
                cmd = [
                    "ffmpeg",
                    "-y",
                    "-f",
                    "lavfi",
                    "-i",
                    "anullsrc=channel_layout=mono:sample_rate=22050",
                    "-t",
                    f"{target_dur}",
                    "-c:a",
                    "pcm_s16le",
                    silent,
                ]
                subprocess.run(cmd, check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
                chunk_files.append(silent)
                print(f" - Segmento {i}: silencio {target_dur}s")
                continue
-            try:
+        client = KokoroHttpClient(kokoro_endpoint, api_key=kokoro_key, voice=args.voice, model=args.model)
                raw = synthesize_segment_kokoro(args.kokoro_endpoint, args.api_key, args.model, args.voice, text)
            except Exception as e:
                print(f"Error sintetizando segmento {i}: {e}")
                # fallback: generar silencio
                target_dur = s["end"] - s["start"]
                silent = os.path.join(tmpdir, f"chunk_{i:04d}.wav")
                cmd = [
                    "ffmpeg",
                    "-y",
                    "-f",
                    "lavfi",
                    "-i",
                    "anullsrc=channel_layout=mono:sample_rate=22050",
                    "-t",
                    f"{target_dur}",
                    "-c:a",
                    "pcm_s16le",
                    silent,
                ]
                subprocess.run(cmd, check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
                chunk_files.append(silent)
                continue
            # guardar raw en temp file
            tmp_chunk = os.path.join(tmpdir, f"raw_chunk_{i:04d}.bin")
            with open(tmp_chunk, "wb") as f:
                f.write(raw)
            # convertir a WAV estandar (22050 mono)
            tmp_wav = os.path.join(tmpdir, f"tmp_chunk_{i:04d}.wav")
            cmdc = ["ffmpeg", "-y", "-i", tmp_chunk, "-ar", "22050", "-ac", "1", "-sample_fmt", "s16", tmp_wav]
            subprocess.run(cmdc, check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
            # ajustar a la duración del segmento
            target_dur = s["end"] - s["start"]
            final_chunk = os.path.join(tmpdir, f"chunk_{i:04d}.wav")
            pad_or_trim(tmp_wav, final_chunk, target_dur, sr=22050)
            chunk_files.append(final_chunk)
            print(f" - Segmento {i}/{len(segs)} -> {os.path.basename(final_chunk)}")
        # concatenar chunks
        dub_wav = args.temp_dub if args.temp_dub else os.path.join(tmpdir, "dub_final.wav")
-        print("Concatenando chunks...")
+        try:
-        concat_chunks(chunk_files, dub_wav)
+            client.synthesize_from_srt(srt_out, dub_wav, video=None, align=True, keep_chunks=False)
        except Exception as e:
            print(f"Error sintetizando desde SRT con Kokoro: {e}", file=sys.stderr)
            sys.exit(6)
        print(f"Archivo dub generado en: {dub_wav}")
        # reemplazar audio en el vídeo
        replaced = os.path.join(tmpdir, "video_replaced.mp4")
        print("Reemplazando pista de audio en el vídeo...")
-        replace_audio_in_video(str(video), dub_wav, replaced)
+        ff = FFmpegAudioProcessor()
        ff.replace_audio_in_video(str(video), dub_wav, replaced)
        # quemar SRT traducido
        print("Quemando SRT traducido en el vídeo...")
-        burn_subtitles(replaced, srt_out, out_video)
+        ff.burn_subtitles(replaced, srt_out, out_video)
        print(f"Vídeo final generado: {out_video}")
--- a/whisper_project/infra/init.py
+++ b/whisper_project/infra/init.py
@ -0,0 +1,11 @@
 """Infra (adapters) package for whisper_project.
 This package exposes adapters and thin wrappers to the legacy helper modules
 while we progressively refactor implementations into adapter classes.
 """
 __all__ = ["process_video", "transcribe"]
 from . import ffmpeg_adapter
 from . import kokoro_adapter
 __all__ = ["ffmpeg_adapter", "kokoro_adapter"]
--- a/whisper_project/infra/pycache/init.cpython-313.pyc
+++ b/whisper_project/infra/pycache/init.cpython-313.pyc
--- a/whisper_project/infra/pycache/argos_adapter.cpython-313.pyc
+++ b/whisper_project/infra/pycache/argos_adapter.cpython-313.pyc
--- a/whisper_project/infra/pycache/faster_whisper_adapter.cpython-313.pyc
+++ b/whisper_project/infra/pycache/faster_whisper_adapter.cpython-313.pyc
--- a/whisper_project/infra/pycache/ffmpeg_adapter.cpython-313.pyc
+++ b/whisper_project/infra/pycache/ffmpeg_adapter.cpython-313.pyc
--- a/whisper_project/infra/pycache/gemini_adapter.cpython-313.pyc
+++ b/whisper_project/infra/pycache/gemini_adapter.cpython-313.pyc
--- a/whisper_project/infra/pycache/kokoro_adapter.cpython-313.pyc
+++ b/whisper_project/infra/pycache/kokoro_adapter.cpython-313.pyc
--- a/whisper_project/infra/pycache/kokoro_utils.cpython-313.pyc
+++ b/whisper_project/infra/pycache/kokoro_utils.cpython-313.pyc
--- a/whisper_project/infra/pycache/marian_adapter.cpython-313.pyc
+++ b/whisper_project/infra/pycache/marian_adapter.cpython-313.pyc
--- a/whisper_project/infra/pycache/process_video.cpython-313.pyc
+++ b/whisper_project/infra/pycache/process_video.cpython-313.pyc
--- a/whisper_project/infra/pycache/process_video_impl.cpython-313.pyc
+++ b/whisper_project/infra/pycache/process_video_impl.cpython-313.pyc
--- a/whisper_project/infra/pycache/transcribe.cpython-313.pyc
+++ b/whisper_project/infra/pycache/transcribe.cpython-313.pyc
--- a/whisper_project/infra/pycache/transcribe_adapter.cpython-313.pyc
+++ b/whisper_project/infra/pycache/transcribe_adapter.cpython-313.pyc
--- a/whisper_project/infra/pycache/transcribe_impl.cpython-313.pyc
+++ b/whisper_project/infra/pycache/transcribe_impl.cpython-313.pyc
--- a/whisper_project/infra/argos_adapter.py
+++ b/whisper_project/infra/argos_adapter.py
@ -0,0 +1,95 @@
 import tempfile
 import os
 from typing import Optional
 def _ensure_argos_package():
    try:
        from argostranslate import package
        installed = package.get_installed_packages()
        for p in installed:
            if p.from_code == "en" and p.to_code == "es":
                return True
        avail = package.get_available_packages()
        for p in avail:
            if p.from_code == "en" and p.to_code == "es":
                return p
    except Exception:
        return None
 def translate_srt_argos_impl(in_path: str, out_path: str) -> None:
    """Implementación interna que traduce SRT usando argostranslate si está disponible.
    Esta función intenta usar argostranslate si está instalada; si no, levanta una
    excepción para indicar que la dependencia no está disponible.
    """
    try:
        import srt  # type: ignore
    except Exception:
        raise RuntimeError("Dependencia 'srt' no encontrada. Instálela para trabajar con SRT.")
    try:
        from argostranslate import package, translate
    except Exception as e:
        raise RuntimeError("argostranslate no disponible: instale 'argostranslate' para usar este adaptador") from e
    # Asegurar paquete en->es
    ok = False
    installed = package.get_installed_packages()
    for p in installed:
        if p.from_code == "en" and p.to_code == "es":
            ok = True
            break
    if not ok:
        # intentar descargar e instalar si existe
        avail = package.get_available_packages()
        for p in avail:
            if p.from_code == "en" and p.to_code == "es":
                # intentar descargar
                download_path = tempfile.mktemp(suffix=".zip")
                try:
                    import requests
                    with requests.get(p.download_url, stream=True, timeout=60) as r:
                        r.raise_for_status()
                        with open(download_path, "wb") as fh:
                            for chunk in r.iter_content(chunk_size=8192):
                                if chunk:
                                    fh.write(chunk)
                    package.install_from_path(download_path)
                    ok = True
                finally:
                    try:
                        if os.path.exists(download_path):
                            os.remove(download_path)
                    except Exception:
                        pass
                break
    if not ok:
        raise RuntimeError("No se pudo encontrar/instalar paquete Argos en->es")
    with open(in_path, "r", encoding="utf-8") as fh:
        subs = list(srt.parse(fh.read()))
    for i, sub in enumerate(subs, start=1):
        text = sub.content.strip()
        if not text:
            continue
        tr = translate.translate(text, "en", "es")
        sub.content = tr
    with open(out_path, "w", encoding="utf-8") as fh:
        fh.write(srt.compose(subs))
 class ArgosTranslator:
    """Adapter que expone la API translate_srt(in, out)."""
    def __init__(self):
        pass
    def translate_srt(self, in_srt: str, out_srt: str) -> None:
        translate_srt_argos_impl(in_srt, out_srt)
--- a/whisper_project/infra/faster_whisper_adapter.py
+++ b/whisper_project/infra/faster_whisper_adapter.py
@ -0,0 +1,60 @@
 """Adapter wrapping faster-whisper into a small transcriber class.
 Provides a `FasterWhisperTranscriber` with a stable `transcribe` API that
 other code can depend on. Uses the implementation in
 `whisper_project.infra.transcribe`.
 """
 from typing import Optional
 from whisper_project.infra.transcribe import transcribe_faster_whisper, write_srt
 class FasterWhisperTranscriber:
    def __init__(self, model: str = "base", compute_type: str = "int8") -> None:
        self.model = model
        self.compute_type = compute_type
    def transcribe(self, file_path: str, srt_out: Optional[str] = None):
        """Transcribe the given audio file.
        If `srt_out` is provided, writes an SRT file using `write_srt`.
        Returns the segments list (as returned by faster-whisper wrapper).
        """
        segments = transcribe_faster_whisper(file_path, self.model, compute_type=self.compute_type)
        if srt_out and segments:
            write_srt(segments, srt_out)
        return segments
 __all__ = ["FasterWhisperTranscriber"]
 from typing import List
 from ..core.models import Segment
 class FasterWhisperTranscriber:
    """Adaptador que usa faster-whisper para transcribir y escribir SRT."""
    def __init__(self, model: str = "base", compute_type: str = "int8"):
        self.model = model
        self.compute_type = compute_type
    def transcribe(self, audio_path: str, srt_out: str) -> List[Segment]:
        # Importar localmente para evitar coste al importar el módulo
        from faster_whisper import WhisperModel
        from whisper_project.transcribe import write_srt, dedupe_adjacent_segments
        model_obj = WhisperModel(self.model, device="cpu", compute_type=self.compute_type)
        segments_gen, info = model_obj.transcribe(audio_path, beam_size=5)
        segments = list(segments_gen)
        # Convertir a nuestros Segment dataclass
        result_segments = []
        for s in segments:
            # faster-whisper segment tiene .start, .end, .text
            seg = Segment(start=float(s.start), end=float(s.end), text=str(s.text))
            result_segments.append(seg)
        # escribir SRT usando la función existente (acepta objetos con .start/.end/.text)
        segments_to_write = dedupe_adjacent_segments(result_segments)
        write_srt(segments_to_write, srt_out)
        return result_segments
--- a/whisper_project/infra/ffmpeg_adapter.py
+++ b/whisper_project/infra/ffmpeg_adapter.py
@ -0,0 +1,296 @@
 """Adapter for ffmpeg-related operations.
 Provides a small OO wrapper around common ffmpeg workflows used by the
 project. Methods delegate to the infra implementation where appropriate
 or run the ffmpeg commands directly for small utilities.
 """
 import subprocess
 import os
 import shutil
 import tempfile
 from typing import Iterable, List, Optional
 def ensure_ffmpeg_available() -> bool:
    """Simple check to ensure ffmpeg/ffprobe are present in PATH.
    Returns True if both are available, otherwise raises RuntimeError.
    """
    for cmd in ("ffmpeg", "ffprobe"):
        try:
            subprocess.run([cmd, "-version"], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL, check=True)
        except Exception:
            raise RuntimeError(f"Required binary not found in PATH: {cmd}")
    return True
 __all__ = ["FFmpegAudioProcessor", "ensure_ffmpeg_available"]
 import os
 import shutil
 import subprocess
 import tempfile
 from typing import Iterable, List, Optional
 def ensure_ffmpeg_available() -> None:
    if shutil.which("ffmpeg") is None:
        raise RuntimeError("ffmpeg no está disponible en PATH")
 def _run(cmd: List[str], hide_output: bool = False) -> None:
    if hide_output:
        subprocess.run(cmd, check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
    else:
        subprocess.run(cmd, check=True)
 def extract_audio(video_path: str, out_wav: str, sr: int = 16000) -> None:
    """Extrae la pista de audio de un vídeo y la convierte a WAV PCM mono a sr hz."""
    ensure_ffmpeg_available()
    cmd = [
        "ffmpeg",
        "-y",
        "-i",
        video_path,
        "-vn",
        "-acodec",
        "pcm_s16le",
        "-ar",
        str(sr),
        "-ac",
        "1",
        out_wav,
    ]
    _run(cmd)
 def replace_audio_in_video(video_path: str, audio_path: str, out_video: str) -> None:
    """Reemplaza la pista de audio del vídeo por audio_path (codifica a AAC)."""
    ensure_ffmpeg_available()
    cmd = [
        "ffmpeg",
        "-y",
        "-i",
        video_path,
        "-i",
        audio_path,
        "-map",
        "0:v:0",
        "-map",
        "1:a:0",
        "-c:v",
        "copy",
        "-c:a",
        "aac",
        "-b:a",
        "192k",
        out_video,
    ]
    _run(cmd)
 def burn_subtitles(video_path: str, srt_path: str, out_video: str, font: Optional[str] = "Arial", size: int = 24) -> None:
    """Quema subtítulos en el vídeo usando el filtro subtitles de ffmpeg.
    Nota: el path al .srt debe ser accesible y no contener caracteres problemáticos.
    """
    ensure_ffmpeg_available()
    # usar filter_complex cuando el path contiene caracteres especiales puede complicar,
    # pero normalmente subtitles=path funciona si el path es abosluto
    abs_srt = os.path.abspath(srt_path)
    vf = f"subtitles={abs_srt}:force_style='FontName={font},FontSize={size}'"
    cmd = [
        "ffmpeg",
        "-y",
        "-i",
        video_path,
        "-vf",
        vf,
        "-c:a",
        "copy",
        out_video,
    ]
    _run(cmd)
 def save_bytes_as_wav(raw_bytes: bytes, target_path: str, sr: int = 22050) -> None:
    """Guarda bytes recibidos de un servicio TTS en un WAV válido usando ffmpeg.
    Escribe bytes a un archivo temporal y usa ffmpeg para convertir al formato objetivo.
    """
    ensure_ffmpeg_available()
    with tempfile.NamedTemporaryFile(delete=False, suffix=".bin") as tmp:
        tmp.write(raw_bytes)
        tmp.flush()
        tmp_path = tmp.name
    try:
        cmd = [
            "ffmpeg",
            "-y",
            "-i",
            tmp_path,
            "-ar",
            str(sr),
            "-ac",
            "1",
            "-sample_fmt",
            "s16",
            target_path,
        ]
        _run(cmd, hide_output=True)
    except subprocess.CalledProcessError:
        # fallback: escribir bytes crudos
        with open(target_path, "wb") as out:
            out.write(raw_bytes)
    finally:
        try:
            os.remove(tmp_path)
        except Exception:
            pass
 def create_silence(duration: float, out_path: str, sr: int = 22050) -> None:
    """Crea un WAV silencioso de duración (segundos) usando anullsrc."""
    ensure_ffmpeg_available()
    cmd = [
        "ffmpeg",
        "-y",
        "-f",
        "lavfi",
        "-i",
        f"anullsrc=channel_layout=mono:sample_rate={sr}",
        "-t",
        f"{duration}",
        "-c:a",
        "pcm_s16le",
        out_path,
    ]
    try:
        _run(cmd, hide_output=True)
    except subprocess.CalledProcessError:
        # fallback: crear archivo pequeño de ceros
        with open(out_path, "wb") as fh:
            fh.write(b"\x00" * 1024)
 def pad_or_trim_wav(in_path: str, out_path: str, target_duration: float, sr: int = 22050) -> None:
    """Rellena con silencio o recorta para que el WAV tenga target_duration en segundos."""
    ensure_ffmpeg_available()
    # obtener duración con ffprobe
    try:
        p = subprocess.run(
            [
                "ffprobe",
                "-v",
                "error",
                "-show_entries",
                "format=duration",
                "-of",
                "default=noprint_wrappers=1:nokey=1",
                in_path,
            ],
            capture_output=True,
            text=True,
            check=True,
        )
        cur = float(p.stdout.strip())
    except Exception:
        cur = 0.0
    if cur == 0.0:
        shutil.copy(in_path, out_path)
        return
    if abs(cur - target_duration) < 0.02:
        shutil.copy(in_path, out_path)
        return
    if cur > target_duration:
        cmd = ["ffmpeg", "-y", "-i", in_path, "-t", f"{target_duration}", out_path]
        _run(cmd, hide_output=True)
        return
    # pad: crear silencio y concatenar
    pad = target_duration - cur
    with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as sil:
        sil_path = sil.name
    listname = None
    try:
        create_silence(pad, sil_path, sr=sr)
        with tempfile.NamedTemporaryFile(mode="w", delete=False, suffix=".txt") as listf:
            listf.write(f"file '{os.path.abspath(in_path)}'\n")
            listf.write(f"file '{os.path.abspath(sil_path)}'\n")
            listname = listf.name
        cmd2 = ["ffmpeg", "-y", "-f", "concat", "-safe", "0", "-i", listname, "-c", "copy", out_path]
        _run(cmd2, hide_output=True)
    finally:
        try:
            os.remove(sil_path)
        except Exception:
            pass
        try:
            if listname:
                os.remove(listname)
        except Exception:
            pass
 def concat_wavs(chunks: Iterable[str], out_path: str) -> None:
    """Concatena una lista de WAVs en out_path usando el demuxer concat (sin recodificar)."""
    ensure_ffmpeg_available()
    with tempfile.NamedTemporaryFile(mode="w", delete=False, suffix=".txt") as listf:
        for c in chunks:
            listf.write(f"file '{os.path.abspath(c)}'\n")
        listname = listf.name
    try:
        cmd = ["ffmpeg", "-y", "-f", "concat", "-safe", "0", "-i", listname, "-c", "copy", out_path]
        _run(cmd)
    except subprocess.CalledProcessError:
        # fallback: reconvertir por entrada concat
        tmp_concat = out_path + ".tmp.wav"
        cmd2 = ["ffmpeg", "-y", "-i", f"concat:{'|'.join(chunks)}", "-c", "copy", tmp_concat]
        _run(cmd2)
        shutil.move(tmp_concat, out_path)
    finally:
        try:
            os.remove(listname)
        except Exception:
            pass
 class FFmpegAudioProcessor:
    """Adaptador de audio que expone utilidades necesarias por el orquestador.
    Métodos principales:
    - extract_audio
    - replace_audio_in_video
    - burn_subtitles
    - save_bytes_as_wav
    - create_silence
    - pad_or_trim_wav
    - concat_wavs
    """
    def extract_audio(self, video_path: str, out_wav: str, sr: int = 16000) -> None:
        return extract_audio(video_path, out_wav, sr=sr)
    def replace_audio_in_video(self, video_path: str, audio_path: str, out_video: str) -> None:
        return replace_audio_in_video(video_path, audio_path, out_video)
    def burn_subtitles(self, video_path: str, srt_path: str, out_video: str, font: Optional[str] = "Arial", size: int = 24) -> None:
        return burn_subtitles(video_path, srt_path, out_video, font=font, size=size)
    def save_bytes_as_wav(self, raw_bytes: bytes, target_path: str, sr: int = 22050) -> None:
        return save_bytes_as_wav(raw_bytes, target_path, sr=sr)
    def create_silence(self, duration: float, out_path: str, sr: int = 22050) -> None:
        return create_silence(duration, out_path, sr=sr)
    def pad_or_trim_wav(self, in_path: str, out_path: str, target_duration: float, sr: int = 22050) -> None:
        return pad_or_trim_wav(in_path, out_path, target_duration, sr=sr)
    def concat_wavs(self, chunks: Iterable[str], out_path: str) -> None:
        return concat_wavs(chunks, out_path)
--- a/whisper_project/infra/gemini_adapter.py
+++ b/whisper_project/infra/gemini_adapter.py
@ -0,0 +1,108 @@
 import argparse
 import json
 import os
 import time
 from typing import Optional
 import requests
 try:
    import srt  # type: ignore
 except Exception:
    srt = None
 try:
    import google.generativeai as genai  # type: ignore
 except Exception:
    genai = None
 def translate_text_google_gl(text: str, api_key: str, model: str = "gemini-2.5-flash") -> str:
    if not api_key:
        raise ValueError("gemini api key required")
    if genai is not None:
        try:
            genai.configure(api_key=api_key)
            model_obj = genai.GenerativeModel(model)
            prompt = f"Traduce al español el siguiente texto y devuelve solo el texto traducido:\n\n{text}"
            resp = model_obj.generate_content(prompt, generation_config={"max_output_tokens": 1024, "temperature": 0.0})
            if hasattr(resp, "text") and resp.text:
                return resp.text.strip()
            if hasattr(resp, "candidates") and resp.candidates:
                c = resp.candidates[0]
                if hasattr(c, "content") and hasattr(c.content, "parts"):
                    parts = [p.text for p in c.content.parts if getattr(p, "text", None)]
                    if parts:
                        return "\n".join(parts).strip()
        except Exception as e:
            print(f"Warning: genai library translate failed: {e}")
    for prefix in ("v1", "v1beta2"):
        endpoint = f"https://generativelanguage.googleapis.com/{prefix}/models/{model}:generateContent?key={api_key}"
        body = {
            "prompt": {"text": f"Traduce al español el siguiente texto y devuelve solo el texto traducido:\n\n{text}"},
            "maxOutputTokens": 1024,
            "temperature": 0.0,
            "candidateCount": 1,
        }
        try:
            r = requests.post(endpoint, json=body, timeout=30)
            r.raise_for_status()
            j = r.json()
            if isinstance(j, dict) and "candidates" in j and isinstance(j["candidates"], list) and j["candidates"]:
                first = j["candidates"][0]
                if isinstance(first, dict):
                    if "content" in first and isinstance(first["content"], str):
                        return first["content"].strip()
                    if "output" in first and isinstance(first["output"], str):
                        return first["output"].strip()
                    if "content" in first and isinstance(first["content"], list):
                        parts = []
                        for c in first["content"]:
                            if isinstance(c, dict) and isinstance(c.get("text"), str):
                                parts.append(c.get("text"))
                        if parts:
                            return "\n".join(parts).strip()
            for key in ("output_text", "text", "response", "translated_text"):
                if key in j and isinstance(j[key], str):
                    return j[key].strip()
        except Exception as e:
            print(f"Warning: GL translate failed ({prefix}): {e}")
    return text
 def translate_srt_file(in_path: str, out_path: str, api_key: str, model: str):
    if srt is None:
        raise RuntimeError("Dependencia 'srt' no encontrada. Instálela para trabajar con SRT.")
    with open(in_path, "r", encoding="utf-8") as fh:
        subs = list(srt.parse(fh.read()))
    for i, sub in enumerate(subs, start=1):
        text = sub.content.strip()
        if not text:
            continue
        try:
            translated = translate_text_google_gl(text, api_key, model=model)
        except Exception as e:
            print(f"Warning: translate failed for index {sub.index}: {e}")
            translated = text
        sub.content = translated
        time.sleep(0.15)
    out_s = srt.compose(subs)
    with open(out_path, "w", encoding="utf-8") as fh:
        fh.write(out_s)
 class GeminiTranslator:
    def __init__(self, api_key: Optional[str] = None, model: str = "gemini-2.5-flash"):
        self.api_key = api_key
        self.model = model
    def translate_srt(self, in_srt: str, out_srt: str) -> None:
        key = self.api_key or os.environ.get("GEMINI_API_KEY")
        if not key:
            raise RuntimeError("GEMINI API key required for GeminiTranslator")
        translate_srt_file(in_srt, out_srt, api_key=key, model=self.model)
--- a/whisper_project/infra/kokoro_adapter.py
+++ b/whisper_project/infra/kokoro_adapter.py
@ -0,0 +1,153 @@
 import os
 import subprocess
 import shutil
 from typing import Optional
 # Importar funciones pesadas (parsing/synth) de forma perezosa dentro de
 # `synthesize_from_srt` para evitar fallos en la importación del paquete cuando
 # dependencias opcionales (p.ej. 'srt') no están instaladas.
 from .ffmpeg_adapter import FFmpegAudioProcessor
 class KokoroHttpClient:
    """Cliente HTTP para sintetizar segmentos desde un .srt usando un endpoint compatible.
    Reemplaza la invocación por subprocess a `srt_to_kokoro.py`. Reusa las funciones de
    `srt_to_kokoro.py` para parsing y síntesis HTTP (synth_chunk) y usa FFmpegAudioProcessor
    para operaciones con WAV cuando sea necesario.
    """
    def __init__(self, endpoint: str, api_key: Optional[str] = None, voice: Optional[str] = None, model: Optional[str] = None):
        self.endpoint = endpoint
        self.api_key = api_key
        self.voice = voice or "em_alex"
        self.model = model or "model"
        self._processor = FFmpegAudioProcessor()
    def synthesize_from_srt(self, srt_path: str, out_wav: str, video: Optional[str] = None, align: bool = True, keep_chunks: bool = False, mix_with_original: bool = False, mix_background_volume: float = 0.2):
        """Sintetiza cada subtítulo del SRT y concatena en out_wav.
        Parámetros claves coinciden con la versión previa del adaptador CLI para compatibilidad.
        """
        headers = {"Accept": "*/*"}
        if self.api_key:
            headers["Authorization"] = f"Bearer {self.api_key}"
        # importar las utilidades sólo cuando se vayan a usar
        try:
            from whisper_project.srt_to_kokoro import parse_srt_file, synth_chunk
        except ModuleNotFoundError as e:
            raise RuntimeError("Módulo requerido no encontrado para síntesis por SRT: instale 'srt' y 'requests' (pip install srt requests)") from e
        subs = parse_srt_file(srt_path)
        tmpdir = os.path.join(os.path.dirname(out_wav), f".kokoro_tmp_{os.getpid()}")
        os.makedirs(tmpdir, exist_ok=True)
        chunk_files = []
        prev_end = 0.0
        for i, sub in enumerate(subs, start=1):
            text = "\n".join(line.strip() for line in sub.content.splitlines()).strip()
            if not text:
                prev_end = sub.end.total_seconds()
                continue
            start_sec = sub.start.total_seconds()
            end_sec = sub.end.total_seconds()
            duration = end_sec - start_sec
            # align: insertar silencio por la brecha anterior
            if align:
                gap = start_sec - prev_end
                if gap > 0.01:
                    sil_target = os.path.join(tmpdir, f"sil_{i:04d}.wav")
                    self._processor.create_silence(gap, sil_target)
                    chunk_files.append(sil_target)
            # construir payload_template simple que reemplace {text}
            payload_template = '{"model":"%s","voice":"%s","input":"{text}","response_format":"wav"}' % (self.model, self.voice)
            try:
                raw = synth_chunk(self.endpoint, text, headers, payload_template)
            except Exception as e:
                # saltar segmento con log y continuar
                print(f"Error al sintetizar segmento {i}: {e}")
                prev_end = end_sec
                continue
            target = os.path.join(tmpdir, f"chunk_{i:04d}.wav")
            # convertir/normalizar bytes a wav
            self._processor.save_bytes_as_wav(raw, target)
            if align:
                aligned = os.path.join(tmpdir, f"chunk_{i:04d}.aligned.wav")
                self._processor.pad_or_trim_wav(target, aligned, duration)
                chunk_files.append(aligned)
                if not keep_chunks:
                    try:
                        os.remove(target)
                    except Exception:
                        pass
            else:
                chunk_files.append(target)
            prev_end = end_sec
            print(f" - Segmento {i}/{len(subs)} -> {os.path.basename(chunk_files[-1])}")
        if not chunk_files:
            raise RuntimeError("No se generaron fragmentos de audio desde el SRT")
        # concatenar
        self._processor.concat_wavs(chunk_files, out_wav)
        # operaciones opcionales: mezclar o reemplazar en vídeo original
        if mix_with_original and video:
            # extraer audio original y mezclar: delegar a srt_to_kokoro original no es necesario
            # aquí podemos replicar la estrategia previa: extraer audio, usar ffmpeg para mezclar
            orig_tmp = os.path.join(tmpdir, f"orig_{os.getpid()}.wav")
            try:
                self._processor.extract_audio(video, orig_tmp, sr=22050)
                # mezclar usando ffmpeg filter_complex
                mixed_tmp = os.path.join(tmpdir, f"mixed_{os.getpid()}.wav")
                vol = float(mix_background_volume)
                cmd = [
                    "ffmpeg",
                    "-y",
                    "-i",
                    out_wav,
                    "-i",
                    orig_tmp,
                    "-filter_complex",
                    f"[0:a]volume=1[a1];[1:a]volume={vol}[a0];[a1][a0]amix=inputs=2:duration=first:dropout_transition=0[mix]",
                    "-map",
                    "[mix]",
                    "-c:a",
                    "pcm_s16le",
                    mixed_tmp,
                ]
                subprocess.run(cmd, check=True)
                shutil.move(mixed_tmp, out_wav)
            finally:
                try:
                    if os.path.exists(orig_tmp):
                        os.remove(orig_tmp)
                except Exception:
                    pass
        if video:
            # si se pidió reemplazar la pista original
            out_video = os.path.splitext(video)[0] + ".replaced_audio.mp4"
            try:
                self._processor.replace_audio_in_video(video, out_wav, out_video)
            except Exception as e:
                print(f"Error al reemplazar audio en el vídeo: {e}")
        # limpieza: opcional conservar tmpdir si keep_chunks
        if not keep_chunks:
            try:
                import shutil as _sh
                _sh.rmtree(tmpdir, ignore_errors=True)
            except Exception:
                pass
--- a/whisper_project/infra/kokoro_utils.py
+++ b/whisper_project/infra/kokoro_utils.py
@ -0,0 +1,261 @@
 """Utilidades reutilizables para síntesis a partir de SRT.
 Contiene parsing del SRT, llamada HTTP al endpoint TTS y helpers ffmpeg
 para convertir/concatenar/padear segmentos. Estas funciones eran previamente
 parte de `srt_to_kokoro.py` y se mueven aquí para ser reutilizables por
 adaptadores y tests.
 """
 import json
 import os
 import re
 import shutil
 import subprocess
 import tempfile
 from typing import Optional
 try:
    import requests
 except Exception:
    # Dejar que el import falle en tiempo de uso (cliente perezoso) si no está instalado
    requests = None
 try:
    import srt
 except Exception:
    srt = None
 def find_synthesis_endpoint(openapi_url: str) -> Optional[str]:
    """Intento heurístico: baja openapi.json y busca paths con palabras clave.
    Retorna la URL completa del path candidato o None.
    """
    if requests is None:
        raise RuntimeError("'requests' no está disponible")
    try:
        r = requests.get(openapi_url, timeout=20)
        r.raise_for_status()
        spec = r.json()
    except Exception:
        return None
    paths = spec.get("paths", {})
    candidate = None
    for path, methods in paths.items():
        lname = path.lower()
        if any(k in lname for k in ("synth", "tts", "text", "synthesize")):
            for method, op in methods.items():
                if method.lower() == "post":
                    candidate = path
                    break
        if candidate:
            break
    if not candidate:
        for path, methods in paths.items():
            for method, op in methods.items():
                meta = json.dumps(op).lower()
                if any(k in meta for k in ("synth", "tts", "text", "synthesize")) and method.lower() == "post":
                    candidate = path
                    break
            if candidate:
                break
    if not candidate:
        return None
    from urllib.parse import urlparse, urljoin
    p = urlparse(openapi_url)
    base = f"{p.scheme}://{p.netloc}"
    return urljoin(base, candidate)
 def parse_srt_file(path: str):
    if srt is None:
        raise RuntimeError("El paquete 'srt' no está instalado")
    with open(path, "r", encoding="utf-8") as f:
        raw = f.read()
    return list(srt.parse(raw))
 def synth_chunk(endpoint: str, text: str, headers: dict, payload_template: Optional[str], timeout=60):
    """Envía la solicitud y devuelve bytes de audio.
    Maneja respuestas audio/* o JSON con campo base64.
    """
    if requests is None:
        raise RuntimeError("El paquete 'requests' no está instalado")
    if payload_template:
        body = payload_template.replace("{text}", text)
        try:
            json_body = json.loads(body)
        except Exception:
            json_body = {"text": text}
    else:
        json_body = {"text": text}
    r = requests.post(endpoint, json=json_body, headers=headers, timeout=timeout)
    r.raise_for_status()
    ctype = r.headers.get("Content-Type", "")
    if ctype.startswith("audio/"):
        return r.content
    try:
        j = r.json()
        for k in ("audio", "wav", "data", "base64"):
            if k in j:
                val = j[k]
                import base64
                try:
                    return base64.b64decode(val)
                except Exception:
                    pass
    except Exception:
        pass
    return r.content
 def ensure_ffmpeg():
    if shutil.which("ffmpeg") is None:
        raise RuntimeError("ffmpeg no está disponible en PATH")
 def convert_and_save(raw_bytes: bytes, target_path: str):
    """Guarda bytes a un archivo temporal y convierte a WAV PCM 22050 mono."""
    with tempfile.NamedTemporaryFile(delete=False, suffix=".bin") as tmp:
        tmp.write(raw_bytes)
        tmp.flush()
        tmp_path = tmp.name
    cmd = [
        "ffmpeg",
        "-y",
        "-i",
        tmp_path,
        "-ar",
        "22050",
        "-ac",
        "1",
        "-sample_fmt",
        "s16",
        target_path,
    ]
    try:
        subprocess.run(cmd, check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
    except subprocess.CalledProcessError:
        with open(target_path, "wb") as out:
            out.write(raw_bytes)
    finally:
        try:
            os.remove(tmp_path)
        except Exception:
            pass
 def create_silence(duration: float, out_path: str, sr: int = 22050):
    cmd = [
        "ffmpeg",
        "-y",
        "-f",
        "lavfi",
        "-i",
        f"anullsrc=channel_layout=mono:sample_rate={sr}",
        "-t",
        f"{duration}",
        "-c:a",
        "pcm_s16le",
        out_path,
    ]
    try:
        subprocess.run(cmd, check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
    except subprocess.CalledProcessError:
        try:
            with open(out_path, "wb") as fh:
                fh.write(b"\x00" * 1024)
        except Exception:
            pass
 def pad_or_trim_wav(in_path: str, out_path: str, target_duration: float, sr: int = 22050):
    try:
        p = subprocess.run(
            [
                "ffprobe",
                "-v",
                "error",
                "-show_entries",
                "format=duration",
                "-of",
                "default=noprint_wrappers=1:nokey=1",
                in_path,
            ],
            capture_output=True,
            text=True,
            check=True,
        )
        cur = float(p.stdout.strip())
    except Exception:
        cur = 0.0
    if cur == 0.0:
        shutil.copy(in_path, out_path)
        return
    if abs(cur - target_duration) < 0.02:
        shutil.copy(in_path, out_path)
        return
    if cur > target_duration:
        cmd = ["ffmpeg", "-y", "-i", in_path, "-t", f"{target_duration}", out_path]
        subprocess.run(cmd, check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
        return
    pad = target_duration - cur
    with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as sil:
        sil_path = sil.name
    listname = None
    try:
        create_silence(pad, sil_path, sr=sr)
        with tempfile.NamedTemporaryFile(mode="w", delete=False, suffix=".txt") as listf:
            listf.write(f"file '{os.path.abspath(in_path)}'\n")
            listf.write(f"file '{os.path.abspath(sil_path)}'\n")
            listname = listf.name
        cmd2 = ["ffmpeg", "-y", "-f", "concat", "-safe", "0", "-i", listname, "-c", "copy", out_path]
        subprocess.run(cmd2, check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
    finally:
        try:
            os.remove(sil_path)
        except Exception:
            pass
        try:
            if listname:
                os.remove(listname)
        except Exception:
            pass
 def concat_chunks(chunks: list, out_path: str):
    ensure_ffmpeg()
    with tempfile.NamedTemporaryFile(mode="w", delete=False, suffix=".txt") as listf:
        for c in chunks:
            listf.write(f"file '{os.path.abspath(c)}'\n")
        listname = listf.name
    try:
        cmd = ["ffmpeg", "-y", "-f", "concat", "-safe", "0", "-i", listname, "-c", "copy", out_path]
        subprocess.run(cmd, check=True)
    except subprocess.CalledProcessError:
        tmp_concat = out_path + ".tmp.wav"
        cmd2 = ["ffmpeg", "-y", "-i", f"concat:{'|'.join(chunks)}", "-c", "copy", tmp_concat]
        subprocess.run(cmd2)
        shutil.move(tmp_concat, out_path)
    finally:
        try:
            os.remove(listname)
        except Exception:
            pass
--- a/whisper_project/infra/marian_adapter.py
+++ b/whisper_project/infra/marian_adapter.py
@ -0,0 +1,117 @@
 from typing import Callable, List, Optional
 def _default_translator_factory(model_name: str = "Helsinki-NLP/opus-mt-en-es", batch_size: int = 8):
    """Crea una función translator(texts: List[str]) -> List[str] usando transformers.
    La creación se hace perezosamente para evitar obligar la dependencia en import-time.
    """
    def make():
        try:
            from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
        except Exception as e:
            raise RuntimeError("transformers no disponible: instale 'transformers' y 'sentencepiece' para traducción local") from e
        tok = AutoTokenizer.from_pretrained(model_name)
        model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
        def translator(texts: List[str]) -> List[str]:
            outs = []
            # procesar en batches simples
            for i in range(0, len(texts), batch_size):
                batch = texts[i : i + batch_size]
                enc = tok(batch, return_tensors="pt", padding=True, truncation=True)
                gen = model.generate(**enc, max_length=512)
                dec = tok.batch_decode(gen, skip_special_tokens=True)
                outs.extend([d.strip() for d in dec])
            return outs
        return translator
    return make()
 def translate_srt(in_path: str, out_path: str, *, model_name: str = "Helsinki-NLP/opus-mt-en-es", batch_size: int = 8, translator: Optional[Callable[[List[str]], List[str]]] = None) -> None:
    """Traduce un archivo SRT manteniendo índices y timestamps.
    Parámetros:
    - in_path, out_path: rutas de entrada/salida
    - model_name, batch_size: usados si `translator` es None
    - translator: función opcional que recibe lista de textos y devuelve lista de textos traducidos.
    """
    # Importar srt perezosamente; si no está disponible, usar un parser mínimo
    try:
        import srt  # type: ignore
        def _read_srt(path: str):
            with open(path, "r", encoding="utf-8") as f:
                raw = f.read()
            return list(srt.parse(raw))
        def _write_srt(path: str, subs):
            with open(path, "w", encoding="utf-8") as f:
                f.write(srt.compose(subs))
        subs = _read_srt(in_path)
        texts = [sub.content.strip() for sub in subs]
        _compose_fn = lambda out_path, subs_list: _write_srt(out_path, subs_list)
    except Exception:
        # Fallback mínimo: parsear bloques simples de SRT (no soporta todos los casos)
        def _parse_simple(raw_text: str):
            blocks = [b.strip() for b in raw_text.strip().split("\n\n") if b.strip()]
            parsed = []
            for b in blocks:
                lines = b.splitlines()
                if len(lines) < 3:
                    continue
                idx = lines[0]
                times = lines[1]
                content = "\n".join(lines[2:])
                parsed.append({"index": idx, "times": times, "content": content})
            return parsed
        def _compose_simple(parsed, out_path: str):
            with open(out_path, "w", encoding="utf-8") as f:
                for i, item in enumerate(parsed, start=1):
                    f.write(f"{item['index']}\n")
                    f.write(f"{item['times']}\n")
                    f.write(f"{item['content']}\n\n")
        with open(in_path, "r", encoding="utf-8") as f:
            raw = f.read()
        subs = _parse_simple(raw)
        texts = [s["content"].strip() for s in subs]
        _compose_fn = lambda out_path, subs_list: _compose_simple(subs_list, out_path)
    if translator is None:
        translator = _default_translator_factory(model_name=model_name, batch_size=batch_size)
    translated = translator(texts)
    if len(translated) != len(subs):
        raise RuntimeError("El traductor devolvió un número distinto de segmentos traducidos")
    # Asignar traducidos en la estructura usada (objeto srt o dict simple)
    if subs and isinstance(subs[0], dict):
        for s, t in zip(subs, translated):
            s["content"] = t.strip()
        _compose_fn(out_path, subs)
    else:
        for sub, t in zip(subs, translated):
            sub.content = t.strip()
        _compose_fn(out_path, subs)
 class MarianTranslator:
    """Adapter que ofrece una API simple para uso en usecases.
    Internamente llama a `translate_srt` y permite inyectar un traductor para tests.
    """
    def __init__(self, model_name: str = "Helsinki-NLP/opus-mt-en-es", batch_size: int = 8):
        self.model_name = model_name
        self.batch_size = batch_size
    def translate_srt(self, in_srt: str, out_srt: str, translator: Optional[Callable[[List[str]], List[str]]] = None) -> None:
        translate_srt(in_srt, out_srt, model_name=self.model_name, batch_size=self.batch_size, translator=translator)
--- a/whisper_project/infra/process_video.py
+++ b/whisper_project/infra/process_video.py
@ -0,0 +1,40 @@
 """Infra wrapper exposing ffmpeg and transcription helpers via adapters.
 This module provides backward-compatible functions but delegates to the
 adapter implementations in `ffmpeg_adapter` and `transcribe`.
 """
 from .ffmpeg_adapter import FFmpegAudioProcessor
 from . import transcribe as _trans
 _FF = FFmpegAudioProcessor()
 def extract_audio(video_path: str, out_wav: str, sr: int = 16000):
    return _FF.extract_audio(video_path, out_wav)
 def burn_subtitles(video_path: str, srt_path: str, out_video: str, font: str = "Arial", size: int = 24):
    return _FF.burn_subtitles(video_path, srt_path, out_video, font=font, size=size)
 def replace_audio_in_video(video_path: str, audio_path: str, out_video: str):
    return _FF.replace_audio_in_video(video_path, audio_path, out_video)
 def get_audio_duration(file_path: str):
    return _trans.get_audio_duration(file_path)
 def transcribe_segmented_with_tempfiles(*args, **kwargs):
    return _trans.transcribe_segmented_with_tempfiles(*args, **kwargs)
 __all__ = [
    "extract_audio",
    "burn_subtitles",
    "replace_audio_in_video",
    "get_audio_duration",
    "transcribe_segmented_with_tempfiles",
 ]
--- a/whisper_project/infra/process_video_impl.py
+++ b/whisper_project/infra/process_video_impl.py
@ -0,0 +1,10 @@
 """Deprecated implementation module.
 All functionality has been moved into adapter classes under
 `whisper_project.infra`. Importing this module will raise an
 ImportError to encourage use of the adapter APIs.
 """
 raise ImportError(
    "process_video_impl has been removed: use whisper_project.infra.ffmpeg_adapter"
 )
--- a/whisper_project/infra/transcribe.py
+++ b/whisper_project/infra/transcribe.py
@ -0,0 +1,66 @@
 """Infra layer: expose a simple module-level API backed by
 `TranscribeService` adapter.
 This replaces the previous re-export from `transcribe_impl` so the
 implementation lives inside the adapter class.
 """
 from .transcribe_adapter import TranscribeService
 # default service instance used by module-level helpers
 _DEFAULT = TranscribeService()
 def transcribe_openai_whisper(file: str):
    return _DEFAULT.transcribe_openai(file)
 def transcribe_transformers(file: str):
    return _DEFAULT.transcribe_transformers(file)
 def transcribe_faster_whisper(file: str):
    return _DEFAULT.transcribe_faster(file)
 def write_srt(segments, out_path: str):
    return _DEFAULT.write_srt(segments, out_path)
 def dedupe_adjacent_segments(segments):
    return _DEFAULT.dedupe_adjacent_segments(segments)
 def get_audio_duration(file_path: str):
    return _DEFAULT.get_audio_duration(file_path)
 def make_uniform_segments(duration: float, seg_seconds: float):
    return _DEFAULT.make_uniform_segments(duration, seg_seconds)
 def transcribe_segmented_with_tempfiles(*args, **kwargs):
    return _DEFAULT.transcribe_segmented_with_tempfiles(*args, **kwargs)
 def tts_synthesize(text: str, out_path: str, model: str = "kokoro") -> bool:
    return _DEFAULT.tts_synthesize(text, out_path, model=model)
 def ensure_tts_model(repo_id: str):
    return _DEFAULT.ensure_tts_model(repo_id)
 __all__ = [
    "transcribe_openai_whisper",
    "transcribe_transformers",
    "transcribe_faster_whisper",
    "write_srt",
    "dedupe_adjacent_segments",
    "get_audio_duration",
    "make_uniform_segments",
    "transcribe_segmented_with_tempfiles",
    "tts_synthesize",
    "ensure_tts_model",
 ]
--- a/whisper_project/infra/transcribe_adapter.py
+++ b/whisper_project/infra/transcribe_adapter.py
@ -0,0 +1,279 @@
 """Transcribe service adapter.
 Provides a small class that wraps transcription and SRT helper functions
 so callers can depend on an object instead of free functions.
 """
 from typing import Optional
 """Transcribe service with inlined implementation.
 This class contains the transcription and SRT utilities previously in
 `transcribe_impl.py`. Keeping it here as a single adapter simplifies DI
 and makes it easier to unit-test.
 """
 from pathlib import Path
 class TranscribeService:
    def __init__(self, model: str = "base", compute_type: str = "int8") -> None:
        self.model = model
        self.compute_type = compute_type
    def transcribe_openai(self, file: str):
        import whisper
        print(f"Cargando openai-whisper modelo={self.model} en CPU...")
        m = whisper.load_model(self.model, device="cpu")
        print("Transcribiendo...")
        result = m.transcribe(file, fp16=False)
        segments = result.get("segments", None)
        if segments:
            for seg in segments:
                print(seg.get("text", ""))
            return segments
        else:
            print(result.get("text", ""))
            return None
    def transcribe_transformers(self, file: str):
        import torch
        from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
        device = "cpu"
        torch_dtype = torch.float32
        print(f"Cargando transformers modelo={self.model} en CPU...")
        model_obj = AutoModelForSpeechSeq2Seq.from_pretrained(self.model, torch_dtype=torch_dtype, low_cpu_mem_usage=True)
        model_obj.to(device)
        processor = AutoProcessor.from_pretrained(self.model)
        pipe = pipeline(
            "automatic-speech-recognition",
            model=model_obj,
            tokenizer=processor.tokenizer,
            feature_extractor=processor.feature_extractor,
            device=-1,
        )
        print("Transcribiendo...")
        result = pipe(file)
        if isinstance(result, dict):
            print(result.get("text", ""))
        else:
            print(result)
        return None
    def transcribe_faster(self, file: str):
        from faster_whisper import WhisperModel
        print(f"Cargando faster-whisper modelo={self.model} en CPU compute_type={self.compute_type}...")
        model_obj = WhisperModel(self.model, device="cpu", compute_type=self.compute_type)
        print("Transcribiendo...")
        segments_gen, info = model_obj.transcribe(file, beam_size=5)
        segments = list(segments_gen)
        text = "".join([seg.text for seg in segments])
        print(text)
        return segments
    def _format_timestamp(self, seconds: float) -> str:
        millis = int((seconds - int(seconds)) * 1000)
        h = int(seconds // 3600)
        m = int((seconds % 3600) // 60)
        s = int(seconds % 60)
        return f"{h:02d}:{m:02d}:{s:02d},{millis:03d}"
    def write_srt(self, segments, out_path: str):
        lines = []
        for i, seg in enumerate(segments, start=1):
            if hasattr(seg, "start"):
                start = float(seg.start)
                end = float(seg.end)
                text = seg.text if hasattr(seg, "text") else str(seg)
            else:
                start = float(seg.get("start", 0.0))
                end = float(seg.get("end", 0.0))
                text = seg.get("text", "")
            start_ts = self._format_timestamp(start)
            end_ts = self._format_timestamp(end)
            lines.append(str(i))
            lines.append(f"{start_ts} --> {end_ts}")
            for line in str(text).strip().splitlines():
                lines.append(line)
            lines.append("")
        Path(out_path).write_text("\n".join(lines), encoding="utf-8")
    def dedupe_adjacent_segments(self, segments):
        if not segments:
            return segments
        norm = []
        for s in segments:
            if hasattr(s, "start"):
                norm.append({"start": float(s.start), "end": float(s.end), "text": getattr(s, "text", "")})
            else:
                norm.append({"start": float(s.get("start", 0.0)), "end": float(s.get("end", 0.0)), "text": s.get("text", "")})
        out = [norm[0].copy()]
        for seg in norm[1:]:
            prev = out[-1]
            a = (prev.get("text") or "").strip()
            b = (seg.get("text") or "").strip()
            if not a or not b:
                out.append(seg.copy())
                continue
            a_words = a.split()
            b_words = b.split()
            max_ol = 0
            max_k = min(len(a_words), len(b_words), 10)
            for k in range(1, max_k + 1):
                if a_words[-k:] == b_words[:k]:
                    max_ol = k
            if max_ol > 0:
                new_b = " ".join(b_words[max_ol:]).strip()
                new_seg = seg.copy()
                new_seg["text"] = new_b
                out.append(new_seg)
            else:
                out.append(seg.copy())
        return out
    def get_audio_duration(self, file_path: str):
        try:
            import subprocess
            cmd = [
                "ffprobe",
                "-v",
                "error",
                "-show_entries",
                "format=duration",
                "-of",
                "default=noprint_wrappers=1:nokey=1",
                file_path,
            ]
            out = subprocess.check_output(cmd, stderr=subprocess.DEVNULL)
            return float(out.strip())
        except Exception:
            return None
    def make_uniform_segments(self, duration: float, seg_seconds: float):
        segments = []
        if duration <= 0 or seg_seconds <= 0:
            return segments
        start = 0.0
        while start < duration:
            end = min(start + seg_seconds, duration)
            segments.append({"start": round(start, 3), "end": round(end, 3)})
            start = end
        return segments
    def transcribe_segmented_with_tempfiles(self, src_file: str, segments: list, backend: str = "faster-whisper", model: str = "base", compute_type: str = "int8", overlap: float = 0.2):
        import subprocess
        import tempfile
        results = []
        for seg in segments:
            start = max(0.0, float(seg["start"]) - overlap)
            end = float(seg["end"]) + overlap
            duration = end - start
            with tempfile.NamedTemporaryFile(suffix=".wav", delete=True) as tmp:
                tmp_path = tmp.name
                cmd = [
                    "ffmpeg",
                    "-y",
                    "-ss",
                    str(start),
                    "-t",
                    str(duration),
                    "-i",
                    src_file,
                    "-ar",
                    "16000",
                    "-ac",
                    "1",
                    tmp_path,
                ]
                try:
                    subprocess.run(cmd, check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
                except Exception:
                    results.append({"start": seg["start"], "end": seg["end"], "text": ""})
                    continue
                try:
                    if backend == "openai-whisper":
                        import whisper
                        m = whisper.load_model(model, device="cpu")
                        res = m.transcribe(tmp_path, fp16=False)
                        text = res.get("text", "")
                    elif backend == "transformers":
                        import torch
                        from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
                        torch_dtype = torch.float32
                        model_obj = AutoModelForSpeechSeq2Seq.from_pretrained(model, torch_dtype=torch_dtype, low_cpu_mem_usage=True)
                        model_obj.to("cpu")
                        processor = AutoProcessor.from_pretrained(model)
                        pipe = pipeline(
                            "automatic-speech-recognition",
                            model=model_obj,
                            tokenizer=processor.tokenizer,
                            feature_extractor=processor.feature_extractor,
                            device=-1,
                        )
                        out = pipe(tmp_path)
                        text = out["text"] if isinstance(out, dict) else str(out)
                    else:
                        from faster_whisper import WhisperModel
                        wmodel = WhisperModel(model, device="cpu", compute_type=compute_type)
                        segs_gen, info = wmodel.transcribe(tmp_path, beam_size=5)
                        segs = list(segs_gen)
                        text = "".join([s.text for s in segs])
                except Exception:
                    text = ""
                results.append({"start": seg["start"], "end": seg["end"], "text": text})
        return results
    def tts_synthesize(self, text: str, out_path: str, model: str = "kokoro"):
        try:
            from TTS.api import TTS
            tts = TTS(model_name=model, progress_bar=False, gpu=False)
            tts.tts_to_file(text=text, file_path=out_path)
            return True
        except Exception:
            try:
                import pyttsx3
                engine = pyttsx3.init()
                engine.save_to_file(text, out_path)
                engine.runAndWait()
                return True
            except Exception:
                return False
    def ensure_tts_model(self, repo_id: str):
        try:
            from huggingface_hub import snapshot_download
            try:
                local_dir = snapshot_download(repo_id, repo_type="model")
            except Exception:
                local_dir = snapshot_download(repo_id)
            return local_dir
        except Exception:
            return repo_id
 __all__ = ["TranscribeService"]
--- a/whisper_project/main.py
+++ b/whisper_project/main.py
@ -0,0 +1,158 @@
 #!/usr/bin/env python3
 """CLI mínimo que expone el orquestador principal.
 Este módulo proporciona la función `main()` que construye los adaptadores
 por defecto e invoca `PipelineOrchestrator.run(...)`. Está diseñado para
 reemplazar el antiguo `run_full_pipeline.py` como punto de entrada.
 """
 from __future__ import annotations
 import argparse
 import glob
 import os
 import shutil
 import sys
 import tempfile
 from whisper_project.usecases.orchestrator import PipelineOrchestrator
 from whisper_project.infra.kokoro_adapter import KokoroHttpClient
 def main():
    p = argparse.ArgumentParser()
    p.add_argument("--video", required=True)
    p.add_argument("--srt", help="SRT de entrada (opcional)")
    p.add_argument(
        "--kokoro-endpoint",
        required=False,
        default="https://kokoro.example/api/synthesize",
        help=(
            "Endpoint HTTP de Kokoro (por defecto: "
            "https://kokoro.example/api/synthesize)"
        ),
    )
    p.add_argument("--kokoro-key", required=False)
    p.add_argument("--voice", default="em_alex")
    p.add_argument("--kokoro-model", default="model")
    p.add_argument("--whisper-model", default="base")
    p.add_argument(
        "--translate-method",
        choices=[
            "local",
            "gemini",
            "argos",
            "none",
        ],
        default="local",
    )
    p.add_argument(
        "--gemini-key",
        default=None,
        help=(
            "API key para Gemini (si eliges "
            "--translate-method=gemini)"
        ),
    )
    p.add_argument("--mix", action="store_true")
    p.add_argument("--mix-background-volume", type=float, default=0.2)
    p.add_argument("--keep-chunks", action="store_true")
    p.add_argument("--keep-temp", action="store_true")
    p.add_argument(
        "--dry-run",
        action="store_true",
        help="Simular pasos sin ejecutar",
    )
    args = p.parse_args()
    video = os.path.abspath(args.video)
    if not os.path.exists(video):
        print("Vídeo no encontrado:", video, file=sys.stderr)
        sys.exit(2)
    workdir = tempfile.mkdtemp(prefix="full_pipeline_")
    try:
        # construir cliente Kokoro HTTP nativo e inyectarlo en el orquestador
        kokoro_client = KokoroHttpClient(
            args.kokoro_endpoint,
            api_key=args.kokoro_key,
            voice=args.voice,
            model=args.kokoro_model,
        )
        orchestrator = PipelineOrchestrator(
            kokoro_endpoint=args.kokoro_endpoint,
            kokoro_key=args.kokoro_key,
            voice=args.voice,
            kokoro_model=args.kokoro_model,
            tts_client=kokoro_client,
        )
        result = orchestrator.run(
            video=video,
            srt=args.srt,
            workdir=workdir,
            translate_method=args.translate_method,
            gemini_api_key=args.gemini_key,
            whisper_model=args.whisper_model,
            mix=args.mix,
            mix_background_volume=args.mix_background_volume,
            keep_chunks=args.keep_chunks,
            dry_run=args.dry_run,
        )
        # Si no es dry-run, crear una subcarpeta por proyecto en output/
        # (output/<basename-of-video>) y mover allí los artefactos generados.
        final_path = None
        if (
            not args.dry_run
            and result
            and getattr(result, "burned_video", None)
        ):
            base = os.path.splitext(os.path.basename(video))[0]
            project_out = os.path.join(os.getcwd(), "output", base)
            try:
                os.makedirs(project_out, exist_ok=True)
            except Exception:
                pass
            # Mover el vídeo principal
            src = result.burned_video
            dest = os.path.join(project_out, os.path.basename(src))
            try:
                if os.path.abspath(src) != os.path.abspath(dest):
                    shutil.move(src, dest)
                final_path = dest
            except Exception:
                final_path = src
            # También mover otros artefactos que empiecen por el basename
            try:
                pattern = os.path.join(os.getcwd(), f"{base}*")
                for p in glob.glob(pattern):
                    # no mover el archivo fuente ya movido
                    if os.path.abspath(p) == os.path.abspath(final_path):
                        continue
                    # mover sólo ficheros regulares
                    try:
                        if os.path.isfile(p):
                            shutil.move(p, os.path.join(project_out, os.path.basename(p)))
                    except Exception:
                        pass
            except Exception:
                pass
        else:
            # En dry-run o sin resultado, no movemos nada
            final_path = getattr(result, "burned_video", None)
        print("Flujo completado. Vídeo final:", final_path)
    finally:
        if not args.keep_temp:
            try:
                shutil.rmtree(workdir)
            except Exception:
                pass
 if __name__ == "__main__":
    main()
--- a/whisper_project/process_video.py
+++ b/whisper_project/process_video.py
@ -1,179 +0,0 @@
 #!/usr/bin/env python3
 """Procesamiento de vídeo: extrae audio, transcribe/traduce y
 quema subtítulos.
 Flujo:
 - Extrae audio con ffmpeg (WAV 16k mono)
 - Transcribe con faster-whisper o openai-whisper
    (opción task='translate')
 - Escribe SRT y lo incrusta en el vídeo con ffmpeg
 Nota: requiere ffmpeg instalado y, para modelos, faster-whisper
 o openai-whisper.
 """
 import argparse
 import subprocess
 import tempfile
 from pathlib import Path
 import sys
 from transcribe import write_srt
 def extract_audio(video_path: str, out_audio: str):
    cmd = [
        "ffmpeg",
        "-y",
        "-i",
        video_path,
        "-vn",
        "-acodec",
        "pcm_s16le",
        "-ar",
        "16000",
        "-ac",
        "1",
        out_audio,
    ]
    subprocess.run(cmd, check=True)
 def burn_subtitles(video_path: str, srt_path: str, out_video: str):
    # Usar filtro subtitles de ffmpeg
    cmd = [
        "ffmpeg",
        "-y",
        "-i",
        video_path,
        "-vf",
        f"subtitles={srt_path}",
        "-c:a",
        "copy",
        out_video,
    ]
    subprocess.run(cmd, check=True)
 def transcribe_and_translate_faster(audio_path: str, model: str, target: str):
    from faster_whisper import WhisperModel
    wm = WhisperModel(model, device="cpu", compute_type="int8")
    segments, info = wm.transcribe(
        audio_path, beam_size=5, task="translate", language=target
    )
    return segments
 def transcribe_and_translate_openai(audio_path: str, model: str, target: str):
    import whisper
    m = whisper.load_model(model, device="cpu")
    result = m.transcribe(
        audio_path, fp16=False, task="translate", language=target
    )
    return result.get("segments", None)
 def main():
    parser = argparse.ArgumentParser(
        description=(
            "Extraer, transcribir/traducir y quemar subtítulos en vídeo"
            " (offline)"
        )
    )
    parser.add_argument(
        "--video", "-v", required=True, help="Ruta del archivo de vídeo"
    )
    parser.add_argument(
        "--backend",
        "-b",
        choices=["faster-whisper", "openai-whisper"],
        default="faster-whisper",
    )
    parser.add_argument(
        "--model",
        "-m",
        default="base",
        help="Modelo de whisper a usar (tiny, base, etc.)",
    )
    parser.add_argument(
        "--to", "-t", default="es", help="Idioma de destino para traducción"
    )
    parser.add_argument(
        "--out",
        "-o",
        default=None,
        help=(
            "Ruta del vídeo de salida (si no se especifica,"
            " se usa input_burned.mp4)"
        ),
    )
    parser.add_argument(
        "--srt",
        default=None,
        help=(
            "Ruta SRT a escribir (si no se especifica,"
            " se usa input.srt)"
        ),
    )
    args = parser.parse_args()
    video = Path(args.video)
    if not video.exists():
        print("Vídeo no encontrado", file=sys.stderr)
        sys.exit(2)
    out_video = (
        args.out
        if args.out
        else str(video.with_name(video.stem + "_burned.mp4"))
    )
    srt_path = args.srt if args.srt else str(video.with_suffix('.srt'))
    with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp:
        audio_path = tmp.name
    try:
        print("Extrayendo audio con ffmpeg...")
        extract_audio(str(video), audio_path)
        print(
            f"Transcribiendo y traduciendo a '{args.to}'"
            f" usando {args.backend}..."
        )
        if args.backend == "faster-whisper":
            segments = transcribe_and_translate_faster(
                audio_path, args.model, args.to
            )
        else:
            segments = transcribe_and_translate_openai(
                audio_path, args.model, args.to
            )
        if not segments:
            print(
                "No se obtuvieron segmentos de la transcripción",
                file=sys.stderr,
            )
            sys.exit(3)
        print(f"Escribiendo SRT en {srt_path}...")
        write_srt(segments, srt_path)
        print(
            f"Quemando subtítulos en el vídeo -> {out_video}"
            f" (esto puede tardar)..."
        )
        burn_subtitles(str(video), srt_path, out_video)
        print("Proceso completado.")
    finally:
        try:
            Path(audio_path).unlink()
        except Exception:
            pass
 if __name__ == "__main__":
    main()
--- a/whisper_project/run_full_pipeline.py
+++ b/whisper_project/run_full_pipeline.py
@ -1,449 +1,13 @@
 #!/usr/bin/env python3
-# Orquesta: transcripción -> traducción -> síntesis por segmento -> reemplazo/mezcla -> quemado de subtítulos
+"""Compatibility shim: run_full_pipeline
-import argparse
+This module forwards to `whisper_project.main:main` to preserve the
-import os
+historical CLI entrypoint name expected by tests and users.
-import shlex
+"""
-import shutil
+from __future__ import annotations
-import subprocess
+
-import sys
+from whisper_project.main import main
 import tempfile
-def run(cmd, dry_run=False, env=None):
+if __name__ == "__main__":
    # Ejecuta un comando. Acepta str (ejecuta vía shell) o list (sin shell).
    # Imprime el comando de forma segura para copiar/pegar. Si dry_run=True
    # no ejecuta nada.
    if isinstance(cmd, (list, tuple)):
        printable = " ".join(shlex.quote(str(x)) for x in cmd)
    else:
        printable = cmd
    print("+", printable)
    if dry_run:
        return 0
    if isinstance(cmd, (list, tuple)):
        return subprocess.run(cmd, shell=False, check=True, env=env)
    return subprocess.run(cmd, shell=True, check=True, env=env)
 def json_payload_template(model, voice):
    # Payload JSON con {text} como placeholder que acepta srt_to_kokoro
    return '{"model":"' + model + '","voice":"' + voice + '","input":"{text}","response_format":"wav"}'
 def main():
    p = argparse.ArgumentParser()
    p.add_argument("--video", required=True, help="Vídeo de entrada")
    p.add_argument(
        "--srt",
        help=("SRT de entrada (si ya existe). Si no, se transcribe del audio"),
    )
    p.add_argument("--kokoro-endpoint", required=True, help="URL del endpoint TTS")
    p.add_argument("--kokoro-key", required=True, help="API key para Kokoro")
    p.add_argument("--voice", default="em_alex", help="Nombre de voz (p.ej. em_alex)")
    p.add_argument("--kokoro-model", default="model", help="ID del modelo Kokoro")
    p.add_argument("--whisper-model", default="base", help="Modelo de Whisper para transcribir")
    p.add_argument("--out", default=None, help="Vídeo de salida final (opcional)")
    p.add_argument(
        "--translate-method",
        choices=["local", "gemini", "none"],
        default="local",
        help=(
            "Método para traducir el SRT: 'local' (MarianMT), 'gemini' (API)"
            " o 'none' (usar SRT proporcionado)"
        ),
    )
    p.add_argument("--gemini-key", default=None, help="API key para Gemini (si aplica)")
    p.add_argument(
        "--mix",
        action="store_true",
        help="Mezclar el audio sintetizado con la pista original en lugar de reemplazarla",
    )
    p.add_argument(
        "--mix-background-volume",
        type=float,
        default=0.2,
        help="Volumen de la pista original al mezclar (0.0-1.0)",
    )
    p.add_argument(
        "--keep-chunks",
        action="store_true",
        help="Conservar los archivos de chunks generados por la síntesis (debug)",
    )
    p.add_argument(
        "--keep-temp",
        action="store_true",
        help="No borrar el directorio temporal de trabajo al terminar",
    )
    p.add_argument("--dry-run", action="store_true", help="Solo mostrar comandos sin ejecutar")
    args = p.parse_args()
    video = os.path.abspath(args.video)
    if not os.path.exists(video):
        print("Vídeo no encontrado:", video, file=sys.stderr)
        sys.exit(2)
    workdir = tempfile.mkdtemp(prefix="full_pipeline_")
    try:
        # 1) obtener SRT: si no se pasa, extraer audio y transcribir
        if args.srt:
            srt_in = os.path.abspath(args.srt)
            print("Usando SRT proporcionado:", srt_in)
        else:
            audio_tmp = os.path.join(workdir, "extracted_audio.wav")
            cmd_extract = [
                "ffmpeg",
                "-y",
                "-i",
                video,
                "-vn",
                "-acodec",
                "pcm_s16le",
                "-ar",
                "16000",
                "-ac",
                "1",
                audio_tmp,
            ]
            run(cmd_extract, dry_run=args.dry_run)
            # llamar al script transcribe.py para generar SRT
            srt_in = os.path.join(workdir, "transcribed.srt")
            cmd_trans = [
                sys.executable,
                "whisper_project/transcribe.py",
                "--file",
                audio_tmp,
                "--backend",
                "faster-whisper",
                "--model",
                args.whisper_model,
                "--srt",
                "--srt-file",
                srt_in,
            ]
            run(cmd_trans, dry_run=args.dry_run)
        # 2) traducir SRT según método elegido
        srt_translated = os.path.join(workdir, "translated.srt")
        if args.translate_method == "local":
            cmd_translate = [
                sys.executable,
                "whisper_project/translate_srt_local.py",
                "--in",
                srt_in,
                "--out",
                srt_translated,
            ]
            run(cmd_translate, dry_run=args.dry_run)
        elif args.translate_method == "gemini":
            gem_key = args.gemini_key or os.environ.get("GEMINI_API_KEY")
            if not gem_key:
                print(
                    "--translate-method=gemini requiere --gemini-key o la var de entorno GEMINI_API_KEY",
                    file=sys.stderr,
                )
                sys.exit(4)
            cmd_translate = [
                sys.executable,
                "whisper_project/translate_srt_with_gemini.py",
                "--in",
                srt_in,
                "--out",
                srt_translated,
                "--gemini-api-key",
                gem_key,
            ]
            run(cmd_translate, dry_run=args.dry_run)
        else:
            # none: usar SRT tal cual
            srt_translated = srt_in
        # 3) sintetizar por segmento con Kokoro, alinear, concatenar y
        #    reemplazar o mezclar audio en el vídeo
        dub_wav = os.path.join(workdir, "dub_final.wav")
        payload = json_payload_template(args.kokoro_model, args.voice)
        synth_cmd = [
            sys.executable,
            "whisper_project/srt_to_kokoro.py",
            "--srt",
            srt_translated,
            "--endpoint",
            args.kokoro_endpoint,
            "--payload-template",
            payload,
            "--api-key",
            args.kokoro_key,
            "--out",
            dub_wav,
            "--video",
            video,
            "--align",
        ]
        if args.keep_chunks:
            synth_cmd.append("--keep-chunks")
        if args.mix:
            synth_cmd += ["--mix-with-original", "--mix-background-volume", str(args.mix_background_volume)]
        else:
            synth_cmd.append("--replace-original")
        run(synth_cmd, dry_run=args.dry_run)
        # 4) quemar SRT en vídeo resultante
        out_video = args.out if args.out else os.path.splitext(video)[0] + ".replaced_audio.subs.mp4"
        replaced_src = os.path.splitext(video)[0] + ".replaced_audio.mp4"
        # build filter string
        vf = f"subtitles={srt_translated}:force_style='FontName=Arial,FontSize=24'"
        cmd_burn = [
            "ffmpeg",
            "-y",
            "-i",
            replaced_src,
            "-vf",
            vf,
            "-c:a",
            "copy",
            out_video,
        ]
        run(cmd_burn, dry_run=args.dry_run)
        print("Flujo completado. Vídeo final:", out_video)
    finally:
        if args.dry_run:
            print("(dry-run) leaving workdir:", workdir)
        else:
            if not args.keep_temp:
                try:
                    shutil.rmtree(workdir)
                except Exception:
                    pass
 if __name__ == '__main__':
    main()
 #!/usr/bin/env python3
 # run_full_pipeline.py
 # Orquesta: transcripción -> traducción -> síntesis por segmento -> reemplazo/mezcla -> quemado de subtítulos
 import argparse
 import os
 import shlex
 import shutil
 import subprocess
 import sys
 import tempfile
 def run(cmd, dry_run=False, env=None):
    # Ejecuta un comando. Acepta str (ejecuta vía shell) o list (sin shell).
    # Imprime el comando de forma segura para copiar/pegar. Si dry_run=True
    # no ejecuta nada.
    if isinstance(cmd, (list, tuple)):
        printable = " ".join(shlex.quote(str(x)) for x in cmd)
    else:
        printable = cmd
    print("+", printable)
    if dry_run:
        return 0
    if isinstance(cmd, (list, tuple)):
        return subprocess.run(cmd, shell=False, check=True, env=env)
    return subprocess.run(cmd, shell=True, check=True, env=env)
 def json_payload_template(model, voice):
    # Payload JSON con {text} como placeholder que acepta srt_to_kokoro
    return '{"model":"' + model + '","voice":"' + voice + '","input":"{text}","response_format":"wav"}'
 def main():
    p = argparse.ArgumentParser()
    p.add_argument("--video", required=True, help="Vídeo de entrada")
    p.add_argument(
        "--srt",
        help=("SRT de entrada (si ya existe). Si no, se transcribe del audio"),
    )
    p.add_argument("--kokoro-endpoint", required=True, help="URL del endpoint TTS")
    p.add_argument("--kokoro-key", required=True, help="API key para Kokoro")
    p.add_argument("--voice", default="em_alex", help="Nombre de voz (p.ej. em_alex)")
    p.add_argument("--kokoro-model", default="model", help="ID del modelo Kokoro")
    p.add_argument("--whisper-model", default="base", help="Modelo de Whisper para transcribir")
    p.add_argument("--out", default=None, help="Vídeo de salida final (opcional)")
    p.add_argument(
        "--translate-method",
        choices=["local", "gemini", "none"],
        default="local",
        help=(
            "Método para traducir el SRT: 'local' (MarianMT), 'gemini' (API)"
            " o 'none' (usar SRT proporcionado)"
        ),
    )
    p.add_argument("--gemini-key", default=None, help="API key para Gemini (si aplica)")
    p.add_argument(
        "--mix",
        action="store_true",
        help="Mezclar el audio sintetizado con la pista original en lugar de reemplazarla",
    )
    p.add_argument(
        "--mix-background-volume",
        type=float,
        default=0.2,
        help="Volumen de la pista original al mezclar (0.0-1.0)",
    )
    p.add_argument(
        "--keep-chunks",
        action="store_true",
        help="Conservar los archivos de chunks generados por la síntesis (debug)",
    )
    p.add_argument(
        "--keep-temp",
        action="store_true",
        help="No borrar el directorio temporal de trabajo al terminar",
    )
    p.add_argument("--dry-run", action="store_true", help="Solo mostrar comandos sin ejecutar")
    args = p.parse_args()
    video = os.path.abspath(args.video)
    if not os.path.exists(video):
        print("Vídeo no encontrado:", video, file=sys.stderr)
        sys.exit(2)
    workdir = tempfile.mkdtemp(prefix="full_pipeline_")
    try:
        # 1) obtener SRT: si no se pasa, extraer audio y transcribir
        if args.srt:
            srt_in = os.path.abspath(args.srt)
            print("Usando SRT proporcionado:", srt_in)
        else:
            audio_tmp = os.path.join(workdir, "extracted_audio.wav")
            cmd_extract = [
                "ffmpeg",
                "-y",
                "-i",
                video,
                "-vn",
                "-acodec",
                "pcm_s16le",
                "-ar",
                "16000",
                "-ac",
                "1",
                audio_tmp,
            ]
            run(cmd_extract, dry_run=args.dry_run)
            # llamar al script transcribe.py para generar SRT
            srt_in = os.path.join(workdir, "transcribed.srt")
            cmd_trans = [
                sys.executable,
                "whisper_project/transcribe.py",
                "--file",
                audio_tmp,
                "--backend",
                "faster-whisper",
                "--model",
                args.whisper_model,
                "--srt",
                "--srt-file",
                srt_in,
            ]
            run(cmd_trans, dry_run=args.dry_run)
        # 2) traducir SRT según método elegido
        srt_translated = os.path.join(workdir, "translated.srt")
        if args.translate_method == "local":
            cmd_translate = [
                sys.executable,
                "whisper_project/translate_srt_local.py",
                "--in",
                srt_in,
                "--out",
                srt_translated,
            ]
            run(cmd_translate, dry_run=args.dry_run)
        elif args.translate_method == "gemini":
            gem_key = args.gemini_key or os.environ.get("GEMINI_API_KEY")
            if not gem_key:
                print(
                    "--translate-method=gemini requiere --gemini-key o la var de entorno GEMINI_API_KEY",
                    file=sys.stderr,
                )
                sys.exit(4)
            cmd_translate = [
                sys.executable,
                "whisper_project/translate_srt_with_gemini.py",
                "--in",
                srt_in,
                "--out",
                srt_translated,
                "--gemini-api-key",
                gem_key,
            ]
            run(cmd_translate, dry_run=args.dry_run)
        else:
            # none: usar SRT tal cual
            srt_translated = srt_in
        # 3) sintetizar por segmento con Kokoro, alinear, concatenar y
        #    reemplazar o mezclar audio en el vídeo
        dub_wav = os.path.join(workdir, "dub_final.wav")
        payload = json_payload_template(args.kokoro_model, args.voice)
        synth_cmd = [
            sys.executable,
            "whisper_project/srt_to_kokoro.py",
            "--srt",
            srt_translated,
            "--endpoint",
            args.kokoro_endpoint,
            "--payload-template",
            payload,
            "--api-key",
            args.kokoro_key,
            "--out",
            dub_wav,
            "--video",
            video,
            "--align",
        ]
        if args.keep_chunks:
            synth_cmd.append("--keep-chunks")
        if args.mix:
            synth_cmd += ["--mix-with-original", "--mix-background-volume", str(args.mix_background_volume)]
        else:
            synth_cmd.append("--replace-original")
        run(synth_cmd, dry_run=args.dry_run)
        # 4) quemar SRT en vídeo resultante
        out_video = args.out if args.out else os.path.splitext(video)[0] + ".replaced_audio.subs.mp4"
        replaced_src = os.path.splitext(video)[0] + ".replaced_audio.mp4"
        # build filter string
        vf = f"subtitles={srt_translated}:force_style='FontName=Arial,FontSize=24'"
        cmd_burn = [
            "ffmpeg",
            "-y",
            "-i",
            replaced_src,
            "-vf",
            vf,
            "-c:a",
            "copy",
            out_video,
        ]
        run(cmd_burn, dry_run=args.dry_run)
        print("Flujo completado. Vídeo final:", out_video)
    finally:
        if args.dry_run:
            print("(dry-run) leaving workdir:", workdir)
        else:
            if not args.keep_temp:
                try:
                    shutil.rmtree(workdir)
                except Exception:
                    pass
 if __name__ == '__main__':
    main()
--- a/whisper_project/run_xtts_clone.py
+++ b/whisper_project/run_xtts_clone.py
@ -1,17 +1,26 @@
-import os, traceback
+#!/usr/bin/env python3
-from TTS.api import TTS
+"""Shim: run_xtts_clone
-out='whisper_project/dub_female_xtts_es.wav'
+This script delegates to the example `examples/run_xtts_clone.py` or
-speaker='whisper_project/ref_female_es.wav'
+prints guidance if not available. Kept for backward compatibility.
-text='Hola, esta es una prueba de clonación usando xtts_v2 en español latino.'
+"""
-model='tts_models/multilingual/multi-dataset/xtts_v2'
+from __future__ import annotations
 import subprocess
 import sys
 def main():
    script = "examples/run_xtts_clone.py"
    try:
        subprocess.run([sys.executable, script], check=True)
    except Exception as e:
        print("Error ejecutando run_xtts_clone ejemplo:", e, file=sys.stderr)
        print("Ejecuta 'python examples/run_xtts_clone.py' para la demo.")
        return 1
    return 0
 if __name__ == "__main__":
    sys.exit(main())
 try:
    print('Cargando modelo:', model)
    tts = TTS(model_name=model, progress_bar=True, gpu=False)
    print('Llamando a tts_to_file con speaker_wav=', speaker)
    tts.tts_to_file(text=text, file_path=out, speaker_wav=speaker, language='es')
    print('Generado:', out, 'size=', os.path.getsize(out))
 except Exception as e:
    print('Error durante la clonación:')
    traceback.print_exc()
--- a/whisper_project/srt_to_kokoro.py
+++ b/whisper_project/srt_to_kokoro.py
@ -1,3 +1,43 @@
 """Funciones helper para sintetizar desde SRT.
 Este módulo mantiene compatibilidad con la antigua utilidad `srt_to_kokoro.py`.
 Contiene `parse_srt_file` y `synth_chunk` delegando a infra.kokoro_utils.
 Se incluye una función `synthesize_from_srt` que documenta la compatibilidad
 con `KokoroHttpClient` (nombre esperado por otros módulos).
 """
 from __future__ import annotations
 from typing import Any
 from whisper_project.infra.kokoro_utils import parse_srt_file as _parse_srt_file, synth_chunk as _synth_chunk
 def parse_srt_file(path: str):
    """Parsea un .srt y devuelve la lista de subtítulos.
    Delegado a `whisper_project.infra.kokoro_utils.parse_srt_file`.
    """
    return _parse_srt_file(path)
 def synth_chunk(endpoint: str, text: str, headers: dict, payload_template: Any, timeout: int = 60) -> bytes:
    """Envía texto al endpoint y devuelve bytes de audio.
    Delegado a `whisper_project.infra.kokoro_utils.synth_chunk`.
    """
    return _synth_chunk(endpoint, text, headers, payload_template, timeout=timeout)
 def synthesize_from_srt(srt_path: str, out_wav: str, endpoint: str = "", api_key: str = ""):
    """Compat layer: función con el nombre esperado por scripts legacy.
    Nota: la implementación completa se encuentra ahora en `KokoroHttpClient`.
    Esta función delega a `parse_srt_file` y `synth_chunk` si se necesita.
    """
    raise NotImplementedError("Use KokoroHttpClient.synthesize_from_srt or the infra adapter instead")
 __all__ = ["parse_srt_file", "synth_chunk", "synthesize_from_srt"]
 #!/usr/bin/env python3
 """
 srt_to_kokoro.py
@ -17,476 +57,67 @@ Ejemplos:
 """
 import argparse
 import json
 import os
 import re
 import shutil
 import subprocess
 import sys
 import tempfile
 from typing import Optional
-try:
+"""
-    import requests
+Thin wrapper CLI que delega en `KokoroHttpClient.synthesize_from_srt`.
 except Exception as e:
    print("Este script requiere la librería 'requests'. Instálala con: pip install requests")
    raise
-try:
+Conserva la interfaz CLI previa para compatibilidad, pero internamente usa
-    import srt
+el cliente HTTP nativo definido en `whisper_project.infra.kokoro_adapter`.
-except Exception:
+"""
    print("Este script requiere la librería 'srt'. Instálala con: pip install srt")
    raise
 import argparse
 import os
 import sys
 import tempfile
-def find_synthesis_endpoint(openapi_url: str) -> Optional[str]:
+from whisper_project.infra.kokoro_adapter import KokoroHttpClient
    """Intento heurístico: baja openapi.json y busca paths con 'synth'|'tts'|'text' que soporten POST."""
    try:
        r = requests.get(openapi_url, timeout=20)
        r.raise_for_status()
        spec = r.json()
    except Exception as e:
        print(f"No pude leer openapi.json desde {openapi_url}: {e}")
        return None
    paths = spec.get("paths", {})
    candidate = None
    for path, methods in paths.items():
        lname = path.lower()
        if any(k in lname for k in ("synth", "tts", "text", "synthesize")):
            for method, op in methods.items():
                if method.lower() == "post":
                    # candidato
                    candidate = path
                    break
        if candidate:
            break
    if not candidate:
        # fallback: scan operationId or summary
        for path, methods in paths.items():
            for method, op in methods.items():
                meta = json.dumps(op).lower()
                if any(k in meta for k in ("synth", "tts", "text", "synthesize")) and method.lower() == "post":
                    candidate = path
                    break
            if candidate:
                break
    if not candidate:
        return None
    # Construir base url desde openapi_url
    from urllib.parse import urlparse, urljoin
    p = urlparse(openapi_url)
    base = f"{p.scheme}://{p.netloc}"
    return urljoin(base, candidate)
 def parse_srt_file(path: str):
    with open(path, "r", encoding="utf-8") as f:
        raw = f.read()
    subs = list(srt.parse(raw))
    return subs
 def synth_chunk(endpoint: str, text: str, headers: dict, payload_template: Optional[str], timeout=60):
    """Envía la solicitud y devuelve bytes de audio. Maneja respuestas audio/* o JSON con campo base64."""
    # Construir payload
    if payload_template:
        body = payload_template.replace("{text}", text)
        try:
            json_body = json.loads(body)
        except Exception:
            # enviar como texto plano
            json_body = {"text": text}
    else:
        json_body = {"text": text}
    # Realizar POST
    r = requests.post(endpoint, json=json_body, headers=headers, timeout=timeout)
    r.raise_for_status()
    ctype = r.headers.get("Content-Type", "")
    if ctype.startswith("audio/"):
        return r.content
    # Si viene JSON con base64
    try:
        j = r.json()
        # buscar campos con 'audio' o 'wav' o 'base64'
        for k in ("audio", "wav", "data", "base64"):
            if k in j:
                val = j[k]
                # si es base64
                import base64
                try:
                    return base64.b64decode(val)
                except Exception:
                    # tal vez ya es bytes hex u otra cosa
                    pass
    except Exception:
        pass
    # Fallback: devolver raw bytes
    return r.content
 def ensure_ffmpeg():
    if shutil.which("ffmpeg") is None:
        print("ffmpeg no está disponible en PATH. Instálalo para poder concatenar/convertir audios.")
        sys.exit(1)
 def convert_and_save(raw_bytes: bytes, target_path: str):
    """Guarda bytes a un archivo temporal y convierte a WAV PCM 16k mono usando ffmpeg."""
    with tempfile.NamedTemporaryFile(delete=False, suffix=".bin") as tmp:
        tmp.write(raw_bytes)
        tmp.flush()
        tmp_path = tmp.name
    # Convertir con ffmpeg a WAV 22050 Hz mono 16-bit
    cmd = [
        "ffmpeg", "-y", "-i", tmp_path,
        "-ar", "22050", "-ac", "1", "-sample_fmt", "s16", target_path
    ]
    try:
        subprocess.run(cmd, check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
    except subprocess.CalledProcessError as e:
        print(f"ffmpeg falló al convertir chunk: {e}")
        # como fallback, escribir los bytes "crudos"
        with open(target_path, "wb") as out:
            out.write(raw_bytes)
    finally:
        try:
            os.remove(tmp_path)
        except Exception:
            pass
 def create_silence(duration: float, out_path: str, sr: int = 22050):
    """Create a silent wav of given duration (seconds) at sr and save to out_path."""
    # use ffmpeg anullsrc
    cmd = [
        "ffmpeg",
        "-y",
        "-f",
        "lavfi",
        "-i",
        f"anullsrc=channel_layout=mono:sample_rate={sr}",
        "-t",
        f"{duration}",
        "-c:a",
        "pcm_s16le",
        out_path,
    ]
    try:
        subprocess.run(cmd, check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
    except subprocess.CalledProcessError:
        # fallback: write tiny silence by creating zero bytes
        try:
            with open(out_path, "wb") as fh:
                fh.write(b"\x00" * 1024)
        except Exception:
            pass
 def pad_or_trim_wav(in_path: str, out_path: str, target_duration: float, sr: int = 22050):
    """Pad with silence or trim input wav to match target_duration (seconds)."""
    # get duration
    try:
        p = subprocess.run([
            "ffprobe",
            "-v",
            "error",
            "-show_entries",
            "format=duration",
            "-of",
            "default=noprint_wrappers=1:nokey=1",
            in_path,
        ], capture_output=True, text=True)
        cur = float(p.stdout.strip())
    except Exception:
        cur = 0.0
    if cur == 0.0:
        shutil.copy(in_path, out_path)
        return
    if abs(cur - target_duration) < 0.02:
        shutil.copy(in_path, out_path)
        return
    if cur > target_duration:
        cmd = ["ffmpeg", "-y", "-i", in_path, "-t", f"{target_duration}", out_path]
        subprocess.run(cmd, check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
        return
    # pad: create silence of missing duration and concat
    pad = target_duration - cur
    with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as sil:
        sil_path = sil.name
    try:
        create_silence(pad, sil_path, sr=sr)
        # concat in_path + sil_path
        with tempfile.NamedTemporaryFile(mode="w", delete=False, suffix=".txt") as listf:
            listf.write(f"file '{os.path.abspath(in_path)}'\n")
            listf.write(f"file '{os.path.abspath(sil_path)}'\n")
            listname = listf.name
        cmd2 = ["ffmpeg", "-y", "-f", "concat", "-safe", "0", "-i", listname, "-c", "copy", out_path]
        subprocess.run(cmd2, check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
    finally:
        try:
            os.remove(sil_path)
        except Exception:
            pass
        try:
            os.remove(listname)
        except Exception:
            pass
 def concat_chunks(chunks: list, out_path: str):
    # Crear lista para ffmpeg concat demuxer
    ensure_ffmpeg()
    with tempfile.NamedTemporaryFile(mode="w", delete=False, suffix=".txt") as listf:
        for c in chunks:
            listf.write(f"file '{os.path.abspath(c)}'\n")
        listname = listf.name
    cmd = ["ffmpeg", "-y", "-f", "concat", "-safe", "0", "-i", listname, "-c", "copy", out_path]
    try:
        subprocess.run(cmd, check=True)
    except subprocess.CalledProcessError:
        # fallback: concatenar mediante reconversión
        tmp_concat = out_path + ".tmp.wav"
        cmd2 = ["ffmpeg", "-y", "-i", f"concat:{'|'.join(chunks)}", "-c", "copy", tmp_concat]
        subprocess.run(cmd2)
        shutil.move(tmp_concat, out_path)
    finally:
        try:
            os.remove(listname)
        except Exception:
            pass
 def main():
    p = argparse.ArgumentParser()
    p.add_argument("--srt", required=True, help="Ruta al archivo .srt traducido")
-    p.add_argument("--openapi", required=False, help="URL al openapi.json de Kokoro (intenta autodetectar endpoint)")
+    p.add_argument("--endpoint", required=False, help="URL directa del endpoint de síntesis (opcional)")
    p.add_argument("--endpoint", required=False, help="URL directa del endpoint de síntesis (usa esto si autodetección falla)")
    p.add_argument(
        "--payload-template",
        required=False,
        help='Plantilla JSON para el payload con {text} como placeholder, ejemplo: "{\"text\": \"{text}\", \"voice\": \"alloy\"}"',
    )
    p.add_argument("--api-key", required=False, help="Valor para autorización (se envía como header Authorization: Bearer <key>)")
-    p.add_argument("--voice", required=False, help="Nombre de voz si aplica (se añade al payload si se usa template)")
+    p.add_argument("--voice", default="em_alex")
    p.add_argument("--model", default="model")
    p.add_argument("--out", required=True, help="Ruta de salida WAV final")
-    p.add_argument(
+    p.add_argument("--video", required=False, help="Ruta al vídeo original (opcional)")
-        "--video",
+    p.add_argument("--align", action="store_true", help="Alinear segmentos con timestamps del SRT")
-        required=False,
+    p.add_argument("--keep-chunks", action="store_true")
-        help="Ruta al vídeo original (necesario si quieres mezclar el audio con la pista original).",
+    p.add_argument("--mix-with-original", action="store_true")
-    )
+    p.add_argument("--mix-background-volume", type=float, default=0.2)
-    p.add_argument(
+    p.add_argument("--replace-original", action="store_true")
        "--mix-with-original",
        action="store_true",
        help="Mezclar el WAV generado con la pista de audio original del vídeo (usa --video).",
    )
    p.add_argument(
        "--mix-background-volume",
        type=float,
        default=0.2,
        help="Volumen de la pista original al mezclar (0.0-1.0), por defecto 0.2",
    )
    p.add_argument(
        "--replace-original",
        action="store_true",
        help="Reemplazar la pista de audio del vídeo original por el WAV generado (usa --video).",
    )
    p.add_argument(
        "--align",
        action="store_true",
        help="Generar silencios para alinear segmentos con los timestamps del SRT (inserta gaps entre segmentos).",
    )
    p.add_argument(
        "--keep-chunks",
        action="store_true",
        help="Conservar los WAV de cada segmento en el directorio temporal (útil para debugging).",
    )
    args = p.parse_args()
-    headers = {"Accept": "*/*"}
+    # Construir cliente Kokoro HTTP y delegar la síntesis completa
-    if args.api_key:
+    endpoint = args.endpoint or os.environ.get("KOKORO_ENDPOINT")
-        headers["Authorization"] = f"Bearer {args.api_key}"
+    api_key = args.api_key or os.environ.get("KOKORO_API_KEY")
    endpoint = args.endpoint
    if not endpoint and args.openapi:
        print("Intentando detectar endpoint desde openapi.json...")
        endpoint = find_synthesis_endpoint(args.openapi)
        if endpoint:
            print(f"Usando endpoint detectado: {endpoint}")
        else:
            print("No se detectó endpoint automáticamente. Pasa --endpoint o --payload-template.")
            sys.exit(1)
    if not endpoint:
-        print("Debes proporcionar --endpoint o --openapi para que el script funcione.")
+        print("Debe proporcionar --endpoint o la variable de entorno KOKORO_ENDPOINT", file=sys.stderr)
        sys.exit(2)
    client = KokoroHttpClient(endpoint, api_key=api_key, voice=args.voice, model=args.model)
    try:
        client.synthesize_from_srt(
            srt_path=args.srt,
            out_wav=args.out,
            video=args.video,
            align=args.align,
            keep_chunks=args.keep_chunks,
            mix_with_original=args.mix_with_original,
            mix_background_volume=args.mix_background_volume,
        )
        print(f"Archivo final generado en: {args.out}")
    except Exception as e:
        print(f"Error durante la síntesis desde SRT: {e}", file=sys.stderr)
        sys.exit(1)
    subs = parse_srt_file(args.srt)
    tmpdir = tempfile.mkdtemp(prefix="srt_kokoro_")
    chunk_files = []
    print(f"Sintetizando {len(subs)} segmentos...")
    prev_end = 0.0
    for i, sub in enumerate(subs, start=1):
        text = re.sub(r"\s+", " ", sub.content.strip())
        if not text:
            prev_end = sub.end.total_seconds()
            continue
        start_sec = sub.start.total_seconds()
        end_sec = sub.end.total_seconds()
        duration = end_sec - start_sec
        # if align requested, insert silence for gap between previous end and current start
        if args.align:
            gap = start_sec - prev_end
            if gap > 0.01:
                sil_target = os.path.join(tmpdir, f"sil_{i:04d}.wav")
                create_silence(gap, sil_target)
                chunk_files.append(sil_target)
        try:
            raw = synth_chunk(endpoint, text, headers, args.payload_template)
        except Exception as e:
            print(f"Error al sintetizar segmento {i}: {e}")
            prev_end = end_sec
            continue
        target = os.path.join(tmpdir, f"chunk_{i:04d}.wav")
        convert_and_save(raw, target)
        # If align: pad or trim to subtitle duration, otherwise keep raw chunk
        if args.align:
            aligned = os.path.join(tmpdir, f"chunk_{i:04d}.aligned.wav")
            pad_or_trim_wav(target, aligned, duration)
            # replace target with aligned file in list
            chunk_files.append(aligned)
            # remove original raw chunk unless keep-chunks
            if not args.keep_chunks:
                try:
                    os.remove(target)
                except Exception:
                    pass
        else:
            chunk_files.append(target)
        prev_end = end_sec
        print(f" - Segmento {i}/{len(subs)} -> {os.path.basename(chunk_files[-1])}")
    if not chunk_files:
        print("No se generaron fragmentos de audio. Abortando.")
        shutil.rmtree(tmpdir, ignore_errors=True)
        sys.exit(1)
    print("Concatenando fragments...")
    concat_chunks(chunk_files, args.out)
    print(f"Archivo final generado en: {args.out}")
    # Si el usuario pidió mezclar con la pista original del vídeo
    if args.mix_with_original:
        if not args.video:
            print("--mix-with-original requiere que pases --video con la ruta del vídeo original.")
        else:
            # extraer audio del vídeo original a wav temporal (mono 22050)
            orig_tmp = os.path.join(tempfile.gettempdir(), f"orig_audio_{os.getpid()}.wav")
            mixed_tmp = os.path.join(tempfile.gettempdir(), f"mixed_audio_{os.getpid()}.wav")
            try:
                cmd_ext = [
                    "ffmpeg",
                    "-y",
                    "-i",
                    args.video,
                    "-vn",
                    "-ar",
                    "22050",
                    "-ac",
                    "1",
                    "-sample_fmt",
                    "s16",
                    orig_tmp,
                ]
                subprocess.run(cmd_ext, check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
                # Mezclar: new audio (args.out) en primer plano, original a volumen reducido
                vol = float(args.mix_background_volume)
                # construir filtro: [0:a]volume=1[a1];[1:a]volume=vol[a0];[a1][a0]amix=inputs=2:duration=first:weights=1 vol [mix]
                filter_complex = f"[0:a]volume=1[a1];[1:a]volume={vol}[a0];[a1][a0]amix=inputs=2:duration=first:weights=1 {vol}[mix]"
                # usar ffmpeg para mezclar y generar mixed_tmp
                cmd_mix = [
                    "ffmpeg",
                    "-y",
                    "-i",
                    args.out,
                    "-i",
                    orig_tmp,
                    "-filter_complex",
                    f"[0:a]volume=1[a1];[1:a]volume={vol}[a0];[a1][a0]amix=inputs=2:duration=first:dropout_transition=0[mix]",
                    "-map",
                    "[mix]",
                    "-c:a",
                    "pcm_s16le",
                    mixed_tmp,
                ]
                subprocess.run(cmd_mix, check=True)
                # reemplazar args.out con mixed_tmp
                shutil.move(mixed_tmp, args.out)
                print(f"Archivo mezclado generado en: {args.out}")
            except subprocess.CalledProcessError as e:
                print(f"Error al mezclar audio con la pista original: {e}")
            finally:
                try:
                    if os.path.exists(orig_tmp):
                        os.remove(orig_tmp)
                except Exception:
                    pass
    # Si se solicita reemplazar la pista original en el vídeo
    if args.replace_original:
        if not args.video:
            print("--replace-original requiere que pases --video con la ruta del vídeo original.")
        else:
            out_video = os.path.splitext(args.video)[0] + ".replaced_audio.mp4"
            try:
                cmd_rep = [
                    "ffmpeg",
                    "-y",
                    "-i",
                    args.video,
                    "-i",
                    args.out,
                    "-map",
                    "0:v:0",
                    "-map",
                    "1:a:0",
                    "-c:v",
                    "copy",
                    "-c:a",
                    "aac",
                    "-b:a",
                    "192k",
                    out_video,
                ]
                subprocess.run(cmd_rep, check=True)
                print(f"Vídeo con audio reemplazado generado: {out_video}")
            except subprocess.CalledProcessError as e:
                print(f"Error al reemplazar audio en el vídeo: {e}")
    # limpieza
    shutil.rmtree(tmpdir, ignore_errors=True)
 if __name__ == '__main__':
    main()
--- a/whisper_project/transcribe.py
+++ b/whisper_project/transcribe.py
@ -1,890 +1,49 @@
 #!/usr/bin/env python3
 """Transcribe audio usando distintos backends de Whisper.
-Soportados: openai-whisper, transformers, faster-whisper
+"""Compat wrapper para transcripción.
 Este módulo expone una clase ligera `FasterWhisperTranscriber` que
 reutiliza la implementación del adaptador infra (`TranscribeService`).
 También reexporta utilidades comunes como `write_srt` y
 `dedupe_adjacent_segments` para mantener compatibilidad con código
 legacy que importa estas funciones desde `whisper_project.transcribe`.
 """
-import argparse
+from __future__ import annotations
-import sys
+
-from pathlib import Path
+from typing import Optional
 from whisper_project.infra.transcribe_adapter import TranscribeService
 from whisper_project.infra.transcribe import (
    write_srt,
    dedupe_adjacent_segments,
 )
-def transcribe_openai_whisper(file: str, model: str):
+class FasterWhisperTranscriber:
-    import whisper
+    """Adaptador mínimo que expone la API esperada por código legacy.
-    print(f"Cargando openai-whisper modelo={model} en CPU...")
+    Internamente reutiliza `TranscribeService.transcribe_faster`.
-    m = whisper.load_model(model, device="cpu")
+    """
    print("Transcribiendo...")
    result = m.transcribe(file, fp16=False)
    # openai-whisper devuelve 'segments' con start, end y text
    segments = result.get("segments", None)
    if segments:
        for seg in segments:
            print(seg.get("text", ""))
        return segments
    else:
        print(result.get("text", ""))
        return None
-
+    def __init__(
-def transcribe_transformers(file: str, model: str):
+        self, model: str = "base", compute_type: str = "int8"
-    import torch
+    ) -> None:
-    from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
+        self._svc = TranscribeService(
-
+            model=model, compute_type=compute_type
    device = "cpu"
    torch_dtype = torch.float32
    print(f"Cargando transformers modelo={model} en CPU...")
    model_obj = AutoModelForSpeechSeq2Seq.from_pretrained(model, torch_dtype=torch_dtype, low_cpu_mem_usage=True)
    model_obj.to(device)
    processor = AutoProcessor.from_pretrained(model)
    pipe = pipeline(
        "automatic-speech-recognition",
        model=model_obj,
        tokenizer=processor.tokenizer,
        feature_extractor=processor.feature_extractor,
        device=-1,
    )
    print("Transcribiendo...")
    result = pipe(file)
    # result puede ser dict o str dependiendo de la versión
    if isinstance(result, dict):
        print(result.get("text", ""))
    else:
        print(result)
    # transformers pipeline normalmente no devuelve segmentos temporales
    return None
 def transcribe_faster_whisper(file: str, model: str, compute_type: str = "int8"):
    from faster_whisper import WhisperModel
    print(f"Cargando faster-whisper modelo={model} en CPU compute_type={compute_type}...")
    model_obj = WhisperModel(model, device="cpu", compute_type=compute_type)
    print("Transcribiendo...")
    segments_gen, info = model_obj.transcribe(file, beam_size=5)
    # faster-whisper may return a generator; convert to list to allow multiple passes
    segments = list(segments_gen)
    text = "".join([seg.text for seg in segments])
    print(text)
    # segments es una lista de objetos con .start, .end, .text
    return segments
 def main():
    parser = argparse.ArgumentParser(
        description="Transcribe audio usando Whisper (varios backends)"
    )
    parser.add_argument(
        "--file", "-f", required=True, help="Ruta al archivo de audio"
    )
    parser.add_argument(
        "--backend",
        "-b",
        choices=["openai-whisper", "transformers", "faster-whisper"],
        default="faster-whisper",
        help="Backend a usar",
    )
    parser.add_argument(
        "--model",
        "-m",
        default="base",
        help="Nombre del modelo (ej: tiny, base)",
    )
    parser.add_argument(
        "--compute-type",
        "-c",
        default="int8",
        help="compute_type para faster-whisper",
    )
    parser.add_argument(
        "--srt",
        action="store_true",
        help="Generar archivo SRT con timestamps (si el backend lo soporta)",
    )
    parser.add_argument(
        "--srt-file",
        default=None,
        help=(
            "Ruta del archivo SRT de salida. Por defecto: mismo nombre"
            " que el audio con extensión .srt"
        ),
    )
    parser.add_argument(
        "--srt-fallback",
        action="store_true",
        help=(
            "Generar SRT aproximado si backend no devuelve segmentos."
        ),
    )
    parser.add_argument(
        "--segment-transcribe",
        action="store_true",
        help=(
            "Cuando se usa --srt-fallback, transcribir cada segmento usando"
            " archivos temporales para rellenar el texto"
        ),
    )
    parser.add_argument(
        "--segment-overlap",
        type=float,
        default=0.2,
        help=(
            "Superposición en segundos entre segmentos al transcribir por"
            " segmentos (por defecto: 0.2)"
        ),
    )
    parser.add_argument(
        "--srt-segment-seconds",
        type=float,
        default=10.0,
        help=(
            "Duración en segundos de cada segmento para el SRT de fallback."
            " Por defecto: 10.0"
        ),
    )
    parser.add_argument(
        "--tts",
        action="store_true",
        help="Generar audio TTS a partir del texto transcrito",
    )
    parser.add_argument(
        "--tts-model",
        default="kokoro",
        help="Nombre del modelo TTS a usar (ej: kokoro)",
    )
    parser.add_argument(
        "--tts-model-repo",
        default=None,
        help=(
            "Repo de Hugging Face para el modelo TTS (ej: user/kokoro)."
            " Si se especifica, se descargará automáticamente."
        ),
    )
    parser.add_argument(
        "--dub",
        action="store_true",
        help=(
            "Generar pista doblada (por segmentos) a partir del texto transcrito"
        ),
    )
    parser.add_argument(
        "--dub-out",
        default=None,
        help=("Ruta de salida para el audio doblado (WAV). Por defecto: mismo nombre + .dub.wav"),
    )
    parser.add_argument(
        "--dub-mode",
        choices=["replace", "mix"],
        default="replace",
        help=("Modo de doblaje: 'replace' reemplaza voz original por TTS; 'mix' mezcla ambas pistas"),
    )
    parser.add_argument(
        "--dub-mix-level",
        type=float,
        default=0.75,
        help=("Cuando --dub-mode=mix, nivel de volumen del TTS relativo (0-1)."),
    )
    args = parser.parse_args()
    path = Path(args.file)
    if not path.exists():
        print(f"Archivo no encontrado: {args.file}", file=sys.stderr)
        sys.exit(2)
    # Shortcut: si el usuario solo quiere SRT de fallback sin transcribir
    # por segmentos, no necesitamos cargar ningún backend (evita errores
    # si faster-whisper/whisper no están instalados).
    if args.srt and args.srt_fallback and not args.segment_transcribe:
        duration = get_audio_duration(args.file)
        if duration is None:
            print(
                "No se pudo obtener duración; no se puede generar SRT de fallback.",
                file=sys.stderr,
            )
            sys.exit(4)
        fallback_segments = make_uniform_segments(duration, args.srt_segment_seconds)
        srt_file_arg = args.srt_file
        srt_path = (
            srt_file_arg
            if srt_file_arg
            else str(path.with_suffix('.srt'))
        )
        # crear segmentos vacíos
        filled_segments = [
            {"start": s["start"], "end": s["end"], "text": ""}
            for s in fallback_segments
        ]
        write_srt(filled_segments, srt_path)
        print(f"SRT de fallback guardado en: {srt_path}")
        sys.exit(0)
    try:
        segments = None
        if args.backend == "openai-whisper":
            segments = transcribe_openai_whisper(args.file, args.model)
        elif args.backend == "transformers":
            segments = transcribe_transformers(args.file, args.model)
        else:
            segments = transcribe_faster_whisper(
                args.file, args.model, compute_type=args.compute_type
            )
        # Si se pide SRT y tenemos segmentos, escribir archivo SRT
        if args.srt:
            if segments:
                # determinar nombre del srt
                # determinar nombre del srt
                srt_file_arg = args.srt_file
                srt_path = (
                    srt_file_arg
                    if srt_file_arg
                    else str(path.with_suffix('.srt'))
                )
                segments_to_write = dedupe_adjacent_segments(segments)
                write_srt(segments_to_write, srt_path)
                print(f"SRT guardado en: {srt_path}")
            else:
                if args.srt_fallback:
                    # intentar generar SRT aproximado
                    duration = get_audio_duration(args.file)
                    if duration is None:
                        print(
                            "No se pudo obtener duración;"
                            " no se puede generar SRT de fallback.",
                            file=sys.stderr,
                        )
                        sys.exit(4)
                    fallback_segments = make_uniform_segments(
                        duration, args.srt_segment_seconds
                    )
                    # Para cada segmento intentamos obtener transcripción
                    # parcial.
                    filled_segments = []
                    if args.segment_transcribe:
                        # extraer cada segmento a un archivo temporal
                        # y transcribir
                        filled = transcribe_segmented_with_tempfiles(
                            args.file,
                            fallback_segments,
                            backend=args.backend,
                            model=args.model,
                            compute_type=args.compute_type,
                            overlap=args.segment_overlap,
                        )
                        filled_segments = filled
                    else:
                        for seg in fallback_segments:
                            seg_obj = {
                                "start": seg["start"],
                                "end": seg["end"],
                                "text": "",
                            }
                            filled_segments.append(seg_obj)
                    srt_file_arg = args.srt_file
                    srt_path = (
                        srt_file_arg
                        if srt_file_arg
                        else str(path.with_suffix('.srt'))
                    )
                    segments_to_write = dedupe_adjacent_segments(
                        filled_segments
                    )
                    write_srt(segments_to_write, srt_path)
                    print(f"SRT de fallback guardado en: {srt_path}")
                    print(
                        "Nota: para SRT con texto, habilite transcripción"
                        " por segmento o use un backend que devuelva"
                        " segmentos."
                    )
                    sys.exit(0)
                else:
                    print(
                        "El backend elegido no devolvió segmentos temporales;"
                        " no se puede generar SRT.",
                        file=sys.stderr,
                    )
                    sys.exit(3)
    except Exception as e:
        print(f"Error durante la transcripción: {e}", file=sys.stderr)
        sys.exit(1)
    # Bloque TTS: sintetizar texto completo si se solicitó
    if args.tts:
        # si se especificó un repo, asegurar modelo descargado
        if args.tts_model_repo:
            model_path = ensure_tts_model(args.tts_model_repo)
            # usar la ruta local como modelo
            args.tts_model = model_path
        all_text = None
        if segments:
            all_text = "\n".join(
                [
                    s.get("text", "") if isinstance(s, dict) else s.text
                    for s in segments
                ]
            )
        if all_text:
            tts_out = str(path.with_suffix(".tts.wav"))
            ok = tts_synthesize(
                all_text, tts_out, model=args.tts_model
            )
            if ok:
                print(f"TTS guardado en: {tts_out}")
            else:
                print(
                    "Error al sintetizar TTS; comprueba dependencias.",
                    file=sys.stderr,
                )
                sys.exit(5)
    # Bloque de doblaje por segmentos: sintetizar cada segmento y generar
    # un archivo WAV concatenado con la pista doblada. El audio resultante
    # mantiene la duración de los segmentos originales (paddings/recortes
    # simples) para poder reemplazar o mezclar con la pista original.
    if args.dub:
        # decidir ruta de salida
        dub_out = (
            args.dub_out
            if args.dub_out
            else str(Path(args.file).with_suffix(".dub.wav"))
        )
-        # si no tenemos segmentos, intentar fallback con transcripción por segmentos
+    def transcribe(
-        use_segments = segments
+        self, file: str, *, srt: bool = False, srt_file: Optional[str] = None
-        if not use_segments:
+    ):
-            duration = get_audio_duration(args.file)
+        segments = self._svc.transcribe_faster(file)
-            if duration is None:
+        if srt and srt_file and segments:
-                print(
+            write_srt(segments, srt_file)
                    "No se pudo obtener la duración del audio; no se puede doblar.",
                    file=sys.stderr,
                )
                sys.exit(6)
            fallback_segments = make_uniform_segments(duration, args.srt_segment_seconds)
            if args.segment_transcribe:
                print("Obteniendo transcripciones por segmento para doblaje...")
                use_segments = transcribe_segmented_with_tempfiles(
                    args.file,
                    fallback_segments,
                    backend=args.backend,
                    model=args.model,
                    compute_type=args.compute_type,
                    overlap=args.segment_overlap,
                )
            else:
                # crear segmentos vacíos (no tiene texto)
                use_segments = [
                    {"start": s["start"], "end": s["end"], "text": ""}
                    for s in fallback_segments
                ]
        # asegurar modelo TTS local si se indicó repo
        if args.tts_model_repo:
            model_path = ensure_tts_model(args.tts_model_repo)
            args.tts_model = model_path
        ok = synthesize_dubbed_audio(
            src_audio=args.file,
            segments=use_segments,
            tts_model=args.tts_model,
            out_path=dub_out,
            mode=args.dub_mode,
            mix_level=args.dub_mix_level,
        )
        if ok:
            print(f"Audio doblado guardado en: {dub_out}")
        else:
            print("Error generando audio doblado.", file=sys.stderr)
            sys.exit(7)
 def _format_timestamp(seconds: float) -> str:
    """Formatea segundos en timestamp SRT hh:mm:ss,mmm"""
    millis = int((seconds - int(seconds)) * 1000)
    h = int(seconds // 3600)
    m = int((seconds % 3600) // 60)
    s = int(seconds % 60)
    return f"{h:02d}:{m:02d}:{s:02d},{millis:03d}"
 def write_srt(segments, out_path: str):
    """Escribe una lista de segmentos en formato SRT.
    segments: iterable de objetos o dicts con .start, .end y .text
    """
    lines = []
    for i, seg in enumerate(segments, start=1):
        # soportar objetos con atributos o dicts
        if hasattr(seg, "start"):
            start = float(seg.start)
            end = float(seg.end)
            text = seg.text if hasattr(seg, "text") else str(seg)
        else:
            start = float(seg.get("start", 0.0))
            end = float(seg.get("end", 0.0))
            text = seg.get("text", "")
        start_ts = _format_timestamp(start)
        end_ts = _format_timestamp(end)
        lines.append(str(i))
        lines.append(f"{start_ts} --> {end_ts}")
        # normalize text newlines
        for line in str(text).strip().splitlines():
            lines.append(line)
        lines.append("")
    Path(out_path).write_text("\n".join(lines), encoding="utf-8")
 def dedupe_adjacent_segments(segments):
    """Eliminar duplicados simples entre segmentos adyacentes.
    Estrategia simple: si el final de un segmento y el inicio del
    siguiente comparten una secuencia de palabras, eliminamos la
    duplicación del inicio del siguiente.
    """
    if not segments:
        return segments
    # Normalize incoming segments to a list of dicts with keys start,end,text
    norm = []
    for s in segments:
        if hasattr(s, "start"):
            norm.append({"start": float(s.start), "end": float(s.end), "text": getattr(s, "text", "")})
        else:
            # assume mapping-like
            norm.append({"start": float(s.get("start", 0.0)), "end": float(s.get("end", 0.0)), "text": s.get("text", "")})
-    out = [norm[0].copy()]
+__all__ = [
-    for seg in norm[1:]:
+    "FasterWhisperTranscriber",
-        prev = out[-1]
+    "TranscribeService",
-        a = (prev.get("text") or "").strip()
+    "write_srt",
-        b = (seg.get("text") or "").strip()
+    "dedupe_adjacent_segments",
-        if not a or not b:
+]
            out.append(seg.copy())
            continue
        # tokenizar en palabras (espacios) y buscar la mayor superposición
        a_words = a.split()
        b_words = b.split()
        max_ol = 0
        max_k = min(len(a_words), len(b_words), 10)
        for k in range(1, max_k + 1):
            if a_words[-k:] == b_words[:k]:
                max_ol = k
        if max_ol > 0:
            # quitar las primeras max_ol palabras de b
            new_b = " ".join(b_words[max_ol:]).strip()
            new_seg = seg.copy()
            new_seg["text"] = new_b
            out.append(new_seg)
        else:
            out.append(seg.copy())
    return out
 def get_audio_duration(file_path: str):
    """Obtiene la duración del audio en segundos usando ffprobe.
    Devuelve float (segundos) o None si no se puede obtener.
    """
    try:
        import subprocess
        cmd = [
            "ffprobe",
            "-v",
            "error",
            "-show_entries",
            "format=duration",
            "-of",
            "default=noprint_wrappers=1:nokey=1",
            file_path,
        ]
        out = subprocess.check_output(cmd, stderr=subprocess.DEVNULL)
        return float(out.strip())
    except Exception:
        return None
 def make_uniform_segments(duration: float, seg_seconds: float):
    """Genera una lista de segmentos uniformes [{start, end}, ...]."""
    segments = []
    if duration <= 0 or seg_seconds <= 0:
        return segments
    start = 0.0
    idx = 0
    while start < duration:
        end = min(start + seg_seconds, duration)
        segments.append({"start": round(start, 3), "end": round(end, 3)})
        idx += 1
        start = end
    return segments
 def transcribe_segmented_with_tempfiles(
    src_file: str,
    segments: list,
    backend: str = "faster-whisper",
    model: str = "base",
    compute_type: str = "int8",
    overlap: float = 0.2,
 ):
    """Recorta `src_file` en segmentos y transcribe cada uno.
    Retorna lista de dicts {'start','end','text'} para cada segmento.
    """
    import subprocess
    import tempfile
    results = []
    for seg in segments:
        start = max(0.0, float(seg["start"]) - overlap)
        end = float(seg["end"]) + overlap
        duration = end - start
        with tempfile.NamedTemporaryFile(suffix=".wav", delete=True) as tmp:
            tmp_path = tmp.name
            cmd = [
                "ffmpeg",
                "-y",
                "-ss",
                str(start),
                "-t",
                str(duration),
                "-i",
                src_file,
                "-ar",
                "16000",
                "-ac",
                "1",
                tmp_path,
            ]
            try:
                subprocess.run(
                    cmd,
                    check=True,
                    stdout=subprocess.DEVNULL,
                    stderr=subprocess.DEVNULL,
                )
            except Exception:
                # si falla el recorte, dejar texto vacío
                results.append(
                    {"start": seg["start"], "end": seg["end"], "text": ""}
                )
                continue
            # transcribir tmp_path con el backend
            try:
                if backend == "openai-whisper":
                    import whisper
                    m = whisper.load_model(model, device="cpu")
                    res = m.transcribe(tmp_path, fp16=False)
                    text = res.get("text", "")
                elif backend == "transformers":
                    # pipeline de transformers
                    import torch
                    from transformers import (
                        AutoModelForSpeechSeq2Seq,
                        AutoProcessor,
                        pipeline,
                    )
                    torch_dtype = torch.float32
                    model_obj = AutoModelForSpeechSeq2Seq.from_pretrained(
                        model, torch_dtype=torch_dtype, low_cpu_mem_usage=True
                    )
                    model_obj.to("cpu")
                    processor = AutoProcessor.from_pretrained(model)
                    pipe = pipeline(
                        "automatic-speech-recognition",
                        model=model_obj,
                        tokenizer=processor.tokenizer,
                        feature_extractor=processor.feature_extractor,
                        device=-1,
                    )
                    out = pipe(tmp_path)
                    text = out["text"] if isinstance(out, dict) else str(out)
                else:
                    # faster-whisper
                    from faster_whisper import WhisperModel
                    wmodel = WhisperModel(
                        model, device="cpu", compute_type=compute_type
                    )
                    segs_gen, info = wmodel.transcribe(tmp_path, beam_size=5)
                    segs = list(segs_gen)
                    text = "".join([s.text for s in segs])
            except Exception:
                text = ""
            results.append(
                {"start": seg["start"], "end": seg["end"], "text": text}
            )
    return results
 def tts_synthesize(text: str, out_path: str, model: str = "kokoro"):
    """Sintetiza `text` a `out_path` usando Coqui TTS si está disponible,
    o pyttsx3 como fallback simple.
    """
    try:
        # Intentar Coqui TTS
        from TTS.api import TTS
        # El usuario debe tener el modelo descargado o especificar el id
        tts = TTS(model_name=model, progress_bar=False, gpu=False)
        tts.tts_to_file(text=text, file_path=out_path)
        return True
    except Exception:
        try:
            # Fallback a pyttsx3 (menos natural, offline)
            import pyttsx3
            engine = pyttsx3.init()
            engine.save_to_file(text, out_path)
            engine.runAndWait()
            return True
        except Exception:
            return False
 def ensure_tts_model(repo_id: str):
    """Descarga un repo de Hugging Face y devuelve la ruta local.
    Usa huggingface_hub.snapshot_download. Si la descarga falla, devuelve
    el repo_id tal cual (se intentará usar como id remoto).
    """
    try:
        from huggingface_hub import snapshot_download
        print(f"Descargando modelo TTS desde: {repo_id} ...")
        try:
            # intentar descarga explícita como 'model' (útil para ids con '/').
            local_dir = snapshot_download(repo_id, repo_type="model")
        except Exception:
            # fallback al comportamiento por defecto
            local_dir = snapshot_download(repo_id)
        print(f"Modelo descargado en: {local_dir}")
        return local_dir
    except Exception as e:
        print(f"No se pudo descargar el modelo {repo_id}: {e}")
        return repo_id
 def _pad_or_trim_wav(in_path: str, out_path: str, target_duration: float):
    """Pad or trim `in_path` WAV to `target_duration` seconds using ffmpeg.
    Creates `out_path` with exactly target_duration seconds. If input is
    shorter, pads with silence; if longer, trims.
    """
    import subprocess
    # ffmpeg -y -i in.wav -af apad=pad_dur=...,atrim=duration=... -ar 16000 -ac 1 out.wav
    try:
        # Use apad then atrim to ensure exact duration
        cmd = [
            "ffmpeg",
            "-y",
            "-i",
            in_path,
            "-af",
            f"apad=pad_dur={max(0, target_duration)}",
            "-t",
            f"{target_duration}",
            "-ar",
            "16000",
            "-ac",
            "1",
            out_path,
        ]
        subprocess.run(cmd, check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
        return True
    except Exception:
        return False
 def synthesize_segment_tts(text: str, model: str, dur: float, out_wav: str) -> bool:
    """Sintetiza `text` en `out_wav` y ajusta su duración a `dur` segundos.
    - Primero genera un WAV temporal con `tts_synthesize`.
    - Luego lo pad/recorta a `dur` usando ffmpeg.
    """
    import tempfile
    import os
    try:
        with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp:
            tmp_path = tmp.name
        ok = tts_synthesize(text, tmp_path, model=model)
        if not ok:
            # cleanup
            try:
                os.remove(tmp_path)
            except Exception:
                pass
            return False
        # ajustar duración
        adjusted = _pad_or_trim_wav(tmp_path, out_wav, target_duration=dur)
        try:
            os.remove(tmp_path)
        except Exception:
            pass
        return adjusted
    except Exception:
        return False
 def synthesize_dubbed_audio(
    src_audio: str,
    segments: list,
    tts_model: str,
    out_path: str,
    mode: str = "replace",
    mix_level: float = 0.75,
 ):
    """Genera una pista doblada a partir de `segments` y el audio fuente.
    - segments: lista de dicts con 'start','end','text' (en segundos).
    - mode: 'replace' (devuelve solo TTS concatenado) o 'mix' (mezcla TTS y original).
    - mix_level: volumen relativo del TTS cuando se mezcla (0-1).
    Retorna True si se generó correctamente `out_path`.
    """
    import tempfile
    import os
    import subprocess
    # Normalizar segmentos a lista de dicts {'start','end','text'}
    norm_segments = []
    for s in segments:
        if hasattr(s, "start"):
            norm_segments.append({"start": float(s.start), "end": float(s.end), "text": getattr(s, "text", "")})
        else:
            norm_segments.append({"start": float(s.get("start", 0.0)), "end": float(s.get("end", 0.0)), "text": s.get("text", "")})
    # crear carpeta temporal para segmentos TTS
    with tempfile.TemporaryDirectory() as tmpdir:
        tts_segment_paths = []
        for i, seg in enumerate(norm_segments):
            start = float(seg.get("start", 0.0))
            end = float(seg.get("end", start))
            dur = max(0.001, end - start)
            text = (seg.get("text") or "").strip()
            out_seg = os.path.join(tmpdir, f"seg_{i:04d}.wav")
            if not text:
                # crear silencio de duración dur
                try:
                    cmd = [
                        "ffmpeg",
                        "-y",
                        "-f",
                        "lavfi",
                        "-i",
                        f"anullsrc=channel_layout=mono:sample_rate=16000",
                        "-t",
                        f"{dur}",
                        "-ar",
                        "16000",
                        "-ac",
                        "1",
                        out_seg,
                    ]
                    subprocess.run(cmd, check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
                    tts_segment_paths.append(out_seg)
                except Exception:
                    return False
                continue
            ok = synthesize_segment_tts(text, tts_model, dur, out_seg)
            if not ok:
                return False
            tts_segment_paths.append(out_seg)
        # crear lista de concatenación
        concat_list = os.path.join(tmpdir, "concat.txt")
        with open(concat_list, "w", encoding="utf-8") as f:
            for p in tts_segment_paths:
                f.write(f"file '{p}'\n")
        # concatenar segmentos en un WAV final temporal
        final_tmp = os.path.join(tmpdir, "tts_full.wav")
        try:
            cmd = [
                "ffmpeg",
                "-y",
                "-f",
                "concat",
                "-safe",
                "0",
                "-i",
                concat_list,
                "-c",
                "copy",
                final_tmp,
            ]
            subprocess.run(cmd, check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
        except Exception:
            return False
        # si el modo es replace, mover final_tmp a out_path (con conversión si es necesario)
        try:
            if mode == "replace":
                # convertir a WAV 16k mono si no lo está
                cmd = [
                    "ffmpeg",
                    "-y",
                    "-i",
                    final_tmp,
                    "-ar",
                    "16000",
                    "-ac",
                    "1",
                    out_path,
                ]
                subprocess.run(cmd, check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
                return True
            # modo mix: mezclar pista TTS con la original en out_path
            # ajustar volumen del TTS
            # ffmpeg -i original -i tts -filter_complex "[1:a]volume=LEVEL[a1];[0:a][a1]amix=inputs=2:normalize=0[out]" -map "[out]" out.wav
            tts_level = float(max(0.0, min(1.0, mix_level)))
            cmd = [
                "ffmpeg",
                "-y",
                "-i",
                src_audio,
                "-i",
                final_tmp,
                "-filter_complex",
                f"[1:a]volume={tts_level}[a1];[0:a][a1]amix=inputs=2:duration=longest:dropout_transition=0",
                "-ar",
                "16000",
                "-ac",
                "1",
                out_path,
            ]
            subprocess.run(cmd, check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
            return True
        except Exception:
            return False
 if __name__ == "__main__":
    main()
--- a/whisper_project/translate_srt_argos.py
+++ b/whisper_project/translate_srt_argos.py
@ -1,84 +1,42 @@
 #!/usr/bin/env python3
-"""translate_srt_argos.py
+"""Shim: translate_srt_argos
 Traduce un .srt localmente usando Argos Translate (más ligero que transformers/torch).
 Instala automáticamente el paquete en caso de no existir.
-Uso:
+Delegates to `whisper_project.infra.argos_adapter.ArgosTranslator.translate_srt`
-  source .venv/bin/activate
+if available; otherwise runs `examples/translate_srt_argos.py` as fallback.
  python3 whisper_project/translate_srt_argos.py --in in.srt --out out.srt
 Requisitos: argostranslate (el script intentará instalarlo si no está presente)
 """
 from __future__ import annotations
 import argparse
-import srt
+import subprocess
-import tempfile
+import sys
 import os
 try:
    from argostranslate import package, translate
 except Exception:
    raise
-def ensure_en_es_package():
+def main():
-    installed = package.get_installed_packages()
+    p = argparse.ArgumentParser(prog="translate_srt_argos")
-    for p in installed:
+    p.add_argument("--in", dest="in_srt", required=True)
-        if p.from_code == 'en' and p.to_code == 'es':
+    p.add_argument("--out", dest="out_srt", required=True)
            return True
    # Si no está instalado, buscar disponible y descargar
    avail = package.get_available_packages()
    for p in avail:
        if p.from_code == 'en' and p.to_code == 'es':
            print('Descargando paquete Argos en->es...')
            download_path = tempfile.mktemp(suffix='.zip')
            try:
                import requests
                with requests.get(p.download_url, stream=True, timeout=60) as r:
                    r.raise_for_status()
                    with open(download_path, 'wb') as fh:
                        for chunk in r.iter_content(chunk_size=8192):
                            if chunk:
                                fh.write(chunk)
                # instalar desde el zip descargado
                package.install_from_path(download_path)
                return True
            except Exception as e:
                print(f"Error descargando/instalando paquete Argos: {e}")
            finally:
                try:
                    if os.path.exists(download_path):
                        os.remove(download_path)
                except Exception:
                    pass
    return False
 def translate_srt(in_path: str, out_path: str):
    with open(in_path, 'r', encoding='utf-8') as fh:
        subs = list(srt.parse(fh.read()))
    # Asegurar paquete en->es
    ok = ensure_en_es_package()
    if not ok:
        raise SystemExit('No se encontró paquete Argos en->es y no se pudo descargar')
    for i, sub in enumerate(subs, start=1):
        text = sub.content.strip()
        if not text:
            continue
        tr = translate.translate(text, 'en', 'es')
        sub.content = tr
        print(f'Translated {i}/{len(subs)}')
    with open(out_path, 'w', encoding='utf-8') as fh:
        fh.write(srt.compose(subs))
    print(f'Wrote translated SRT to: {out_path}')
 if __name__ == '__main__':
    p = argparse.ArgumentParser()
    p.add_argument('--in', dest='in_srt', required=True)
    p.add_argument('--out', dest='out_srt', required=True)
    args = p.parse_args()
-    translate_srt(args.in_srt, args.out_srt)
+
    try:
        from whisper_project.infra.argos_adapter import ArgosTranslator
        t = ArgosTranslator()
        t.translate_srt(args.in_srt, args.out_srt)
        return
    except Exception:
        try:
            script = "examples/translate_srt_argos.py"
            cmd = [sys.executable, script, "--in", args.in_srt, "--out", args.out_srt]
            subprocess.run(cmd, check=True)
            return
        except Exception as e:
            print("Error: no se pudo ejecutar Argos Translate:", e, file=sys.stderr)
            sys.exit(1)
 if __name__ == "__main__":
    sys.exit(main() or 0)
    # The deprecated block has been removed.
    # Use whisper_project.infra.argos_adapter for programmatic access.
--- a/whisper_project/translate_srt_local.py
+++ b/whisper_project/translate_srt_local.py
@ -1,57 +1,41 @@
 #!/usr/bin/env python3
-"""translate_srt_local.py
+"""Shim: translate_srt_local
 Traduce un .srt localmente usando MarianMT (Helsinki-NLP/opus-mt-en-es).
-Uso:
+Delegates to `whisper_project.infra.marian_adapter.MarianTranslator.translate_srt`
-  source .venv/bin/activate
+if available; otherwise falls back to running the script in `examples/`.
  python3 whisper_project/translate_srt_local.py --in path/to/in.srt --out path/to/out.srt
 Requisitos: transformers, sentencepiece, srt
 """
 from __future__ import annotations
 import argparse
-import srt
+import subprocess
-from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
+import sys
 def translate_srt(in_path: str, out_path: str, model_name: str = "Helsinki-NLP/opus-mt-en-es", batch_size: int = 8):
    with open(in_path, "r", encoding="utf-8") as f:
        subs = list(srt.parse(f.read()))
    # Cargar modelo y tokenizador
    tok = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
    texts = [sub.content.strip() for sub in subs]
    translated = []
    for i in range(0, len(texts), batch_size):
        batch = texts[i:i+batch_size]
        # tokenizar
        enc = tok(batch, return_tensors="pt", padding=True, truncation=True)
        outs = model.generate(**enc, max_length=512)
        outs_decoded = tok.batch_decode(outs, skip_special_tokens=True)
        translated.extend(outs_decoded)
    # Asignar traducidos
    for sub, t in zip(subs, translated):
        sub.content = t.strip()
    with open(out_path, "w", encoding="utf-8") as f:
        f.write(srt.compose(subs))
    print(f"SRT traducido guardado en: {out_path}")
 def main():
-    p = argparse.ArgumentParser()
+    p = argparse.ArgumentParser(prog="translate_srt_local")
    p.add_argument("--in", dest="in_srt", required=True)
    p.add_argument("--out", dest="out_srt", required=True)
    p.add_argument("--model", default="Helsinki-NLP/opus-mt-en-es")
    p.add_argument("--batch-size", dest="batch_size", type=int, default=8)
    args = p.parse_args()
-    translate_srt(args.in_srt, args.out_srt, model_name=args.model, batch_size=args.batch_size)
+    try:
        # Prefer the infra adapter when available
        from whisper_project.infra.marian_adapter import MarianTranslator
        t = MarianTranslator()
        t.translate_srt(args.in_srt, args.out_srt)
        return
    except Exception:
        # Fallback: run the examples script if present
        try:
            script = "examples/translate_srt_local.py"
            cmd = [sys.executable, script, "--in", args.in_srt, "--out", args.out_srt]
            subprocess.run(cmd, check=True)
            return
        except Exception as e:
            print("Error: no se pudo ejecutar la traducción local:", e, file=sys.stderr)
            sys.exit(1)
-if __name__ == '__main__':
+if __name__ == "__main__":
-    main()
+    sys.exit(main() or 0)
--- a/whisper_project/translate_srt_with_gemini.py
+++ b/whisper_project/translate_srt_with_gemini.py
@ -1,139 +1,42 @@
 #!/usr/bin/env python3
-"""translate_srt_with_gemini.py
+"""Shim: translate_srt_with_gemini
 Lee un .srt, traduce cada bloque de texto con Gemini (Google Generative API) y
 escribe un nuevo .srt manteniendo índices y timestamps.
-Uso:
+Delegates to `whisper_project.infra.gemini_adapter.GeminiTranslator.translate_srt`
-  export GEMINI_API_KEY="..."
+or falls back to `examples/translate_srt_with_gemini.py`.
  .venv/bin/python whisper_project/translate_srt_with_gemini.py \
    --in whisper_project/dailyrutines.kokoro.dub.srt \
    --out whisper_project/dailyrutines.kokoro.dub.es.srt \
    --model gemini-2.5-flash
 Si no pasas --gemini-api-key, se usará la variable de entorno GEMINI_API_KEY.
 """
 from __future__ import annotations
 import argparse
-import json
+import subprocess
-import os
+import sys
 import time
 from typing import List
 import requests
 import srt
 # Intentar usar la librería oficial si está instalada (mejor compatibilidad)
 try:
    import google.generativeai as genai  # type: ignore
 except Exception:
    genai = None
 def translate_text_google_gl(text: str, api_key: str, model: str = "gemini-2.5-flash") -> str:
    """Llamada a la API Generative Language de Google (generateContent).
    Devuelve el texto traducido (o el texto original si falla).
    """
    if not api_key:
        raise ValueError("gemini api key required")
    # Si la librería oficial está disponible, usarla (maneja internamente los endpoints)
    if genai is not None:
        try:
            genai.configure(api_key=api_key)
            model_obj = genai.GenerativeModel(model)
            # la librería acepta un prompt simple o lista; pedimos texto traducido explícitamente
            prompt = f"Traduce al español el siguiente texto y devuelve solo el texto traducido:\n\n{text}"
            resp = model_obj.generate_content(prompt, generation_config={"max_output_tokens": 1024, "temperature": 0.0})
            # resp.text está disponible en la respuesta wrapper
            if hasattr(resp, "text") and resp.text:
                return resp.text.strip()
            # fallback: revisar candidates
            if hasattr(resp, "candidates") and resp.candidates:
                c = resp.candidates[0]
                if hasattr(c, "content") and hasattr(c.content, "parts"):
                    parts = [p.text for p in c.content.parts if getattr(p, "text", None)]
                    if parts:
                        return "\n".join(parts).strip()
        except Exception as e:
            print(f"Warning: genai library translate failed: {e}")
    # Fallback HTTP (legacy/path-variant). Intentamos v1 y v1beta2 según disponibilidad.
    for prefix in ("v1", "v1beta2"):
        endpoint = (
            f"https://generativelanguage.googleapis.com/{prefix}/models/{model}:generateContent?key={api_key}"
        )
        body = {
            "prompt": {"text": f"Traduce al español el siguiente texto y devuelve solo el texto traducido:\n\n{text}"},
            "maxOutputTokens": 1024,
            "temperature": 0.0,
            "candidateCount": 1,
        }
        try:
            r = requests.post(endpoint, json=body, timeout=30)
            r.raise_for_status()
            j = r.json()
            # buscar candidatos
            if isinstance(j, dict) and "candidates" in j and isinstance(j["candidates"], list) and j["candidates"]:
                first = j["candidates"][0]
                if isinstance(first, dict):
                    if "content" in first and isinstance(first["content"], str):
                        return first["content"].strip()
                    if "output" in first and isinstance(first["output"], str):
                        return first["output"].strip()
                    if "content" in first and isinstance(first["content"], list):
                        parts = []
                        for c in first["content"]:
                            if isinstance(c, dict) and isinstance(c.get("text"), str):
                                parts.append(c.get("text"))
                        if parts:
                            return "\n".join(parts).strip()
            for key in ("output_text", "text", "response", "translated_text"):
                if key in j and isinstance(j[key], str):
                    return j[key].strip()
        except Exception as e:
            print(f"Warning: GL translate failed ({prefix}): {e}")
    return text
 def translate_srt_file(in_path: str, out_path: str, api_key: str, model: str):
    with open(in_path, "r", encoding="utf-8") as fh:
        subs = list(srt.parse(fh.read()))
    for i, sub in enumerate(subs, start=1):
        text = sub.content.strip()
        if not text:
            continue
        # llamar a la API
        try:
            translated = translate_text_google_gl(text, api_key, model=model)
        except Exception as e:
            print(f"Warning: translate failed for index {sub.index}: {e}")
            translated = text
        # asignar traducido
        sub.content = translated
        # pequeño delay para no golpear la API demasiado rápido
        time.sleep(0.15)
        print(f"Translated {i}/{len(subs)}")
    out_s = srt.compose(subs)
    with open(out_path, "w", encoding="utf-8") as fh:
        fh.write(out_s)
    print(f"Wrote translated SRT to: {out_path}")
 def main():
-    p = argparse.ArgumentParser()
+    p = argparse.ArgumentParser(prog="translate_srt_with_gemini")
    p.add_argument("--in", dest="in_srt", required=True)
    p.add_argument("--out", dest="out_srt", required=True)
-    p.add_argument("--gemini-api-key", default=None)
+    p.add_argument("--gemini-api-key", dest="gemini_api_key", required=False, default=None)
    p.add_argument("--model", default="gemini-2.5-flash")
    args = p.parse_args()
-    key = args.gemini_api_key or os.environ.get("GEMINI_API_KEY")
+    try:
-    if not key:
+        from whisper_project.infra.gemini_adapter import GeminiTranslator
        print("Provide --gemini-api-key or set GEMINI_API_KEY env var", flush=True)
        raise SystemExit(2)
-    translate_srt_file(args.in_srt, args.out_srt, key, args.model)
+        g = GeminiTranslator(api_key=args.gemini_api_key)
        g.translate_srt(args.in_srt, args.out_srt)
        return
    except Exception:
        try:
            script = "examples/translate_srt_with_gemini.py"
            cmd = [sys.executable, script, "--in", args.in_srt, "--out", args.out_srt]
            if args.gemini_api_key:
                cmd += ["--gemini-api-key", args.gemini_api_key]
            subprocess.run(cmd, check=True)
            return
        except Exception as e:
            print("Error: no se pudo ejecutar la traducción con Gemini:", e, file=sys.stderr)
            sys.exit(1)
-if __name__ == '__main__':
+if __name__ == "__main__":
-    main()
+    sys.exit(main() or 0)
--- a/whisper_project/usecases/init.py
+++ b/whisper_project/usecases/init.py
@ -0,0 +1,3 @@
 from . import orchestrator
 __all__ = ["orchestrator"]
--- a/whisper_project/usecases/pycache/init.cpython-313.pyc
+++ b/whisper_project/usecases/pycache/init.cpython-313.pyc
--- a/whisper_project/usecases/pycache/orchestrator.cpython-313.pyc
+++ b/whisper_project/usecases/pycache/orchestrator.cpython-313.pyc
--- a/whisper_project/usecases/orchestrator.py
+++ b/whisper_project/usecases/orchestrator.py
@ -0,0 +1,362 @@
 """Orquestador que compone los adaptadores infra para ejecutar el pipeline.
 Proporciona una clase `Orchestrator` con método `run` y soporta modo dry-run
 para inspección sin ejecutar los pasos pesados.
 """
 from __future__ import annotations
 import logging
 from pathlib import Path
 from typing import Optional
 from whisper_project.infra import process_video, transcribe
 logger = logging.getLogger(__name__)
 class Orchestrator:
    """Orquesta: extracción audio -> transcripción -> TTS por segmento -> reemplazo audio -> quemar subtítulos.
    Nota: los pasos concretos se delegan a los adaptadores en `whisper_project.infra`.
    """
    def __init__(self, dry_run: bool = False, tts_model: str = "kokoro", verbose: bool = False):
        self.dry_run = dry_run
        self.tts_model = tts_model
        if verbose:
            logging.basicConfig(level=logging.DEBUG)
    def run(self, src_video: str, out_dir: str, translate: bool = False) -> dict:
        """Ejecuta el pipeline.
        Args:
            src_video: ruta al vídeo de entrada.
            out_dir: carpeta donde escribir resultados intermedios/finales.
            translate: si True, intentará traducir SRT (delegado a futuras implementaciones).
        Returns:
            diccionario con resultados y rutas generadas.
        """
        src = Path(src_video)
        out = Path(out_dir)
        out.mkdir(parents=True, exist_ok=True)
        result = {
            "input_video": str(src.resolve()),
            "out_dir": str(out.resolve()),
            "steps": [],
        }
        # 1) Extraer audio
        audio_wav = out / f"{src.stem}.wav"
        step = {"name": "extract_audio", "out": str(audio_wav)}
        result["steps"].append(step)
        if self.dry_run:
            logger.info("[dry-run] extraer audio: %s -> %s", src, audio_wav)
        else:
            logger.info("extraer audio: %s -> %s", src, audio_wav)
            process_video.extract_audio(str(src), str(audio_wav))
        # 2) Transcribir (segmentado si es necesario)
        srt_path = out / f"{src.stem}.srt"
        step = {"name": "transcribe", "out": str(srt_path)}
        result["steps"].append(step)
        if self.dry_run:
            logger.info("[dry-run] transcribir audio -> %s", srt_path)
            segments = []
        else:
            logger.info("transcribir audio -> %s", srt_path)
            # usamos la función delegante que el proyecto expone
            segments = transcribe.transcribe_segmented_with_tempfiles(str(audio_wav), [])
            transcribe.write_srt(segments, str(srt_path))
        # 3) (Opcional) traducir SRT — placeholder
        if translate:
            step = {"name": "translate", "out": str(srt_path)}
            result["steps"].append(step)
            if self.dry_run:
                logger.info("[dry-run] traducir SRT: %s", srt_path)
            else:
                logger.info("traducir SRT: %s (funcionalidad no implementada en orquestador)", srt_path)
        # 4) Generar TTS segmentado en un WAV final (dub)
        dubbed_wav = out / f"{src.stem}.dub.wav"
        step = {"name": "tts_and_stitch", "out": str(dubbed_wav)}
        result["steps"].append(step)
        if self.dry_run:
            logger.info("[dry-run] synthesize TTS por segmento -> %s (modelo=%s)", dubbed_wav, self.tts_model)
        else:
            logger.info("synthesize TTS por segmento -> %s (modelo=%s)", dubbed_wav, self.tts_model)
            # por ahora usamos la función helper de transcribe para síntesis (si existe)
            try:
                # `segments` viene de la transcripción previa
                transcribe.tts_synthesize(" ".join([s.get("text", "") for s in segments]), str(dubbed_wav), model=self.tts_model)
            except Exception:
                # Fallback simple: crear un silencio (no romper)
                logger.exception("TTS falló, creando archivo vacío como fallback")
                try:
                    process_video.pad_or_trim_wav(0.0, str(dubbed_wav))
                except Exception:
                    logger.exception("No se pudo crear WAV de fallback")
        # 5) Reemplazar audio en el vídeo
        dubbed_video = out / f"{src.stem}.dub.mp4"
        step = {"name": "replace_audio_in_video", "out": str(dubbed_video)}
        result["steps"].append(step)
        if self.dry_run:
            logger.info("[dry-run] reemplazar audio en video: %s -> %s", src, dubbed_video)
        else:
            logger.info("reemplazar audio en video: %s -> %s", src, dubbed_video)
            process_video.replace_audio_in_video(str(src), str(dubbed_wav), str(dubbed_video))
        # 6) Quemar subtítulos en vídeo final
        burned = out / f"{src.stem}.burned.mp4"
        step = {"name": "burn_subtitles", "out": str(burned)}
        result["steps"].append(step)
        if self.dry_run:
            logger.info("[dry-run] quemar subtítulos: %s + %s -> %s", dubbed_video, srt_path, burned)
        else:
            logger.info("quemar subtítulos: %s + %s -> %s", dubbed_video, srt_path, burned)
            process_video.burn_subtitles(str(dubbed_video), str(srt_path), str(burned))
        return result
 __all__ = ["Orchestrator"]
 import os
 import subprocess
 import sys
 from typing import Optional
 from ..core.models import PipelineResult
 from ..infra import ffmpeg_adapter
 from ..infra.kokoro_adapter import KokoroHttpClient
 class PipelineOrchestrator:
    """Use case class that coordinates the high-level steps of the pipeline.
    Esta clase mantiene la lógica de orquestación en métodos pequeños y
    testables, y depende de adaptadores infra para las operaciones I/O.
    """
    def __init__(
        self,
        kokoro_endpoint: str,
        kokoro_key: Optional[str] = None,
        voice: Optional[str] = None,
        kokoro_model: Optional[str] = None,
        transcriber=None,
        translator=None,
        tts_client=None,
        audio_processor=None,
    ):
        # Si no se inyectan adaptadores, crear implementaciones por defecto
        # Sólo importar adaptadores pesados si no se inyectan implementaciones.
        if transcriber is None:
            try:
                from ..infra.faster_whisper_adapter import FasterWhisperTranscriber
                self.transcriber = FasterWhisperTranscriber()
            except Exception:
                # dejar como None para permitir fallback a subprocess en tiempo de ejecución
                self.transcriber = None
        else:
            self.transcriber = transcriber
        if translator is None:
            try:
                from ..infra.marian_adapter import MarianTranslator
                self.translator = MarianTranslator()
            except Exception:
                self.translator = None
        else:
            self.translator = translator
        if tts_client is None:
            try:
                from ..infra.kokoro_adapter import KokoroHttpClient
                self.tts_client = KokoroHttpClient(kokoro_endpoint, api_key=kokoro_key, voice=voice, model=kokoro_model)
            except Exception:
                self.tts_client = None
        else:
            self.tts_client = tts_client
        if audio_processor is None:
            try:
                from ..infra.ffmpeg_adapter import FFmpegAudioProcessor
                self.audio_processor = FFmpegAudioProcessor()
            except Exception:
                self.audio_processor = None
        else:
            self.audio_processor = audio_processor
    def run(
        self,
        video: str,
        srt: Optional[str],
        workdir: str,
        translate_method: str = "local",
        gemini_api_key: Optional[str] = None,
        whisper_model: str = "base",
        mix: bool = False,
        mix_background_volume: float = 0.2,
        keep_chunks: bool = False,
        dry_run: bool = False,
    ) -> PipelineResult:
        """Run the pipeline.
        When dry_run=True the orchestrator will only print planned actions
        instead of executing subprocesses or ffmpeg commands.
        """
        # 0) prepare paths
        if dry_run:
            print("[dry-run] workdir:", workdir)
        # 1) extraer audio
        audio_tmp = os.path.join(workdir, "extracted_audio.wav")
        if dry_run:
            print(f"[dry-run] ffmpeg extract audio -> {audio_tmp}")
        else:
            self.audio_processor.extract_audio(video, audio_tmp, sr=16000)
        # 2) transcribir si es necesario
        if srt:
            srt_in = srt
        else:
            srt_in = os.path.join(workdir, "transcribed.srt")
            cmd_trans = [
                sys.executable,
                "whisper_project/transcribe.py",
                "--file",
                audio_tmp,
                "--backend",
                "faster-whisper",
                "--model",
                whisper_model,
                "--srt",
                "--srt-file",
                srt_in,
            ]
            if dry_run:
                print("[dry-run] ", " ".join(cmd_trans))
            else:
                # Use injected transcriber when possible
                try:
                    self.transcriber.transcribe(audio_tmp, srt_in)
                except Exception:
                    # Fallback to subprocess if adapter not available
                    subprocess.run(cmd_trans, check=True)
        # 3) traducir
        srt_translated = os.path.join(workdir, "translated.srt")
        if translate_method == "local":
            cmd_translate = [
                sys.executable,
                "whisper_project/translate_srt_local.py",
                "--in",
                srt_in,
                "--out",
                srt_translated,
            ]
            if dry_run:
                print("[dry-run] ", " ".join(cmd_translate))
            else:
                try:
                    self.translator.translate_srt(srt_in, srt_translated)
                except Exception:
                    subprocess.run(cmd_translate, check=True)
        elif translate_method == "gemini":
            # preferir adaptador inyectado que soporte Gemini, sino usar el local wrapper
            cmd_translate = [
                sys.executable,
                "whisper_project/translate_srt_with_gemini.py",
                "--in",
                srt_in,
                "--out",
                srt_translated,
            ]
            if gemini_api_key:
                cmd_translate += ["--gemini-api-key", gemini_api_key]
            if dry_run:
                print("[dry-run] ", " ".join(cmd_translate))
            else:
                try:
                    # intentar usar adaptador Gemini si está disponible
                    if self.translator and getattr(self.translator, "__class__", None).__name__ == "GeminiTranslator":
                        self.translator.translate_srt(srt_in, srt_translated)
                    else:
                        # intentar importar adaptador local
                        from ..infra.gemini_adapter import GeminiTranslator
                        gem = GeminiTranslator(api_key=gemini_api_key)
                        gem.translate_srt(srt_in, srt_translated)
                except Exception:
                    subprocess.run(cmd_translate, check=True)
        elif translate_method == "argos":
            cmd_translate = [
                sys.executable,
                "whisper_project/translate_srt_argos.py",
                "--in",
                srt_in,
                "--out",
                srt_translated,
            ]
            if dry_run:
                print("[dry-run] ", " ".join(cmd_translate))
            else:
                try:
                    if self.translator and getattr(self.translator, "__class__", None).__name__ == "ArgosTranslator":
                        self.translator.translate_srt(srt_in, srt_translated)
                    else:
                        from ..infra.argos_adapter import ArgosTranslator
                        a = ArgosTranslator()
                        a.translate_srt(srt_in, srt_translated)
                except Exception:
                    subprocess.run(cmd_translate, check=True)
        elif translate_method == "none":
            srt_translated = srt_in
        else:
            raise ValueError("translate_method not supported in this orchestrator")
        # 4) sintetizar por segmento
        dub_wav = os.path.join(workdir, "dub_final.wav")
        if dry_run:
            print(f"[dry-run] synthesize from srt {srt_translated} -> {dub_wav} (align={True} mix={mix})")
        else:
            # Use injected tts_client
            self.tts_client.synthesize_from_srt(
                srt_translated,
                dub_wav,
                video=video,
                align=True,
                keep_chunks=keep_chunks,
                mix_with_original=mix,
                mix_background_volume=mix_background_volume,
            )
        # 5) reemplazar audio en vídeo
        replaced = os.path.splitext(video)[0] + ".replaced_audio.mp4"
        if dry_run:
            print(f"[dry-run] replace audio in video -> {replaced}")
        else:
            self.audio_processor.replace_audio_in_video(video, dub_wav, replaced)
        # 6) quemar subtítulos
        burned = os.path.splitext(video)[0] + ".replaced_audio.subs.mp4"
        if dry_run:
            print(f"[dry-run] burn subtitles {srt_translated} into -> {burned}")
        else:
            self.audio_processor.burn_subtitles(replaced, srt_translated, burned)
        return PipelineResult(
            workdir=workdir,
            dub_wav=dub_wav,
            replaced_video=replaced,
            burned_video=burned,
        )
		`@ -0,0 +1,3 @@`
							`from . import orchestrator`

							`__all__ = ["orchestrator"]`