submaster/prompt_init.md

You can run OpenAI's **Whisper** model for audio-to-text transcription on a **CPU** using **PyTorch**, typically by either using the original `openai-whisper` library or the Hugging Face `transformers` implementation.

### Using the `openai-whisper` library

1.  **Installation:** Ensure you have Python, PyTorch (CPU version), and **FFmpeg** installed.

    ```bash
    # Install the Whisper package
    pip install -U openai-whisper
    # On Linux, install FFmpeg (example for Debian/Ubuntu)
    sudo apt update && sudo apt install ffmpeg
    ```

2.  **Specify CPU in Python:** In your Python script, explicitly load the model and move it to the CPU device. You can also pass the `device='cpu'` argument directly to `whisper.load_model()`.

    ```python
    import whisper

    # Load the model and specify 'cpu' as the device
    model = whisper.load_model("base", device='cpu')

    # Or, if loading and then moving:
    # model = whisper.load_model("base").to("cpu")

    # Transcribe the audio file
    result = model.transcribe("path/to/your/audio.mp3", fp16=False) # fp16=False is recommended for CPU

    print(result["text"])
    ```

    *Note: Using a smaller model like `"tiny"` or `"base"` will be significantly faster on a CPU.*

-----

### Using the Hugging Face `transformers` library

The Hugging Face `transformers` library also provides a way to run Whisper and often includes optimizations:

1.  **Installation:** Install the necessary libraries, ensuring you have the CPU-only version of PyTorch if you don't have a GPU.

    ```bash
    pip install transformers datasets accelerate torch
    ```

2.  **Setup and Pipeline:** Use the PyTorch `AutoModelForSpeechSeq2Seq`, `AutoProcessor`, and `pipeline`, explicitly setting the device to `"cpu"`:

    ```python
    import torch
    from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline

    # Set device to CPU
    device = "cpu"
    torch_dtype = torch.float32 # Use float32 on CPU for standard performance

    # Choose a model size
    model_id = "openai/whisper-base" # Example model

    # Load model and processor
    model = AutoModelForSpeechSeq2Seq.from_pretrained(
        model_id,
        torch_dtype=torch_dtype,
        low_cpu_mem_usage=True,
        use_safetensors=True
    ).to(device)

    processor = AutoProcessor.from_pretrained(model_id)

    # Create the ASR pipeline
    pipe = pipeline(
        "automatic-speech-recognition",
        model=model,
        tokenizer=processor.tokenizer,
        feature_extractor=processor.feature_extractor,
        torch_dtype=torch_dtype,
        device=device,
    )

    # Transcribe
    result = pipe("path/to/your/audio.mp3")
    print(result["text"])
    ```

-----

### Optimization: `faster-whisper`

For much better performance on a CPU (up to 4 times faster), consider using the **`faster-whisper`** library, which uses the CTranslate2 inference engine:

1.  **Installation:**

    ```bash
    pip install faster-whisper
    ```

2.  **Usage:**

    ```python
    from faster_whisper import WhisperModel

    model_size = "base" # Choose a model size

    # Run on CPU with INT8 precision for speed
    model = WhisperModel(model_size, device="cpu", compute_type="int8")

    segments, info = model.transcribe("path/to/your/audio.mp3", beam_size=5)

    for segment in segments:
        print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
    ```

The [Whisper: Install Guide](https://www.youtube.com/watch?v=XX-ET_-onYU) video walks through the initial installation steps for Whisper AI, which is a prerequisite for running it with PyTorch on any device.
http://googleusercontent.com/youtube_content/0