submaster/prompt_init.md at 85691f13dcc03ae2db778875cf17f72411596efc

Cesar Mendivil 85691f13dc Init commit

2025-10-23 21:54:13 -07:00

3.6 KiB

Raw Blame History

You can run OpenAI's Whisper model for audio-to-text transcription on a CPU using PyTorch, typically by either using the original openai-whisper library or the Hugging Face transformers implementation.

Using the `openai-whisper` library

Installation: Ensure you have Python, PyTorch (CPU version), and FFmpeg installed.

# Install the Whisper package
pip install -U openai-whisper
# On Linux, install FFmpeg (example for Debian/Ubuntu)
sudo apt update && sudo apt install ffmpeg

Specify CPU in Python: In your Python script, explicitly load the model and move it to the CPU device. You can also pass the device='cpu' argument directly to whisper.load_model().

import whisper

# Load the model and specify 'cpu' as the device
model = whisper.load_model("base", device='cpu') 

# Or, if loading and then moving:
# model = whisper.load_model("base").to("cpu") 

# Transcribe the audio file
result = model.transcribe("path/to/your/audio.mp3", fp16=False) # fp16=False is recommended for CPU

print(result["text"])

Note: Using a smaller model like "tiny" or "base" will be significantly faster on a CPU.

Using the Hugging Face `transformers` library

The Hugging Face transformers library also provides a way to run Whisper and often includes optimizations:

Installation: Install the necessary libraries, ensuring you have the CPU-only version of PyTorch if you don't have a GPU.
```
pip install transformers datasets accelerate torch
```

Setup and Pipeline: Use the PyTorch AutoModelForSpeechSeq2Seq, AutoProcessor, and pipeline, explicitly setting the device to "cpu":

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline

# Set device to CPU
device = "cpu"
torch_dtype = torch.float32 # Use float32 on CPU for standard performance

# Choose a model size
model_id = "openai/whisper-base" # Example model

# Load model and processor
model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, 
    torch_dtype=torch_dtype, 
    low_cpu_mem_usage=True, 
    use_safetensors=True
).to(device)

processor = AutoProcessor.from_pretrained(model_id)

# Create the ASR pipeline
pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    torch_dtype=torch_dtype,
    device=device,
)

# Transcribe
result = pipe("path/to/your/audio.mp3")
print(result["text"])

Optimization: `faster-whisper`

For much better performance on a CPU (up to 4 times faster), consider using the faster-whisper library, which uses the CTranslate2 inference engine:

Installation:
```
pip install faster-whisper
```

Usage:

from faster_whisper import WhisperModel

model_size = "base" # Choose a model size

# Run on CPU with INT8 precision for speed
model = WhisperModel(model_size, device="cpu", compute_type="int8") 

segments, info = model.transcribe("path/to/your/audio.mp3", beam_size=5)

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

The Whisper: Install Guide video walks through the initial installation steps for Whisper AI, which is a prerequisite for running it with PyTorch on any device. http://googleusercontent.com/youtube_content/0

3.6 KiB Raw Blame History

Using the openai-whisper library

Using the Hugging Face transformers library

Optimization: faster-whisper

3.6 KiB

Raw Blame History

Using the `openai-whisper` library

Using the Hugging Face `transformers` library

Optimization: `faster-whisper`