112 lines
3.6 KiB
Markdown
112 lines
3.6 KiB
Markdown
You can run OpenAI's **Whisper** model for audio-to-text transcription on a **CPU** using **PyTorch**, typically by either using the original `openai-whisper` library or the Hugging Face `transformers` implementation.
|
|
|
|
### Using the `openai-whisper` library
|
|
|
|
1. **Installation:** Ensure you have Python, PyTorch (CPU version), and **FFmpeg** installed.
|
|
|
|
```bash
|
|
# Install the Whisper package
|
|
pip install -U openai-whisper
|
|
# On Linux, install FFmpeg (example for Debian/Ubuntu)
|
|
sudo apt update && sudo apt install ffmpeg
|
|
```
|
|
|
|
2. **Specify CPU in Python:** In your Python script, explicitly load the model and move it to the CPU device. You can also pass the `device='cpu'` argument directly to `whisper.load_model()`.
|
|
|
|
```python
|
|
import whisper
|
|
|
|
# Load the model and specify 'cpu' as the device
|
|
model = whisper.load_model("base", device='cpu')
|
|
|
|
# Or, if loading and then moving:
|
|
# model = whisper.load_model("base").to("cpu")
|
|
|
|
# Transcribe the audio file
|
|
result = model.transcribe("path/to/your/audio.mp3", fp16=False) # fp16=False is recommended for CPU
|
|
|
|
print(result["text"])
|
|
```
|
|
|
|
*Note: Using a smaller model like `"tiny"` or `"base"` will be significantly faster on a CPU.*
|
|
|
|
-----
|
|
|
|
### Using the Hugging Face `transformers` library
|
|
|
|
The Hugging Face `transformers` library also provides a way to run Whisper and often includes optimizations:
|
|
|
|
1. **Installation:** Install the necessary libraries, ensuring you have the CPU-only version of PyTorch if you don't have a GPU.
|
|
|
|
```bash
|
|
pip install transformers datasets accelerate torch
|
|
```
|
|
|
|
2. **Setup and Pipeline:** Use the PyTorch `AutoModelForSpeechSeq2Seq`, `AutoProcessor`, and `pipeline`, explicitly setting the device to `"cpu"`:
|
|
|
|
```python
|
|
import torch
|
|
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
|
|
|
|
# Set device to CPU
|
|
device = "cpu"
|
|
torch_dtype = torch.float32 # Use float32 on CPU for standard performance
|
|
|
|
# Choose a model size
|
|
model_id = "openai/whisper-base" # Example model
|
|
|
|
# Load model and processor
|
|
model = AutoModelForSpeechSeq2Seq.from_pretrained(
|
|
model_id,
|
|
torch_dtype=torch_dtype,
|
|
low_cpu_mem_usage=True,
|
|
use_safetensors=True
|
|
).to(device)
|
|
|
|
processor = AutoProcessor.from_pretrained(model_id)
|
|
|
|
# Create the ASR pipeline
|
|
pipe = pipeline(
|
|
"automatic-speech-recognition",
|
|
model=model,
|
|
tokenizer=processor.tokenizer,
|
|
feature_extractor=processor.feature_extractor,
|
|
torch_dtype=torch_dtype,
|
|
device=device,
|
|
)
|
|
|
|
# Transcribe
|
|
result = pipe("path/to/your/audio.mp3")
|
|
print(result["text"])
|
|
```
|
|
|
|
-----
|
|
|
|
### Optimization: `faster-whisper`
|
|
|
|
For much better performance on a CPU (up to 4 times faster), consider using the **`faster-whisper`** library, which uses the CTranslate2 inference engine:
|
|
|
|
1. **Installation:**
|
|
|
|
```bash
|
|
pip install faster-whisper
|
|
```
|
|
|
|
2. **Usage:**
|
|
|
|
```python
|
|
from faster_whisper import WhisperModel
|
|
|
|
model_size = "base" # Choose a model size
|
|
|
|
# Run on CPU with INT8 precision for speed
|
|
model = WhisperModel(model_size, device="cpu", compute_type="int8")
|
|
|
|
segments, info = model.transcribe("path/to/your/audio.mp3", beam_size=5)
|
|
|
|
for segment in segments:
|
|
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
|
|
```
|
|
|
|
The [Whisper: Install Guide](https://www.youtube.com/watch?v=XX-ET_-onYU) video walks through the initial installation steps for Whisper AI, which is a prerequisite for running it with PyTorch on any device.
|
|
http://googleusercontent.com/youtube_content/0 |