3.6 KiB
You can run OpenAI's Whisper model for audio-to-text transcription on a CPU using PyTorch, typically by either using the original openai-whisper library or the Hugging Face transformers implementation.
Using the openai-whisper library
-
Installation: Ensure you have Python, PyTorch (CPU version), and FFmpeg installed.
# Install the Whisper package pip install -U openai-whisper # On Linux, install FFmpeg (example for Debian/Ubuntu) sudo apt update && sudo apt install ffmpeg -
Specify CPU in Python: In your Python script, explicitly load the model and move it to the CPU device. You can also pass the
device='cpu'argument directly towhisper.load_model().import whisper # Load the model and specify 'cpu' as the device model = whisper.load_model("base", device='cpu') # Or, if loading and then moving: # model = whisper.load_model("base").to("cpu") # Transcribe the audio file result = model.transcribe("path/to/your/audio.mp3", fp16=False) # fp16=False is recommended for CPU print(result["text"])Note: Using a smaller model like
"tiny"or"base"will be significantly faster on a CPU.
Using the Hugging Face transformers library
The Hugging Face transformers library also provides a way to run Whisper and often includes optimizations:
-
Installation: Install the necessary libraries, ensuring you have the CPU-only version of PyTorch if you don't have a GPU.
pip install transformers datasets accelerate torch -
Setup and Pipeline: Use the PyTorch
AutoModelForSpeechSeq2Seq,AutoProcessor, andpipeline, explicitly setting the device to"cpu":import torch from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline # Set device to CPU device = "cpu" torch_dtype = torch.float32 # Use float32 on CPU for standard performance # Choose a model size model_id = "openai/whisper-base" # Example model # Load model and processor model = AutoModelForSpeechSeq2Seq.from_pretrained( model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True ).to(device) processor = AutoProcessor.from_pretrained(model_id) # Create the ASR pipeline pipe = pipeline( "automatic-speech-recognition", model=model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor, torch_dtype=torch_dtype, device=device, ) # Transcribe result = pipe("path/to/your/audio.mp3") print(result["text"])
Optimization: faster-whisper
For much better performance on a CPU (up to 4 times faster), consider using the faster-whisper library, which uses the CTranslate2 inference engine:
-
Installation:
pip install faster-whisper -
Usage:
from faster_whisper import WhisperModel model_size = "base" # Choose a model size # Run on CPU with INT8 precision for speed model = WhisperModel(model_size, device="cpu", compute_type="int8") segments, info = model.transcribe("path/to/your/audio.mp3", beam_size=5) for segment in segments: print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
The Whisper: Install Guide video walks through the initial installation steps for Whisper AI, which is a prerequisite for running it with PyTorch on any device. http://googleusercontent.com/youtube_content/0