You can run OpenAI's **Whisper** model for audio-to-text transcription on a **CPU** using **PyTorch**, typically by either using the original `openai-whisper` library or the Hugging Face `transformers` implementation. ### Using the `openai-whisper` library 1. **Installation:** Ensure you have Python, PyTorch (CPU version), and **FFmpeg** installed. ```bash # Install the Whisper package pip install -U openai-whisper # On Linux, install FFmpeg (example for Debian/Ubuntu) sudo apt update && sudo apt install ffmpeg ``` 2. **Specify CPU in Python:** In your Python script, explicitly load the model and move it to the CPU device. You can also pass the `device='cpu'` argument directly to `whisper.load_model()`. ```python import whisper # Load the model and specify 'cpu' as the device model = whisper.load_model("base", device='cpu') # Or, if loading and then moving: # model = whisper.load_model("base").to("cpu") # Transcribe the audio file result = model.transcribe("path/to/your/audio.mp3", fp16=False) # fp16=False is recommended for CPU print(result["text"]) ``` *Note: Using a smaller model like `"tiny"` or `"base"` will be significantly faster on a CPU.* ----- ### Using the Hugging Face `transformers` library The Hugging Face `transformers` library also provides a way to run Whisper and often includes optimizations: 1. **Installation:** Install the necessary libraries, ensuring you have the CPU-only version of PyTorch if you don't have a GPU. ```bash pip install transformers datasets accelerate torch ``` 2. **Setup and Pipeline:** Use the PyTorch `AutoModelForSpeechSeq2Seq`, `AutoProcessor`, and `pipeline`, explicitly setting the device to `"cpu"`: ```python import torch from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline # Set device to CPU device = "cpu" torch_dtype = torch.float32 # Use float32 on CPU for standard performance # Choose a model size model_id = "openai/whisper-base" # Example model # Load model and processor model = AutoModelForSpeechSeq2Seq.from_pretrained( model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True ).to(device) processor = AutoProcessor.from_pretrained(model_id) # Create the ASR pipeline pipe = pipeline( "automatic-speech-recognition", model=model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor, torch_dtype=torch_dtype, device=device, ) # Transcribe result = pipe("path/to/your/audio.mp3") print(result["text"]) ``` ----- ### Optimization: `faster-whisper` For much better performance on a CPU (up to 4 times faster), consider using the **`faster-whisper`** library, which uses the CTranslate2 inference engine: 1. **Installation:** ```bash pip install faster-whisper ``` 2. **Usage:** ```python from faster_whisper import WhisperModel model_size = "base" # Choose a model size # Run on CPU with INT8 precision for speed model = WhisperModel(model_size, device="cpu", compute_type="int8") segments, info = model.transcribe("path/to/your/audio.mp3", beam_size=5) for segment in segments: print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text)) ``` The [Whisper: Install Guide](https://www.youtube.com/watch?v=XX-ET_-onYU) video walks through the initial installation steps for Whisper AI, which is a prerequisite for running it with PyTorch on any device. http://googleusercontent.com/youtube_content/0