# whisper.cpp
![whisper.cpp](https://user-images.githubusercontent.com/1991296/235238348-05d0f6a4-da44-4900-a1de-d0707e75b763.jpeg)
[![Actions Status](https://github.com/ggerganov/whisper.cpp/workflows/CI/badge.svg)](https://github.com/ggerganov/whisper.cpp/actions)
[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![npm](https://img.shields.io/npm/v/whisper.cpp.svg)](https://www.npmjs.com/package/whisper.cpp/)
Stable: [v1.5.2](https://github.com/ggerganov/whisper.cpp/releases/tag/v1.5.2) / [Roadmap | F.A.Q.](https://github.com/ggerganov/whisper.cpp/discussions/126)
High-performance inference of [OpenAI's Whisper](https://github.com/openai/whisper) automatic speech recognition (ASR) model:
- Plain C/C++ implementation without dependencies
- Apple Silicon first-class citizen - optimized via ARM NEON, Accelerate framework, Metal and [Core ML](https://github.com/ggerganov/whisper.cpp#core-ml-support)
- AVX intrinsics support for x86 architectures
- VSX intrinsics support for POWER architectures
- Mixed F16 / F32 precision
- [4-bit and 5-bit integer quantization support](https://github.com/ggerganov/whisper.cpp#quantization)
- Zero memory allocations at runtime
- Support for CPU-only inference
- [Efficient GPU support for NVIDIA](https://github.com/ggerganov/whisper.cpp#nvidia-gpu-support-via-cublas)
- [Partial OpenCL GPU support via CLBlast](https://github.com/ggerganov/whisper.cpp#opencl-gpu-support-via-clblast)
- [OpenVINO Support](https://github.com/ggerganov/whisper.cpp#openvino-support)
- [C-style API](https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h)
Supported platforms:
- [x] Mac OS (Intel and Arm)
- [x] [iOS](examples/whisper.objc)
- [x] [Android](examples/whisper.android)
- [x] [Java](bindings/java/README.md)
- [x] Linux / [FreeBSD](https://github.com/ggerganov/whisper.cpp/issues/56#issuecomment-1350920264)
- [x] [WebAssembly](examples/whisper.wasm)
- [x] Windows ([MSVC](https://github.com/ggerganov/whisper.cpp/blob/master/.github/workflows/build.yml#L117-L144) and [MinGW](https://github.com/ggerganov/whisper.cpp/issues/168)]
- [x] [Raspberry Pi](https://github.com/ggerganov/whisper.cpp/discussions/166)
- [x] [docker](https://github.com/ggerganov/whisper.cpp/pkgs/container/whisper.cpp)
The entire high-level implementation of the model is contained in [whisper.h](whisper.h) and [whisper.cpp](whisper.cpp).
The rest of the code is part of the [ggml](https://github.com/ggerganov/ggml) machine learning library.
Having such a lightweight implementation of the model allows to easily integrate it in different platforms and applications.
As an example, here is a video of running the model on an iPhone 13 device - fully offline, on-device: [whisper.objc](examples/whisper.objc)
https://user-images.githubusercontent.com/1991296/197385372-962a6dea-bca1-4d50-bf96-1d8c27b98c81.mp4
You can also easily make your own offline voice assistant application: [command](examples/command)
https://user-images.githubusercontent.com/1991296/204038393-2f846eae-c255-4099-a76d-5735c25c49da.mp4
On Apple Silicon, the inference runs fully on the GPU via Metal:
https://github.com/ggerganov/whisper.cpp/assets/1991296/c82e8f86-60dc-49f2-b048-d2fdbd6b5225
Or you can even run it straight in the browser: [talk.wasm](examples/talk.wasm)
## Implementation details
- The core tensor operations are implemented in C ([ggml.h](ggml.h) / [ggml.c](ggml.c))
- The transformer model and the high-level C-style API are implemented in C++ ([whisper.h](whisper.h) / [whisper.cpp](whisper.cpp))
- Sample usage is demonstrated in [main.cpp](examples/main)
- Sample real-time audio transcription from the microphone is demonstrated in [stream.cpp](examples/stream)
- Various other examples are available in the [examples](examples) folder
The tensor operators are optimized heavily for Apple silicon CPUs. Depending on the computation size, Arm Neon SIMD
intrinsics or CBLAS Accelerate framework routines are used. The latter are especially effective for bigger sizes since
the Accelerate framework utilizes the special-purpose AMX coprocessor available in modern Apple products.
## Quick start
First clone the repository.
Then, download one of the Whisper models converted in [ggml format](models). For example:
```bash
bash ./models/download-ggml-model.sh base.en
```
If you wish to convert the Whisper models to ggml format yourself, instructions are in [models/README.md](models/README.md).
Now build the [main](examples/main) example and transcribe an audio file like this:
```bash
# build the main example
make
# transcribe an audio file
./main -f samples/jfk.wav
```
---
For a quick demo, simply run `make base.en`:
```java
$ make base.en
cc -I. -O3 -std=c11 -pthread -DGGML_USE_ACCELERATE -c ggml.c -o ggml.o
c++ -I. -I./examples -O3 -std=c++11 -pthread -c whisper.cpp -o whisper.o
c++ -I. -I./examples -O3 -std=c++11 -pthread examples/main/main.cpp whisper.o ggml.o -o main -framework Accelerate
./main -h
usage: ./main [options] file0.wav file1.wav ...
options:
-h, --help [default] show this help message and exit
-t N, --threads N [4 ] number of threads to use during computation
-p N, --processors N [1 ] number of processors to use during computation
-ot N, --offset-t N [0 ] time offset in milliseconds
-on N, --offset-n N [0 ] segment index offset
-d N, --duration N [0 ] duration of audio to process in milliseconds
-mc N, --max-context N [-1 ] maximum number of text context tokens to store
-ml N, --max-len N [0 ] maximum segment length in characters
-sow, --split-on-word [false ] split on word rather than on token
-bo N, --best-of N [5 ] number of best candidates to keep
-bs N, --beam-size N [5 ] beam size for beam search
-wt N, --word-thold N [0.01 ] word timestamp probability threshold
-et N, --entropy-thold N [2.40 ] entropy threshold for decoder fail
-lpt N, --logprob-thold N [-1.00 ] log probability threshold for decoder fail
-debug, --debug-mode [false ] enable debug mode (eg. dump log_mel)
-tr, --translate [false ] translate from source language to english
-di, --diarize [false ] stereo audio diarization
-tdrz, --tinydiarize [false ] enable tinydiarize (requires a tdrz model)
-nf, --no-fallback [false ] do not use temperature fallback while decoding
-otxt, --output-txt [false ] output result in a text file
-ovtt, --output-vtt [false ] output result in a vtt file
-osrt, --output-srt [false ] output result in a srt file
-olrc, --output-lrc [false ] output result in a lrc file
-owts, --output-words [false ] output script for generating karaoke video
-fp, --font-path [/System/Library/Fonts/Supplemental/Courier New Bold.ttf] path to a monospace font for karaoke video
-ocsv, --output-csv [false ] output result in a CSV file
-oj, --output-json [false ] output result in a JSON file
-ojf, --output-json-full [false ] include more information in the JSON file
-of FNAME, --output-file FNAME [ ] output file path (without file extension)
-ps, --print-special [false ] print special tokens
-pc, --print-colors [false ] print colors
-pp, --print-progress [false ] print progress
-nt, --no-timestamps [false ] do not print timestamps
-l LANG, --language LANG [en ] spoken language ('auto' for auto-detect)
-dl, --detect-language [false ] exit after automatically detecting language
--prompt PROMPT [ ] initial prompt
-m FNAME, --model FNAME [models/ggml-base.en.bin] model path
-f FNAME,
UnknownToKnown
- 粉丝: 1w+
- 资源: 782
最新资源
- 三菱Q系列PLC压背光板恒定压力控制及转矩模式切换案例:涵盖ST编程、QD77定位与PID调节,三菱Q系列PLC程序案例 本案例是压背光板并保持恒定压力,通过位置模式以及转矩模式切来快速实现压力保持
- MATLAB Simulink R2015b下的SEPIC变换器仿真模型:涵盖开环与闭环控制,SEPIC:基于MATLAB Simulink的SEPIC变器仿真模型,包含开环控制和闭环控制两种控制
- MATLAB Simulink R2015b下的Buck变换器仿真模型:涵盖开环与闭环控制策略,Buck:基于MATLAB Simulink的Buck变器仿真模型,包含开环控制和闭环控制两种控制 仿
- MATLAB中NSGA2算法求解分布式电源选址定容问题:实用参考方案,MATLAB程序采用非支配排序遗传算法(NSGA2)求解分布式电源选址定容问题,可作为一个有用的参考,程序注释明确,算法原理可以自
- 基于MATLAB Simulink R2015b的三相静止到两相变换仿真模型:Clark与Park变换的综合应用,ABC-DQ:基于MATLAB Simulink的三相静止坐标系到两相静止坐标系(Cl
- Halcon与C#联合开发实用框架:源码学习参考,经过实际项目验证并修复Bug的全新版本 ,Halcon联合C#开发最新版实用框架 实际项目应用验证过的版本,源码,修改了大量Bug以适合实际项目应用
- 三菱FX3U分切机程序详解:结合伺服速度与力矩控制,多种控制模式(锥度与恒张力)及张力检测与PID调节的锂电行业通用模板,三菱FX3U分切机程序,采用三菱伺服的速度与力矩模式,收料采用锥度与恒张力两种
- 汇川H3U PLC与触摸屏程序模板:运动轴控制解决方案,含伺服CANlink总线技术,支持手动暂停与气缸超时处理,集成基恩士相机TCP/IP通讯,编程效率飞跃提升 ,汇川,H3U,plc程序模板和触摸
- 电力载波通讯技术的互动开关源代码与硬件电路图:独特算法,长距离传输,强抗干扰,穿透空气开关技术,采用电力载波通讯技术的互动开关软件源代码,硬件电路图 自有算法,传输距离远,抗干扰性能强,能穿透空气开
- Simulink神经网络模型调用:实现回归预测、分类与时间序列分析功能,simulink调用神经网络训练好的模型进行回归预测,分类,以及时间序列分析 事先根据数据集对模型进行训练,以此保存最好模型
- PLC生产方案:国产AT32F407芯片,支持CANOPEN与机械臂控制,以太网DHCP及服务器TCP Modbus,高速定位脉冲输出,特殊扩展模块众多,可连接云平台,商用专用,谨防盗版 ,PLC生产
- 维纶触摸屏程序项目:威纶通界面UI应用与EB Pro 6.0及以上版本兼容性支持,适用于ip和ie系列不同尺寸触摸屏电子文档复制使用指南,维纶触摸屏程序实际项目,威纶通界面UI,复制可用,威伦通触摸E
- 西门子1200灌装线程序:包括PLC和触摸屏功能,全面覆盖实际应用与仿真,数据统计存储、故障急停处理及报警系统,适用于博图13及以上版本学习体验,西门子1200灌装线全线程序,程序分为两部分,一部分为
- 基于Matlab的共享储能电站日前优化调度程序:工业用户非原价算法文章交易细则,matlab程序,文章付现,关键词:共享储能电站,日前优化调度,工业用户 非原价 拿之前问清楚 可以运行看结果,出不
- 基于MATLAB Simulink R2015b的三相三电平SVPWM逆变器仿真模型研究,Three-Phase-Inverter-3Level:基于MATALB Simulink的三相三电平SVPW
- 弱磁算法的单电流控制策略:额定转速下MTPA控制,额定转速上实施单电流控制机制,该弱磁算法采用单电流控制策略,额定转速以下采用MTPA控制,额定转速以上采用单电流控制 ,弱磁算法;单电流控制策略;M
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈