OpenAI的Whisper模型_ggml-medium.bin资源-CSDN文库

共429个文件

java：41个

h：36个

txt：33个

版权申诉

语音识别

arm

macos

109 浏览量 2023-12-26 17:33:35 上传评论 1 收藏 4.6MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

OpenAI的Whisper模型（429个子文件）

gradlew.bat 3KB

gradlew.bat 2KB

speak.bat 70B

for-tests-ggml-medium.en.bin 573KB

for-tests-ggml-small.en.bin 573KB

for-tests-ggml-base.en.bin 573KB

for-tests-ggml-tiny.en.bin 573KB

for-tests-ggml-large.bin 562KB

for-tests-ggml-base.bin 562KB

for-tests-ggml-tiny.bin 562KB

for-tests-ggml-small.bin 562KB

for-tests-ggml-medium.bin 562KB

ggml.c 635KB

ggml-quants.c 284KB

ggml-quants.c 282KB

ggml-backend.c 51KB

ggml-backend.c 35KB

ggml-alloc.c 28KB

jni.c 9KB

jni.c 8KB

BuildTypes.cmake 2KB

GitVars.cmake 717B

DefaultTargetOptions.cmake 437B

download-ggml-model.cmd 1KB

llama.cpp 392KB

whisper.cpp 226KB

ggml-opencl.cpp 69KB

main.cpp 45KB

server.cpp 34KB

command.cpp 29KB

talk-llama.cpp 29KB

common.cpp 28KB

gpt-2.cpp 27KB

Chessboard.cpp 26KB

lsp.cpp 20KB

stream.cpp 18KB

grammar-parser.cpp 17KB

ruby_whisper.cpp 15KB

talk.cpp 15KB

emscripten.cpp 12KB

addon.cpp 12KB

wchess.cmd.cpp 10KB

emscripten.cpp 10KB

common-ggml.cpp 8KB

quantize.cpp 8KB

common-sdl.cpp 7KB

WChess.cpp 6KB

emscripten.cpp 6KB

bench.cpp 6KB

test-chessboard.cpp 4KB

wchess.wasm.cpp 4KB

whisper-openvino-encoder.cpp 4KB

emscripten.cpp 4KB

emscripten.cpp 3KB

chessboard-1.0.0.css 978B

chessboard-1.0.0.min.css 718B

ggml-cuda.cu 365KB

main-cuda.Dockerfile 965B

cublas.Dockerfile 589B

main.Dockerfile 402B

WhisperCppDemo.entitlements 369B

assistant.gbnf 2KB

chess.gbnf 941B

colors.gbnf 339B

.gitignore 803B

.gitignore 225B

.gitignore 95B

.gitignore 47B

.gitignore 25B

.gitignore 22B

.gitignore 13B

.gitignore 12B

.gitignore 10B

.gitignore 6B

.gitignore 2B

.gitignore 0B

.gitmodules 96B

whisper.go 16KB

context.go 8KB

main.go 5KB

flags.go 5KB

params.go 4KB

process.go 3KB

interface.go 3KB

whisper_test.go 3KB

model.go 2KB

context_test.go 1KB

main.go 878B

共 429 条

# whisper.cpp ![whisper.cpp](https://user-images.githubusercontent.com/1991296/235238348-05d0f6a4-da44-4900-a1de-d0707e75b763.jpeg) [![Actions Status](https://github.com/ggerganov/whisper.cpp/workflows/CI/badge.svg)](https://github.com/ggerganov/whisper.cpp/actions) [![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT) [![npm](https://img.shields.io/npm/v/whisper.cpp.svg)](https://www.npmjs.com/package/whisper.cpp/) Stable: [v1.5.2](https://github.com/ggerganov/whisper.cpp/releases/tag/v1.5.2) / [Roadmap | F.A.Q.](https://github.com/ggerganov/whisper.cpp/discussions/126) High-performance inference of [OpenAI's Whisper](https://github.com/openai/whisper) automatic speech recognition (ASR) model: - Plain C/C++ implementation without dependencies - Apple Silicon first-class citizen - optimized via ARM NEON, Accelerate framework, Metal and [Core ML](https://github.com/ggerganov/whisper.cpp#core-ml-support) - AVX intrinsics support for x86 architectures - VSX intrinsics support for POWER architectures - Mixed F16 / F32 precision - [4-bit and 5-bit integer quantization support](https://github.com/ggerganov/whisper.cpp#quantization) - Zero memory allocations at runtime - Support for CPU-only inference - [Efficient GPU support for NVIDIA](https://github.com/ggerganov/whisper.cpp#nvidia-gpu-support-via-cublas) - [Partial OpenCL GPU support via CLBlast](https://github.com/ggerganov/whisper.cpp#opencl-gpu-support-via-clblast) - [OpenVINO Support](https://github.com/ggerganov/whisper.cpp#openvino-support) - [C-style API](https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h) Supported platforms: - [x] Mac OS (Intel and Arm) - [x] [iOS](examples/whisper.objc) - [x] [Android](examples/whisper.android) - [x] [Java](bindings/java/README.md) - [x] Linux / [FreeBSD](https://github.com/ggerganov/whisper.cpp/issues/56#issuecomment-1350920264) - [x] [WebAssembly](examples/whisper.wasm) - [x] Windows ([MSVC](https://github.com/ggerganov/whisper.cpp/blob/master/.github/workflows/build.yml#L117-L144) and [MinGW](https://github.com/ggerganov/whisper.cpp/issues/168)] - [x] [Raspberry Pi](https://github.com/ggerganov/whisper.cpp/discussions/166) - [x] [docker](https://github.com/ggerganov/whisper.cpp/pkgs/container/whisper.cpp) The entire high-level implementation of the model is contained in [whisper.h](whisper.h) and [whisper.cpp](whisper.cpp). The rest of the code is part of the [ggml](https://github.com/ggerganov/ggml) machine learning library. Having such a lightweight implementation of the model allows to easily integrate it in different platforms and applications. As an example, here is a video of running the model on an iPhone 13 device - fully offline, on-device: [whisper.objc](examples/whisper.objc) https://user-images.githubusercontent.com/1991296/197385372-962a6dea-bca1-4d50-bf96-1d8c27b98c81.mp4 You can also easily make your own offline voice assistant application: [command](examples/command) https://user-images.githubusercontent.com/1991296/204038393-2f846eae-c255-4099-a76d-5735c25c49da.mp4 On Apple Silicon, the inference runs fully on the GPU via Metal: https://github.com/ggerganov/whisper.cpp/assets/1991296/c82e8f86-60dc-49f2-b048-d2fdbd6b5225 Or you can even run it straight in the browser: [talk.wasm](examples/talk.wasm) ## Implementation details - The core tensor operations are implemented in C ([ggml.h](ggml.h) / [ggml.c](ggml.c)) - The transformer model and the high-level C-style API are implemented in C++ ([whisper.h](whisper.h) / [whisper.cpp](whisper.cpp)) - Sample usage is demonstrated in [main.cpp](examples/main) - Sample real-time audio transcription from the microphone is demonstrated in [stream.cpp](examples/stream) - Various other examples are available in the [examples](examples) folder The tensor operators are optimized heavily for Apple silicon CPUs. Depending on the computation size, Arm Neon SIMD intrinsics or CBLAS Accelerate framework routines are used. The latter are especially effective for bigger sizes since the Accelerate framework utilizes the special-purpose AMX coprocessor available in modern Apple products. ## Quick start First clone the repository. Then, download one of the Whisper models converted in [ggml format](models). For example: ```bash bash ./models/download-ggml-model.sh base.en ``` If you wish to convert the Whisper models to ggml format yourself, instructions are in [models/README.md](models/README.md). Now build the [main](examples/main) example and transcribe an audio file like this: ```bash # build the main example make # transcribe an audio file ./main -f samples/jfk.wav ``` --- For a quick demo, simply run `make base.en`: ```java $ make base.en cc -I. -O3 -std=c11 -pthread -DGGML_USE_ACCELERATE -c ggml.c -o ggml.o c++ -I. -I./examples -O3 -std=c++11 -pthread -c whisper.cpp -o whisper.o c++ -I. -I./examples -O3 -std=c++11 -pthread examples/main/main.cpp whisper.o ggml.o -o main -framework Accelerate ./main -h usage: ./main [options] file0.wav file1.wav ... options: -h, --help [default] show this help message and exit -t N, --threads N [4 ] number of threads to use during computation -p N, --processors N [1 ] number of processors to use during computation -ot N, --offset-t N [0 ] time offset in milliseconds -on N, --offset-n N [0 ] segment index offset -d N, --duration N [0 ] duration of audio to process in milliseconds -mc N, --max-context N [-1 ] maximum number of text context tokens to store -ml N, --max-len N [0 ] maximum segment length in characters -sow, --split-on-word [false ] split on word rather than on token -bo N, --best-of N [5 ] number of best candidates to keep -bs N, --beam-size N [5 ] beam size for beam search -wt N, --word-thold N [0.01 ] word timestamp probability threshold -et N, --entropy-thold N [2.40 ] entropy threshold for decoder fail -lpt N, --logprob-thold N [-1.00 ] log probability threshold for decoder fail -debug, --debug-mode [false ] enable debug mode (eg. dump log_mel) -tr, --translate [false ] translate from source language to english -di, --diarize [false ] stereo audio diarization -tdrz, --tinydiarize [false ] enable tinydiarize (requires a tdrz model) -nf, --no-fallback [false ] do not use temperature fallback while decoding -otxt, --output-txt [false ] output result in a text file -ovtt, --output-vtt [false ] output result in a vtt file -osrt, --output-srt [false ] output result in a srt file -olrc, --output-lrc [false ] output result in a lrc file -owts, --output-words [false ] output script for generating karaoke video -fp, --font-path [/System/Library/Fonts/Supplemental/Courier New Bold.ttf] path to a monospace font for karaoke video -ocsv, --output-csv [false ] output result in a CSV file -oj, --output-json [false ] output result in a JSON file -ojf, --output-json-full [false ] include more information in the JSON file -of FNAME, --output-file FNAME [ ] output file path (without file extension) -ps, --print-special [false ] print special tokens -pc, --print-colors [false ] print colors -pp, --print-progress [false ] print progress -nt, --no-timestamps [false ] do not print timestamps -l LANG, --language LANG [en ] spoken language ('auto' for auto-detect) -dl, --detect-language [false ] exit after automatically detecting language --prompt PROMPT [ ] initial prompt -m FNAME, --model FNAME [models/ggml-base.en.bin] model path -f FNAME,

评论收藏

内容反馈

版权申诉