-
Notifications
You must be signed in to change notification settings - Fork 16.8k
Pull requests: ggml-org/llama.cpp
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
common, ggml : fix non-ASCII file path handling on Windows
ggml
changes relating to the ggml tensor library for machine learning
#21838
opened Apr 13, 2026 by
Anai-Guo
Loading…
Expose build_info in router mode
examples
python
python script changes
server
#21835
opened Apr 13, 2026 by
gaspardpetit
Loading…
Fix unbounded VRAM usage creep on HIP/ROCm backend when quantizing kv cache
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
ggml-rpc: fix 32-bit ARM (ILP32) serialization bugs
ggml
changes relating to the ggml tensor library for machine learning
#21828
opened Apr 12, 2026 by
rovmo
Loading…
mtmd: use causal attn for gemma 4 audio (+ small breaking change to mtmd)
examples
#21824
opened Apr 12, 2026 by
ngxson
Contributor
Loading…
2
cli : Use acquire/release semantics for stopping logic
examples
#21822
opened Apr 12, 2026 by
matthiasstraka
Loading…
llama : add --hugepages for HugeTLB-backed weight loading (Linux)
#21821
opened Apr 12, 2026 by
doctorjei
Loading…
Metal: TurboQuant GPU dequant kernels + host buffer type
Apple Metal
https://en.wikipedia.org/wiki/Metal_(API)
examples
ggml
changes relating to the ggml tensor library for machine learning
server
testing
Everything test related
server : reinit speculative ngram state after context shift to fix GGML_ABORT
examples
server
testing
Everything test related
#21815
opened Apr 12, 2026 by
jonpojonpo
Loading…
server: allow cancel loading model
examples
server
#21814
opened Apr 12, 2026 by
ngxson
Contributor
Loading…
common : add download cancellation and temp file cleanup
#21813
opened Apr 12, 2026 by
angt
Member
Loading…
docs/android.md: Add dependency Improvements or additions to documentation
libandroid-spawn for building on termux
documentation
#21812
opened Apr 12, 2026 by
aafsmarak
Loading…
2
TP: fix 0-sized tensor slices, AllReduce fallback
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#21808
opened Apr 12, 2026 by
JohannesGaessler
Contributor
Loading…
webui: add setting for first-line chat titles
examples
server/webui
server
#21797
opened Apr 12, 2026 by
crodjer
Loading…
llama-bench: fix accumulated load_time in perf timings
examples
#21794
opened Apr 12, 2026 by
abhinavuser
Loading…
server: (anthropic API) fix prefix caching
examples
server
#21793
opened Apr 12, 2026 by
kvc0
Loading…
kv: Add optional mmap kv cache
examples
testing
Everything test related
#21792
opened Apr 12, 2026 by
skiz
Loading…
vulkan: fix output corruption on GCN 2.0/3.0 (Vulkan 1.2)
ggml
changes relating to the ggml tensor library for machine learning
Vulkan
Issues specific to the Vulkan backend
#21787
opened Apr 12, 2026 by
rafikb
Loading…
chat: dedicated DeepSeek v3.2 parser + "official" template
testing
Everything test related
#21785
opened Apr 12, 2026 by
pwilkin
Member
Loading…
ggml-metal: add Metal kernel for ggml_roll
Apple Metal
https://en.wikipedia.org/wiki/Metal_(API)
ggml
changes relating to the ggml tensor library for machine learning
#21782
opened Apr 11, 2026 by
stephencox-ict
Contributor
Loading…
vendor : update cpp-httplib to 0.42.0
python
python script changes
script
Script related
#21781
opened Apr 11, 2026 by
cabelo
Contributor
Loading…
ggml: add graph_reused
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#21764
opened Apr 11, 2026 by
am17an
Contributor
Loading…
CUDA: only init NCCL for setups with multi GPU
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#21761
opened Apr 11, 2026 by
EldarBorge
Loading…
Previous Next
ProTip!
Updated in the last three days: updated:>2026-04-09.