Commit Graph

293 Commits (62b3e81aaeafb282934de8b21de13b0104f12f8c)
 

Author SHA1 Message Date
Georgi Gerganov 9e1707218a
Add "--instruct" argument for usage with Alpaca (#240)
Also start adding prompts in "./prompts"
1 year ago
Georgi Gerganov 22213a17b5
Change RMSNorm eps to 1e-6 (#173)
I think this is what is used in the Python code
1 year ago
Ronsor d7def1a752
Warn user if a context size greater than 2048 tokens is specified (#274)
LLaMA doesn't support more than 2048 token context sizes, and going above that produces terrible results.
1 year ago
Pavol Rusnak 6f61c18ec9 Fix typo in readme 1 year ago
Pavol Rusnak 1e5a6d088d Add note about Python 3.11 to readme 1 year ago
Pavol Rusnak 554b541521 Add memory/disk requirements to readme 1 year ago
Alex Nguyen d3f202d57b
Remove unused code since n_vocab is model.hparams.n_vocab (#262) 1 year ago
Justin Suess e03e359730
fixed warning with std::ignore about unused function result (#151)
fixed warning with std::ignore about unused function result
1 year ago
Gary Linscott a81d0c2a17
Fix n^2 loop in tokenization (#254)
This causes long prompts to parse very slowly.
1 year ago
anzz1 b2de7f18df
CI Improvements (#230)
* CI Improvements

Manual build feature, autoreleases for Windows

* better CI naming convention

use branch name in releases and tags
1 year ago
Niklas Korz a292747893
Nix flake (#40)
* Nix flake

* Nix: only add Accelerate framework on macOS

* Nix: development shel, direnv and compatibility

* Nix: use python packages supplied by withPackages

* Nix: remove channel compatibility

* Nix: fix ARM neon dotproduct on macOS

---------

Co-authored-by: Pavol Rusnak <pavol@rusnak.io>
1 year ago
thement c9f670a177
Implement non-greedy tokenizer that tries to maximize token lengths (#242)
* Implement non-greedy tokenizer that tries to maximize token lengths

* Insert single space in front of the prompt

- this is to match original llama tokenizer behavior

---------

Co-authored-by: Jakub Horak <jakub.horak@ibawizard.net>
1 year ago
Georgi Gerganov 4f54609110
Default to 4 threads (#243) 1 year ago
Georgi Gerganov e81b9c81c1
Update Contributing section 1 year ago
Stephan Walter 367946c668
Don't tell users to use a bad number of threads (#243)
The readme tells people to use the command line option "-t 8", causing 8
threads to be started. On systems with fewer than 8 cores, this causes a
significant slowdown. Remove the option from the example command lines
and use /proc/cpuinfo on Linux to determine a sensible default.
1 year ago
mmyjona 6b0df5ccf3
add ptread link to fix cmake build under linux (#114)
* add ptread link to fix cmake build under linux

* add cmake to linux and macos platform

* separate make and cmake workflow

---------

Co-authored-by: Sebastián A <sebastian.aedo29@gmail.com>
1 year ago
Bernat Vadell 2af23d3043
🚀 Dockerize llamacpp (#132)
* feat: dockerize llamacpp

* feat: split build & runtime stages

* split dockerfile into main & tools

* add quantize into tool docker image

* Update .devops/tools.sh

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* add docker action pipeline

* change CI to publish at github docker registry

* fix name runs-on macOS-latest is macos-latest (lowercase)

* include docker versioned images

* fix github action docker

* fix docker.yml

* feat: include all-in-one command tool & update readme.md

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
1 year ago
Matvey Soloviev 904d2a8d6a
Q4_1 quantization (#193)
* Add AVX2 version of ggml_vec_dot_q4_1

* Small optimisations to q4_1 dot product (@Const-me)

* Rearrange Q4_1 quantization to work for multipart models. (Fix #152)

* Fix ggml_vec_mad_q4_1 too

* Fix non-vectorised q4_1 vec mul
1 year ago
Georgi Gerganov 721311070e
Update README.md 1 year ago
Georgi Gerganov ac15de7895
Expand "Contributing" section 1 year ago
Georgi Gerganov 273abc47ff
Update hot topics - RMSnorm 1 year ago
Nebula 9b4a15b17d
Fix RMS norm in GGML (#191) 1 year ago
hoangmit 6eac39ba95
Add RMS norm and use it (#187)
* add ggml_rms_norm

* update op num
1 year ago
moritzbrantner 27944c4206
fixed typo (#178) 1 year ago
Rickey Bowers Jr 2d15d6c9a9
add SIGINT support for _WIN32 environments (#120)
* add SIGINT support for _WIN32 environments

* perhaps more consistent
1 year ago
Justin Suess 2d64715ad4
added ctx_size parameter (#148)
* added ctx_size parameter

* added it in more places

* Apply suggestions from code review

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
1 year ago
Justin Suess 16b2c61a22
fixed color reset on exit (#149)
* fixed color reset on exit

* added sigint handler for ansi_color_reset

* Update main.cpp

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
1 year ago
Musab Gultekin 977295c700
Fix potential licensing issue (#126)
* Update README.md

* Update README.md

remove facebook
1 year ago
Ronsor 956dfda8ad
Use `tokenizer.vocab_size()` instead of hardcoding 32000 in convert-pth-to-ggml.py (#142)
There are ways that special tokens or other new tokens could be added to the tokenizer; therefore it's probably best not to assume the vocabulary is only 32000 tokens.
1 year ago
hoangmit 113e685d18
inline -> static inline for "bytesFromNibbles" (#161)
Without "static" prefix, it fails to compile in clang
1 year ago
Ronsor 47857e564c
Don't use vdotq_s32 if it's not available (#139)
* Don't use vdotq_s32 if it's not available

`dotprod` extensions aren't available on some ARM CPUs (e.g. Raspberry Pi 4), so check for them and only use them if they're available.

Reintroduces the code removed in 84d9015 if `__ARM_FEATURE_DOTPROD` isn't defined.

* Update ggml.c

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
1 year ago
Radoslav Gerganov 60f819a2b1
Add section to README on how to run the project on Android (#130) 1 year ago
Georgi Gerganov 97ab2b2578
Add Misc section + update hot topics + minor fixes 1 year ago
Sebastián A 2f700a2738
Add windows to the CI (#98) 1 year ago
Georgi Gerganov c09a9cfb06
CMake build in Release by default (#75) 1 year ago
Georgi Gerganov 7ec903d3c1
Update contribution section, hot topics, limitations, etc. 1 year ago
Georgi Gerganov 4497ad819c
Print system information 1 year ago
Sebastián A ed6849cc07
Initial support for CMake (#75) 1 year ago
Thomas Klausner 41be0a3b3d
Add NetBSD support. (#90) 1 year ago
Pavol Rusnak 671d5cac15
Use fprintf for diagnostic output (#48)
keep printf only for printing model output

one can now use ./main ... 2>dev/null to suppress any diagnostic output
1 year ago
Georgi Gerganov 84d9015c4a
Use vdotq_s32 to improve performance (#67)
* 10% performance boost on ARM

* Back to original change
1 year ago
uint256_t 63fd76fbb0
Reduce model loading time (#43)
* Use buffering

* Use vector

* Minor

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
1 year ago
Val Kharitonov 2a20f48efa
Fix UTF-8 handling (including colors) (#79) 1 year ago
Pavol Rusnak d1f224712d
Add quantize script for batch quantization (#92)
* Add quantize script for batch quantization

* Indentation

* README for new quantize.sh

* Fix script name

* Fix file list on Mac OS

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
1 year ago
Georgi Gerganov 1808ee0500
Add initial contribution guidelines 1 year ago
Matvey Soloviev a169bb889c Gate signal support on being on a unixoid system. (#74) 1 year ago
Matvey Soloviev 460c482540 Fix token count accounting 1 year ago
Georgi Gerganov c80e2a8f2a
Revert "10% performance boost on ARM"
This reverts commit 113a9e83eb.

There are some reports for illegal instruction.
Moved this stuff to vdotq_s32 branch until resolve
1 year ago
Georgi Gerganov 54a0e66ea0
Check for vdotq_s32 availability 1 year ago
Georgi Gerganov 543c57e991
Ammend to previous commit - forgot to update non-QRDMX branch 1 year ago