llama.cpp

Commit Graph

Author	SHA1	Message	Date
Georgi Gerganov	9e1707218a	Add "--instruct" argument for usage with Alpaca (#240 ) Also start adding prompts in "./prompts"	1 year ago
Georgi Gerganov	22213a17b5	Change RMSNorm eps to 1e-6 (#173 ) I think this is what is used in the Python code	1 year ago
Ronsor	d7def1a752	Warn user if a context size greater than 2048 tokens is specified (#274 ) LLaMA doesn't support more than 2048 token context sizes, and going above that produces terrible results.	1 year ago
Pavol Rusnak	6f61c18ec9	Fix typo in readme	1 year ago
Pavol Rusnak	1e5a6d088d	Add note about Python 3.11 to readme	1 year ago
Pavol Rusnak	554b541521	Add memory/disk requirements to readme	1 year ago
Alex Nguyen	d3f202d57b	Remove unused code since n_vocab is model.hparams.n_vocab (#262 )	1 year ago
Justin Suess	e03e359730	fixed warning with std::ignore about unused function result (#151 ) fixed warning with std::ignore about unused function result	1 year ago
Gary Linscott	a81d0c2a17	Fix n^2 loop in tokenization (#254 ) This causes long prompts to parse very slowly.	1 year ago
anzz1	b2de7f18df	CI Improvements (#230 ) * CI Improvements Manual build feature, autoreleases for Windows * better CI naming convention use branch name in releases and tags	1 year ago
Niklas Korz	a292747893	Nix flake (#40 ) * Nix flake * Nix: only add Accelerate framework on macOS * Nix: development shel, direnv and compatibility * Nix: use python packages supplied by withPackages * Nix: remove channel compatibility * Nix: fix ARM neon dotproduct on macOS --------- Co-authored-by: Pavol Rusnak <pavol@rusnak.io>	1 year ago
thement	c9f670a177	Implement non-greedy tokenizer that tries to maximize token lengths (#242 ) * Implement non-greedy tokenizer that tries to maximize token lengths * Insert single space in front of the prompt - this is to match original llama tokenizer behavior --------- Co-authored-by: Jakub Horak <jakub.horak@ibawizard.net>	1 year ago
Georgi Gerganov	4f54609110	Default to 4 threads (#243 )	1 year ago
Georgi Gerganov	e81b9c81c1	Update Contributing section	1 year ago
Stephan Walter	367946c668	Don't tell users to use a bad number of threads (#243 ) The readme tells people to use the command line option "-t 8", causing 8 threads to be started. On systems with fewer than 8 cores, this causes a significant slowdown. Remove the option from the example command lines and use /proc/cpuinfo on Linux to determine a sensible default.	1 year ago
mmyjona	6b0df5ccf3	add ptread link to fix cmake build under linux (#114 ) * add ptread link to fix cmake build under linux * add cmake to linux and macos platform * separate make and cmake workflow --------- Co-authored-by: Sebastián A <sebastian.aedo29@gmail.com>	1 year ago
Bernat Vadell	2af23d3043	🚀 Dockerize llamacpp (#132 ) * feat: dockerize llamacpp * feat: split build & runtime stages * split dockerfile into main & tools * add quantize into tool docker image * Update .devops/tools.sh Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * add docker action pipeline * change CI to publish at github docker registry * fix name runs-on macOS-latest is macos-latest (lowercase) * include docker versioned images * fix github action docker * fix docker.yml * feat: include all-in-one command tool & update readme.md --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	1 year ago
Matvey Soloviev	904d2a8d6a	Q4_1 quantization (#193 ) * Add AVX2 version of ggml_vec_dot_q4_1 * Small optimisations to q4_1 dot product (@Const-me) * Rearrange Q4_1 quantization to work for multipart models. (Fix #152) * Fix ggml_vec_mad_q4_1 too * Fix non-vectorised q4_1 vec mul	1 year ago
Georgi Gerganov	721311070e	Update README.md	1 year ago
Georgi Gerganov	ac15de7895	Expand "Contributing" section	1 year ago
Georgi Gerganov	273abc47ff	Update hot topics - RMSnorm	1 year ago
Nebula	9b4a15b17d	Fix RMS norm in GGML (#191 )	1 year ago
hoangmit	6eac39ba95	Add RMS norm and use it (#187 ) * add ggml_rms_norm * update op num	1 year ago
moritzbrantner	27944c4206	fixed typo (#178 )	1 year ago
Rickey Bowers Jr	2d15d6c9a9	add SIGINT support for _WIN32 environments (#120 ) * add SIGINT support for _WIN32 environments * perhaps more consistent	1 year ago
Justin Suess	2d64715ad4	added ctx_size parameter (#148 ) * added ctx_size parameter * added it in more places * Apply suggestions from code review --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	1 year ago
Justin Suess	16b2c61a22	fixed color reset on exit (#149 ) * fixed color reset on exit * added sigint handler for ansi_color_reset * Update main.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	1 year ago
Musab Gultekin	977295c700	Fix potential licensing issue (#126 ) * Update README.md * Update README.md remove facebook	1 year ago
Ronsor	956dfda8ad	Use `tokenizer.vocab_size()` instead of hardcoding 32000 in convert-pth-to-ggml.py (#142 ) There are ways that special tokens or other new tokens could be added to the tokenizer; therefore it's probably best not to assume the vocabulary is only 32000 tokens.	1 year ago
hoangmit	113e685d18	inline -> static inline for "bytesFromNibbles" (#161 ) Without "static" prefix, it fails to compile in clang	1 year ago
Ronsor	47857e564c	Don't use vdotq_s32 if it's not available (#139 ) * Don't use vdotq_s32 if it's not available `dotprod` extensions aren't available on some ARM CPUs (e.g. Raspberry Pi 4), so check for them and only use them if they're available. Reintroduces the code removed in `84d9015` if `__ARM_FEATURE_DOTPROD` isn't defined. * Update ggml.c --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	1 year ago
Radoslav Gerganov	60f819a2b1	Add section to README on how to run the project on Android (#130 )	1 year ago
Georgi Gerganov	97ab2b2578	Add Misc section + update hot topics + minor fixes	1 year ago
Sebastián A	2f700a2738	Add windows to the CI (#98 )	1 year ago
Georgi Gerganov	c09a9cfb06	CMake build in Release by default (#75 )	1 year ago
Georgi Gerganov	7ec903d3c1	Update contribution section, hot topics, limitations, etc.	1 year ago
Georgi Gerganov	4497ad819c	Print system information	1 year ago
Sebastián A	ed6849cc07	Initial support for CMake (#75 )	1 year ago
Thomas Klausner	41be0a3b3d	Add NetBSD support. (#90 )	1 year ago
Pavol Rusnak	671d5cac15	Use fprintf for diagnostic output (#48 ) keep printf only for printing model output one can now use ./main ... 2>dev/null to suppress any diagnostic output	1 year ago
Georgi Gerganov	84d9015c4a	Use vdotq_s32 to improve performance (#67 ) * 10% performance boost on ARM * Back to original change	1 year ago
uint256_t	63fd76fbb0	Reduce model loading time (#43 ) * Use buffering * Use vector * Minor --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	1 year ago
Val Kharitonov	2a20f48efa	Fix UTF-8 handling (including colors) (#79 )	1 year ago
Pavol Rusnak	d1f224712d	Add quantize script for batch quantization (#92 ) * Add quantize script for batch quantization * Indentation * README for new quantize.sh * Fix script name * Fix file list on Mac OS --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	1 year ago
Georgi Gerganov	1808ee0500	Add initial contribution guidelines	1 year ago
Matvey Soloviev	a169bb889c	Gate signal support on being on a unixoid system. (#74 )	1 year ago
Matvey Soloviev	460c482540	Fix token count accounting	1 year ago
Georgi Gerganov	c80e2a8f2a	Revert "10% performance boost on ARM" This reverts commit `113a9e83eb`. There are some reports for illegal instruction. Moved this stuff to vdotq_s32 branch until resolve	1 year ago
Georgi Gerganov	54a0e66ea0	Check for vdotq_s32 availability	1 year ago
Georgi Gerganov	543c57e991	Ammend to previous commit - forgot to update non-QRDMX branch	1 year ago

... 2 3 4 5 6

293 Commits (62b3e81aaeafb282934de8b21de13b0104f12f8c) All Branches Search

293 Commits (62b3e81aaeafb282934de8b21de13b0104f12f8c)

All Branches