master
remove-vzip
readme
ci_cublas
fix-eval-bos
q4_3-range-fix
ik/rmse_quantization
q4_0-q4_2-range-fix
gg/rmse_quantization
quant-attn
mmap-pages-stats
flash-attn
mmap
q4_1_more_accel
q4_1_more_accel_kahan
q4_1_more_accel_loopsplit
tcp_server
dev
ci_cublas-31ff9e2
ci_cublas-44286d3
ci_cublas-45d94c8
master-018f227
master-01a297b
master-02c5b27
master-02d6988
master-03f7e33
master-04aaae1
master-04c6f5e
master-074bea2
master-084e2f0
master-09aecbf
master-0ad9646
master-0b2da20
master-0b366e7
master-0b5a935
master-0ba76c1
master-0c44427
master-0c56923
master-0d054e2
master-0e018fe
master-0e07e6a
master-0e6cbff
master-0f07cac
master-0f1b21c
master-106faaf
master-10f19c1
master-11d9023
master-12b5900
master-13b0c68
master-1481a9c
master-1623a6e
master-180b693
master-1972616
master-1bfc153
master-1d08882
master-1f0414f
master-2005469
master-20e1e84
master-20fbf2a
master-214b6a3
master-22213a1
master-2456837
master-2485d7a
master-25d7abb
master-2663d2c
master-29b7baa
master-2a2e63c
master-2a98bc1
master-2bb992f
master-2bdc096
master-2d099e5
master-2d3481c
master-2e17dfd
master-2e664f1
master-2ec8342
master-2edbdb0
master-2f7bf7d
master-2f7c8e0
master-305ba6f
master-305eb5a
master-31572d9
master-315a95a
master-3173a62
master-334637e
master-33e35b8
master-34ab526
master-34c1072
master-34d9f22
master-3525899
master-368d0c8
master-36b4f7e
master-36d0753
master-36d19a6
master-38de86a
master-3bcc129
master-3cd8dde
master-3d59769
master-3e5aa8a
master-3e6e70d
master-4122dff
master-4274722
master-436e561
master-437e778
master-459e93c
master-461ba9e
master-4640eff
master-47f61aa
master-481044d
master-483bab2
master-4870e45
master-489537e
master-4953e90
master-4b8efff
master-502a400
master-50a8a2a
master-50cb666
master-50fae10
master-53c8434
master-53dbba7
master-54bb60e
master-55390bc
master-55bc5f0
master-563cdc3
master-56e659a
master-574406d
master-585d91a
master-58b367c
master-58e6c9f
master-5a5f8b1
master-5a8c4f6
master-5addcb1
master-5af8e32
master-5b70e7d
master-5c19c70
master-5d5817c
master-5ecff35
master-6232f2d
master-62cfc54
master-6667401
master-66aab46
master-67c7779
master-684da25
master-698f7b5
master-69b7402
master-69c9229
master-6a9661e
master-6b6dbc8
master-6bc4400
master-6c24870
master-6f1ee4b
master-6f79699
master-70269ca
master-70f01cb
master-7296c96
master-76a8849
master-77a7340
master-77efdf5
master-799fdc1
master-7a32fcb
master-7a87d31
master-7a9b6c3
master-7b8dbcb
master-7e312f1
master-7f4c5c6
master-7fc50c0
master-7ff0dcd
master-81040f1
master-83df563
master-8520fc3
master-857308d
master-859fee6
master-863f65e
master-8687c1f
master-872c365
master-87a6f84
master-884e7d7
master-8944a13
master-8a0f867
master-8a1756a
master-8b67998
master-8c2ec5e
master-8c3ffc2
master-8c9be35
master-8cda5c9
master-8cf9f34
master-8d4a855
master-90b19bd
master-9190e8e
master-928480e
master-92a6e13
master-93265e9
master-939ad2d
master-9411288
master-94c5652
master-957c8ae
master-95ea26f
master-96f9c05
master-9794052
master-986b6ce
master-99c5b27
master-9b0a4d4
master-9cbc404
master-9daff41
master-9e17072
master-9ff334f
master-a140219
master-a316a42
master-a3a2a0e
master-a4755cf
master-a5c42c4
master-a5d30b1
master-a6bdc47
master-a791a68
master-aa485ce
master-aaf3b23
master-ad072fc
master-ad5fd5b
master-ae44e23
master-afd220d
master-b1ee8f5
master-b391579
master-b3f460e
master-b51c717
master-b6e7f9b
master-b925f1f
master-be87b6e
master-bf4b22f
master-c0bb1d3
master-c12b14b
master-c1f8850
master-c2b25b6
master-c3ac702
master-c3ca7a5
master-c494ed5
master-c4f89d8
master-c4fe84f
master-c50b628
master-c56b715
master-c5aa5e5
master-c5d70f5
master-c85e03d
master-c8c2c52
master-c9a59b7
master-c9e2c26
master-cc0bb72
master-cc9cee8
master-cd7fa95
master-cea1c85
master-d0aaff5
master-d3f202d
master-d40fded
master-d502bc7
master-d5850c5
master-d7def1a
master-d990e3f
master-d9a239c
master-da5303c
master-db10808
master-dcdd65e
master-dd0eabc
master-dd7eff5
master-e0305ea
master-e216aa0
master-e2cd506
master-e4412b4
master-e4422e2
master-e4cf982
master-e6c9e09
master-e7f6997
master-e899bf5
master-e8c0516
master-e95b655
master-e986f94
master-ea10d3d
master-ea3a0ad
master-eb17a02
master-ec728e4
master-ec9cdb6
master-ecbe466
master-ed3c680
master-ee0c40d
master-eeaa7b0
master-efd0564
master-f0d70f1
master-f121705
master-f202ada
master-f266259
master-f2d1c47
master-f3d4edf
master-f4cef87
master-f4d277a
master-f5a77a6
master-f647ce0
master-f7d0509
master-f7dc43b
master-fbd4d38
${ noResults }
3 Commits (f4d277ae17247ee51129ef1a9ff74d377cc90b1b)
Author | SHA1 | Message | Date |
---|---|---|---|
comex | f963b63afa |
Rewrite loading code to try to satisfy everyone:
- Support all three formats (ggml, ggmf, ggjt). (However, I didn't include the hack needed to support GPT4All files without conversion. Those can still be used after converting them with convert.py from my other PR.) - Support both mmap and read (mmap is used by default, but can be disabled with `--no-mmap`, and is automatically disabled for pre-ggjt files or on platforms where mmap is not supported). - Support multi-file models like before, but automatically determine the number of parts rather than requiring `--n_parts`. - Improve validation and error checking. - Stop using the per-file type field (f16) entirely in favor of just relying on the per-tensor type/size fields. This has no immediate benefit, but makes it easier to experiment with different formats, and should make it easier to support the new GPTQ-for-LLaMa models in the future (I have some work in progress on that front). - Support VirtualLock on Windows (using the same `--mlock` option as on Unix). - Indicate loading progress when using mmap + mlock. (Which led me to the interesting observation that on my Linux machine, with a warm file cache, mlock actually takes some time, whereas mmap without mlock starts almost instantly...) - To help implement this, move mlock support from ggml to the loading code. - madvise/PrefetchVirtualMemory support (based on #740) - Switch from ifstream to the `fopen` family of functions to avoid unnecessary copying and, when mmap is enabled, allow reusing the same file descriptor for both metadata reads and mmap (whereas the existing implementation opens the file a second time to mmap). - Quantization now produces a single-file output even with multi-file inputs (not really a feature as much as 'it was easier this way'). Implementation notes: I tried to factor the code into more discrete pieces than before. Regarding code style: I tried to follow the code style, but I'm naughty and used a few advanced C++ features repeatedly: - Destructors to make it easier to ensure everything gets cleaned up. - Exceptions. I don't even usually use exceptions when writing C++, and I can remove them if desired... but here they make the loading code much more succinct while still properly handling a variety of errors, ranging from API calls failing to integer overflow and allocation failure. The exceptions are converted to error codes at the API boundary.) Co-authored-by: Pavol Rusnak <pavol@rusnak.io> (for the bit I copied from #740) |
2 years ago |
Georgi Gerganov |
03f7e33560
|
Cleanup STL headers + fix embedding examples + minor stuff | 2 years ago |
Georgi Gerganov |
a316a425d0
|
Overhaul the examples structure
- main -> examples - utils -> examples (renamed to "common") - quantize -> examples - separate tools for "perplexity" and "embedding" Hope I didn't break something ! |
2 years ago |