llama.cpp

master

remove-vzip

readme

ci_cublas

fix-eval-bos

q4_3-range-fix

ik/rmse_quantization

q4_0-q4_2-range-fix

gg/rmse_quantization

quant-attn

mmap-pages-stats

flash-attn

mmap

q4_1_more_accel

q4_1_more_accel_kahan

q4_1_more_accel_loopsplit

tcp_server

dev

ci_cublas-31ff9e2

ci_cublas-44286d3

ci_cublas-45d94c8

master-018f227

master-01a297b

master-02c5b27

master-02d6988

master-03f7e33

master-04aaae1

master-04c6f5e

master-074bea2

master-084e2f0

master-09aecbf

master-0ad9646

master-0b2da20

master-0b366e7

master-0b5a935

master-0ba76c1

master-0c44427

master-0c56923

master-0d054e2

master-0e018fe

master-0e07e6a

master-0e6cbff

master-0f07cac

master-0f1b21c

master-106faaf

master-10f19c1

master-11d9023

master-12b5900

master-13b0c68

master-1481a9c

master-1623a6e

master-180b693

master-1972616

master-1bfc153

master-1d08882

master-1f0414f

master-2005469

master-20e1e84

master-20fbf2a

master-214b6a3

master-22213a1

master-2456837

master-2485d7a

master-25d7abb

master-2663d2c

master-29b7baa

master-2a2e63c

master-2a98bc1

master-2bb992f

master-2bdc096

master-2d099e5

master-2d3481c

master-2e17dfd

master-2e664f1

master-2ec8342

master-2edbdb0

master-2f7bf7d

master-2f7c8e0

master-305ba6f

master-305eb5a

master-31572d9

master-315a95a

master-3173a62

master-334637e

master-33e35b8

master-34ab526

master-34c1072

master-34d9f22

master-3525899

master-368d0c8

master-36b4f7e

master-36d0753

master-36d19a6

master-38de86a

master-3bcc129

master-3cd8dde

master-3d59769

master-3e5aa8a

master-3e6e70d

master-4122dff

master-4274722

master-436e561

master-437e778

master-459e93c

master-461ba9e

master-4640eff

master-47f61aa

master-481044d

master-483bab2

master-4870e45

master-489537e

master-4953e90

master-4b8efff

master-502a400

master-50a8a2a

master-50cb666

master-50fae10

master-53c8434

master-53dbba7

master-54bb60e

master-55390bc

master-55bc5f0

master-563cdc3

master-56e659a

master-574406d

master-585d91a

master-58b367c

master-58e6c9f

master-5a5f8b1

master-5a8c4f6

master-5addcb1

master-5af8e32

master-5b70e7d

master-5c19c70

master-5d5817c

master-5ecff35

master-6232f2d

master-62cfc54

master-6667401

master-66aab46

master-67c7779

master-684da25

master-698f7b5

master-69b7402

master-69c9229

master-6a9661e

master-6b6dbc8

master-6bc4400

master-6c24870

master-6f1ee4b

master-6f79699

master-70269ca

master-70f01cb

master-7296c96

master-76a8849

master-77a7340

master-77efdf5

master-799fdc1

master-7a32fcb

master-7a87d31

master-7a9b6c3

master-7b8dbcb

master-7e312f1

master-7f4c5c6

master-7fc50c0

master-7ff0dcd

master-81040f1

master-83df563

master-8520fc3

master-857308d

master-859fee6

master-863f65e

master-8687c1f

master-872c365

master-87a6f84

master-884e7d7

master-8944a13

master-8a0f867

master-8a1756a

master-8b67998

master-8c2ec5e

master-8c3ffc2

master-8c9be35

master-8cda5c9

master-8cf9f34

master-8d4a855

master-90b19bd

master-9190e8e

master-928480e

master-92a6e13

master-93265e9

master-939ad2d

master-9411288

master-94c5652

master-957c8ae

master-95ea26f

master-96f9c05

master-9794052

master-986b6ce

master-99c5b27

master-9b0a4d4

master-9cbc404

master-9daff41

master-9e17072

master-9ff334f

master-a140219

master-a316a42

master-a3a2a0e

master-a4755cf

master-a5c42c4

master-a5d30b1

master-a6bdc47

master-a791a68

master-aa485ce

master-aaf3b23

master-ad072fc

master-ad5fd5b

master-ae44e23

master-afd220d

master-b1ee8f5

master-b391579

master-b3f460e

master-b51c717

master-b6e7f9b

master-b925f1f

master-be87b6e

master-bf4b22f

master-c0bb1d3

master-c12b14b

master-c1f8850

master-c2b25b6

master-c3ac702

master-c3ca7a5

master-c494ed5

master-c4f89d8

master-c4fe84f

master-c50b628

master-c56b715

master-c5aa5e5

master-c5d70f5

master-c85e03d

master-c8c2c52

master-c9a59b7

master-c9e2c26

master-cc0bb72

master-cc9cee8

master-cd7fa95

master-cea1c85

master-d0aaff5

master-d3f202d

master-d40fded

master-d502bc7

master-d5850c5

master-d7def1a

master-d990e3f

master-d9a239c

master-da5303c

master-db10808

master-dcdd65e

master-dd0eabc

master-dd7eff5

master-e0305ea

master-e216aa0

master-e2cd506

master-e4412b4

master-e4422e2

master-e4cf982

master-e6c9e09

master-e7f6997

master-e899bf5

master-e8c0516

master-e95b655

master-e986f94

master-ea10d3d

master-ea3a0ad

master-eb17a02

master-ec728e4

master-ec9cdb6

master-ecbe466

master-ed3c680

master-ee0c40d

master-eeaa7b0

master-efd0564

master-f0d70f1

master-f121705

master-f202ada

master-f266259

master-f2d1c47

master-f3d4edf

master-f4cef87

master-f4d277a

master-f5a77a6

master-f647ce0

master-f7d0509

master-f7dc43b

master-fbd4d38

Commit Graph

Author	SHA1	Message	Date
Georgi Gerganov	7a32fcb3b2	ggml : add Q8_0 quantization format (rename the old one to Q8_1) (ARM NEON) (#1179 ) * ggml : add Q8_0 quantization format (rename the old one to Q8_1) * tests : fix test-quantize-fns * ggml : finalize Q8_0 implementation * ggml : use q4_0_q8_0 and q4_2_q8_0 * ggml : fix Q8_0 dot product bug (ARM) * ggml : Q8_0 unroll x2 * ggml : fix bug - using wrong block type * ggml : extend quantize_fns_t with "vec_dot_type" * ggml : fix Q8_0 to use 255 values out of 256 * ggml : fix assert using wrong QK4_2 instead of QK4_3	1 year ago
Stephan Walter	c50b628810	Fix CI: ARM NEON, quantization unit tests, editorconfig (#1122 )	1 year ago
unbounded	5f939498d5	ggml : unit test for quantization functions (#953 ) * Unit test for quantization functions Use the ggml_internal_get_quantize_fn function to loop through all quantization formats and run a sanity check on the result. Also add a microbenchmark that times these functions directly without running the rest of the GGML graph. * test-quantize-fns: CI fixes Fix issues uncovered in CI - need to use sizes divisible by 328 for loop unrolling - use intrinsic header that should work on Mac test-quantize: remove Per PR comment, subsumed by test-quantize-fns * test-quantize: fix for q8_0 intermediates	1 year ago

Author

SHA1

Message

Date

Georgi Gerganov

7a32fcb3b2

ggml : add Q8_0 quantization format (rename the old one to Q8_1) (ARM NEON) (#1179 )

* ggml : add Q8_0 quantization format (rename the old one to Q8_1)

* tests : fix test-quantize-fns

* ggml : finalize Q8_0 implementation

* ggml : use q4_0_q8_0 and q4_2_q8_0

* ggml : fix Q8_0 dot product bug (ARM)

* ggml : Q8_0 unroll x2

* ggml : fix bug - using wrong block type

* ggml : extend quantize_fns_t with "vec_dot_type"

* ggml : fix Q8_0 to use 255 values out of 256

* ggml : fix assert using wrong QK4_2 instead of QK4_3

Stephan Walter

c50b628810

Fix CI: ARM NEON, quantization unit tests, editorconfig (#1122 )

unbounded

5f939498d5

ggml : unit test for quantization functions (#953 )

* Unit test for quantization functions

Use the ggml_internal_get_quantize_fn function to loop through all
quantization formats and run a sanity check on the result.

Also add a microbenchmark that times these functions directly without
running the rest of the GGML graph.

* test-quantize-fns: CI fixes

Fix issues uncovered in CI
 - need to use sizes divisible by 32*8 for loop unrolling
 - use intrinsic header that should work on Mac

* test-quantize: remove

Per PR comment, subsumed by test-quantize-fns

* test-quantize: fix for q8_0 intermediates

3 Commits (f0d70f147d969e41fa410b8af2965a27aa901eb9)