- replaced manual unrolling with loop structures and constant arrays
- instruction count reduced from 12445 to 4016
- maybe about 1 to 2% performance loss on some benchs but take this
number with a grain of salt.
* optimize `test_ct_intlog2` while still covering all 128 bit positions
* refactor whirlpool to reduce code bloat
replaced the fully unrolled round loop with a runtime loop, reducing
instruction count by 80k in `process_block` and yielding aprox 30%
performance boost due to improved cache locality.
* use compile-time arrays for `test_ct_intlog2`
* blake3: initial unit test passing!
* typo
* add key derivation macros; vanilla unit tests; working to test 18 atm
* mark it here - all tests passing o_o
* finish first-round unit tests - will add more if necessary
* add crypto shootout bench entertainment
* tests: add XOF unit w/ seek; assert NO finalization
* add another to finalizations unit test
* add all BLAKE3 scaffolding for later SIMD optimizations
* irksome
* tabs
* tabs2
* extra documentation / contracts
* extra detail
* try to make things a bit more arch-neutral
* release notes
* Formatting
---------
Co-authored-by: Christoffer Lerno <christoffer@aegik.com>
* ChaCha20 implementation, first pass
* fix bug with clone_slice when length is 0
* final ChaCha20 crypto tidying
* final adjustments; add benchmark
* add guards everywhere else or w/e
* stdlib 'i++' conformity
* release notes & security warning updates
* update tests; cleanup; default counter should be 0 not 1
* remove prints in test file
* add extra unit tests for unaligned buffers
Co-authored-by: Manu Linares <mbarriolinares@gmail.com>
* one final alignment test
* nice contraction of tests w/ some paranoia sprinkled in
* nearly double the efficiency of chacha20's transform
Co-authored-by: Manu Linares <mbarriolinares@gmail.com>
* fix memory leak in test case
* improve one of the unit tests to cover more cases
* greatly simplify chacha20 'transform'
Co-authored-by: Manu Linares <mbarriolinares@gmail.com>
---------
Co-authored-by: Manu Linares <mbarriolinares@gmail.com>
* Add LinkedList Operators and Update Tests
* add linkedlist printing and `@new` macros (single-line init and pool-capable)
* add linkedlist node and reg iterator; comparisons w/ ==
* Fix benchmarks. Drop random access to the linked list using []. Only return a direct array view.
---------
Co-authored-by: Christoffer Lerno <christoffer@aegik.com>
* Add form-feed and vertical tab to` trim` defaults
* add some initial string-based benchmarking
* update to non-const string
* do not account for mem times in bench
* misc bench fixes to repair reporting times; improve trim tests
* ok last one for real..remove (void) casts
* finally, swap to more efficient default whitespace order in `trim`
* simplify and add much faster hash functions in key locations
* add benchmark runtime @start and @end macros for better control
* update benchmark reporting and hashmap tests
---------
Co-authored-by: Christoffer Lerno <christoffer@aegik.com>
* sort: extract partition from quicksort
Extract the partition logic from quicksort into a macro. This allows to
reuse the partition logic for, e.g., the quickselect algorithm.
* sort: implement quickselect
implement Hoare's selection algorithm (quickselect) on the basis of the
already implemented quicksort. Quickselect allows to find the kth
smallest element in a unordered list with an average time complexity of
O(N) (worst case: O(N^2)).
* add quicksort benchmark
Create a top-level benchmarks folder. Add the benchmark implementation
for the quicksort algorithm.
Benchmarks can then be run in the same way as unit tests from the
root folder with:
c3c compile-benchmarks benchmarks/stdlib/sort