2018.07.17 20:40:24 (1019290739791089664) from Daniel J. Bernstein, replying to "Vlad Krasnov πͺπΊπ¦ (@thecomp1ler)" (1018304423020228608):
Thanks for the information. https://sorting.cr.yp.to/refs.html now has a link. Do I correctly understand that the paper was in 2016 and the code release was in 2017? I have a PIC port of your code doing n=1024 in about 20000 Haswell cycles; does this match the speed that you expect?
2018.07.14 19:11:00 (1018181077448515585) from Daniel J. Bernstein:
"Sorting integer arrays: security, speed, and verification." Slides for first djbsort talk now available: https://cr.yp.to/talks.html#2018.07.11
2018.07.14 23:01:06 (1018238983287918592) from "Vlad Krasnov πͺπΊπ¦ (@thecomp1ler)":
To be fair AVX quick sort and radix sort are not very well optimized for small inputs. However http://www.vldb.org/pvldb/1/1454171.pdf does SIMD merge networks.
2018.07.15 01:46:49 (1018280689224085504) from Daniel J. Bernstein, replying to "Vlad Krasnov πͺπΊπ¦ (@thecomp1ler)" (1018238983287918592):
Thanks for the reference. There are vectorized sorting networks in the literature as far back as 1987; see https://sorting.cr.yp.to/refs.html. People use these on GPUs. But the conventional wisdom has been that other sorting algorithms are faster on typical CPUs.
2018.07.15 03:21:08 (1018304423020228608) from "Vlad Krasnov πͺπΊπ¦ (@thecomp1ler)":
Why my paper is in harder to verify? The code is on github: https://github.com/vkrasnov/avx_qsort