2022.08.30 03:56:14 (1564431769255849985) from Daniel J. Bernstein, replying to "covorigin (@covorigin)" (1564304353934344192):
There's a parameter to tune of how many pages the process is allowed to read without having the pages checked first. For those pages, I agree that it might be good for errors caught in subsequent scrubbing to terminate the process, but I'm not sure people will be ok with this.
2022.08.30 04:00:22 (1564432812551286785) from Daniel J. Bernstein:
What's attractive about zero access, with a page fault (in the OS sense) checking the page for faults (in the hardware sense) and only then allowing a read, is that there's a pure reduction in the error rate; nobody saying "you terminated my movie player because a pixel flipped".
2022.08.29 13:25:45 (1564212707745665027) from Daniel J. Bernstein:
Given the current reality of desktops/laptops/smartphones almost never having ECC RAM, I'd love to see more operating-system support for periodically sweeping through pages to detect and correct errors, storing (say) 14 bytes of error-correction data for each 4096-byte page.
2022.08.29 13:44:13 (1564217352517873665) from Daniel J. Bernstein:
It's hard for the OS to do anything useful to correct errors in pages being actively written, but that's not most pages at most times. The OS can try marking a page as read-only (or, more robust, zero access); compute the error-correction bits; periodically check for errors.
2022.08.29 19:29:56 (1564304353934344192) from "covorigin (@covorigin)":
It could also use the dirty-bit to see if there was a write (if so it must recalculate without checking against prev). It could use the Accessed page bit to find out if the application should be terminated if it detects a corruption- it shows if app was exposed to the error.