The updated version of implementation notes is published on http://dmitryvk.github.com/sbcl-win32-threads/implementation-notes.html
Suspend
Thread suspension is implemented as safepoints. Safepoint is implemented with read of memory location ('GC poll address' which is located in 'GC poll page').
At first phase, the 'master' thread unmaps the GC poll page. After this, other threads will at some time get page faults. There are several issues that must be dealt with:
1) The reaction to unmapping is not immediate - thread must reach the safepoint
2) Some threads will not reach safepoint soon (if thread is executing foreign code or a blocking system call)
3) Even if a thread has reached a safepoint, it does not mean that GC can start. The thread may be inside WITHOUT-GCING section, for example. In this case, thread may not be resumed with GC poll page unmapped.
We can draw some conclusions:
1) Every thread that can reach safepoint (if it's not in foreign code or in blocking syscall) must reach it before GC can proceed.
1.a) Every thread can can not reach safepoint must not interfere with GC if it will suddenly return to lisp code
2) After all threads have reached safepoint, we must wait for all threads to be ready for GC.
This implies two-phase suspend.
Phase 1:
1) GC poll page is remapped as unreadable
2) master thread checks each thread: if it's running lisp code, wait until it reaches a safepoint. Thread is considered to reach a safepoint when it's state is STATE_SUSPENDED_BRIEFLY.
Phase 2:
1) GC poll page is mapped again so that threads can run until they are ready for gc
2) Master thread waits for every thread to be ready for GC. This is achieved by waiting for state of every thread to become STATE_SUSPENDED (except for threads that are ready for GC)
Thread is ready for GC if:
1) thread_state(thread) == STATE_SUSPENDED
2) thread_is running foreign code and it is not inside WITHOUT-GCING or WITHOUT-INTERRUPTS and blockable signals are unblocked
For this, thread-local variable *GC-SAFE* is introduced - it tracks the current readiness for GC of a thread. It is guaranteed that when *GC-SAFE* changes from NIL to T, thread checks if GC is in progress and enters suspended state.
Thread interruption is similar, but we don't need to wait for all threads to reach a safepoint - it is only necessary for interrupted thread to reach safepoint.
Phase 1:
1) GC poll page is remapped as unreadable
2) master thread checks interrupted thread: if it's running lisp code, wait until it reaches a safepoint. Thread is considered to reach a safepoint when it's state is STATE_SUSPENDED_BRIEFLY.
Phase 2:
1) GC poll page is mapped
2) All threads that have reached a safepoint are released
Safepoint code
Safepoint code is called to check whether thread has something to do related to SBCL internal working. It is called:
1) When thread reaches a safepoint and GC poll page is unmapped
2) When leaving and entering foreign code
3) At other occasions.
Safepoints have several responsibilities.
1) If there is a GC or thread interruption in progress, thread has to notify the master thread that it is has reached a safepoint. Safepoint does this by changing the state to STATE_SUSPENDED_BRIEFLY and waiting for state to be changed by master thread. When it resumes, thread checks whether it should suspend or interrupt.
2) If thread should suspend, it is checked whether thread can suspend. If thread is suspendable, it changes its state to STATE_SUSPENDED; otherwise, it sets STOP_FOR_GC_PENDING (and sets pseudo_atomic_interrupted)
3) If thread should interrupt, it either sets INTERRUPT_PENDING and pseudo_atomic_interrupted or executes interruption.
4) If GC is pending and thread can do GC, runs the GC
5) Is interrupt is pending and thread can execute it, executes it.
On some occasions, runtime is in very fragile state and can not really do anything that safepoint must do (e.g., change thread state, execute GC, execute interruption). Thses are e.g. using lisp thread synchronization primitives. To control this, *DISABLE-SAFEPOINTS* variable is used.
GC code is run inside a safepoint, and safepoint code is not reenterable. GC code itself has safepoints (since SUB-GC is a normal lisp function, it calls lisp synchronization routines and does several switches to/from foreign code). To prevent rentering of a safepoint code, *IN-SAFEPOINT* variable is used.