Subroutine threaded code is, unlike other forms of threaded code representations, directly executable by the CPU itself. Instead of a vector of addresses (as with DirectThreadedCode or IndirectThreadedCode), you have a vector of JSR- or CALL-instructions (for you RISC types, BAL-instructions. :D). Most microprocessors have some symmetry between similar JSR and CALL instructions, and therefore even the simplest of compilers will perform tail-recursion elimination. This allowed ChuckMoore's MachineForth compilers (and ColorForth for the Pentium, which used the same models) to simplify the traditional Forth language by eliminating a number of control flow constructs commonly found in ANSI Forth. (Footnote: in some respects, MachineForth is to ANSI Forth as Scheme is to CommonLisp.)
Contrast with DirectThreadedCode, IndirectThreadedCode, TokenThreadedCode