Well, you can imagine how surprised I was to see someone reinventing Forth.
In an indirect threaded interpreter, each op — let's call them words, for the hell of it — is represented by an index into some table holding an address to the code. (Or, in Forth, an address of a word definition, which contains an address of the code to run.)
In a direct-threaded interpreter, each words is represented by the address of the code itself, so the interpreter — which we'll call NEXT — is basically a repeated indirect jump.
At least one Forth saw the obvious next step in performance: replace each reference to a compiled word with a CALL instruction to its address, instead of the address. (This isn't a huge leap for Forth. Direct-threaded Forths usually have a CALL instruction in the word definition's code slot anyway.) On the machine in question (I don't remember which), they used a within-segment jump instruction, and were able to squeeze it the same space that would have been required for the direct-threaded address.
Evidently, this approach has been rediscovered, under the name "context threading."
Context threading does bring a few significant advancements to the table: it generates the calls at runtime from the original bytecode, and inlines basic flow control among all those CALL instructions. This isn't possible in Forth without a lot of trickery, for the same reason it's difficult in Smalltalk: there are no built-in flow control mechanisms. However, high-performance subroutine-threaded Forths will inline short, common words — and you don't get much shorter or more common than IF/ELSE/THEN.
Speaking of which, context threading seems to rely on the original bytecode to resolve flow control operations, where Forth would simply modify the return address.
(Incidentally: yes, this is one of the two tricks I used in the Mongoose interpreter to improve indirect branch prediction. Forth has a lot of good ideas to steal.)