"Thread Local Storage" is global memory that is local to a thread. That is, there is one copy per thread, instead of one copy per process. (Well yes, it's an OxyMoron, but please read on... ;-)
If you want each thread to have its own patch of "global memory" to work on, you ask the system to allocate you a thread local storage index. Each thread can put its own pointer in the thread local storage array, at that index. Whenever a thread asks for the pointer at the given index, it will receive the appropriate pointer for the current thread. Thus, each thread can allocate a chunk of memory, put a pointer to it in the TLS array at the allocated index, and treat it as "global memory" that is local to each thread.
You'll need to downcast this pointer and later dealloc it explicitly (if it doesn't just point back into a stack object at the bottom of the thread's stack, which is a very convenient way to handle it). You'll also need to know that no other modules the thread enters happens to require and prepare TLS - e.g. TLS might be used for tracking transactions in a C++ STM system.
See Microsoft Windows Win32 functions TlsAlloc, TlsSetValue, TlsGetValue, and TlsFree.
See Java ThreadLocal and InheritableThreadLocal classes at http://java.sun.com/j2se/1.4.2/docs/api/java/lang/ThreadLocal.html
See also: ThreadLocalVariable, ExplicitManagementOfImplicitContext