When creating threads, we don’t usually think of its stack size. In the native world, the CreateThread function accepts a stack size (second argument) which we usually pass as 0. In the managed world, the Thread class exposes a pair of constructors expecting a stack size argument (which I was reminded by a comment).
Why is this important? Creating threads has its costs. This is not only the added work the Windows scheduler must undertake or the data structures that must be allocated in the kernel to manage that thread (KTHREAD, ETHREAD, etc.). Even if the threads are mostly waiting, memory for their stacks is wasted.
When a thread is created, actually two stacks are created: one in user space (lower addresses) and one in kernel (system) space. The latter is very limited in size (12KB in x86, 24KB in x64) and almost always resides in RAM (the reason has to do with interrupt service routines and other high IRQL code, such as DPCs, that are beyond the scope of this post). This stack size cannot be changed in any documented way, and in any case, only relevant for device driver programmers. We’ll concern ourselves with the user mode stack.
The Native World
When the stack size is specified as 0 in CreateThread, a default value, embedded in the PE header is used, which is 1MB by default. However, that 1MB is not actually committed in its entirety, but only a single page (4KB) and the next page in memory is marked with the PAGE_GUARD protection attribute, that causes an exception to be generated when the stack tries to expand beyond that first page. Windows’ memory manager responds by automatically committing the next page and moving the guard page to the following page (technically all downwards, as Intel stacks grow down in addresses, not up). So, what’s the meaning of that 1MB? This is the reserved size – that is, the maximum contiguous memory that thread stack can have. Trying to grow beyond that causes a stack overflow exception. (in this brief explanation, I’ve omitted some minor details for clarity)
Reserving memory is considered an inexpensive operation – no memory is committed, no RAM wasted, not even page file space; the only thing happening is the addition of a Virtual Address Descriptor (VAD) to indicate the fact that another address region needs to be described and marked “reserved”. But – and this is a big but – address space range is wasted. That means, other allocations (of any kind) have less address space to work with. This is mostly problematic for 32 bit processes, as they are limited to 2GB or 3GB (and sometimes 4GB on 64 bit system, look at my post for more details). 64 bit processes are mostly unaffected, as their address space is vast (~8TB).
Here’s a simple experiment we can try: how many threads can we create in a 32 bit process with 2GB user address space?
DWORD WINAPI DoSleep(PVOID) {
Sleep(INFINITE);
return 0;
}
int _tmain(int argc, _TCHAR* argv[]) {
int count = 0;
DWORD id;
do {
HANDLE h = CreateThread(0, 0, DoSleep, 0, 0, &id);
if(h == NULL)
break;
count++;
} while(true);
printf("Total threads: %d\n", count);
getchar();
return 0;
}
Running this on my system yields:
When opening Task Manager and looking at the process memory we find:
The red rectangle indicates the committed size (process-wide). This is definitely less than 1456 * 1MB! But the address space is pretty full (almost 1.5GB just for stack threads!)
Most applications do not require such a large stack (1MB), so we can change that. One way is to change the size globally using a linker option. This will set a different stack size for all threads that specify 0 for the second argument to CreateThread. Here’s the dialog in Visual Studio 2008:
The “Stack Reserve Size” is the relevant option. Let’s change this to 65536 (64KB):
And run it again. This time the result is:
Better than our previous 1456.
The issue is (of course) not the number of threads, as both numbers are ridiculously large. But the saving of address space allows more allocations of a “conventional” nature (malloc, new, VirtualAlloc, HeapAlloc, etc.).
The second argument to CreateThread allows changing the committed or the reserved size of that particular thread’s stack, overriding the default set by the linker option. By default, the change is in the initial committed size. To change the reserved size, one must specify the STACK_SIZE_PARAM_IS_A_RESERVATION constant (which I find to be a ridiculous and inconsistent name) in the flags argument (one before last).
What is you have a prebuilt EXE with no source code, and you suspect it’s creating too many threads with large stacks? You can use the editbin.exe tool (installed with Visual Studio) to manipulate PE header values, including this one. To do the same (reduce stack reservation to 64KB, for example) one could execute from a command prompt:
editbin /stack:65536 ThreadStack.Exe
The Managed World
In .NET, the thread’s stack size cannot be set in any visible way, and it’s set to 1MB by default. The additional problem with .NET, is that 1MB is immediately committed! (not just reserved). This means it consumes memory right away, even if the stacks don’t need to grow to 1MB.
Here’s a test to verify this:
static void Main(string[] args) {
int count = 1;
do {
Thread t = new Thread(() => Thread.Sleep(Timeout.Infinite));
try {
t.IsBackground = true;
t.Start();
}
catch(OutOfMemoryException e) {
break;
}
catch(ThreadStartException e) {
break;
}
count++;
} while(true);
Console.WriteLine("Threads: {0}", count);
Console.ReadLine();
}
The result of running this simple app is:
Close to its native counterpart.
Looking at task manager reveals the big difference:
Note the almost 1.5GB of committed memory!
Can we change this behaviour? Not in any way I could find.
Can we at least change the default 1MB? The VS property pages do not expose this option. However, using editbin.exe works on .NET executables as well as native ones, because both are PE files with the same basic header. This works even with signed assemblies, because this value lies in the part that is not hashed by the signing process.
Using this command line:
editbin /stack:131072 ManagedThreadStack.Exe
yields:
Definitely an improvement!
Conclusion
Thread stack sizes need to be taken into account, especially in heavily multithreaded applications. The CLR imposes limits on what we can do, but hopefully more control will be available in future versions of the CLR.