Tabard: Multiple GNU Prolog Engines in a Distributed Environment

October 7, 2006

User-space threads vs Kernel-threads

Filed under: Question — tabard @ 6:53 pm

User-space threads (M x 1, based on POSIX draft 4 threads) are created, terminated, synchronized, scheduled, and so forth using interfaces provided by a threads library. Creation, termination, and synchronization operations can be performed extremely fast using user-space threads.

Because user-space threads are not directly visible to the kernel (which is aware only of the overriding process containing the user-space threads), user-space threads (M x 1) require no kernel support.

If one thread blocks, the entire process blocks. When this happens, the benefit of threads parallelism is lost. Wrappers around various system calls can reduce some of the blocking, but at a cost to performance.

With kernel threads (1 thread to one process, or 1 x 1), each user thread has a corresponding kernel thread. Thus, there is full kernel support for threads.

Each thread is independently schedulable by the kernel, so if one thread blocks, others can still run.

Creation, termination, and synchronization can be slower with kernel threads than user threads, since the kernel must be involved in all thread management. Overhead may be greater, but more concurrency is possible using kernel threads, even with a uniprocessor system. As a result, total application performance with kernel-space threads surpasses that of user-space threads.

Note, however, that developers must be more careful when creating large amounts of threads, as each thread adds more weight to the process and more overhead to the system.

October 4, 2006

Multi-threading slower than single-threading?

Filed under: Question, Uncategorized — tabard @ 12:04 pm

I found this old mail on the swi-prolog mailing list archive about threads. It poses an interesting question: is multi-threading slower than single-thread?

Of course, since this mail is from 2002, it is utterly out of date when it comes to limitations of the current implementation and anyone referring to it to document shortcomings will be a moron.

Adrian Holzwarth said : “I’d guess that a swi-prolog with multi-threading running on a single-cpu-machine cannot run faster than a single threaded version. If you want to split the work to be done you need a second (or more :)) cpu to bother. Worse, otherwise you are creating overhead with dividing the work into pieces and managing the mess. And the lonesome single CPU has to do *all* the computing anyway, successive.”

Sebastian Sardina replied: “Indeed I do not expect the multi-threading version to run faster, but just equivalent if I do not use multi-threads at all. I understand that, but say you don’t do anything fancy: you just run a regluar Prolog program like the 9-queens example. So the problem is: why the multi-threading version of Prolog running a single thread is slower than the single-thread version of Prolog running the same single thread program?”

Jan Wielemaker also replied: “Right now the difference is neglectable on single-CPU Windows machines. The only remaining problematic area is dynamic code on Linux SMP machines that can be upto about 50% slower. Synchronization cannot be avoided here as the current implementation says dynamic code is fully shared between threads.”

He continues to say something that certainly has caught my attention: “It is one of these most frustrating aspects of performance tuning: sometimes you *know* the program is smaller and has to perform fewer steps, so it *must* be faster but measuring turns out it is in fact slower.”

Good for thesis: do you think it is useful, how does it fit with real examples, where do you think the API is not elegant/incomplete, what do you (eventually expect), are there relevant de-facto standards that should be considered, etc.

Blog at WordPress.com.