I’ve created a project on sourceforge for Tabard, here’s the Tabard project page and there’s also a mailing list for development and another one with automated mail-outs of CVS commits. I would love to have more people testing the code and giving feedback and I hope this is a starting point for that.
October 27, 2006
October 26, 2006
Tests
I’m running some tests right now for which I wrote a little Perl script that outputs to a CSV file that I later import into OpenCalc. I’m using time(1) with the TIMEFORMAT env var set to “%E” which means I only use the elapsed time in the measurements.
poll: protocol failure in circuit setup
If you get the error:
poll: protocol failure in circuit setup
then you may need to edit /etc/inetd.conf and add .1000 to the end of the ‘nowait’ keyword.
October 10, 2006
Simple
:- include('lib').
trick(a(gnu), b(software)).
trick(_, c(is)).
% Master
init:-
pm2_is_master,!,
pl_thread_send_msg(vid(3,0,_,_), a(gnu)),
pl_thread_send_msg(vid(2,0,_,_), b(software)),
pl_thread_send_msg(vid(1,0,_,_), c(is)),
read_results,
finish_listeners. % never remove this unless you know what
% you're doing
% Workers
init:-
mutex_lock,
pl_thread_get_msg(Termo),
!, trick(Termo, X),
write('Worker -> '),write(Termo), write(','), write(X), nl,
pl_thread_send_msg(vid(0,0,_,_), X).
read_results:-
pl_thread_get_msg(Result1),
write('Master -> '),write(Result1),nl,
read_results.
And the output:
$ pm2load tabard
Worker -> b(software),c(is)
Master -> c(is)
Worker -> a(gnu),b(software)
Master -> b(software)
Master -> c(is)
Worker -> c(is),c(is)
Howto: Transform a string into a Prolog term
You can use read_term_from_codes/3 but remember to pass the option end_of_term(eof) if the term doesn’t end with a dot. This is because the default end-of-term delimiter is dot.
Example:
read_term_from_codes(“a(a)”, X, [end_of_term(eof)])
X = a(a)
October 7, 2006
User-space threads vs Kernel-threads
User-space threads (M x 1, based on POSIX draft 4 threads) are created, terminated, synchronized, scheduled, and so forth using interfaces provided by a threads library. Creation, termination, and synchronization operations can be performed extremely fast using user-space threads.
Because user-space threads are not directly visible to the kernel (which is aware only of the overriding process containing the user-space threads), user-space threads (M x 1) require no kernel support.
If one thread blocks, the entire process blocks. When this happens, the benefit of threads parallelism is lost. Wrappers around various system calls can reduce some of the blocking, but at a cost to performance.
With kernel threads (1 thread to one process, or 1 x 1), each user thread has a corresponding kernel thread. Thus, there is full kernel support for threads.
Each thread is independently schedulable by the kernel, so if one thread blocks, others can still run.
Creation, termination, and synchronization can be slower with kernel threads than user threads, since the kernel must be involved in all thread management. Overhead may be greater, but more concurrency is possible using kernel threads, even with a uniprocessor system. As a result, total application performance with kernel-space threads surpasses that of user-space threads.
Note, however, that developers must be more careful when creating large amounts of threads, as each thread adds more weight to the process and more overhead to the system.
October 5, 2006
From single-thread to multi-thread
In terms of programming, when wanting to transform a single-threaded program into multi-threaded one has to get into a different mindset.
In terms of Prolog (Tabard and most Prolog systems that support multi-threading), that means replacing:
- 1) predicates into thread_send_message()’s
- 2) unbound variables into thread_get_message()’s
Example: Consider the Prolog goal in single-thread mode:
% pred(+A, +B, -C)
pred([1,2], [[2,2], [3,2]], X).
And now in multi-threading mode, on the master thread side:
thread_send_message(worker_queue, pred([1,2], [[2,2], [3,2]])),
thread_get_message(master_queue, X).
And in the worker thread side:
thread_get_message(worker_queue, Y),
(do processing)
thread_send_message(master, Result).
And now for a real example (tested on SWI-Prolog):
%
% multiple threads wait on a single queue and pick up the first
% goal to execute. This example provides no means to tell when all
% work is done. This must be realised using additional
% synchronisation.
%
trick(a(gnu), b(software)). %answer is message is a(gnu)
trick(_, c(is)). %answer if not
% create_workers(+Id, +N)
%
% Create a pool with given Id and number of workers
% A set of workers wait on a single queue
%
create_workers(WorkerQueue, MasterQueue, N):-
message_queue_create(WorkerQueue),
message_queue_create(MasterQueue),
forall(between(1, N, _),
thread_create(do_work(WorkerQueue, MasterQueue), _, [])).
%
% Workers
%
do_work(WorkerQueue, MasterQueue):-
repeat,
thread_get_message(WorkerQueue, Goal),
% do processing
!, trick(Goal, X),
thread_send_message(MasterQueue, X),
fail.
% Master
%
% work(+Id, +Goal)
%
% Post work to be done by the pool
work(WorkerQueue, MasterQueue, Goal):-
thread_send_message(WorkerQueue, Goal),
thread_get_message(MasterQueue, Result),
write(Result).
October 4, 2006
For testing
game of life
matrix arithmetic
nreserve on 100-element list 2000 times for accuracy
swi-prolog benchmarks
Multi-threading slower than single-threading?
I found this old mail on the swi-prolog mailing list archive about threads. It poses an interesting question: is multi-threading slower than single-thread?
Of course, since this mail is from 2002, it is utterly out of date when it comes to limitations of the current implementation and anyone referring to it to document shortcomings will be a moron.
Adrian Holzwarth said : “I’d guess that a swi-prolog with multi-threading running on a single-cpu-machine cannot run faster than a single threaded version. If you want to split the work to be done you need a second (or more
) cpu to bother. Worse, otherwise you are creating overhead with dividing the work into pieces and managing the mess. And the lonesome single CPU has to do *all* the computing anyway, successive.”
Sebastian Sardina replied: “Indeed I do not expect the multi-threading version to run faster, but just equivalent if I do not use multi-threads at all. I understand that, but say you don’t do anything fancy: you just run a regluar Prolog program like the 9-queens example. So the problem is: why the multi-threading version of Prolog running a single thread is slower than the single-thread version of Prolog running the same single thread program?”
Jan Wielemaker also replied: “Right now the difference is neglectable on single-CPU Windows machines. The only remaining problematic area is dynamic code on Linux SMP machines that can be upto about 50% slower. Synchronization cannot be avoided here as the current implementation says dynamic code is fully shared between threads.”
He continues to say something that certainly has caught my attention: “It is one of these most frustrating aspects of performance tuning: sometimes you *know* the program is smaller and has to perform fewer steps, so it *must* be faster but measuring turns out it is in fact slower.”
Good for thesis: do you think it is useful, how does it fit with real examples, where do you think the API is not elegant/incomplete, what do you (eventually expect), are there relevant de-facto standards that should be considered, etc.
September 29, 2006
Testing: Examples of/for Performance Evaluation
Update: I’m now considering using this other programs for testing too.
Inductive Logic Programming system Aleph (link)
- a branch of machine learning that synthesises logic programs using other logic programs as input.
- Used by Jan Wielemaker (SWI-Prolog) to show speedup with threads on SMP systems.
Benchmark suite by Fernando Pereira (link)
- Used by Jan Wielemaker (SWI-Prolog) for comparing the single threaded to the multi-threaded version.
- Its purpose is to try to identify strengths and weaknesses in the basic engine of a Prolog system.
- “Also, I must say that I have relatively little faith on small benchmark programs. I find that performance (both time and space) on substantial programs, reliability, adherence to de facto standards and ease of use are far more important in practice. I’ve tried several Prolog systems that performed very well on small benchmarks (including mine), but that failed badly on one or more of these criteria.“
Dining Philosophers (link)
- Classic multi-process synchronization problem,
- It’s a toy program,
- More to do with concurrency that distributed/parallel computing.
Where it will be speedup for sure:
1) serialized program that now becames distributed.
2) multi-threaded program that distributed executes faster.