Internals
A recent publication [1] has described a way to exactly do that using OpenMP, which is available on many platforms and is easy to use, especially if you want to parallel processing of a for-loop.
To explain the implemented approach BSIM3 version 3.3.0 model was chosen, located in the BSIM3 directory, as the first example. The BSIM3load() function in b3ld.c contains two nested for-loops using linked lists (models and instances, e.g. individual transistors). Unfortunately OpenMP requires a loop with an integer index. So in file B3set.c an array is defined, filled with pointers to all instances of BSIM3 and stored in model->BSIM3InstanceArray.
BSIM3load() is now a wrapper function, calling the for-loop, which runs through functions BSIM3LoadOMP(), once per instance. Inside BSIM3LoadOMP() the model equations are calculated.
Typically need it is needed to synchronize the activities, in that storing the results into the matrix has to be guarded. The trick offered by the authors now is that the storage is moved out of the BSIM3LoadOMP() function. Inside BSIM3LoadOMP() the updated data are stored in extra locations locally per instance, defined in bsim3def.h. Only after the complete for-loop is exercised, the update to the matrix is done in an extra function BSIM3LoadRhsMat() in the main thread after the paralleled loop. No extra synchronization is required.
Then the thread programming needed is only a single line!!
#pragma omp parallel for
introducing the for-loop over the device instances.
This of course is made possible only thanks to the OpenMP guys and the clever trick on no synchronization introduced by the above cited authors.
The time-measuring function getrusage() used with Linux or Cygwin to determine the CPU time usage (with the rusage option enabled) counts tics from every core, adds them up, and thus reports a CPU time value enlarged by a factor of 8 if 8 threads have been chosen. So now ngspice is forced to use ftime for time measuring if OpenMP is selected.