Singularity, Multithreading, Multiprocessing and TPL?

Nov 16, 2008 at 3:06 AM
From your design goals I understand you are researching highly parallel processing across possibly heterogenous processors.  Is there any chance that Joins or the Task Parallel Library portion of Parallel FX will run under Singularity 2.0?  I assume you guys are thinking about Larrabee and similar platforms for the future but without TPL, utilizing the available processors would much more difficult.  Even hardware threads in current single-core processors make it difficult to fully utilize the full potential of the processor without multithreading.

I'm especially interested in this sort of usage since most of my own projects currently require GPGPU for speed but development for that platform (whether via DirectX, OpenGL or even Cuda) is overly complex for many problem types and adding features can cause nearly complete rewrites.  Cell Broadband has similar difficulties.  Something like Larrabee with it's x86 NUMA-like architecture and high core count, even if it's first target usage will be for a GPU, would simplify things considerably assuming the architecture is open enough to get Singularity bootstrapped on it.  If I'm not mistaken, Singularity should only require minimal changes (boot loader, drivers, compiler extensions for it's vector units) to utililze something like the Larrabee processor.

Nov 17, 2008 at 10:13 PM
Hi jclary, thanks for writing. I consulted with my coworkers Galen Hunt and Ed Nightingale about this question, and our response is as follows.

We are internally prototyping a concept we call satellite kernels on the Singularity 2.0 RDK.  The idea is to employ multiple Singularity kernels spread across *all* of the available processing cores in a machine.  The satellite kernels cooperate to create a single OS from the user and application perspective.  The advantage of this approach is that it allows full access to system resources and services from any core.  We have prototype satellite kernels running on a diverse set of CPU and I/O devices. While there are many efforts in Microsoft looking at runtime and framework support for parallel computation, we haven’t yet incorporated any of them into Singularity.  Seems like a great area for researchers outside Microsoft as well.

I hope this answers your question, and please let me know if there's anything else you'd like to know.
Derrick Coetzee
Microsoft Research Operating Systems Group developer
Nov 17, 2008 at 11:10 PM
I'm definitely interested in the concept of using Singularity for distributed compute style workloads but it would seem a shame to have to duplicate all the effort that has gone into JOINS and TPL for local threading.  I don't know enough about their implementation details to know if they use any native code, though.  If they do, it might be unavoidable.  

I suppose there might be better/different ways to do things in Singularity but really I can't see not using something rather similar.  Even on current generation quad core intel machines it's sometimes difficult to properly utilize available resources not only because of the multiple cores but the hyperthreading also means you can't use more than roughly half a core's power in a single thread. Customizing your own solution with regular System.Threading features can be exceptionally error prone and difficult to optimize which I think is something you are trying to avoid in Singularity.

For most applications, distributing a problem across separate processes would add a lot of unecessary overhead although it might be slightly less so with Singularity's architecture.  Still, it's not natural from a development standpoint to do that and you'd have to find some way to hide the complexities.  TPL seems like a very natural way to do things and must be popular with others besides me since it's supposed to be going in to mscorlib for 4.0. 

Obviously PLINQ (the other half of Parallel FX) would require bringing in all the LINQ stuff which would be very difficult but TPL looks independent enough that it could be yanked out and possibly run on sscli 2.0 assuming there's no native code in the library.  Or since you've already got Sing#, it could be done as a language extension like it was in Comega.  I expect most paralellizing could be done efficiently just with Parallel.For and Parallel.Invoke.

LINQ in general would be nice for the future as it gives you an out of the box way to do reasonably efficient local databases without a database engine but that really has nothing to do with this discussion.