Thoughts on shared memory parallelism

this is more to remind myself of what I am thinking about right now. I should be thinking about Advance Topics in High Performance Computing as the exam is tomorrow but I find myself wandering onto  - admittedly more advance topics – non examinable material.

So my current thoughts are around the future say next 3 to 5 yrs which would be a nice time to do a PhD, well from currently working on some OpenCL benchmarking; I figure that its all nice that you can manually schedule tasks to be performed on hardware co-processors (GPGPU) but that’s like pthreads and what we really want is OpenMP! That is think about the problem and not the mechanics.

So why is this important, well current GPGPU programming has a problem, a normal HPC type problem … the interconnect. PCI express is a point to point communications channel running at comparable snails pace to that of the CPU and Memory so squeezing data across that is basically a bad idea for HPC … fine for games and video. Why would anyone change this, well power consumption would be a good reason, the latest Intel Core I5’s have GPUs on the same physical chip package as the CPU – to save power and drive low cost notebooks. its currently separate silicon but could in theory be interconnected with a fast bus; QPI on Intel or much better Hyper Transport! Only problem is that it would appear from Intel’s documentation that the onboard GMA’s are connected by PCIe dohh! never mind they were GMAs which are _not_ the best.

I did go off on a mild tangent thinking about doing an Open Source HW ASIC and Supercomputer blade that could scale not dissimilar to a Cray SeaStar+ design. There are Open Source Hyper-Transport cores, Open RISC or SPARC processors cores and appropriate 1 or 10G Ethernet IP – though of course we are looking to Light Peak as an interconnect. Could easily get a design and built then get a factory in China to pump them out on demand. the rest of the cabinets, cooling and data centre stuff .. er someone else can deal with that. just build MPI right in at the core.

So back to AMD and fusion, that being AMD’s next gen processors. there is no fixed public plan that I currently know of – at least not one that cant be changed – but basically a new x86-64 core ‘bulldozer’ will replace the K10 core currently used in Opteron and it will go 8/12/16/24 core but they will also bring onboard GPUs based on the Radeon GPU technology either on the same package or the same die. more over it better be connected by Hyper-Transport, sooner rather than later.

what about nvidia? well just got my fermi card (at least awaiting the courier – due today). hot hot but the right idea, fermi on hyper-transport? or cheaper to just buy out AMD?

Anyways back to the beginning, we want a way to not think about the layer of abstraction but just to be able to choose to run code on various accelerators but do it without the plumbing. so I have found two papers from last years International workshop on OpenMP . two papers of particular note were A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures  and Can OpenMP be extended to deal with Hardware Accelerator? the first paper next steps would be a great place to start build and OpenCL driver version. wonder if I could do a  PhD in that?

anyways back to study :(


About Me

Stuart Fraser is a 20 year veteran of the 'Data Systems Department', gone back to University. Now working again.

Recent posts

Recent comments

Search

Categories

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

© Copyright 2012