home | list info | list archive | date index | thread index

Re: [OCLUG-Tech] How good is Linux at NUMA ?

  • Subject: Re: [OCLUG-Tech] How good is Linux at NUMA ?
  • From: Martin Hicks <mort [ at ] bork [ dot ] org>
  • Date: Fri, 31 Mar 2006 08:58:21 -0500

On Fri, Mar 31, 2006 at 08:42:14AM -0500, Peter Sjoberg wrote:
> On Fri, 2006-03-31 at 01:19 -0500, Adrian Irving-Beer wrote:
> > On Thu, Mar 30, 2006 at 07:25:14PM -0500, Peter Sjoberg wrote:
> > 
> > > Just look at http://www.jroller.com/page/jaimec?entry=sparc_linux
> > 
> > I guess you stopped reading before the comments, where people debunk
> > the entry, and then the original poster basically says, "oops, guess I
> > was totally wrong" and closes the topic. :)
> Yes, I confess that I stopped to early (read it after my post).
> 
> I still wonder how well NUMA is implemented but feel I need to read the
> source and join the kernel mailing list to get more answers, and in the
> end it's just because I'm interested, not because I need to know for
> work or anything.
> 
> Does Linux Opteron implementation of NUMA
> Try to allocate process memory on same cpu: YES!
> 
> Have some NumaAPI so a process can manage thread creation to some
> degree: Yes
> 
> If no more memory is available on current cpu, does malloc grab from any
> other cpu (2 level NUMA) or make a difference between far (1 jump) and
> farther (2 jump) CPUs (multi level NUMA): ?no?
> If Yes, limited to 3 levels or ?

Many levels.  Go look at how the zonelists are built from the SLIT ACPIT
table.  Look at the ia64/sn2 implementation, where they a bunch of
different levels (based on how many numalink routers you have to travel
through to get to the node with memory).  SGI also has a couple
degenerative case nodes:  Memory only nodes, and TIO nodes which are I/O
nodes with no cpu or memory.  All of the memory management stuff is
smart enough to reclaim on memory-only nodes, etc.

> 
> If current CPU core is overloaded, does it recognize that the other core
> is just as good from a memory standpoint and migrate the process?: ?

I think it would be expensive to figure out how "good" a node is from a
memory standpoint.  The thread will get migrated by the scheduler,  but
the memory won't move.  The memory migration in linux is currently only
"manual".  I.e., you stop the task, move it, and start it again.  In the
future they may implement automatic migration, so that if the scheduler
bumps a process to a different node the memory will follow.

> 
> Migrate a process to another CPU when memory usage grows and current
> node is full while another node has enough to have to hold the whole
> process: ?no?

In my mind, if you run into this situation and you actually care, then
you should be using a batch scheduler and planning better.

> Have some kind of numa awareness when managing io (network, disk) for  a
> process: ?? (or is it _always_ the user space process that creates every
> buffer involved in io?)
> 

I thinks the scalex86.org people were pushing patches for better NUMA
affinity for IO and networking.  I was never involved with this work.

mh

-- 
Martin Hicks || mort [ at ] bork [ dot ] org || PGP/GnuPG: 0x4C7F2BEE

Attachment: signature.asc
Description: Digital signature

message navigation