SOFTWARE TOOLS FOR PARALLEL AND DISTRIBUTED RENDERING ----------------------------------------------------- Emilio Camahort Center for Computational Visualization The University of Texas at Austin http://www.cs.utexas.edu/users/ecamahor/ OVERVIEW * parallel rendering o general software tools - asynchronous I/O: aio - POSIX threads: pthreads - OpenGL Optimizer MP support o SGI specific software packages - ImageVision Library - IRIS Performer * distributed rendering o low-level software tools - socket programming o high-level general software tools - PVM: parallel virtual machine - MPI: message passing interface ASYNCHRONOUS I/O * availability: on SGI IRIX: aio_* use slave threads, that may or may not be isolated, ie, lio_* restricted to a single processor on SUN Solaris: aio* thread unsafe asynchronous I/O aio_* thread safe asynchronous I/O on Linux: not supported * most library routines have 32-bit and 64-bit offset versions * very simple idea: o issue an io request and exit the request routine without waiting for completion o completion is notified in different ways (OS dependent!!) - sending a signal to the requesting process - executing (possibly in parallel) an associated callback (IRIX) - spawning a new thread and executing a callback function (IRIX) - no notification at all o requests may be prioritized (Solaris only) SGI'S ASYNCHRONOUS I/O aio_cancel (3) - cancel an asynchronous I/O request aio_error (3) - return error status of an asynchronous I/O operation aio_fsync (3) - asynchronously synchronize a file's in-memory state with that on the physical medium aio_hold (3) - Defer or resume reception of asynchronous I/O callbacks aio_init (3) - asynchronous I/O initialization aio_read (3) - asynchronous I/O read aio_return (3) - return error status of an asynchronous I/O operation aio_sgi_init (3) - asynchronous I/O initialization aio_suspend (3) - wait for an asynchronous I/O request aio_write (3) - asynchronous I/O write lio_listio (3) - linked asynchronous I/O operations SUN'S ASYNCHRONOUS I/O aio_cancel (3r) - cancel asynchronous I/O request aio_error (3r) - retrieve return or error status of asynchronous I/O operation aio_fsync (3r) - asynchronous file synchronization aio_read (3r) - asynchronous read and write operations aio_req (9s) - asynchronous I/O request structure aio_return (3r) - retrieve return or error status of asynchronous I/O operation aio_suspend (3r) - wait for asynchronous I/O request aio_write (3r) - asynchronous read and write operations POSIX THREADS * threads are subprocesses sharing a single address space, but having different program counters and stack spaces * POSIX threads have support for: - creating, synchronizing and exiting threads - mutual exclusion, locks - condition variables - semaphores (part of the POSIX standard) - barrier synchronization, etc. SGI'S PTHREADS pthread_atfork (3P) - register fork() handlers pthread_attr_init, pthread_attr_destroy, pthread_attr_setstacksize, pthread_attr_getstacksize, pthread_attr_setstackaddr, pthread_attr_getstackaddr, pthread_attr_setdetachstate, pthread_attr_getdetachstate (3P) - initialize thread attributes pthread_attr_setinheritsched, pthread_attr_getinheritsched (3P) - thread scheduling inheritance attributes pthread_attr_setschedparam, pthread_attr_getschedparam (3P) - manage thread scheduling priority attributes pthread_attr_setschedpolicy, pthread_attr_getschedpolicy (3P) - manage scheduling policy attributes pthread_attr_setscope, pthread_attr_getscope (3P) - thread scheduling scope attributes pthread_cancel (3P) - request cancellation of a thread pthread_cleanup_push, pthread_cleanup_pop (3P) - manage thread cleanup handlers pthread_condattr_init, pthread_condattr_destroy (3P) - initialize/destroy a condition variable attribute object pthread_cond_init, pthread_cond_signal, pthread_cond_broadcast, pthread_cond_wait, pthread_cond_timedwait, pthread_cond_destroy (3P) - condition variables pthread_create (3P) - create and start a thread pthread_detach (3P) - detach a thread pthread_equal (3P) - compare thread identifiers pthread_exit (3P) - terminate the calling thread pthread_join (3P) - wait for thread termination pthread_key_create (3P) - thread-specific data key creation pthread_key_delete (3P) - thread-specific data key deletion pthread_kill (3P) - deliver a signal to a thread pthread_mutexattr_init, pthread_mutexattr_destroy (3P) - initialize/destroy a mutex attribute object pthread_mutexattr_setprotocol, pthread_mutexattr_getprotocol, pthread_mutexattr_setprioceiling, pthread_mutexattr_getprioceiling (3P) - set/get a mutex attribute object's priority and protocol pthread_mutex_init, pthread_mutex_lock, pthread_mutex_trylock, pthread_mutex_unlock, pthread_mutex_destroy (3P) - mutual exclusion locks pthread_mutex_setprioceiling, pthread_mutex_getprioceiling (3P) - set/get a mutex's priority ceiling pthread_once (3P) - thread-safe initialization pthread_self (3P) - identify a thread pthread_setcancelstate, pthread_setcanceltype, pthread_testcancel (3P) - manage cancelability of a thread pthread_setschedparam, pthread_getschedparam (3P) - change thread scheduling pthread_setspecific, pthread_getspecific (3P) - thread-specific data management pthread_sigmask (3P) - examine and change blocked signals SUN'S PTHREADS See ~/doc/pthreads.sun SUN'S SOLARIS THREADS libthread (3t) - thread libraries: libpthread and libthread thr_create (3t) - thread creation thr_exit (3t) - thread termination thr_getprio (3t) - dynamic access to thread scheduling thr_getspecific (3t) - thread-specific-data functions thr_join (3t) - wait for thread termination thr_keycreate (3t) - thread-specific-data functions thr_kill (3t) - send a signal to a thread thr_self (3t) - get calling thread's ID thr_setprio (3t) - dynamic access to thread scheduling thr_setspecific (3t) - thread-specific-data functions thr_sigsetmask (3t) - change and/or examine calling thread's signal mask threads (3t) - thread libraries: libpthread and libthread MULTITHREADING USING OpenGL Optimizer * thread manager: man opThreadMgr - set of threads, each with one priority queue of tasks - a task can be added to one or more thread queues - queues managed in either round-robin or pre-emptive fashion - IMPORTANT: no FIFO behavior - routines: for thread managing and task scheduling * task definitions: - opActionInfo: gives info about a thread invoking an opAction - opAction: the base class for all actions (tasks) like: opFunctionAction: one task to run on one thread opMPFunAction: one task to run on many threads opMPFunListAction: many tasks to run on many threads * acronyms: qd = queue discipline TID, tid = thread id SPFun = task to be queued at a single thread MPFun = task to be queued at multiple threads MULTIPLE CHANGES TO A Cosmo3D SCENE GRAPH * when using multithreading multiple changes to scene graph may occur * transaction manager serializes them: man opTransactionMgr * the model: - one process controls changes to scene graph - that process owns opTransactionMgr (usually rendering one) - other processes can read scene graph, but writes need be submitted to opTransactionMgr - writes (opTransactions) are performed when owner process decides * opTransactions: - can be sent with or w/o blocking sender - put in a pending queue in opTransactionMgr - mimic Cosmo3D scene graph operations: csObject operations: unref & delete object, modify fields, etc csGroup operations: add, insert, remove, replace children csShape operations: set geometry, set appearance csMaterial operations: set diffuse color OTHER Optimizer OPERATIONS * do not use libc threading operations: fork(), sproc() * use Optimizer's low-level multiprocess tools instead: - opLock: a simple locking mechanism: lock, unlock a process - opMutex: a mutual exclusion mechanism, see man opMutex - opSemaphore: implements semaphores - opTaskBlock: to make processes wait on a task - opBlockingCounter: implements a condition variable * NOTE: the last 3 are related, opBlockingCounter is an abstraction of opSemaphore, opTaskBlock is a simpler version of opBlockingCounter THE ImageVision LIBRARY * a library for image processing and manipulation * supports different image formats through the IFL: Image file library * run: imgformats on SGI for list of formats * has support for: - basic image composition algorithms: add, sub, xor, etc. - image processing: histogram operators, morphological operators, etc - Fourier analysis - other misc functions * includes a suite of tools for image manipulation: - imgview: single image viewing and basic manipulation - imgworks: tool for enhancing images * image processing is done in place, possibly combining more than one operation at a time * C++ class based, has a functional language flavor IMAGE CACHING USING THE IL * IL uses an image cache with configurable size * caching is transparent, but can be controlled by the user * image caching: o images are cached-in on demand: for example when an image - needs to be displayed on the screen - needs to the sent to texture memory - is going to be accessed by the user's program o images stay in memory until expired o images can be locked in memory * images evicted from memory according to: o LRU policy (by default) o number of hits, times the image has been referenced o priority scheme (user defined) MULTITHREADING USING THE IL * multithreading/multiprocessing is (was!) transparent to developer * also: hardware support for most of its operations: - IL sets hardware paths between memory and, say, texture memory - processing is done by the image proceesing hardware (last stage of graphics pipeline) * allows choosing number of threads for I/O operation (default: 1) * allows choosing number of threads for computation (default: number of hardware processors MULTIPROCESSING USING IRIS Performer * allows assigning different pipeline tasks to different processors * the pipeline tasks are four: - intersections for collision-detection and picking - application related tasks - culling the database before rendering - rendering (drawing) the culled display list * all of them can be assigned to a different process(or) * last three need to be synchronized, incurring in 1 or 2 frame latencies * it is also possible to do part of the application in a separate process * the culling part itself can be multiprocessed (or multithreaded?) OTHER FEATURES OF IRIS Performer * it supports multiple pipes: - requires different processes for application, culling and rendering * it supports asynchronous I/O - (yet another) asynchronous process is used for high-latency I/O - it also allows dynamic changes (adds, removes) to the scene graph - changes can be done safely * it implements shared memory among processes - has a semaphore area for synchronization - allows locks and named shared memory blocks * it implements a synchronous buffer system for passing data between successive stages: application, culling and rendering * PROBLEM: multiprocessing is implemented using heavy-weight processes, ie, could be much slower than threads LOW-LEVEL COMMUNICATION: SOCKETS * file (or named?) sockets * Internet sockets HIGH-LEVEL MESSAGE PASSING LIBRARIES * MPI: message passing interface - a communication API for multiprocessor machines - runs also on heterogeneous networks, but it's less efficient - generally relies on vendor's message passing library - guarantees portability - supports implementation of parallel libraries - supports hierarchies of groups of communicating processes - it allows different types of communication modes: buffered, non- buffered, blocking, non-blocking, synchronous, ready - doesn't support fault tolerance * PVM: parallel virtual machine - originally intended for message passing in heterogeneous networks - processes are started and terminated dinamically - supports dynamic resource management and fault tolerance - best suited for a master/slave model of computation - allows setting communication groups for broadcasting - has blocking and non-blocking receive - supports synchronous and asynchronous broadcast MISCELLANEOUS STUFF * SGI: mpadmin (1) - control and report processor status runon (1) - run a command on a particular cpu sysmp (2) - multiprocessing control: to assign processes to processors * Xiaoyu's slides: doc/render.ppt