Author: eilemann@gmail.com
State: Implemented in 1.1.7
Overview
Tile compounds decompose the view into regular-sized 2D tiles. Each contributing resources renders tiles, until all tiles have been processed. This provides a natural way of balancing the load. Similarly to tile compounds, chunk compounds decompose a database range into DB chunks. For the most part in this document we will consider chunk compounds as a special case for tile compounds.
Requirements
- Tile Affinity: A given tile should preferably be processed by the same resource. This is even more important for chunks where the resource has to map the data into its address space and upload it to a GPU.
- Configurability: The tile size and chunk granularity has to be configurable in the config file and programmatically on a frame-by-frame basis.
Specification
There are two approaches of distributing the tasks to all resources: a central queue or a per-resource queue with task stealing.
The central queue requires:
- Client-side prefetching for performance
- Server-side pre-distribution for affinity
- Inter-frame knowledge for affinity
- Per-frame queue, keeping #latency queues
The per-resource queue requires:
- Pre-distribution of tasks
- Client-to-client communication for work stealing
- Client-to-server feedback for pre-distribution of next frame
The second implementation is more complex without gaining any feature or performance improvement. We will focus on the first implementation.
Implementation
Collage Task Queues ------------------- Similar to co::Barrier Master instance does push() and pop() items Slave instances do pop() items class QueueMaster // can only be registered { public: void push( const QueuePacket& packet ) Command* command = _cache.alloc( packet.size( )); copy packet -> command; command->command = CMD_QUEUE_ITEM; command->objectID = getID(); _queue.push_back( command ); Command& pop(); // release()d by callee protected: getChangeType() { return TYPE_STATIC; } private: typedef std::deque< Command* > PacketQueue; PacketQueue _queue; CommandCache _cache; _cmdGetItem( Command& command ) while( !_queue.empty() && --itemsRequested ) packet->instanceID = slaveInstanceID; send( command.getNode(), _queue.front( )); _queue.pop_front(); if( _queue.empty( )) send( command.getNode(), QueueEmptyPacket ); }; class QueueSlave // can only be mapped { public: void attach() { registerCommand( CMD_QUEUE_ITEM, 0, _queue ); registerCommand( CMD_QUEUE_EMPTY, 0, _queue ); } Command* pop(); // release()d by callee if( _queue.getSize() < _prefetchLow ) GetItemPacket packet; packet.num = _prefetchSize; send( master, packet ); Command* command = _queue.pop(); if( command->command == CMD_QUEUE_ITEM ) return command; command->release(); return 0; protected: getChangeType() { return TYPE_STATIC; } private: CommandQueue _queue; uint32_t _prefetchLow; uint32_t _prefetchSize; }; Fabric Task Queues Specializations Server-Side Implementation -------------------------- - Implement loader and API in Compound::update: - in CompoundUpdateOutputVisitor: o get co::QueueMaster [member of outputtiles/chunks] - TBD: one per eye and latency, cf. Frame::cycleData o generate all tasks packets[1] and push them to queue - fill in all non-source dependent data, e.g., PVP but not drawbuffer - cf. ChannelUpdateVisitor for task generation code o for chunks: only draw - in CompoundUpdateInputVisitor: o set QueueMaster id on inputtiles/chunks - register queues (cf. Barriers) - TBD output frame links for tile queues o Issue: one per tile, but only one frame per leaf in config o start with chunk queue impl which only has one output frame per leaf in ChannelUpdateVisitor: - if compound has input queue do generate tasks: o chunks: clear, chunks, readback o tiles: tiles [+output frame IDs] o contains queueID [cf. input frames] o all others, including transmit, do not change o pay attention that frameReadback does not set ready the frame - CLEANUP! - in CommandQueue::_cmdGetItem o TBD add compound-specific data [1] TileTaskPacket : public QueuePacket { TileTaskPacket() { size = ...; } uint32_t tasks; PixelViewport pvp; Viewport vp; Frustum, ... } ChunkTaskPacket : public QueuePacket { ChunkTaskPacket() { size = ...; } uint32_t tasks; Range range; } [...] Client-Side Implementation -------------------------- Channel::_cmdFrameTiles() map QueueSlave to packet->queueID; [see Channel::getView() ] while( packet = queue.pop( )) set tile data in current context call frameClear, frameDraw, frameReadback [if task is set in tasks] Channel::_cmdFrameTasks() map QueueSlave to packet->queueID; [see Channel::getView() ] getNextRange() call frameDraw Channel::getNextRange() get packet = queue.pop(); if !packet return false set range in current context release packet return true
API
2D tiles: void eq::View::setTileSize( const Vector2i& tile ); DB chunks: void eq::View::setChunkSize( const float range ); const Range& eq::Channel::getNextRange() const; - fetches and returns the next range to be rendered - returns Range::INVALID (to be created) when all chunks are done - initially the first range is set - getRange always returns the last fetched range
File Format
compound { channel "destination" outputtiles { name "queue" size [ int int ] } compound { channel "source1..n" inputtiles { name "queue" } outputframe {} } inputframe { name "frame.source1..n" } } compound { channel "destination" outputchunks { name "queue" divisor int } compound { channel "source1..n" inputchunks { name "queue" } outputframe {} } inputframe { name "frame.source1..n" } } compound { channel "destination" outputtiles { name "tiles" size [ int int ] } buffer [ COLOR DEPTH ] compound { inputtiles { name "tiles" } outputchunks { name "chunks1" granularity float } compound { channel "source1..n" inputchunks { name "chunks1" } outputframe {} } } compound { buffer [ COLOR DEPTH ] inputtiles { name "tiles" } outputchunks { name "chunks2" granularity float } compound { channel "source1..m" inputchunks { name "chunks2" } outputframe {} } } inputframe { name "frame.source1..n" } inputframe { name "frame.source1..m" } }
Examples
Collage queue usage: Producer: QueueMaster queue; localNode->registerObject( queue ); for each task queue.push( task ); send( queue->getID( )); Consumer: const uint128_t queueID = recv(); QueueSlave queue; localNode->mapObject( queue, queueID ); while( task = queue.pop( )) execute task task->release() Channel::frameDraw() setup Range range = getRange(); while( range != INVALID ) draw( range ); range = getNextRange();
Issues
Q1: Do we have one draw call per tile, or does one draw call render multiple tiles?
Resolved: Have one draw call per tile. This is transparent to the
application. The static setup can be amortized by caching the relevant state
information on the channel itself, and resetting it in
each frameStart
Having one Channel::frameClear, frameDraw, frameReadback
per tile
has the advantage of exposing this feature completely transparent to the
application. Having one call for all tiles has the advantage that the
application may amortize certain static setup calls, e.g., loading a shader or
texture, over all tiles.
Q2: Do we have one draw call per chunk, or does one draw call render multiple chunks?
Resolved: For DB chunks, the application has to get all chunks to render in a
single Channel::frameDraw
invokation.
The same logic applies as for tiles, with the exception that we want to have one clear and readback for all chunks. This does not exclude using multiple draw calls, but reusing the same framebuffer does. Often applications adapt the near and far based on the current data, thus changing the meaning of the existing, reused depth buffer.