Equalizer logo
Collage logo
GPU-SD logo

Tile Compounds

Author: eilemann@gmail.com
State: Implemented in 1.1.7

Overview

Tile compounds decompose the view into regular-sized 2D tiles. Each contributing resources renders tiles, until all tiles have been processed. This provides a natural way of balancing the load. Similarly to tile compounds, chunk compounds decompose a database range into DB chunks. For the most part in this document we will consider chunk compounds as a special case for tile compounds.

Requirements

Specification

There are two approaches of distributing the tasks to all resources: a central queue or a per-resource queue with task stealing.

The central queue requires:

The per-resource queue requires:

The second implementation is more complex without gaining any feature or performance improvement. We will focus on the first implementation.

Implementation

                              Collage Task Queues
                              -------------------

  Similar to co::Barrier
  Master instance does push() and pop() items
  Slave instances do pop() items

class QueueMaster // can only be registered
{
public:
    void push( const QueuePacket& packet )
        Command* command = _cache.alloc( packet.size( ));
        copy packet -> command;
        command->command = CMD_QUEUE_ITEM;
        command->objectID = getID();
        _queue.push_back( command );

    Command& pop(); // release()d by callee

protected:
    getChangeType() { return TYPE_STATIC; }

private:
    typedef std::deque< Command* > PacketQueue;
    PacketQueue _queue;
    CommandCache _cache;

    _cmdGetItem( Command& command )
        while( !_queue.empty() && --itemsRequested )
            packet->instanceID = slaveInstanceID;
            send( command.getNode(), _queue.front( ));
            _queue.pop_front();
        if( _queue.empty( ))
            send( command.getNode(), QueueEmptyPacket );
};

class QueueSlave // can only be mapped
{
public:
    void attach()
    {
        registerCommand( CMD_QUEUE_ITEM, 0, _queue );
        registerCommand( CMD_QUEUE_EMPTY, 0, _queue );
    }

    Command* pop(); // release()d by callee
       if( _queue.getSize() < _prefetchLow )
           GetItemPacket packet;
           packet.num = _prefetchSize;
           send( master, packet );
       Command* command = _queue.pop();
       if( command->command == CMD_QUEUE_ITEM )
           return command;
       command->release();
       return 0;

protected:
    getChangeType() { return TYPE_STATIC; }

private:
    CommandQueue _queue;

    uint32_t _prefetchLow;
    uint32_t _prefetchSize;

};

Fabric Task Queues Specializations

                           Server-Side Implementation
                           --------------------------

  - Implement loader and API
  in Compound::update:
  - in CompoundUpdateOutputVisitor:
    o get co::QueueMaster [member of outputtiles/chunks]
      - TBD: one per eye and latency, cf. Frame::cycleData
    o generate all tasks packets[1] and push them to queue
      - fill in all non-source dependent data, e.g., PVP but not drawbuffer
      - cf. ChannelUpdateVisitor for task generation code
        o for chunks: only draw

  - in CompoundUpdateInputVisitor:
    o set QueueMaster id on inputtiles/chunks
  - register queues (cf. Barriers)
  - TBD output frame links for tile queues
    o Issue: one per tile, but only one frame per leaf in config
    o start with chunk queue impl which only has one output frame per leaf

  in ChannelUpdateVisitor:
  - if compound has input queue
    do generate tasks:
      o chunks: clear, chunks, readback
      o tiles: tiles [+output frame IDs]
      o contains queueID [cf. input frames]
      o all others, including transmit, do not change
      o pay attention that frameReadback does not set ready the frame

  - CLEANUP!

  - in CommandQueue::_cmdGetItem
    o TBD add compound-specific data

[1]
TileTaskPacket : public QueuePacket
{
    TileTaskPacket()
    {
        size = ...;
    }

    uint32_t tasks;
    PixelViewport pvp;
    Viewport vp;
    Frustum, ...
}

ChunkTaskPacket : public QueuePacket
{
    ChunkTaskPacket()
    {
        size = ...;
    }

    uint32_t tasks;
    Range range;
}
[...]

                           Client-Side Implementation
                           --------------------------
 Channel::_cmdFrameTiles()
    map QueueSlave to packet->queueID; [see Channel::getView() ]
    while( packet = queue.pop( ))
       set tile data in current context
       call frameClear, frameDraw, frameReadback [if task is set in tasks]

 Channel::_cmdFrameTasks()
    map QueueSlave to packet->queueID; [see Channel::getView() ]
    getNextRange()
    call frameDraw

 Channel::getNextRange()
    get packet = queue.pop();
    if !packet
      return false

    set range in current context
    release packet
    return true


API

  2D tiles:
    void eq::View::setTileSize( const Vector2i& tile );

  DB chunks:
    void eq::View::setChunkSize( const float range );
    const Range& eq::Channel::getNextRange() const;
    - fetches and returns the next range to be rendered
    - returns Range::INVALID (to be created) when all chunks are done
    - initially the first range is set
    - getRange always returns the last fetched range

File Format

  compound
  {
      channel "destination"
      outputtiles { name "queue" size [ int int ] }

      compound
      {
          channel "source1..n"
          inputtiles { name "queue" }
          outputframe {}
      }
      inputframe { name "frame.source1..n" }
  }

  compound
  {
      channel "destination"
      outputchunks { name "queue" divisor int }

      compound
      {
          channel "source1..n"
          inputchunks { name "queue" }
          outputframe {}
      }
      inputframe { name "frame.source1..n" }
  }

  compound
  {
      channel "destination"
      outputtiles { name "tiles" size [ int int ] }
      buffer [ COLOR DEPTH ]

      compound
      {
          inputtiles { name "tiles" }
          outputchunks { name "chunks1" granularity float }
          compound
          {
              channel "source1..n"
              inputchunks { name "chunks1" }
              outputframe {}
          }
      }
      compound
      {
          buffer [ COLOR DEPTH ]
          inputtiles { name "tiles" }
          outputchunks { name "chunks2" granularity float }
          compound
          {
              channel "source1..m"
              inputchunks { name "chunks2" }
              outputframe {}
          }
      }
      inputframe { name "frame.source1..n" }
      inputframe { name "frame.source1..m" }
  }

Examples

Collage queue usage:
  Producer:
    QueueMaster queue;
    localNode->registerObject( queue );
    for each task
        queue.push( task );
    send( queue->getID( ));

  Consumer:
    const uint128_t queueID = recv();
    QueueSlave queue;
    localNode->mapObject( queue, queueID );
    while( task = queue.pop( ))
        execute task
        task->release()

  Channel::frameDraw()
     setup
     Range range = getRange();
     while( range != INVALID )
         draw( range );
         range = getNextRange();

Issues

Q1: Do we have one draw call per tile, or does one draw call render multiple tiles?

Resolved: Have one draw call per tile. This is transparent to the application. The static setup can be amortized by caching the relevant state information on the channel itself, and resetting it in each frameStart

Having one Channel::frameClear, frameDraw, frameReadback per tile has the advantage of exposing this feature completely transparent to the application. Having one call for all tiles has the advantage that the application may amortize certain static setup calls, e.g., loading a shader or texture, over all tiles.

Q2: Do we have one draw call per chunk, or does one draw call render multiple chunks?

Resolved: For DB chunks, the application has to get all chunks to render in a single Channel::frameDraw invokation.

The same logic applies as for tiles, with the exception that we want to have one clear and readback for all chunks. This does not exclude using multiple draw calls, but reusing the same framebuffer does. Often applications adapt the near and far based on the current data, thus changing the meaning of the existing, reused depth buffer.