Equalizer: Documentation: Developer: Compound Load Balancing

Implemented in 0.6
view_equalizer in 0.9 (described here)
Boundaries: implemented in 0.9.1

Overview

Compound load balancing tries to optimally use all child resources assigned to a compound. Higher-level load balancing will dynamically change the resource assigned to a compound, based on the overall system load.

Compound load balancing either adapts the viewport or the range of the children to make sure all resource are busy all the time, so that optimal scalability can be achieved.

The implementation is fully transparent to the applicatoin and uses the timing values from the last finished frame. Later versions may introduce an API to get load information for the current frame from the application, to allow a more accurate approach.

Implementation

Load Balancing Grid

The statistics interface does provide us with the timing values from the last finished frame. This timing values, together with the 2D viewport or DB range give us a load distribution. The ROI readback interface will be used to refine the load distribution.

The goal of the load balancer is that all channels use the same time for rendering. The basic assumption is that the new frame will take the same time as the last finished frame and that is has roughly the same load distribution.

Using the statistics from the last finished frame, a 2D load grid is generated. Each grid cell containst the area and load for this cell. The load is time per normalized viewport, i.e., assuming an even work distribution within the call. The Figure on the right shows a load grid for a three-way 2D decomposition.

Load Balancing Split Tree

The viewport distribution is calculated using a binary split tree. The leaf nodes of the tree represent the child compounds performing the rendering. Intermediate nodes define the split direction at this level. The Figure on the left shows the split tree for the example load grid. Unless a view equalizer is used, each child counts as a full resource.

The load equalizer computes the split position using a top-down traversal on each level of the split tree, based on the resources and total rendering time of the given node. In the given example, the target time for the root node is 180 ms with a 60/120 ms split, and a 60/60 ms split on the right sub-node.

New Split Computation

To find the y split position on the root node, the load equalizer iterates over the 2D grid until 60ms have been accumulated. In the given example, this is reached at split position 0.4, giving the first source channel a viewport of [ 0 0 1 0.4 ].

On the intermediate node, the 60/60 ms split leads to a split at X position 0.62. Note that the yellow and grid cells do only partially contribute to the new split due to the new split position, in this case about 92 percent (0.6/0.65), i.e., about 55ms for the yellow grid cell.

Note that the new split reduced the size of the red source channel, which had the longest render time.

The calculation for database decomposition uses the same algorithm, except that it is simplified to a one-dimensional integration across the range.

In addition to the simplified algorithm explained above, the load equalizer implements the following additional features:

Dampening the per-frame adaptation by interpolating between the old and new value using a configurable damping factor
Restrict a source viewport to be not bigger than the underlying source channel
Restrict a source viewport to fall on a pre-defined pixel boundary, e.g., to balance for tiles with a size modulo 8
Remove the time needed to assemble the results from the destination channel's budget
Do not create zero-sized tiles

RFE: Tile and Range Boundaries

Implementation

  Implement file format as below
  Add:
    void LoadEqualizer::setBoundary( const Vector2i& boundary );
    void LoadEqualizer::setBoundary( const float boundary );
    const Vector2i& LoadEqualizer::getBoundary2i() const;
    float LoadEqualizer::getBoundaryf() const;
    Vector2i _boundary2i;  // default: 1 1
    float _boundaryf       // default: numeric_limits<float>::epsilon
  Add 'Vector2i boundary' to LoadEqualizer::Node
  In _assignTargetTimes
    set boundary on leaf nodes
    compute boundaries on non-leaf on up traversal
      Note: add values for split axis, max values for non-split axis
  In _computeSplit
    replace current epsilon/MIN_PIXELS code with boundary computation

File Format

  compound
  {
      load_equalizer
      {
          mode     [ 2D | VERTICAL | HORIZONTAL | DB ]
          boundary [ x y ] | float    # x,y: tile boundary for 2D, float: range
      }
  }

API

Open Issues

Channels within the same thread.