US9721376B2 - Elimination of minimal use threads via quad merging - Google Patents

Elimination of minimal use threads via quad merging Download PDF

Info

Publication number
US9721376B2
US9721376B2 US14/671,467 US201514671467A US9721376B2 US 9721376 B2 US9721376 B2 US 9721376B2 US 201514671467 A US201514671467 A US 201514671467A US 9721376 B2 US9721376 B2 US 9721376B2
Authority
US
United States
Prior art keywords
quad
merging
draw call
merge
primitives
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
US14/671,467
Other versions
US20150379764A1 (en
Inventor
Derek J. Lentz
Sang Oak Woo
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics Co Ltd filed Critical Samsung Electronics Co Ltd
Priority to US14/671,467 priority Critical patent/US9721376B2/en
Assigned to SAMSUNG ELECTRONICS CO., LTD. reassignment SAMSUNG ELECTRONICS CO., LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: LENTZ, DEREK J., WOO, SANG OAK
Priority to KR1020150085149A priority patent/KR102392060B1/en
Publication of US20150379764A1 publication Critical patent/US20150379764A1/en
Priority to US15/633,702 priority patent/US9972124B2/en
Application granted granted Critical
Publication of US9721376B2 publication Critical patent/US9721376B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/40Filling a planar surface by adding surface attributes, e.g. colour or texture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/10Geometric effects
    • G06T15/40Hidden part removal

Definitions

  • the present invention is generally related to pixel shading in a graphics processing system. More particularly, the present invention is directed to performing quad fragment merging to reduce a pixel shading overhead.
  • GPU graphics processing unit
  • Techniques have been proposed to perform shading using quad-fragment merging.
  • conventional approaches have many drawbacks, including various quality problems and artifacts, as well as other problems.
  • Fragment merging in a graphics system is performed on a draw call basis. Primitives of the same draw call have many common attributes, such as the same graphics state, which facilitates merging of fragments in blocks of pixels, where the block has a size of at least 2 ⁇ 2 pixels. Partially covered fragments of the same draw call are considered for possible merging and at least one merge test performed.
  • the merge test may include error tests such as a level of detail error test and an interpolation error test.
  • a method of performing fragment merging in a shading stage of a graphics system on a draw call basis includes performing a draw call and rasterizing primitives.
  • merging is performed of partially covered fragments of a block of pixels to form a merged block of pixels.
  • Shading is performed of the merged block of pixels.
  • the block of pixels is a quad of four pixels
  • the merge test is a quad merge test
  • shading is performed of a merged quad.
  • a method of performing quad merging in a graphics system includes accumulating data of rasterized primitives for a draw call. A depth of each sample is computed within each primitive of the draw call using depth plane equations or other interpolation techniques such as barycentric computations. Early Z testing is performed for the primitives. Quad merging is performed for the rasterized primitives of the draw call for at least one quad location of partially covered quad fragments satisfying a set of merge tests that excludes overlapping primitives and excludes primitives having different faces.
  • a graphics processing unit includes a fragment merge unit to perform merging of fragments on a draw call basis.
  • the merge unit performs quad merging on a draw call basis.
  • support is provided to accumulate data for groups of primitives or to provide a moving window of primitives and covered blocks (e.g., quads) for accumulating rendered and merged data.
  • FIG. 1 is a high level block diagram of a merging unit to merge quads for shading in accordance with an embodiment of the present invention.
  • FIG. 2 is a diagram illustrating a quad merging unit in accordance with an embodiment of the present invention.
  • FIG. 3 illustrates a merge testing flow chart in accordance with an embodiment of the present invention.
  • FIG. 4 illustrates aspects of quad merging in accordance with an embodiment of the present invention.
  • FIG. 5 illustrates aspects of approximation of variable values and interpolation errors in accordance with an embodiment of the present invention.
  • FIG. 6 illustrates aspects associated with specular lighting in accordance with an embodiment of the present invention.
  • FIG. 1 is a high level block diagram of a graphics processing unit (GPU) 100 including a merge unit 102 in accordance with an embodiment of the present invention.
  • the merge unit is a merge unit to merge partially covered fragments in pixels blocks at least as large a quad (at least a 2 ⁇ 2 block of pixels).
  • An individual block e.g., a quad
  • a rasterization setup stage receives vertex attributes.
  • the output of the pre-rasterization setup stage goes into a rasterizer 104 .
  • the merge unit 102 receives an output of the rasterizer 104 , a draw call enable signal 106 , and a graphics state 108 associated with a draw call.
  • the output 190 of the quad merge unit is provided to a shader 195 and includes information on merged blocks.
  • An optional attribute setup output may be included.
  • the merging unit 102 may be implemented in graphics hardware, firmware, or software.
  • a draw call is a block of rendering primitives in a graphics system that has the same graphics state. Images are generally generated from a set of state changes and draw calls. Within a draw call, the draw call data is accumulated and merge decisions are made. In one embodiment support may be provided to accumulate data for groups of primitives or to provide a moving window of primitives and covered blocks (e.g., quads) for accumulating rendered and merged data.
  • the graphics state for a draw call will use the same textures and set of attributes, such as the varying variables (hereinafter abbreviates as “Vv”) of the OpenGL® graphis language.
  • the merge unit 102 performs merging only within draw calls. In one embodiment the merging may be enabled or disabled on a draw call basis via draw call enable 106 based on application requirements or user requirements.
  • Software control may be provided to enable or disable merging via the draw call enable 106 .
  • the merging may be enabled in a power saving mode or upon the detection of whether or not particular pixel lighting computations could beneficially utilize the merging.
  • specular lighting is a case in which merging may create image artifacts such that it may be desirable to turn merging off for specular lighting.
  • Merging may also be selected based on the primitive or object type to be shaded. As an example, merging should be disabled for sprites and lines.
  • the merge unit 102 performs one or more merge tests. To support the merge tests for depth related tests, the merge unit 102 may perform or otherwise receive inputs of an early Z compare element 132 . In one embodiment attribute setup is performed outside of the merge process and may be delayed until after merging is completed.
  • the merge tests may include testing whether primitives overlap and have a common face 122 ; a depth slope test 124 ; which may be indicative of a level of detail (LOD) error 126 by testing objects depth slopes (e.g., in X & Y) and disabling merging when the slope is too great or too different, preventing an LOD problem; and an interpolation error test 128 to prevent merging between primitives that are not adjacent.
  • LOD level of detail
  • FIG. 2 shows an example of a quad merge unit 202 in accordance with an embodiment of the present invention.
  • the quad merge unit 202 includes an input from an optional shared edge detection unit 203 , an attribute setup unit 205 , an early Z unit 207 , a merge testing unit 209 , a quad accumulation unit 211 , a merge mapping unit 213 , and a flush to shaders output 215 .
  • the shared edge detection unit 203 detects when edges between primitives are shared and only enables filtering on edges that are exactly shared. For example, the vertex indices within the vertex data array or arrays may be used to identify shared edges between primitives.
  • the quad accumulation unit 211 and merge mapping unit storage 213 are empty.
  • the primitives associated with a draw call are rasterized into quads, with live coverage by the rasterizer hardware (not shown in FIG. 2 ).
  • the attribute setup unit 205 performs interpolation setup (e.g., plane equation computations) to compute the required depth plane equation for each primitive, which in turn permits the depth of each sample within each primitive to be computed. It will be understood, however, that the attribute setup unit could alternatively implement a barycentric interpolation.
  • Quads that have coverage within primitives are generated by rasterization and passed through to early Z testing in the early Z unit 207 .
  • Z/depth values are computed for each sample that the early Z unit tests.
  • Quads with surviving coverage after early Z testing are sent to the merge testing unit 213 , which performs the required merge tests. Those quads that pass the merge tests with remaining coverage are stored in the quad accumulation unit storage 211 .
  • mapping information is stored in the merge mapping element storage 213 . This mapping information is also used by the merge testing unit 209 .
  • merge testing is applied to each incoming quad with partial coverage, where a quad is partially covered when the quad is not fully covered, but includes at least 1 live sample.
  • the results are used to modify the quad accumulation unit storage 211 and the merge mapping unit storage 213 .
  • the results of the quad accumulation are flushed to the shaders.
  • the flushing may be performed before processing a new draw call, such as at the end of a current draw call or at any time when storage has become too full.
  • primitives e.g., plane equations and primitive face information
  • quads are flushed to the shader.
  • plane equations are flushed in a synchronous manner with quad data so that an interpolator has access to the plane equations when running a pixel shader for the primitives. Data regulation may be provided so that the buffers in the shader and interpolator are not over-filled.
  • merging is never performed between different draw calls.
  • the primitives of a particular draw call have a common graphics state, which may include common textures, common shader programs, and attribute variables.
  • the edges within a draw call are edges that are shared between primitives that are internal to a rendered object within the image (non-silhouette) edges.
  • Quads enclosing internal edges are conventionally rendered twice in pixel shaders. However, on average, 50% of the pixels in these types of quads enclosing internal edges are “helper pixels” that are typically only required so that texture hardware can compute a level of detail. Additionally, overlap between primitives within a draw call is often very rare.
  • Performing quad merging for a draw call permits helper pixels to be removed when pixels from the adjacent primitives are packed together in shared quads. Additionally, texture accesses and variables (such as Vv) are shared between adjacent primitives within a draw call. Merging coverage from one quad into another permits shading one quad instead of two quads. Thus if the coverage of non-overlapping primitives at a quad position can be merged, 50% of these pixel shader threads (on edges) are saved (i.e., 1 of 2 quads).
  • FIG. 3 is a flowchart illustrating additional aspects of merging.
  • input quads or larger blocks are filtered as follows.
  • a merge enabled test is performed in decision block 305 . If merging is disabled, then the process writes data into the quad buffer. No modification is performed of the partial coverage map when merging is disabled.
  • merging 338 is not performed and the data is written into the quad circular buffer 330 .
  • the partial coverage map 320 is not modified; in another option the partial coverage map points to the new quad. If there is a different face, merging 338 is not performed and data is written into the quad circular buffer 330 without modifying the partial coverage map 320 . In one embodiment other merge tests may also be performed.
  • a step is performed to overwrite a merged (stored or input) quad buffer entry with combined coverage: (input_coverage
  • Options may be provided to merge up to a maximum number of primitives (e.g., 2).
  • the buffer data is flushed to the pixel shader(s) along with plane equation data.
  • plane equations or equivalent interpolation data
  • the plane equations are flushed in a synchronous manner with quad data so an interpolator has access to the plane equations when running the pixel shader(s) for the primitives.
  • FIG. 4 shows a set of quads and two primitives (triangles 1 and 2 ) having a shared interior edge.
  • Each quad has four pixel centers.
  • the quads in triangle 1 may have a blue color and the quads in triangle 2 a yellow color.
  • quad (1, 1) (row 1, column 1)
  • the left side pixels from triangle 1 can be merged with the pixels from triangle 2 as in quad (2, 1).
  • quads (3, 1) and (4, 1) the quads are merged into triangle 1 .
  • FIG. 5 illustrates approximation of Vv values for textures.
  • the most common usage for Vv values is to access textures.
  • the Vv values could, in theory, e calculated exactly.
  • the Vv values can be approximated.
  • FIG. 6 illustrates issues associated with specular lighting.
  • the primitives each have surface normals. If normal values are significantly different from the correct values at particular pixels, specular highlights can be very different at those pixels because of the power function used in specular lighting. This is because specular lighting uses primitive or pixel level normals. The degree to which artifacts occur for specular lighting depends on various factors. If Phong shading is used, which has per primitive normals, this could result in artifacts from merging. If interpolated normals (per pixel normals) are used, visible artifacts are probably minimal. Similarly, if normal maps (per pixel normals) are used, then there will probably be minimal visible artifacts.
  • LOD is normally not critical because it is computed using a log function. However, the LOD can change quickly and artifacts can occur when the depth (Z) slope of the primitive is very high. Additionally Vv slopes may change rapidly and approximation errors (Vv approximation) will be larger.
  • Exemplary formulas for disabling merging when the depth slope is high may be based on analyzing the derivatives of depth (Z) with respect to x and y, such as by having the sum of derivatives in each x and y being greater than a threshold or each individual derivative in x and y being greater than a threshold: ( dz/dx+dz/dy )>threshold or ( dz/dx >threshold) ⁇ ( dz/dy )>threshold
  • the slopes of Vv and/or 1/W are used to estimate which quad will have a lower interpolation error.
  • a threshold in the depth slope may be used to define an interpolation error merge test. If the depth (Z) slopes of 2 primitives differ by a lot then the Vv slopes may differ a lot across the edge between them.
  • merging is disabled when the difference between slopes of z in x, and y for two primitives (having z values z1 and z2) is greater than a threshold: ((( dz 1 /dx+dz 1 /dy ) ⁇ ( dz 2 /dx+dz 2 /dy ))>threshold)
  • the present invention may also be tangibly embodied as a set of computer instructions stored on a computer readable medium, such as a memory device.

Abstract

Fragment merging is performed on a draw call basis. One application is for quad merging. Primitives of the same draw call have many common attributes, such as a graphics state, which facilitates merging of quad fragments. Partially covered quad fragments of the same draw call are considered for possible merging and at least one merge test performed. The merge test may include error tests such as a level of detail error test, interpolated depth, and an interpolation error test.

Description

CROSS REFERENCE TO RELATED APPLICATIONS
The present application claims the benefit of Provisional Application No. 62/018,040 filed Jun. 27, 2014, the contents of which are hereby incorporated by reference.
FIELD OF THE INVENTION
The present invention is generally related to pixel shading in a graphics processing system. More particularly, the present invention is directed to performing quad fragment merging to reduce a pixel shading overhead.
BACKGROUND OF THE INVENTION
One aspect of many modern graphics systems that include a graphics processing unit (GPU) is that there are many pixel shader threads that require processing. This consumes power and limits performance. Techniques have been proposed to perform shading using quad-fragment merging. However, conventional approaches have many drawbacks, including various quality problems and artifacts, as well as other problems.
SUMMARY OF THE INVENTION
Fragment merging in a graphics system is performed on a draw call basis. Primitives of the same draw call have many common attributes, such as the same graphics state, which facilitates merging of fragments in blocks of pixels, where the block has a size of at least 2×2 pixels. Partially covered fragments of the same draw call are considered for possible merging and at least one merge test performed. The merge test may include error tests such as a level of detail error test and an interpolation error test.
In one embodiment a method of performing fragment merging in a shading stage of a graphics system on a draw call basis includes performing a draw call and rasterizing primitives. In response to at least one merge test being satisfied, merging is performed of partially covered fragments of a block of pixels to form a merged block of pixels. Shading is performed of the merged block of pixels. In one embodiment the block of pixels is a quad of four pixels, the merge test is a quad merge test, and shading is performed of a merged quad.
In one embodiment a method of performing quad merging in a graphics system includes accumulating data of rasterized primitives for a draw call. A depth of each sample is computed within each primitive of the draw call using depth plane equations or other interpolation techniques such as barycentric computations. Early Z testing is performed for the primitives. Quad merging is performed for the rasterized primitives of the draw call for at least one quad location of partially covered quad fragments satisfying a set of merge tests that excludes overlapping primitives and excludes primitives having different faces.
In one embodiment a graphics processing unit (GPU) includes a fragment merge unit to perform merging of fragments on a draw call basis. In one embodiment the merge unit performs quad merging on a draw call basis. In one embodiment support is provided to accumulate data for groups of primitives or to provide a moving window of primitives and covered blocks (e.g., quads) for accumulating rendered and merged data.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a high level block diagram of a merging unit to merge quads for shading in accordance with an embodiment of the present invention.
FIG. 2 is a diagram illustrating a quad merging unit in accordance with an embodiment of the present invention.
FIG. 3 illustrates a merge testing flow chart in accordance with an embodiment of the present invention.
FIG. 4 illustrates aspects of quad merging in accordance with an embodiment of the present invention.
FIG. 5 illustrates aspects of approximation of variable values and interpolation errors in accordance with an embodiment of the present invention.
FIG. 6 illustrates aspects associated with specular lighting in accordance with an embodiment of the present invention.
DETAILED DESCRIPTION
FIG. 1 is a high level block diagram of a graphics processing unit (GPU) 100 including a merge unit 102 in accordance with an embodiment of the present invention. In one embodiment the merge unit is a merge unit to merge partially covered fragments in pixels blocks at least as large a quad (at least a 2×2 block of pixels). An individual block (e.g., a quad) may be partially covered by two or more different primitives (e.g., typically triangles). Instead of shading the same block multiple times, there is a benefit to merging partially covered blocks that satisfy one or more merge tests to reduce the number of shading threads.
A rasterization setup stage receives vertex attributes. The output of the pre-rasterization setup stage goes into a rasterizer 104. The merge unit 102 receives an output of the rasterizer 104, a draw call enable signal 106, and a graphics state 108 associated with a draw call. The output 190 of the quad merge unit is provided to a shader 195 and includes information on merged blocks. An optional attribute setup output may be included. The merging unit 102 may be implemented in graphics hardware, firmware, or software.
A draw call is a block of rendering primitives in a graphics system that has the same graphics state. Images are generally generated from a set of state changes and draw calls. Within a draw call, the draw call data is accumulated and merge decisions are made. In one embodiment support may be provided to accumulate data for groups of primitives or to provide a moving window of primitives and covered blocks (e.g., quads) for accumulating rendered and merged data.
The graphics state for a draw call will use the same textures and set of attributes, such as the varying variables (hereinafter abbreviates as “Vv”) of the OpenGL® graphis language. The merge unit 102 performs merging only within draw calls. In one embodiment the merging may be enabled or disabled on a draw call basis via draw call enable 106 based on application requirements or user requirements.
Software control may be provided to enable or disable merging via the draw call enable 106. In one embodiment the merging may be enabled in a power saving mode or upon the detection of whether or not particular pixel lighting computations could beneficially utilize the merging. For example, specular lighting is a case in which merging may create image artifacts such that it may be desirable to turn merging off for specular lighting. Merging may also be selected based on the primitive or object type to be shaded. As an example, merging should be disabled for sprites and lines.
The merge unit 102 performs one or more merge tests. To support the merge tests for depth related tests, the merge unit 102 may perform or otherwise receive inputs of an early Z compare element 132. In one embodiment attribute setup is performed outside of the merge process and may be delayed until after merging is completed. The merge tests may include testing whether primitives overlap and have a common face 122; a depth slope test 124; which may be indicative of a level of detail (LOD) error 126 by testing objects depth slopes (e.g., in X & Y) and disabling merging when the slope is too great or too different, preventing an LOD problem; and an interpolation error test 128 to prevent merging between primitives that are not adjacent.
FIG. 2 shows an example of a quad merge unit 202 in accordance with an embodiment of the present invention. The quad merge unit 202 includes an input from an optional shared edge detection unit 203, an attribute setup unit 205, an early Z unit 207, a merge testing unit 209, a quad accumulation unit 211, a merge mapping unit 213, and a flush to shaders output 215.
In one embodiment the shared edge detection unit 203 detects when edges between primitives are shared and only enables filtering on edges that are exactly shared. For example, the vertex indices within the vertex data array or arrays may be used to identify shared edges between primitives.
In one embodiment, at the start of a draw call, the quad accumulation unit 211 and merge mapping unit storage 213 are empty. The primitives associated with a draw call are rasterized into quads, with live coverage by the rasterizer hardware (not shown in FIG. 2). The attribute setup unit 205 performs interpolation setup (e.g., plane equation computations) to compute the required depth plane equation for each primitive, which in turn permits the depth of each sample within each primitive to be computed. It will be understood, however, that the attribute setup unit could alternatively implement a barycentric interpolation.
Quads that have coverage within primitives are generated by rasterization and passed through to early Z testing in the early Z unit 207. Z/depth values are computed for each sample that the early Z unit tests. Quads with surviving coverage after early Z testing are sent to the merge testing unit 213, which performs the required merge tests. Those quads that pass the merge tests with remaining coverage are stored in the quad accumulation unit storage 211.
In one embodiment mapping information is stored in the merge mapping element storage 213. This mapping information is also used by the merge testing unit 209.
In one embodiment merge testing is applied to each incoming quad with partial coverage, where a quad is partially covered when the quad is not fully covered, but includes at least 1 live sample. When quads are merged, the results are used to modify the quad accumulation unit storage 211 and the merge mapping unit storage 213.
The results of the quad accumulation are flushed to the shaders. As quad merging is performed on a draw call basis, the flushing may be performed before processing a new draw call, such as at the end of a current draw call or at any time when storage has become too full. In one embodiment, when a draw call is complete or if any structure fills up, primitives (e.g., plane equations and primitive face information) and quads are flushed to the shader. In one embodiment, plane equations are flushed in a synchronous manner with quad data so that an interpolator has access to the plane equations when running a pixel shader for the primitives. Data regulation may be provided so that the buffers in the shader and interpolator are not over-filled.
In one embodiment merging is never performed between different draw calls. The primitives of a particular draw call have a common graphics state, which may include common textures, common shader programs, and attribute variables. In many cases the edges within a draw call are edges that are shared between primitives that are internal to a rendered object within the image (non-silhouette) edges. Quads enclosing internal edges are conventionally rendered twice in pixel shaders. However, on average, 50% of the pixels in these types of quads enclosing internal edges are “helper pixels” that are typically only required so that texture hardware can compute a level of detail. Additionally, overlap between primitives within a draw call is often very rare. Performing quad merging for a draw call permits helper pixels to be removed when pixels from the adjacent primitives are packed together in shared quads. Additionally, texture accesses and variables (such as Vv) are shared between adjacent primitives within a draw call. Merging coverage from one quad into another permits shading one quad instead of two quads. Thus if the coverage of non-overlapping primitives at a quad position can be merged, 50% of these pixel shader threads (on edges) are saved (i.e., 1 of 2 quads).
FIG. 3 is a flowchart illustrating additional aspects of merging. In one embodiment input quads (or larger blocks) are filtered as follows.
A merge enabled test is performed in decision block 305. If merging is disabled, then the process writes data into the quad buffer. No modification is performed of the partial coverage map when merging is disabled.
If merging is enabled, then a test is made in decision block 310 whether the quad is fully covered. If the quad is fully covered, that data is written into the quad buffer. No modification is made of the partial coverage map and no merge is performed.
If merging is enabled and the block is not fully covered then the process moves to the lookup the position block 315. The partial coverage map 320 is read to test if an existing quad is partially covered at this quad position in decision block 325. If an existing quad is not partially covered at this quad position, a write is performed into the quad circular buffer 330 and a write performed to the partial coverage map 320. If an existing quad is partially covered at the quad position, then an overlap test is performed in block 335. In one implementation the test for overlap is (input_coverage & stored_coverage) !=0. An additional further merge qualification test 336 may also be included after the overlap test 335. If there is an overlap and any further merge qualification test passes, then merging 338 is not performed and the data is written into the quad circular buffer 330. In one option the partial coverage map 320 is not modified; in another option the partial coverage map points to the new quad. If there is a different face, merging 338 is not performed and data is written into the quad circular buffer 330 without modifying the partial coverage map 320. In one embodiment other merge tests may also be performed.
If the merging 338 is successful, then a step is performed to overwrite a merged (stored or input) quad buffer entry with combined coverage: (input_coverage|stored_coverage). If the combined coverage is full coverage, then the partial coverage map entry is erased. When coverage is merged, either the stored coverage or the input coverage becomes zero because coverage is migrated from one quad to another.
Options may be provided to merge up to a maximum number of primitives (e.g., 2).
When merging 338 is completed, the buffer data is flushed to the pixel shader(s) along with plane equation data. In one embodiment when a draw call is complete or if any structure fills up, primitives (plane equations) and quads are flushed to the shaders. The plane equations (or equivalent interpolation data) are flushed in a synchronous manner with quad data so an interpolator has access to the plane equations when running the pixel shader(s) for the primitives.
FIG. 4 shows a set of quads and two primitives (triangles 1 and 2) having a shared interior edge. Each quad has four pixel centers. As an illustrative example, the quads in triangle 1 may have a blue color and the quads in triangle 2 a yellow color. In quad (1, 1) (row 1, column 1), the left side pixels from triangle 1 can be merged with the pixels from triangle 2 as in quad (2, 1). In quads (3, 1) and (4, 1) the quads are merged into triangle 1.
FIG. 5 illustrates approximation of Vv values for textures. The most common usage for Vv values is to access textures. The Vv values could, in theory, e calculated exactly. However, in practice there are benefits to approximating values by interpolation. The Vv values can be approximated. The approximation error is a function of the Vv slope differences between the 2 primitives and the distance from the edge. If only 1 or 2 pixels are moved to the adjacent triangle, the distance from the edge is normally small (<=sqrt(2)). For this situation, texel differences should be indistinguishable. In a typical application it would generally be the case that only 1 or 2 texels along edge will be affected. The affects should thus be small and basically invisible to an ordinary user. However, there may be significant visible errors if one or both slopes are very large. This normally occurs when the depth slope is large. Large depth slope is inexpensive to test for and may be used to disable merging for a particular primitive. However, if the slopes are close enough then 1 or 2 pixels will have small Vv error and, therefore, the same texels are accessed. An optional implementation detail is to compare the slopes of the Vv directly but this requires a more expensive and complex implementation.
FIG. 6 illustrates issues associated with specular lighting. The primitives each have surface normals. If normal values are significantly different from the correct values at particular pixels, specular highlights can be very different at those pixels because of the power function used in specular lighting. This is because specular lighting uses primitive or pixel level normals. The degree to which artifacts occur for specular lighting depends on various factors. If Phong shading is used, which has per primitive normals, this could result in artifacts from merging. If interpolated normals (per pixel normals) are used, visible artifacts are probably minimal. Similarly, if normal maps (per pixel normals) are used, then there will probably be minimal visible artifacts.
While an exemplary quad fragment merging process has been described, the basic approach works for larger blocks in a primitive as well, as long as the blocks are fully covered. For example, if rasterization creates aligned 4×4 or 8×8 blocks of pixels and they are fully covered, these can be stored more efficiently as larger blocks and this also improves handling.
An exemplary algorithm for disabling merging when the depth slope is too large, resulting in LOD artifacts is now described. The LOD is normally not critical because it is computed using a log function. However, the LOD can change quickly and artifacts can occur when the depth (Z) slope of the primitive is very high. Additionally Vv slopes may change rapidly and approximation errors (Vv approximation) will be larger. Exemplary formulas for disabling merging when the depth slope is high may be based on analyzing the derivatives of depth (Z) with respect to x and y, such as by having the sum of derivatives in each x and y being greater than a threshold or each individual derivative in x and y being greater than a threshold:
(dz/dx+dz/dy)>threshold or
(dz/dx>threshold)∥(dz/dy)>threshold
An exemplary interpolation error merge test is now described. In one embodiment the slopes of Vv and/or 1/W are used to estimate which quad will have a lower interpolation error. For example, a threshold in the depth slope may be used to define an interpolation error merge test. If the depth (Z) slopes of 2 primitives differ by a lot then the Vv slopes may differ a lot across the edge between them. In one embodiment merging is disabled when the difference between slopes of z in x, and y for two primitives (having z values z1 and z2) is greater than a threshold:
(((dz1/dx+dz1/dy)−(dz2/dx+dz2/dy))>threshold)
While the invention has been described in conjunction with specific embodiments, it will be understood that it is not intended to limit the invention to the described embodiments. On the contrary, it is intended to cover alternatives, modifications, and equivalents as may be included within the spirit and scope of the invention as defined by the appended claims. The present invention may be practiced without some or all of these specific details. In addition, well known features may not have been described in detail to avoid unnecessarily obscuring the invention. In accordance with the present invention, the components, process steps, and/or data structures may be implemented using various types of operating systems, programming languages, computing platforms, computer programs, and/or general purpose machines. In addition, those of ordinary skill in the art will recognize that devices of a less general purpose nature, such as hardwired devices, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), or the like, may also be used without departing from the scope and spirit of the inventive concepts disclosed herein. The present invention may also be tangibly embodied as a set of computer instructions stored on a computer readable medium, such as a memory device.

Claims (26)

What is claimed is:
1. A method of performing coverage merging in a shading stage of a graphics system, comprising:
performing a draw call on primitives and rasterizing the primitives into quad blocks of four pixels;
selecting the draw call for merge testing of individual quad blocks;
in response to at least one merge test being satisfied, merging partially covered fragments of the same draw call of a block of pixels to form a merged block of pixels; and
performing shading of the merged block of pixels on a draw call basis;
wherein merging is disabled for an individual draw call based on detecting a specular lighting condition.
2. The method of claim 1, wherein, the merge test is a quad merge test, and shading is performed on a merged quad.
3. The method of claim 1, wherein a sequence of draw calls is performed with merging being performed only for blocks of pixels of the same draw call.
4. The method of claim 1, wherein a graphics state of the draw call includes a shared texture access and a set of variables.
5. The method of claim 4, further comprising enabling quad merging based on the graphics state.
6. The method of claim 1, wherein the at least one merge test comprises a depth slope test to disable merging for a Z slope condition indicative of a level of detail error.
7. The method of claim 1 wherein the at least one merge test for partially covered fragments comprises:
performing a test to exclude overlapping fragments; and
performing a test to exclude fragments from different faces.
8. The method of claim 7, further comprising performing a depth slope or depth error test to disable merging for a Z slope condition indicative of a level of detail error.
9. The method of claim 1, further comprising performing interpolation of variables of equations describing primitives and the at least one merge test includes a threshold test of estimated Z interpolation errors.
10. The method of claim 1, wherein the specular lighting condition comprises Phong shading.
11. The method of claim 1, wherein the specular lighting condition comprises a per primitive surface normal.
12. A method of performing quad merging in a graphics system, comprising:
accumulating, for a draw call, data of rasterized primitives;
selecting the draw call for merge testing of individual quad blocks wherein merging is disabled for an individual draw call based on detecting a specular lighting condition;
computing a depth of each sample within each primitive of the draw call using interpolation;
performing early Z testing for the primitives;
performing, for the rasterized primitives of the draw call, quad merging for at least one quad location of partially covered quad coverage satisfying a set of merge tests that excludes overlapping primitives and excludes primitives having different faces; and
performing shading of at least one merged quad of the draw call.
13. The method of claim 12, further comprising performing a depth slope test and disabling merging for a Z slope indicative of a level of detail error.
14. The method of claim 12 further comprising interpolating variables of equations describing primitives and performing a merge test based on estimate of interpolation errors.
15. The method of claim 12, wherein the specular lighting condition comprises Phong shading.
16. The method of claim 12, wherein the specular lighting condition comprises a per primitive surface normal.
17. A graphics system, comprising:
a graphics processing unit (GPU) including a quad merge unit to perform merging on quads that have coverage within primitives,
wherein the quad merge unit merges coverage from one quad into another, and performs merge testing for quads on a selectable draw call basis wherein merging is disabled for an individual draw call based on detecting a specular lighting condition.
18. The graphics system of claim 17, wherein the merge testing includes an overlap test and a different face test.
19. The graphics system of claim 17, wherein the merge testing comprises a depth slope or depth error test to disable merging for a Z slope condition indicative of a level of detail error.
20. The graphics system of claim 17, wherein the merge testing includes an interpolation error test.
21. The graphics system of claim 17, wherein the draw call includes a graphics state.
22. The system of claim 17, wherein the specular lighting condition comprises Phong shading.
23. The system of claim 17, wherein the specular lighting condition comprises a per primitive surface normal.
24. A method of performing coverage merging in a shading stage of a graphics system, comprising:
performing a draw call on primitives and rasterizing the primitives into quad blocks of four pixels;
selecting the draw call for merge testing of individual quad blocks;
in response to at least one merge test being satisfied, merging partially covered fragments of the same draw call of a block of pixels to form a merged block of pixels; and
performing shading of the merged block of pixels on a draw call basis;
wherein merging is disabled for an individual draw call for a sprite or a line.
25. A method of performing quad merging in a graphics system, comprising:
accumulating, for a draw call, data of rasterized primitives;
selecting the draw call for merge testing of individual quad blocks wherein merging is disabled for an individual draw call corresponding to a sprite or a line;
computing a depth of each sample within each primitive of the draw call using interpolation;
performing early Z testing for the primitives;
performing, for the rasterized primitives of the draw call, quad merging for at least one quad location of partially covered quad coverage satisfying a set of merge tests that excludes overlapping primitives and excludes primitives having different faces; and
performing shading of at least one merged quad of the draw call.
26. A graphics system, comprising:
a graphics processing unit (GPU) including a quad merge unit to perform merging on quads that have coverage within primitives,
wherein the quad merge unit merges coverage from one quad into another, and performs merge testing for quads on a selectable draw call basis wherein merging is disabled for an individual draw call for a sprite or a line.
US14/671,467 2014-06-27 2015-03-27 Elimination of minimal use threads via quad merging Active US9721376B2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US14/671,467 US9721376B2 (en) 2014-06-27 2015-03-27 Elimination of minimal use threads via quad merging
KR1020150085149A KR102392060B1 (en) 2014-06-27 2015-06-16 Shading method and system via quad merging
US15/633,702 US9972124B2 (en) 2014-06-27 2017-06-26 Elimination of minimal use threads via quad merging

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201462018040P 2014-06-27 2014-06-27
US14/671,467 US9721376B2 (en) 2014-06-27 2015-03-27 Elimination of minimal use threads via quad merging

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/633,702 Continuation-In-Part US9972124B2 (en) 2014-06-27 2017-06-26 Elimination of minimal use threads via quad merging

Publications (2)

Publication Number Publication Date
US20150379764A1 US20150379764A1 (en) 2015-12-31
US9721376B2 true US9721376B2 (en) 2017-08-01

Family

ID=55165452

Family Applications (1)

Application Number Title Priority Date Filing Date
US14/671,467 Active US9721376B2 (en) 2014-06-27 2015-03-27 Elimination of minimal use threads via quad merging

Country Status (2)

Country Link
US (1) US9721376B2 (en)
KR (1) KR102392060B1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11748933B2 (en) 2020-08-03 2023-09-05 Samsung Electronics Co., Ltd. Method for performing shader occupancy for small primitives
US11798218B2 (en) 2020-08-03 2023-10-24 Samsung Electronics Co., Ltd. Methods and apparatus for pixel packing

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10223809B2 (en) * 2016-05-27 2019-03-05 Intel Corporation Bandwidth-efficient lossy fragment color compression of multi-sample pixels
US10152819B2 (en) 2016-08-15 2018-12-11 Microsoft Technology Licensing, Llc Variable rate shading
CN106373199B (en) * 2016-08-31 2019-05-14 中测新图(北京)遥感技术有限责任公司 A kind of oblique photograph building model rapid extracting method
US10147227B2 (en) 2017-02-17 2018-12-04 Microsoft Technology Licensing, Llc Variable rate shading
US11043028B2 (en) * 2018-11-02 2021-06-22 Nvidia Corporation Reducing level of detail of a polygon mesh to decrease a complexity of rendered geometry within a scene
US10657699B1 (en) * 2018-12-08 2020-05-19 Arm Limited Performing texturing operations for sets of plural execution threads in graphics processing systems

Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5886701A (en) 1995-08-04 1999-03-23 Microsoft Corporation Graphics rendering device and method for operating same
US6518965B2 (en) 1998-04-27 2003-02-11 Interactive Silicon, Inc. Graphics system and method for rendering independent 2D and 3D objects using pointer based display list video refresh operations
US20030076331A1 (en) * 2001-10-23 2003-04-24 Deering Michael F. Relative coordinates for triangle rendering
US6633299B1 (en) 2000-01-10 2003-10-14 Intel Corporation Method and apparatus for implementing smart allocation policies for a small frame buffer cache serving 3D and 2D streams
US6687396B1 (en) 1998-07-29 2004-02-03 Pentax Corporation Optical member inspection apparatus, image-processing apparatus, image-processing method, and computer readable medium
US6697063B1 (en) * 1997-01-03 2004-02-24 Nvidia U.S. Investment Company Rendering pipeline
US6762765B2 (en) 2001-12-31 2004-07-13 Intel Corporation Bandwidth reduction for zone rendering via split vertex buffers
US6831658B2 (en) 2002-07-22 2004-12-14 Sun Microsystems, Inc. Anti-aliasing interlaced video formats for large kernel convolution
US20050046628A1 (en) * 2003-06-26 2005-03-03 Intel Corporation Methods, systems, and data structures for generating a rasterizer
US6943791B2 (en) 2002-03-11 2005-09-13 Sun Microsystems, Inc. Z-slope test to optimize sample throughput
US20050212806A1 (en) 2002-05-10 2005-09-29 Metod Koselj Graphics engine converting individual commands to spatial image information, and electrical device and memory incorporating the graphics engine
US7064771B1 (en) * 1999-04-28 2006-06-20 Compaq Information Technologies Group, L.P. Method and apparatus for compositing colors of images using pixel fragments with Z and Z gradient parameters
US20070139440A1 (en) 2005-12-19 2007-06-21 Crow Franklin C Method and system for rendering polygons having abutting edges
US20080150950A1 (en) * 2006-12-04 2008-06-26 Arm Norway As Method of and apparatus for processing graphics
EP1988508A1 (en) 2007-05-01 2008-11-05 Qualcomm Incorporated Universal rasterization of graphic primitives
US8237738B1 (en) 2006-11-02 2012-08-07 Nvidia Corporation Smooth rasterization of polygonal graphics primitives
US20120281004A1 (en) 2011-05-02 2012-11-08 Shebanow Michael C Coverage caching
US8508544B1 (en) 2006-11-02 2013-08-13 Nvidia Corporation Small primitive detection to optimize compression and decompression in a graphics processor
US8547395B1 (en) 2006-12-20 2013-10-01 Nvidia Corporation Writing coverage information to a framebuffer in a computer graphics system
US20140063012A1 (en) 2012-08-30 2014-03-06 Qualcomm Incorporated Computation reduced tessellation
US20150170410A1 (en) * 2013-12-17 2015-06-18 Rahul P. Sathe Reducing Shading by Merging Fragments from the Adjacent Primitives

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6771264B1 (en) * 1998-08-20 2004-08-03 Apple Computer, Inc. Method and apparatus for performing tangent space lighting and bump mapping in a deferred shading graphics processor

Patent Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5886701A (en) 1995-08-04 1999-03-23 Microsoft Corporation Graphics rendering device and method for operating same
US6697063B1 (en) * 1997-01-03 2004-02-24 Nvidia U.S. Investment Company Rendering pipeline
US6518965B2 (en) 1998-04-27 2003-02-11 Interactive Silicon, Inc. Graphics system and method for rendering independent 2D and 3D objects using pointer based display list video refresh operations
US6687396B1 (en) 1998-07-29 2004-02-03 Pentax Corporation Optical member inspection apparatus, image-processing apparatus, image-processing method, and computer readable medium
US7064771B1 (en) * 1999-04-28 2006-06-20 Compaq Information Technologies Group, L.P. Method and apparatus for compositing colors of images using pixel fragments with Z and Z gradient parameters
US6633299B1 (en) 2000-01-10 2003-10-14 Intel Corporation Method and apparatus for implementing smart allocation policies for a small frame buffer cache serving 3D and 2D streams
US20030076331A1 (en) * 2001-10-23 2003-04-24 Deering Michael F. Relative coordinates for triangle rendering
US6762765B2 (en) 2001-12-31 2004-07-13 Intel Corporation Bandwidth reduction for zone rendering via split vertex buffers
US6943791B2 (en) 2002-03-11 2005-09-13 Sun Microsystems, Inc. Z-slope test to optimize sample throughput
US20050212806A1 (en) 2002-05-10 2005-09-29 Metod Koselj Graphics engine converting individual commands to spatial image information, and electrical device and memory incorporating the graphics engine
US6831658B2 (en) 2002-07-22 2004-12-14 Sun Microsystems, Inc. Anti-aliasing interlaced video formats for large kernel convolution
US20050046628A1 (en) * 2003-06-26 2005-03-03 Intel Corporation Methods, systems, and data structures for generating a rasterizer
US20070139440A1 (en) 2005-12-19 2007-06-21 Crow Franklin C Method and system for rendering polygons having abutting edges
US8237738B1 (en) 2006-11-02 2012-08-07 Nvidia Corporation Smooth rasterization of polygonal graphics primitives
US8508544B1 (en) 2006-11-02 2013-08-13 Nvidia Corporation Small primitive detection to optimize compression and decompression in a graphics processor
US20080150950A1 (en) * 2006-12-04 2008-06-26 Arm Norway As Method of and apparatus for processing graphics
US8547395B1 (en) 2006-12-20 2013-10-01 Nvidia Corporation Writing coverage information to a framebuffer in a computer graphics system
EP1988508A1 (en) 2007-05-01 2008-11-05 Qualcomm Incorporated Universal rasterization of graphic primitives
US20120281004A1 (en) 2011-05-02 2012-11-08 Shebanow Michael C Coverage caching
US20140063012A1 (en) 2012-08-30 2014-03-06 Qualcomm Incorporated Computation reduced tessellation
US20150170410A1 (en) * 2013-12-17 2015-06-18 Rahul P. Sathe Reducing Shading by Merging Fragments from the Adjacent Primitives

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Fatahalian et al., "Reducing Shading on GPUs using Quad-Fragment Merging," ACM Transactions on Graphics, vol. 29, No. 4, Article 67, Jul. 2010.

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11748933B2 (en) 2020-08-03 2023-09-05 Samsung Electronics Co., Ltd. Method for performing shader occupancy for small primitives
US11798218B2 (en) 2020-08-03 2023-10-24 Samsung Electronics Co., Ltd. Methods and apparatus for pixel packing

Also Published As

Publication number Publication date
KR102392060B1 (en) 2022-04-28
US20150379764A1 (en) 2015-12-31
KR20160001641A (en) 2016-01-06

Similar Documents

Publication Publication Date Title
US9721376B2 (en) Elimination of minimal use threads via quad merging
US20230351678A1 (en) Hidden culling in tile-based computer generated images
EP3032499B1 (en) Apparatus and method for rendering
US9558585B2 (en) Hidden surface removal in graphics processing systems
KR101140460B1 (en) Tile based graphics rendering
KR100866573B1 (en) A point-based rendering method using visibility map
US9965876B2 (en) Method and apparatus for graphics processing of a graphics fragment
US11393165B2 (en) Method and system for multisample antialiasing
US10043306B2 (en) Using depth data in a graphics processing system
US8044956B1 (en) Coverage adaptive multisampling
US20090195552A1 (en) Methods of and apparatus for processing computer graphics
US9519982B2 (en) Rasterisation in graphics processing systems
US7502035B1 (en) Apparatus, system, and method for multi-sample pixel coalescing
US7027047B2 (en) 3D graphics rendering engine for processing an invisible fragment and a method therefor
GB2525666A (en) Graphics processing systems
US9972124B2 (en) Elimination of minimal use threads via quad merging
US20160093088A1 (en) Graphics processing systems
US11049216B1 (en) Graphics processing systems
US9916675B2 (en) Graphics processing systems
KR20180037838A (en) Method and apparatus for processing texture
US7920148B2 (en) Post-rendering anti-aliasing with a smoothing filter

Legal Events

Date Code Title Description
AS Assignment

Owner name: SAMSUNG ELECTRONICS CO., LTD., KOREA, REPUBLIC OF

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LENTZ, DEREK J.;WOO, SANG OAK;SIGNING DATES FROM 20150316 TO 20150323;REEL/FRAME:035277/0704

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4