Introduction to 3D Game Programming with DirectX 12 (Computer Science) (2016)

Part   2


Chapter   7


This chapter introduces a number of drawing patterns that we will use throughout the rest of this book. The chapter begins by introducing a drawing optimization, which we refer to as “frame resources.” With frame resources, we modify our render loop so that we do not have to flush the command queue every frame; this improves CPU and GPU utilization. Next we introduce the concept of a render item and explain how we divide up our constant data based on update frequency. In addition, we examine root signatures in more detail and learn about the other root parameter types: root descriptors and root constants. Finally, we show how to draw some more complicated objects; by the end of this chapter, you will be able to draw a surface that resembles hills and valleys, cylinders, spheres, and an animated wave simulation.


1.    To understand a modification to our rendering process that does not require us to flush the command queue every frame, thereby improving performance.

2.    To learn about the two other types of root signature parameter types: root descriptors and root constants.

3.    To discover how to procedurally generate and draw common geometric shapes like grids, cylinders, and spheres.

4.    To find out how we can animate vertices on the CPU and upload the new vertex positions to the GPU using dynamic vertex buffers.


Recall from §4.2 that the CPU and GPU work in parallel. The CPU builds and submits command lists (in addition to other CPU work) and the GPU processes commands in the command queue. The goal is to keep both CPU and GPU busy to take full advantage of the hardware resources available on the system. So far in our demos, we have been synchronizing the CPU and GPU once per frame. Two examples of why this is necessary are:

1.    The command allocator cannot be reset until the GPU is finished executing the commands. Suppose we did not synchronize so that the CPU could continue on to the next frame n+1 before the GPU has finished processing the current frame n: If the CPU resets the command allocator in frame n+1, but the GPU is still processing commands from frame n, then we would be clearing the commands the GPU is still working on.

2.    A constant buffer cannot be updated by the CPU until the GPU has finished executing the drawing commands that reference the constant buffer. This example corresponds to the situation described in §4.2.2 and Figure 4.7. Suppose we did not synchronize so that the CPU could continue on to the next frame + 1 before the GPU has finished processing the current frame n: If the CPU overwrites the constant buffer data in frame n+1, but the GPU has not yet executed the draw call that references the constant buffer in frame n, then the constant buffer contains the wrong data for when the GPU executes the draw call for frame n.

Thus we have been calling D3DApp::FlushCommandQueue at the end of every frame to ensure the GPU has finished executing all the commands for the frame. This solution works but is inefficient for the following reasons:

1.    At the beginning of a frame, the GPU will not have any commands to process since we waited to empty the command queue. It will have to wait until the CPU builds and submits some commands for execution.

2.    At the end of a frame, the CPU is waiting for the GPU to finish processing commands.

So every frame, the CPU and GPU are idling at some point.

One solution to this problem is to create a circular array of the resources the CPU needs to modify each frame. We call such resources frame resources, and we usually use a circular array of three frame resource elements. The idea is that for frame n, the CPU will cycle through the frame resource array to get the next available (i.e., not in use by GPU) frame resource. The CPU will then do any resource updates, and build and submit command lists for frame n while the GPU works on previous frames. The CPU will then continue on to frame n+1 and repeat. If the frame resource array has three elements, this lets the CPU get up to two frames ahead of the GPU, ensuring that the GPU is kept busy. Below is an example of the frame resource class we use for the “Shapes” demo in this chapter. Because the CPU only needs to modify constant buffers in this demo, the frame resource class only contains constant buffers.

// Stores the resources needed for the CPU to build the command lists

// for a frame. The contents here will vary from app to app based on

// the needed resources.

struct FrameResource



  FrameResource(ID3D12Device* device, UINT passCount,  UINT objectCount);

  FrameResource(const FrameResource& rhs) = delete;

  FrameResource& operator=(const FrameResource& rhs) = delete;


  // We cannot reset the allocator until the GPU is done processing the 

  // commands. So each frame needs their own allocator.

  Microsoft::WRL::ComPtr<ID3D12CommandAllocator> CmdListAlloc;

  // We cannot update a cbuffer until the GPU is done processing the

  // commands that reference it. So each frame needs their own cbuffers.

  std::unique_ptr<UploadBuffer<PassConstants>> PassCB = nullptr;

  std::unique_ptr<UploadBuffer<ObjectConstants>> ObjectCB = nullptr;

  // Fence value to mark commands up to this fence point. This lets us

  // check if these frame resources are still in use by the GPU.

  UINT64 Fence = 0;


FrameResource::FrameResource(ID3D12Device* device, UINT passCount, UINT 






  PassCB = std::make_unique<UploadBuffer<PassConstants>>(device, passCount, true);

  ObjectCB = std::make_unique<UploadBuffer<ObjectConstants>>(device, objectCount, true);


FrameResource::˜FrameResource() { }

Our application class will then instantiate a vector of three frame resources, and keep member variables to track the current frame resource:

static const int NumFrameResources = 3;

std::vector<std::unique_ptr<FrameResource>> mFrameResources;

FrameResource* mCurrFrameResource = nullptr;

int mCurrFrameResourceIndex = 0;

void ShapesApp::BuildFrameResources()


  for(int i = 0; i < gNumFrameResources; ++i)



      md3dDevice.Get(), 1, (UINT)mAllRitems.size()));



Now, for CPU frame n, the algorithm works like so:

void ShapesApp::Update(const GameTimer& gt)


  // Cycle through the circular frame resource array.

  mCurrFrameResourceIndex = (mCurrFrameResourceIndex + 1) % NumFrameResources;

  mCurrFrameResource = mFrameResources[mCurrFrameResourceIndex];

  // Has the GPU finished processing the commands of the current frame 

  // resource. If not, wait until the GPU has completed commands up to

  // this fence point.

  if(mCurrFrameResource->Fence != 0 && 

    mCommandQueue->GetLastCompletedFence() < mCurrFrameResource->Fence)


    HANDLE eventHandle = CreateEventEx(nullptr, false, false, EVENT_ALL_ACCESS);


      mCurrFrameResource->Fence, eventHandle));

    WaitForSingleObject(eventHandle, INFINITE);



  // […] Update resources in mCurrFrameResource (like cbuffers).


void ShapesApp::Draw(const GameTimer& gt)

  // […] Build and submit command lists for this frame.

  // Advance the fence value to mark commands up to this fence point.

  mCurrFrameResource->Fence = ++mCurrentFence;

  // Add an instruction to the command queue to set a new fence point. 

  // Because we are on the GPU timeline, the new fence point won’t be 

  // set until the GPU finishes processing all the commands prior to

  // this Signal().

  mCommandQueue->Signal(mFence.Get(), mCurrentFence);

  // Note that GPU could still be working on commands from previous

  // frames, but that is okay, because we are not touching any frame

  // resources associated with those frames. 


Note that this solution does not prevent waiting. If one processor is processing frames much faster than the other, one processor will eventually have to wait for the other to catch up, as we cannot let one get too far ahead of the other. If the GPU is processing commands faster than the CPU can submit work, then the GPU will idle. In general, if we are trying to push the graphical limit, we want to avoid this situation, as we are not taking full advantage of the GPU. On the other hand, if the CPU is always processing frames faster than the GPU, then the CPU will have to wait at some point. This is the desired situation, as the GPU is being fully utilized; the extra CPU cycles can always be used for other parts of the game such as AI, physics, and game play logic.

So if multiple frame resources do not prevent any waiting, how does it help us? It helps us keep the GPU fed. While the GPU is processing commands from frame n, it allows the CPU to continue on to build and submit commands for frames n+1 and n+2. This helps keep the command queue nonempty so that the GPU always has work to do.


Drawing an object requires setting multiple parameters such as binding vertex and index buffers, binding object constants, setting a primitive type, and specifying the DrawIndexedInstanced parameters. As we begin to draw more objects in our scenes, it is helpful to create a lightweight structure that stores the data needed to draw an object; this data will vary from app to app as we add new features which will require different drawing data. We call the set of data needed to submit a full draw call the rendering pipeline a render item. For this demo, our RenderItem structure looks like this:

// Lightweight structure stores parameters to draw a shape. This will

// vary from app-to-app.

struct RenderItem


  RenderItem() = default;

  // World matrix of the shape that describes the object’s local space

  // relative to the world space, which defines the position, 

  // orientation, and scale of the object in the world.

  XMFLOAT4X4 World = MathHelper::Identity4x4();

  // Dirty flag indicating the object data has changed and we need 

  // to update the constant buffer. Because we have an object 

  // cbuffer for each FrameResource, we have to apply the

  // update to each FrameResource. Thus, when we modify obect data we

  // should set

  //  NumFramesDirty = gNumFrameResources so that each frame resource

  // gets the update.

  int NumFramesDirty = gNumFrameResources;

  // Index into GPU constant buffer corresponding to the ObjectCB 

  // for this render item.

  UINT ObjCBIndex = -1;

  // Geometry associated with this render-item. Note that multiple

  // render-items can share the same geometry.

  MeshGeometry* Geo = nullptr;

  // Primitive topology.


  // DrawIndexedInstanced parameters.

  UINT IndexCount = 0;

  UINT StartIndexLocation = 0;

  int BaseVertexLocation = 0;


Our application will maintain lists of render items based on how they need to be drawn; that is, render items that need different PSOs will be kept in different lists.

// List of all the render items.

std::vector<std::unique_ptr<RenderItem>> mAllRitems;

// Render items divided by PSO.

std::vector<RenderItem*> mOpaqueRitems;

std::vector<RenderItem*> mTransparentRitems;


Observe from the previous section that we introduced a new constant buffer in our FrameResource class:

std::unique_ptr<UploadBuffer<PassConstants>> PassCB = nullptr;

In the demos going forward, this buffer stores constant data that is fixed over a given rendering pass such as the eye position, the view and projection matrices, and information about the screen (render target) dimensions; it also includes game timing information, which is useful data to have access to in shader programs. Note that our demos will not necessarily use all this constant data, but it is convenient to have available, and there is little cost providing the extra data. For example, while we do not need the render target sizes now, when we go to implement some post processing effect, having that information will be needed.

cbuffer cbPass : register(b1)


  float4x4 gView;

  float4x4 gInvView;

  float4x4 gProj;

  float4x4 gInvProj;

  float4x4 gViewProj;

  float4x4 gInvViewProj;

  float3 gEyePosW;

  float cbPerObjectPad1;

  float2 gRenderTargetSize;

  float2 gInvRenderTargetSize;

  float gNearZ;

  float gFarZ;

  float gTotalTime;

  float gDeltaTime;


We have also modified our per object constant buffer to only store constants that are associated with an object. So far, the only constant data we associate with an object for drawing is its world matrix:

cbuffer cbPerObject : register(b0)


   float4x4 gWorld; 


The idea of these changes is to group constants based on update frequency. The per pass constants only need to be updated once per rendering pass, and the object constants only need to change when an object’s world matrix changes. If we had a static object in the scene, like a tree, we only need to set its world matrix once to a constant buffer and then never update the constant buffer again. In our demos, we implement the following methods to handle updating the per pass and per object constant buffers. These methods are called once per frame in the Update method.

void ShapesApp::UpdateObjectCBs(const GameTimer& gt)


  auto currObjectCB = mCurrFrameResource->ObjectCB.get();

  for(auto& e : mAllRitems)


    // Only update the cbuffer data if the constants have changed. 

    // This needs to be tracked per frame resource.

    if(e->NumFramesDirty > 0)


      XMMATRIX world = XMLoadFloat4x4(&e->World);

      ObjectConstants objConstants;

      XMStoreFloat4x4(&objConstants.World, XMMatrixTranspose(world));

      currObjectCB->CopyData(e->ObjCBIndex, objConstants);

      // Next FrameResource need to be updated too.





void ShapesApp::UpdateMainPassCB(const GameTimer& gt)


  XMMATRIX view = XMLoadFloat4x4(&mView);

  XMMATRIX proj = XMLoadFloat4x4(&mProj);

  XMMATRIX viewProj = XMMatrixMultiply(view, proj);

  XMMATRIX invView = XMMatrixInverse(&XMMatrixDeterminant(view), view);

  XMMATRIX invProj = XMMatrixInverse(&XMMatrixDeterminant(proj), proj);

  XMMATRIX invViewProj = XMMatrixInverse(&XMMatrixDeterminant(viewProj), viewProj);

  XMStoreFloat4x4(&mMainPassCB.View, XMMatrixTranspose(view));

  XMStoreFloat4x4(&mMainPassCB.InvView, XMMatrixTranspose(invView));

  XMStoreFloat4x4(&mMainPassCB.Proj, XMMatrixTranspose(proj));

  XMStoreFloat4x4(&mMainPassCB.InvProj, XMMatrixTranspose(invProj));

  XMStoreFloat4x4(&mMainPassCB.ViewProj, XMMatrixTranspose(viewProj));

  XMStoreFloat4x4(&mMainPassCB.InvViewProj, XMMatrixTranspose(invViewProj));

  mMainPassCB.EyePosW = mEyePos;

  mMainPassCB.RenderTargetSize = XMFLOAT2((float)mClientWidth, (float)mClientHeight);

  mMainPassCB.InvRenderTargetSize = XMFLOAT2(1.0f / mClientWidth, 1.0f / mClientHeight);

  mMainPassCB.NearZ = 1.0f;

  mMainPassCB.FarZ = 1000.0f;

  mMainPassCB.TotalTime = gt.TotalTime();

  mMainPassCB.DeltaTime = gt.DeltaTime();

  auto currPassCB = mCurrFrameResource->PassCB.get();

  currPassCB->CopyData(0, mMainPassCB);


We update our vertex shader accordingly to support these constant buffer changes:

VertexOut VS(VertexIn vin)


  VertexOut vout;

  // Transform to homogeneous clip space.

  float4 posW = mul(float4(vin.PosL, 1.0f), gWorld);

  vout.PosH = mul(posW, gViewProj);

  // Just pass vertex color into the pixel shader.

  vout.Color = vin.Color;

  return vout;


The extra vector-matrix multiplication per vertex this adjustment gives is negligible on modern GPUs, which have plenty of computation power.

The resources that our shaders expect have changed; therefore, we need to update the root signature accordingly to take two descriptor tables (we need two tables because the CBVs will be set at different frequencies—the per pass CBV only needs to be set once per rendering pass while the per object CBV needs to be set per render item):


cbvTable0.Init(D3D12_DESCRIPTOR_RANGE_TYPE_CBV, 1, 0);


cbvTable1.Init(D3D12_DESCRIPTOR_RANGE_TYPE_CBV, 1, 1);

// Root parameter can be a table, root descriptor or root constants.

CD3DX12_ROOT_PARAMETER slotRootParameter[2];

// Create root CBVs.

slotRootParameter[0].InitAsDescriptorTable(1, &cbvTable0);

slotRootParameter[1].InitAsDescriptorTable(1, &cbvTable1);

// A root signature is an array of root parameters.

CD3DX12_ROOT_SIGNATURE_DESC rootSigDesc(2, slotRootParameter, 0, nullptr, 




Do not go overboard with the number of constant buffers in your shaders. [Thibieroz13] recommends you keep them under five for performance.


In this section, we show how to create the geometry for ellipsoids, spheres, cylinders and cones. These shapes are useful for drawing sky domes, debugging, visualizing collision detection, and deferred rendering. For example, you might want to render all of your game characters as spheres for a debug test.

We put our procedural geometry generation code in the GeometryGenerator class (GeometryGenerator.h/.cpp). GeometryGenerator is a utility class for generating simple geometric shapes like grids, sphere, cylinders, and boxes, which we use throughout this book for our demo programs. This class generates the data in system memory, and we must then copy the data we want to our vertex and index buffers. GeometryGenerator creates some vertex data that will be used in later chapters. We do not need this data in our current demos, and so we do not copy this data into our vertex buffers. The MeshData structure is a simple structure nested inside GeometryGenerator that stores a vertex and index list:

class GeometryGenerator



  using uint16 = std::uint16_t;

  using uint32 = std::uint32_t;

  struct Vertex




      const DirectX::XMFLOAT3& p, 

      const DirectX::XMFLOAT3& n, 

      const DirectX::XMFLOAT3& t, 

      const DirectX::XMFLOAT2& uv) :






      float px, float py, float pz, 

      float nx, float ny, float nz,

      float tx, float ty, float tz,

      float u, float v) : 



      TangentU(tx, ty, tz), 


    DirectX::XMFLOAT3 Position;

    DirectX::XMFLOAT3 Normal;

    DirectX::XMFLOAT3 TangentU;

    DirectX::XMFLOAT2 TexC;


  struct MeshData


    std::vector<Vertex> Vertices;

    std::vector<uint32> Indices32;

    std::vector<uint16>& GetIndices16()





        for(size_t i = 0; i < Indices32.size(); ++i)

          mIndices16[i] = static_cast<uint16>(Indices32[i]);


      return mIndices16;



      std::vector<uint16> mIndices16;



7.4.1 Generating a Cylinder Mesh

We define a cylinder by specifying its bottom and top radii, its height, and the slice and stack count, as shown in Figure 7.1. We break the cylinder into three parts: 1) the side geometry, 2) the top cap geometry, and 3) the bottom cap geometry.


Figure 7.1.  In this illustration, the cylinder on the left has eight slices and four stacks, and the cylinder on the right has sixteen slices and eight stacks. The slices and stacks control the triangle density. Note that the top and bottom radii can differ so that we can create cone shaped objects, not just “pure” cylinders. Cylinder Side Geometry

We generate the cylinder centered at the origin, parallel to the y-axis. From Figure 7.1, all the vertices lie on the “rings” of the cylinder, where there are stackCount + 1 rings, and each ring has sliceCount unique vertices. The difference in radius between consecutive rings is Δr = (topRadius – bottomRadius)/stackCount. If we start at the bottom ring with index 0, then the radius of the ith ring is ri = bottomRadius + iΔr and the height of the ith ring is image

where Δh is the stack height and h is the cylinder height. So the basic idea is to iterate over each ring, and generate the vertices that lie on that ring. This gives the following implementation:



  float bottomRadius, float topRadius, 

  float height, uint32 sliceCount, uint32 stackCount)


  MeshData meshData;


  // Build Stacks.


  float stackHeight = height / stackCount;

  // Amount to increment radius as we move up each stack level from

  // bottom to top.

  float radiusStep = (topRadius - bottomRadius) / stackCount;

  uint32 ringCount = stackCount+1;

  // Compute vertices for each stack ring starting at the bottom and 

  // moving up.

  for(uint32 i = 0; i < ringCount; ++i)


    float y = -0.5f*height + i*stackHeight;

    float r = bottomRadius + i*radiusStep;

    // vertices of ring

    float dTheta = 2.0f*XM_PI/sliceCount;

    for(uint32 j = 0; j <= sliceCount; ++j)


      Vertex vertex;

      float c = cosf(j*dTheta);

      float s = sinf(j*dTheta);

      vertex.Position = XMFLOAT3(r*c, y, r*s);

      vertex.TexC.x = (float)j/sliceCount;

      vertex.TexC.y = 1.0f - (float)i/stackCount;

      // Cylinder can be parameterized as follows, where we introduce v

      // parameter that goes in the same direction as the v tex-coord

      // so that the bitangent goes in the same direction as the 

      // v tex-coord.

      //  Let r0 be the bottom radius and let r1 be the top radius.

      //  y(v) = h - hv for v in [0,1].

      //  r(v) = r1 + (r0-r1)v


      //  x(t, v) = r(v)*cos(t)

      //  y(t, v) = h - hv

      //  z(t, v) = r(v)*sin(t)


      // dx/dt = -r(v)*sin(t)

      // dy/dt = 0

      // dz/dt = +r(v)*cos(t)


      // dx/dv = (r0-r1)*cos(t)

      // dy/dv = -h

      // dz/dv = (r0-r1)*sin(t)

      // This is unit length.

      vertex.TangentU = XMFLOAT3(-s, 0.0f, c);

      float dr = bottomRadius-topRadius;

      XMFLOAT3 bitangent(dr*c, -height, dr*s);

      XMVECTOR T = XMLoadFloat3(&vertex.TangentU);

      XMVECTOR B = XMLoadFloat3(&bitangent);

      XMVECTOR N = XMVector3Normalize(XMVector3Cross(T, B));

      XMStoreFloat3(&vertex.Normal, N);






Observe that the first and last vertex of each ring is duplicated in position, but the texture coordinates are not duplicated. We have to do this so that we can apply textures to cylinders correctly.



The actual method GeometryGenerator::CreateCylinder creates additional vertex data such as normal vectors and texture coordinates that will be useful for future demos. Do not worry about these quantities for now.

Observe from Figure 7.2 that there is a quad (two triangles) for each slice in every stack. Figure 7.2 shows that the indices for the ith stack and jth slice are given by:


where n is the number of vertices per ring. So the key idea is to loop over every slice in every stack, and apply the above formulas.

  // Add one because we duplicate the first and last vertex per ring

  // since the texture coordinates are different.

  uint32 ringVertexCount = sliceCount+1;

  // Compute indices for each stack.

  for(uint32 i = 0; i < stackCount; ++i)


    for(uint32 j = 0; j < sliceCount; ++j)


      meshData.Indices32.push_back(i*ringVertexCount + j);

      meshData.Indices32.push_back((i+1)*ringVertexCount + j);

      meshData.Indices32.push_back((i+1)*ringVertexCount + j+1);

      meshData.Indices32.push_back(i*ringVertexCount + j);

      meshData.Indices32.push_back((i+1)*ringVertexCount + j+1);

      meshData.Indices32.push_back(i*ringVertexCount + j+1);



  BuildCylinderTopCap(bottomRadius, topRadius, height, 

    sliceCount, stackCount, meshData);

  BuildCylinderBottomCap(bottomRadius, topRadius, height,

    sliceCount, stackCount, meshData);

  return meshData;



Figure 7.2.  The vertices ABCD contained in the ith and i + 1th ring, and jth slice. Cap Geometry

Generating the cap geometry amounts to generating the slice triangles of the top and bottom rings to approximate a circle:

void GeometryGenerator::BuildCylinderTopCap(

  float bottomRadius, float topRadius, float height,

  uint32 sliceCount, uint32 stackCount, MeshData& meshData)


  uint32 baseIndex = (uint32)meshData.Vertices.size();

  float y = 0.5f*height;

  float dTheta = 2.0f*XM_PI/sliceCount;

  // Duplicate cap ring vertices because the texture coordinates and 

  // normals differ.

  for(uint32 i = 0; i <= sliceCount; ++i)


    float x = topRadius*cosf(i*dTheta);

    float z = topRadius*sinf(i*dTheta);

    // Scale down by the height to try and make top cap texture coord

    // area proportional to base.

    float u = x/height + 0.5f;

    float v = z/height + 0.5f;


      Vertex(x, y, z, 0.0f, 1.0f, 0.0f, 1.0f, 0.0f, 0.0f, u, v) );


  // Cap center vertex.


    Vertex(0.0f, y, 0.0f, 0.0f, 1.0f, 0.0f, 1.0f, 0.0f, 0.0f, 0.5f, 0.5f) );

  // Index of center vertex.

  uint32 centerIndex = (uint32)meshData.Vertices.size()-1;

  for(uint32 i = 0; i < sliceCount; ++i)



    meshData.Indices32.push_back(baseIndex + i+1);

    meshData.Indices32.push_back(baseIndex + i);



The bottom cap code is analogous.

7.4.2 Generating a Sphere Mesh

We define a sphere by specifying its radius, and the slice and stack count, as shown in Figure 7.3. The algorithm for generating the sphere is very similar to that of the cylinder, except that the radius per ring changes is a nonlinear way based on trigonometric functions. We will leave it to the reader to study the GeometryGenerator::CreateSphere code. Note that we can apply a non-uniform scaling world transformation to transform a sphere into an ellipsoid.


Figure 7.3.  The idea of slices and stacks also apply to a sphere to control the level of tessellation.

7.4.3 Generating a Geosphere Mesh

Observe from Figure 7.3 that the triangles of the sphere do not have equal areas. This can be undesirable for some situations. A geosphere approximates a sphere using triangles with almost equal areas as well as equal side lengths (see Figure 7.4).


Figure 7.4.  Approximating a geosphere by repeated subdivision and reprojection onto the sphere.

To generate a geosphere, we start with an icosahedron, subdivide the triangles, and then project the new vertices onto the sphere with the given radius. We can repeat this process to improve the tessellation.

Figure 7.5 shows how a triangle can be subdivided into four equal sized triangles. The new vertices are found just by taking the midpoints along the edges of the original triangle. The new vertices can then be projected onto a sphere of radius r by projecting the vertices onto the unit sphere and then scalar multiplying by image


Figure 7.5.  Subdividing a triangle into four triangles of equal area.

The code is given below:


GeometryGenerator::CreateGeosphere(float radius, uint32 numSubdivisions)


  MeshData meshData;

  // Put a cap on the number of subdivisions.

  numSubdivisions = std::min<uint32>(numSubdivisions, 6u);

  // Approximate a sphere by tessellating an icosahedron.

  const float X = 0.525731f; 

  const float Z = 0.850651f;

  XMFLOAT3 pos[12] = 


    XMFLOAT3(-X, 0.0f, Z), XMFLOAT3(X, 0.0f, Z), 

    XMFLOAT3(-X, 0.0f, -Z), XMFLOAT3(X, 0.0f, -Z),  

    XMFLOAT3(0.0f, Z, X),  XMFLOAT3(0.0f, Z, -X), 

    XMFLOAT3(0.0f, -Z, X), XMFLOAT3(0.0f, -Z, -X),  

    XMFLOAT3(Z, X, 0.0f),  XMFLOAT3(-Z, X, 0.0f), 

    XMFLOAT3(Z, -X, 0.0f), XMFLOAT3(-Z, -X, 0.0f)


  uint32 k[60] =


    1,4,0, 4,9,0, 4,5,9, 8,5,4, 1,8,4,  

    1,10,8, 10,3,8, 8,3,5, 3,2,5, 3,7,2,  

    3,10,7, 10,6,7, 6,11,7, 6,0,11, 6,1,0, 

    10,1,6, 11,0,9, 2,11,9, 5,2,9, 11,2,7 



  meshData.Indices32.assign(&k[0], &k[60]);

  for(uint32 i = 0; i < 12; ++i)

    meshData.Vertices[i].Position = pos[i];

  for(uint32 i = 0; i < numSubdivisions; ++i)


  // Project vertices onto sphere and scale.

  for(uint32 i = 0; i < meshData.Vertices.size(); ++i)


    // Project onto unit sphere.

    XMVECTOR n = XMVector3Normalize(XMLoadFloat3(&meshData.Vertices[i].Position));

    // Project onto sphere.

    XMVECTOR p = radius*n;

    XMStoreFloat3(&meshData.Vertices[i].Position, p);

    XMStoreFloat3(&meshData.Vertices[i].Normal, n);

    // Derive texture coordinates from spherical coordinates.

    float theta = atan2f(meshData.Vertices[i].Position.z, 


    // Put in [0, 2pi].

    if(theta < 0.0f)

      theta += XM_2PI;

    float phi = acosf(meshData.Vertices[i].Position.y / radius);

    meshData.Vertices[i].TexC.x = theta/XM_2PI;

    meshData.Vertices[i].TexC.y = phi/XM_PI;

    // Partial derivative of P with respect to theta

    meshData.Vertices[i].TangentU.x = -radius*sinf(phi)*sinf(theta);

    meshData.Vertices[i].TangentU.y = 0.0f;

    meshData.Vertices[i].TangentU.z = +radius*sinf(phi)*cosf(theta);

    XMVECTOR T = XMLoadFloat3(&meshData.Vertices[i].TangentU);

    XMStoreFloat3(&meshData.Vertices[i].TangentU, XMVector3Normalize(T));


  return meshData;



To demonstrate our sphere and cylinder generation code, we implement the “Shapes” demo shown in Figure 7.6. In addition, you will also gain experience positioning and drawing multiple objects in a scene (i.e., creating multiple world transformation matrices). Furthermore, we place all of the scene geometry in one big vertex and index buffer. Then we will use the DrawIndexedInstanced method to draw one object at a time (as the world matrix needs to be changed between objects); so you will see an example of using the StartIndexLocation and BaseVertexLocation parameters of DrawIndexedInstanced.


Figure 7.6.  Screenshot of the “Shapes” demo.

7.5.1 Vertex and Index Buffers

As Figure 7.6 shows, in this demo we draw a box, grid, cylinders, and sphere. Even though we draw multiple spheres and cylinders in this demo, we only need one copy of the sphere and cylinder geometry. We simply redraw the same sphere and cylinder mesh multiple times, but with different world matrices; this is an example of instancing geometry, which saves memory.

We pack all the mesh vertices and indices into one vertex and index buffer. This is done by concatenating the vertex and index arrays. This means that when we draw an object, we are only drawing a subset of the vertex and index buffers. There are three quantities we need to know in order to draw only a subset of the geometry using ID3D12CommandList::DrawIndexedInstanced (recall Figure 6.3 and the discussion about it from Chapter 6). We need to know the starting index to the object in the concatenated index buffer, its index count, and we need to know the base vertex location—the index of the object’s first vertex relative to the concatenated vertex buffer. Recall that the base vertex location is an integer value added to the indices in a draw call before the vertices are fetched, so that the indices reference the proper subset in the concatenated vertex buffer. (See also Exercise 2 in Chapter 5.)

The code below shows how the geometry buffers are created, how the necessary drawing quantities are cached, and how the objects are drawn.

void ShapesApp::BuildShapeGeometry()


  GeometryGenerator geoGen;

  GeometryGenerator::MeshData box = geoGen.CreateBox(1.5f, 0.5f, 1.5f, 3);

  GeometryGenerator::MeshData grid = geoGen.CreateGrid(20.0f, 30.0f, 60, 40);

  GeometryGenerator::MeshData sphere = geoGen.CreateSphere(0.5f, 20, 20);

  GeometryGenerator::MeshData cylinder = geoGen.CreateCylinder(0.5f, 0.3f,  3.0f, 20, 20);


  // We are concatenating all the geometry into one big vertex/index

  // buffer. So define the regions in the buffer each submesh covers.


  // Cache the vertex offsets to each object in the concatenated vertex 

  // buffer.

  UINT boxVertexOffset = 0;

  UINT gridVertexOffset = (UINT)box.Vertices.size();

  UINT sphereVertexOffset = gridVertexOffset + (UINT)grid.Vertices.size();

  UINT cylinderVertexOffset = sphereVertexOffset + (UINT)sphere.Vertices.size();

  // Cache the starting index for each object in the concatenated index 

  // buffer.

  UINT boxIndexOffset = 0;

  UINT gridIndexOffset = (UINT)box.Indices32.size();

  UINT sphereIndexOffset = gridIndexOffset + (UINT)grid.Indices32.size();

  UINT cylinderIndexOffset = sphereIndexOffset + (UINT)sphere.Indices32.size();

  // Define the SubmeshGeometry that cover different 

  // regions of the vertex/index buffers.

  SubmeshGeometry boxSubmesh;

  boxSubmesh.IndexCount = (UINT)box.Indices32.size();

  boxSubmesh.StartIndexLocation = boxIndexOffset;

  boxSubmesh.BaseVertexLocation = boxVertexOffset;

  SubmeshGeometry gridSubmesh;

  gridSubmesh.IndexCount = (UINT)grid.Indices32.size();

  gridSubmesh.StartIndexLocation = gridIndexOffset;

  gridSubmesh.BaseVertexLocation = gridVertexOffset;

  SubmeshGeometry sphereSubmesh;

  sphereSubmesh.IndexCount = (UINT)sphere.Indices32.size();

  sphereSubmesh.StartIndexLocation = sphereIndexOffset;

  sphereSubmesh.BaseVertexLocation = sphereVertexOffset;

  SubmeshGeometry cylinderSubmesh;

  cylinderSubmesh.IndexCount = (UINT)cylinder.Indices32.size();

  cylinderSubmesh.StartIndexLocation = cylinderIndexOffset;

  cylinderSubmesh.BaseVertexLocation = cylinderVertexOffset;


  // Extract the vertex elements we are interested in and pack the

  // vertices of all the meshes into one vertex buffer.


  auto totalVertexCount =

    box.Vertices.size() +

    grid.Vertices.size() +

    sphere.Vertices.size() +


  std::vector<Vertex> vertices(totalVertexCount);

  UINT k = 0;

  for(size_t i = 0; i < box.Vertices.size(); ++i, ++k)


    vertices[k].Pos = box.Vertices[i].Position;

    vertices[k].Color = XMFLOAT4(DirectX::Colors::DarkGreen);


  for(size_t i = 0; i < grid.Vertices.size(); ++i, ++k)


    vertices[k].Pos = grid.Vertices[i].Position;

    vertices[k].Color = XMFLOAT4(DirectX::Colors::ForestGreen);


  for(size_t i = 0; i < sphere.Vertices.size(); ++i, ++k)


    vertices[k].Pos = sphere.Vertices[i].Position;

    vertices[k].Color = XMFLOAT4(DirectX::Colors::Crimson);


  for(size_t i = 0; i < cylinder.Vertices.size(); ++i, ++k)


    vertices[k].Pos = cylinder.Vertices[i].Position;

    vertices[k].Color = XMFLOAT4(DirectX::Colors::SteelBlue);


  std::vector<std::uint16_t> indices;













  const UINT vbByteSize = (UINT)vertices.size() * sizeof(Vertex);

  const UINT ibByteSize = (UINT)indices.size() * sizeof(std::uint16_t);

  auto geo = std::make_unique<MeshGeometry>();

  geo->Name = "shapeGeo";

  ThrowIfFailed(D3DCreateBlob(vbByteSize, &geo->VertexBufferCPU));



  ThrowIfFailed(D3DCreateBlob(ibByteSize, &geo->IndexBufferCPU));



  geo->VertexBufferGPU = d3dUtil::CreateDefaultBuffer(md3dDevice.Get(),

    mCommandList.Get(),, vbByteSize, geo->VertexBufferUploader);

  geo->IndexBufferGPU = d3dUtil::CreateDefaultBuffer(md3dDevice.Get(),

    mCommandList.Get(),, ibByteSize, geo->IndexBufferUploader);

  geo->VertexByteStride = sizeof(Vertex);

  geo->VertexBufferByteSize = vbByteSize;

  geo->IndexFormat = DXGI_FORMAT_R16_UINT;

  geo->IndexBufferByteSize = ibByteSize;

  geo->DrawArgs["box"] = boxSubmesh;

  geo->DrawArgs["grid"] = gridSubmesh;

  geo->DrawArgs["sphere"] = sphereSubmesh;

  geo->DrawArgs["cylinder"] = cylinderSubmesh;

  mGeometries[geo->Name] = std::move(geo);


The mGeometries variable used in the last line of the above method is defined like so:

std::unordered_map<std::string, std::unique_ptr<MeshGeometry>> mGeometries;

This is a common pattern we employ for the rest of this book. It is cumbersome to create a new variable name for each geometry, PSO, texture, shader, etc., so we use unordered maps for constant time lookup and reference our objects by name. Here are some more examples:

std::unordered_map<std::string, std::unique_ptr<MeshGeometry>> mGeometries;

std::unordered_map<std::string, ComPtr<ID3DBlob>> mShaders;

std::unordered_map<std::string, ComPtr<ID3D12PipelineState>> mPSOs;

7.5.2 Render Items

We now define our scene render items. Observe how all the render items share the same MeshGeometry, and we use the DrawArgs to get the DrawIndexedInstanced parameters to draw a subregion of the vertex/index buffers.

// ShapesApp member variable.

std::vector<std::unique_ptr<RenderItem>> mAllRitems;

std::vector<RenderItem*> mOpaqueRitems;

void ShapesApp::BuildRenderItems()


  auto boxRitem = std::make_unique<RenderItem>();


    XMMatrixScaling(2.0f, 2.0f, 2.0f)*XMMatrixTranslation(0.0f, 0.5f, 0.0f));

  boxRitem->ObjCBIndex = 0;

  boxRitem->Geo = mGeometries["shapeGeo"].get();


  boxRitem->IndexCount = boxRitem->Geo->DrawArgs["box"].IndexCount;

  boxRitem->StartIndexLocation = boxRitem->Geo->DrawArgs["box"]. 


  boxRitem->BaseVertexLocation = boxRitem->Geo->DrawArgs["box"].



  auto gridRitem = std::make_unique<RenderItem>();

  gridRitem->World = MathHelper::Identity4x4();

  gridRitem->ObjCBIndex = 1;

  gridRitem->Geo = mGeometries["shapeGeo"].get();


  gridRitem->IndexCount = gridRitem->Geo->DrawArgs["grid"].IndexCount;

  gridRitem->StartIndexLocation = gridRitem->Geo->DrawArgs["grid"].


  gridRitem->BaseVertexLocation = gridRitem->Geo->DrawArgs["grid"].



  // Build the columns and spheres in rows as in Figure 7.6.

  UINT objCBIndex = 2;

  for(int i = 0; i < 5; ++i)


    auto leftCylRitem = std::make_unique<RenderItem>();

    auto rightCylRitem = std::make_unique<RenderItem>();

    auto leftSphereRitem = std::make_unique<RenderItem>();

    auto rightSphereRitem = std::make_unique<RenderItem>();

    XMMATRIX leftCylWorld = XMMatrixTranslation(-5.0f, 1.5f, -10.0f + i*5.0f);

    XMMATRIX rightCylWorld = XMMatrixTranslation(+5.0f, 1.5f, -10.0f + i*5.0f);

    XMMATRIX leftSphereWorld = XMMatrixTranslation(-5.0f, 3.5f, -10.0f + i*5.0f);

    XMMATRIX rightSphereWorld = XMMatrixTranslation(+5.0f, 3.5f, -10.0f 

      + i*5.0f);

    XMStoreFloat4x4(&leftCylRitem->World, rightCylWorld);

    leftCylRitem->ObjCBIndex = objCBIndex++;

    leftCylRitem->Geo = mGeometries["shapeGeo"].get();

    leftCylRitem->PrimitiveType = D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST;

    leftCylRitem->IndexCount = leftCylRitem->Geo->DrawArgs["cylinder"].


    leftCylRitem->StartIndexLocation = 


    leftCylRitem->BaseVertexLocation = 


    XMStoreFloat4x4(&rightCylRitem->World, leftCylWorld);

    rightCylRitem->ObjCBIndex = objCBIndex++;

    rightCylRitem->Geo = mGeometries["shapeGeo"].get();

    rightCylRitem->PrimitiveType = D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST;

    rightCylRitem->IndexCount = rightCylRitem-> Geo->DrawArgs["cylinder"].


    rightCylRitem->StartIndexLocation = 


    rightCylRitem->BaseVertexLocation = 


    XMStoreFloat4x4(&leftSphereRitem->World, leftSphereWorld);

    leftSphereRitem->ObjCBIndex = objCBIndex++;

    leftSphereRitem->Geo = mGeometries["shapeGeo"].get();

    leftSphereRitem->PrimitiveType = D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST;

    leftSphereRitem->IndexCount = leftSphereRitem->Geo->DrawArgs["sphere"].


    leftSphereRitem->StartIndexLocation = 


    leftSphereRitem->BaseVertexLocation = 


    XMStoreFloat4x4(&rightSphereRitem->World, rightSphereWorld);

    rightSphereRitem->ObjCBIndex = objCBIndex++;

    rightSphereRitem->Geo = mGeometries["shapeGeo"].get();

    rightSphereRitem->PrimitiveType = D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST;

    rightSphereRitem->IndexCount = rightSphereRitem->Geo->DrawArgs["sphere"].


    rightSphereRitem->StartIndexLocation = 


    rightSphereRitem->BaseVertexLocation = 







  // All the render items are opaque in this demo.

  for(auto& e : mAllRitems)



7.5.3 Frame Resources and Constant Buffer Views

Recall that we have a vector of FrameResources, and each FrameResource has an upload buffer for storing the pass constants and constant buffers for every render item in the scene.

std::unique_ptr<UploadBuffer<PassConstants>> PassCB = nullptr;

std::unique_ptr<UploadBuffer<ObjectConstants>> ObjectCB = nullptr; 

If we have 3 frame resources and n render items, then we have three 3n object constant buffers and 3 pass constant buffers. Hence we need 3(n+1) constant buffer views (CBVs). Thus we will need to modify our CBV heap to include the additional descriptors:

void ShapesApp::BuildDescriptorHeaps()


  UINT objCount = (UINT)mOpaqueRitems.size();

  // Need a CBV descriptor for each object for each frame resource,

  // +1 for the perPass CBV for each frame resource.

  UINT numDescriptors = (objCount+1) * gNumFrameResources;

  // Save an offset to the start of the pass CBVs. These are the last 3 descriptors.

  mPassCbvOffset = objCount * gNumFrameResources;


  cbvHeapDesc.NumDescriptors = numDescriptors;



  cbvHeapDesc.NodeMask = 0;




Now, we can populate the CBV heap with the following code where descriptors 0 to n-1 contain the object CBVs for the 0th frame resource, descriptors n to 2n−1 contains the object CBVs for 1st frame resource, descriptors 2n to 3n−1 contain the objects CBVs for the 2nd frame resource, and descriptors 3n, 3n+1, and 3n+2 contain the pass CBVs for the 0th, 1st, and 2nd frame resource, respectively:

void ShapesApp::BuildConstantBufferViews()


  UINT objCBByteSize = d3dUtil::CalcConstantBufferByteSize(sizeof


  UINT objCount = (UINT)mOpaqueRitems.size();

  // Need a CBV descriptor for each object for each frame resource.

  for(int frameIndex = 0; frameIndex < gNumFrameResources; ++frameIndex)


    auto objectCB = mFrameResources[frameIndex]->ObjectCB->Resource();

    for(UINT i = 0; i < objCount; ++i)


      D3D12_GPU_VIRTUAL_ADDRESS cbAddress = objectCB->GetGPUVirtualAddress();

      // Offset to the ith object constant buffer in the current buffer.

      cbAddress += i*objCBByteSize;

      // Offset to the object CBV in the descriptor heap.

      int heapIndex = frameIndex*objCount + i;

      auto handle = CD3DX12_CPU_DESCRIPTOR_HANDLE(


      handle.Offset(heapIndex, mCbvSrvUavDescriptorSize);


      cbvDesc.BufferLocation = cbAddress;

      cbvDesc.SizeInBytes = objCBByteSize;

      md3dDevice->CreateConstantBufferView(&cbvDesc, handle);



  UINT passCBByteSize = d3dUtil::CalcConstantBufferByteSize(sizeof


  // Last three descriptors are the pass CBVs for each frame resource.

  for(int frameIndex = 0; frameIndex < gNumFrameResources; ++frameIndex)


    auto passCB = mFrameResources[frameIndex]->PassCB->Resource();

    // Pass buffer only stores one cbuffer per frame resource.

    D3D12_GPU_VIRTUAL_ADDRESS cbAddress = passCB->GetGPUVirtualAddress();

    // Offset to the pass cbv in the descriptor heap.

    int heapIndex = mPassCbvOffset + frameIndex;

    auto handle = CD3DX12_CPU_DESCRIPTOR_HANDLE(


    handle.Offset(heapIndex, mCbvSrvUavDescriptorSize);


    cbvDesc.BufferLocation = cbAddress;

    cbvDesc.SizeInBytes = passCBByteSize;

    md3dDevice->CreateConstantBufferView(&cbvDesc, handle);



Recall that we can get a handle to the first descriptor in a heap with the ID3D12DescriptorHeap::GetCPUDescriptorHandleForHeapStart method. However, now that our heap has more than one descriptor, this method is no longer sufficient. We need to be able to offset to other descriptors in the heap. To do this, we need to know the size to increment in the heap to get to the next descriptor. This is hardware specific, so we have to query this information from the device, and it depends on the heap type. Recall that our D3DApp class caches this information:

mRtvDescriptorSize = md3dDevice->GetDescriptorHandleIncrementSize(


mDsvDescriptorSize = md3dDevice->GetDescriptorHandleIncrementSize(


mCbvSrvUavDescriptorSize = md3dDevice->GetDescriptorHandleIncrementSize(


Once we know the descriptor increment size, we can use one of the two CD3DX12_CPU_DESCRIPTOR_HANDLE::Offset methods to offset the handle by n descriptors:

// Specify the number of descriptors to offset times the descriptor

// Offset by n descriptors:

CD3DX12_CPU_DESCRIPTOR_HANDLE handle = mCbvHeap->GetCPUDescriptorHandleForHeapStart();

handle.Offset(n * mCbvSrvDescriptorSize);

// Or equivalently, specify the number of descriptors to offset, 

// followed by the descriptor increment size:

CD3DX12_CPU_DESCRIPTOR_HANDLE handle = mCbvHeap->GetCPUDescriptorHandleForHeapStart();

handle.Offset(n, mCbvSrvDescriptorSize);



CD3DX12_GPU_DESCRIPTOR_HANDLE has the same Offset methods.

7.5.4 Drawing the Scene

At last we can draw our render items. Perhaps the only tricky part is offsetting to the correct CBV in the heap for the object we want to draw. Notice how a render item stores an index to the constant buffer that is associated with the render item.

void ShapesApp::DrawRenderItems(

  ID3D12GraphicsCommandList* cmdList, 

  const std::vector<RenderItem*>& ritems)


  UINT objCBByteSize = d3dUtil::CalcConstantBufferByteSize(sizeof(ObjectConstants));

  auto objectCB = mCurrFrameResource->ObjectCB->Resource();

  // For each render item…

  for(size_t i = 0; i < ritems.size(); ++i)


    auto ri = ritems[i];

    cmdList->IASetVertexBuffers(0, 1, &ri->Geo->VertexBufferView());



    // Offset to the CBV in the descriptor heap for this object and

    // for this frame resource.

    UINT cbvIndex = mCurrFrameResourceIndex*(UINT)mOpaqueRitems.size() 

      + ri->ObjCBIndex;

    auto cbvHandle = CD3DX12_GPU_DESCRIPTOR_HANDLE(


    cbvHandle.Offset(cbvIndex, mCbvSrvUavDescriptorSize);

    cmdList->SetGraphicsRootDescriptorTable(0, cbvHandle);

    cmdList->DrawIndexedInstanced(ri->IndexCount, 1,

      ri->StartIndexLocation, ri->BaseVertexLocation, 0);



The DrawRenderItems method is invoked in the main Draw call:

void ShapesApp::Draw(const GameTimer& gt)


  auto cmdListAlloc = mCurrFrameResource->CmdListAlloc;

  // Reuse the memory associated with command recording.

  // We can only reset when the associated command lists have 

  // finished execution on the GPU.


  // A command list can be reset after it has been added to the

  // command queue via ExecuteCommandList.

  // Reusing the command list reuses memory.




      cmdListAlloc.Get(), mPSOs["opaque_wireframe"].Get()));




    ThrowIfFailed(mCommandList->Reset(cmdListAlloc.Get(), mPSOs["opaque"].Get()));


  mCommandList->RSSetViewports(1, &mScreenViewport);

  mCommandList->RSSetScissorRects(1, &mScissorRect);

  // Indicate a state transition on the resource usage.





  // Clear the back buffer and depth buffer.


    Colors::LightSteelBlue, 0, nullptr);



    1.0f, 0, 0, nullptr);

  // Specify the buffers we are going to render to.

  mCommandList->OMSetRenderTargets(1, &CurrentBackBufferView(),

    true, &DepthStencilView());

  ID3D12DescriptorHeap* descriptorHeaps[] = { mCbvHeap.Get() };

  mCommandList->SetDescriptorHeaps(_countof(descriptorHeaps), descriptorHeaps);


  int passCbvIndex = mPassCbvOffset + mCurrFrameResourceIndex;

  auto passCbvHandle = CD3DX12_GPU_DESCRIPTOR_HANDLE(


  passCbvHandle.Offset(passCbvIndex, mCbvSrvUavDescriptorSize);

  mCommandList->SetGraphicsRootDescriptorTable(1, passCbvHandle);

  DrawRenderItems(mCommandList.Get(), mOpaqueRitems);

  // Indicate a state transition on the resource usage.





  // Done recording commands.


  // Add the command list to the queue for execution.

  ID3D12CommandList* cmdsLists[] = { mCommandList.Get() };

  mCommandQueue->ExecuteCommandLists(_countof(cmdsLists), cmdsLists);

  // Swap the back and front buffers

  ThrowIfFailed(mSwapChain->Present(0, 0));

  mCurrBackBuffer = (mCurrBackBuffer + 1) % SwapChainBufferCount;

  // Advance the fence value to mark commands up to this fence point.

  mCurrFrameResource->Fence = ++mCurrentFence;

  // Add an instruction to the command queue to set a new fence point. 

  // Because we are on the GPU timeline, the new fence point won’t be 

  // set until the GPU finishes processing all the commands prior to this Signal().

  mCommandQueue->Signal(mFence.Get(), mCurrentFence);



We introduced root signatures in §6.6.5 of the previous chapter. A root signature defines what resources need to be bound to the pipeline before issuing a draw call and how those resources get mapped to shader input registers. What resources need to be bound depends on what resources the current shader programs expect. When the PSO is created, the root signature and shader programs combination will be validated.

7.6.1 Root Parameters

Recall that a root signature is defined by an array of root parameters. Thus far we have only created a root parameter that stores a descriptor table. However, a root parameter can actually be one of three types:

1.    Descriptor Table: Expects a descriptors table referencing a contiguous range in a heap that identifies the resource to be bound.

2.    Root descriptor (inline descriptor): Expects a descriptor to be set directly that identifies the resource to be bound; the descriptor does not need to be in a heap. Only CBVs to constant buffers, and SRV/UAVs to buffers can be bound as a root descriptor. In particular, this means SRVs to textures cannot be bound as a root descriptor.

3.    Root constant: Expects a list of 32-bit constant values to be bound directly.

For performance, there is a limit of 64 DWORDs that can be put in a root signature. The three types of root parameters have the following costs:

1.    Descriptor Table: 1 DWORD

2.    Root Descriptor: 2 DWORDs

3.    Root Constant: 1 DWORD per 32-bit constant

We can create an arbitrary root signature, provided we do not go over the sixty-four DWORD limit. Root constants are very convenient but their cost adds up quickly. For example, if the only constant data we needed was a world-view-projection matrix, we could use sixteen root constants to store it, which would make us not need to bother with a constant buffer and CBV heap. However, that eats up a quarter of our root signature budget. Using a root descriptor would only be two DWORDs, and a descriptor table is only one DWORD. As our applications become more complex, our constant buffer data will become larger, and it is unlikely we will be able to get away with using only root constants. In a real world application, you will probably use a combination of all three types of root parameters.

In code a root parameter is described by filling out a CD3DX12_ROOT_PARAMETER structure. As we have seen with CD3DX code, the CD3DX12_ROOT_PARAMETER extends D3D12_ROOT_PARAMETER and add some helper initialization functions.

typedef struct D3D12_ROOT_PARAMETER


  D3D12_ROOT_PARAMETER_TYPE ParameterType;



    D3D12_ROOT_DESCRIPTOR_TABLE DescriptorTable;

    D3D12_ROOT_CONSTANTS Constants;

    D3D12_ROOT_DESCRIPTOR Descriptor;


  D3D12_SHADER_VISIBILITY ShaderVisibility;


1.    ParameterType: A member of the following enumerated type indicating the root parameter type (descriptor table, root constant, CBV root descriptor, SRV root descriptor, UAV root descriptor).









2.    DescriptorTable/Constants/Descriptor: A structure describing the root parameter. The member of the union you fill out depends on the root parameter type. §7.6.2, §7.6.3, and §7.6.4 discuss these structures.

3.    ShaderVisibility: A member of the following enumeration that specifies which shader programs this root parameter is visible to. Usually in this book we specify D3D12_SHADER_VISIBILITY_ALL. However, if we know a resource is only going to be used in a pixel shader, for example, then we can specify D3D12_SHADER_VISIBILITY_PIXEL. Limiting the visibility of a root parameter can potentially lead to some optimizations.










7.6.2 Descriptor Tables

A descriptor table root parameter is further defined by filling out the DescriptorTable member of D3D12_ROOT_PARAMETER.

typedef struct D3D12_ROOT_DESCRIPTOR_TABLE


  UINT NumDescriptorRanges;

  const D3D12_DESCRIPTOR_RANGE *pDescriptorRanges;


This simply specifies an array of D3D12_DESCRIPTOR_RANGEs and the number of ranges in the array.

The D3D12_DESCRIPTOR_RANGE structure is defined like so:

typedef struct D3D12_DESCRIPTOR_RANGE



  UINT NumDescriptors;

  UINT BaseShaderRegister;

  UINT RegisterSpace;

  UINT OffsetInDescriptorsFromTableStart;


1.    RangeType: A member of the following enumerated type indicating the type of descriptors in this range:










Sampler descriptors are discussed in the chapter on texturing.

2.    NumDescriptors: The number of descriptors in the range.

3.    BaseShaderRegister: Base shader register arguments are bound to. For example, if you set NumDescriptors to 3, BaseShaderRegister to 1 and the range type is CBV (for constant buffers), then you will be binding to HLSL registers

  cbuffer cbA : register(b1) {…};

  cbuffer cbB : register(b2) {…};

  cbuffer cbC : register(b3) {…};

4.    RegisterSpace: This property gives you another dimension to specify shader registers. For example, the following two registers seem to overlap register slot t0, but they are different registers because they live in different spaces: Texture2D gDiffuseMap : register(t0, space0);

  Texture2D gNormalMap : register(t0, space1);

If no space register is explicitly specified in the shader, it automatically defaults to space0. Usually we use space0, but for arrays of resources it is useful to use multiple spaces, and necessary if the arrays are of an unknown size.

5.    OffsetInDescriptorsFromTableStart: The offset of this range of descriptors from the start of the table. See the example below.

A slot parameter initialized as a descriptor table takes an array of D3D12_DESCRIPTOR_RANGE instances because we can mix various types of descriptors in one table. Suppose we defined a table of six descriptors by the following three ranges in order: two CBVs, three SRVs and one UAV. This table would be defined like so:

// Create a table with 2 CBVs, 3 SRVs and 1 UAV.



  D3D12_DESCRIPTOR_RANGE_TYPE_CBV, // descriptor type

  2, // descriptor count

  0, // base shader register arguments are bound to for this root 

     // parameter

  0, // register space

  0);// offset from start of table


  D3D12_DESCRIPTOR_RANGE_TYPE_SRV, // descriptor type

  3, // descriptor count

  0, // base shader register arguments are bound to for this root 

     // parameter

  0, // register space

  2);// offset from start of table


  D3D12_DESCRIPTOR_RANGE_TYPE_UAV, // descriptor type

  1, // descriptor count

  0, // base shader register arguments are bound to for this root 

     // parameter

  0, // register space

  5);// offset from start of table


  3, descRange, D3D12_SHADER_VISIBILITY_ALL);

As usual, there is a CD3DX12_DESCRIPTOR_RANGE variation that inherits from D3D12_DESCRIPTOR_RANGE, and we use the following initialization function:



    UINT numDescriptors,

    UINT baseShaderRegister,

    UINT registerSpace = 0,

    UINT offsetInDescriptorsFromTableStart =


This table covers six descriptors, and the application is expected to bind a contiguous range of descriptors in a descriptor heap that include two CBVs followed by three SRVs followed by one UAV. We see that all the range types start at register 0 but there is no “overlap” conflict because CBVs, SRVs, and UAVs all get bound to different register types, each starting at register 0.

We can have Direct3D compute the OffsetInDescriptorsFromTableStart value for us by specifying D3D12_DESCRIPTOR_RANGE_OFFSET_APPEND; this instructs Direct3D to use the previous range descriptor counts in the table to compute the offset. Note that the CD3DX12_DESCRIPTOR_RANGE::Init method defaults to the register space to 0, and the OffsetInDescriptorsFromTableStart to D3D12_DESCRIPTOR_RANGE_OFFSET_APPEND.

7.6.3 Root Descriptors

A root descriptor root parameter is further defined by filling out the Descriptor member of D3D12_ROOT_PARAMETER.

typedef struct D3D12_ROOT_DESCRIPTOR


  UINT ShaderRegister;

  UINT RegisterSpace;


1.    ShaderRegister: The shader register the descriptor will be bound to. For example, if you specify 2 and this root parameter is a CBV then the parameter gets mapped to the constant buffer in register(b2):

  cbuffer cbPass : register(b2) {…};

2.    RegisterSpace: See D3D12_DESCRIPTOR_RANGE::RegisterSpace.

Unlike descriptor tables which require us to set a descriptor handle in a descriptor heap, to set a root descriptor, we simply bind the virtual address of the resource directly.

UINT objCBByteSize = d3dUtil::CalcConstantBufferByteSize(sizeof


D3D12_GPU_VIRTUAL_ADDRESS objCBAddress =  objectCB->GetGPUVirtualAddress();

// Offset to the constants for this object in the buffer.

objCBAddress += ri->ObjCBIndex*objCBByteSize;


   0, // root parameter index


7.6.4 Root Constants

A descriptor table root parameter is further defined by filling out the Constants member of D3D12_ROOT_PARAMETER.

  typedef struct D3D12_ROOT_CONSTANTS


    UINT ShaderRegister;

    UINT RegisterSpace;

    UINT Num32BitValues;


1.    ShaderRegister: See D3D12_ROOT_DESCRIPTOR::ShaderRegister.

2.    RegisterSpace: See D3D12_DESCRIPTOR_RANGE::RegisterSpace.

3.    Num32BitValues: The number of 32-bit constants this root parameter expects.

Setting root constants still maps the data to a constant buffer from the shader’s perspective. The following example illustrates:

// Application code: Root signature definition.

CD3DX12_ROOT_PARAMETER slotRootParameter[1];

slotRootParameter[0].InitAsConstants(12, 0);

// A root signature is an array of root parameters.

CD3DX12_ROOT_SIGNATURE_DESC rootSigDesc(1, slotRootParameter,

  0, nullptr,


// Application code: to set the constants to register b0.

auto weights = CalcGaussWeights(2.5f);

int blurRadius = (int)weights.size() / 2;

cmdList->SetGraphicsRoot32BitConstants(0, 1, &blurRadius, 0);

cmdList->SetGraphicsRoot32BitConstants(0, (UINT)weights.size(),, 1);

// HLSL code.

cbuffer cbSettings : register(b0)


   // We cannot have an array entry in a constant buffer that gets

   // mapped onto root constants, so list each element. 

   int gBlurRadius;

   // Support up to 11 blur weights.

   float w0;

   float w1;

   float w2;

   float w3;

   float w4;

   float w5;

   float w6;

   float w7;

   float w8;

   float w9;

   float w10;


The ID3D12GraphicsCommandList::SetGraphicsRoot32BitConstants method has the following prototype:

void ID3D12GraphicsCommandList::SetGraphicsRoot32BitConstants( 

  UINT RootParameterIndex,

  UINT Num32BitValuesToSet,

  const void *pSrcData,

  UINT DestOffsetIn32BitValues);

1.    RootParameterIndex: Index of the root parameter we are setting.

2.    Num32BitValuesToSet: The number of 32-bit values to set.

3.    pSrcData: Pointer to an array of 32-bit values to set.

4.    DestOffsetIn32BitValues: Offset in 32-bit values in the constant buffer.

As with root descriptors, setting root constants bypasses the need for a descriptor heap.

7.6.5 A More Complicated Root Signature Example

Consider a shader that expects the following resources:

Texture2D gDiffuseMap : register(t0);

cbuffer cbPerObject : register(b0)


  float4x4 gWorld;

  float4x4 gTexTransform;


cbuffer cbPass : register(b1)


  float4x4 gView;

  float4x4 gInvView;

  float4x4 gProj;

  float4x4 gInvProj;

  float4x4 gViewProj;

  float4x4 gInvViewProj;

  float3 gEyePosW;

  float cbPerObjectPad1;

  float2 gRenderTargetSize;

  float2 gInvRenderTargetSize;

  float gNearZ;

  float gFarZ;

  float gTotalTime;

  float gDeltaTime;

  float4 gAmbientLight;

  Light gLights[MaxLights];


cbuffer cbMaterial : register(b2)


  float4  gDiffuseAlbedo;

  float3  gFresnelR0;

  float  gRoughness;

  float4x4 gMatTransform;


The root signature for this shader would be described as follows:




  1, // number of descriptors

  0); // register t0

  // Root parameter can be a table, root descriptor or root constants.

  CD3DX12_ROOT_PARAMETER slotRootParameter[4];

  // Perfomance TIP: Order from most frequent to least frequent.



  slotRootParameter[1].InitAsConstantBufferView(0); // register b0

  slotRootParameter[2].InitAsConstantBufferView(1); // register b1

  slotRootParameter[3].InitAsConstantBufferView(2); // register b2

  // A root signature is an array of root parameters.

  CD3DX12_ROOT_SIGNATURE_DESC rootSigDesc(4, slotRootParameter,

    0, nullptr,


7.6.6 Root Parameter Versioning

Root arguments refer to the actual values we pass to root parameters. Consider the following code where we change the root arguments (only descriptor table in this case) between draw calls:

for(size_t i = 0; i < mRitems.size(); ++i)


  const auto& ri = mRitems[i];


  // Offset to the CBV for this frame and this render item.

  int cbvOffset = mCurrFrameResourceIndex*(int)mRitems.size();

  cbvOffset += ri.CbIndex;

  cbvHandle.Offset(cbvOffset, mCbvSrvDescriptorSize);

  // Identify descriptors to use for this draw call.

  cmdList->SetGraphicsRootDescriptorTable(0, cbvHandle);


    ri.IndexCount, 1, 


    ri.BaseVertexLocation, 0);


Each draw call will be executed with the currently set state of the root arguments at the time of the draw call. This works because the hardware automatically saves a snapshot of the current state of the root arguments for each draw call. In other words, the root arguments are automatically versioned for each draw call.

Note that a root signature can provide more fields than a shader uses. For example, if the root signature specifies a root CBV in root parameter 2, but the shader does not use that constant buffer, then this combination is valid as long as the root signature does specify all the resource the shader does use.

For performance, we should aim to keep the root signature small. One reason for this is the automatic versioning of the root arguments per draw call. The larger the root signature, the larger these snapshots of the root arguments will be. Additionally, the SDK documentation advises that root parameters should be ordered in the root signature from most frequently changed to least frequently changed. The Direct3D 12 documentation also advises to avoid switching the root signature when possible, so it is a good idea to share the same root signature across many PSOs you create. In particular, it may be beneficial to have a “super” root signature that works with several shader programs even though not all the shaders uses all the parameters the root signature defines. On the other hand, it also depends how big this “super” root signature has to be for this to work. If it is too big, it could cancel out the gains of not switching the root signature.


In this section, we show how to build the “Land and Waves” demo shown in Figure 7.7. This demo constructs a triangle grid mesh procedurally and offsets the vertex heights to create a terrain. In addition, it uses another triangle grid to represent water, and animates the vertex heights to create waves. This demo also switches to using root descriptors for constant buffers, which allows us to drop support for a descriptor heap for CBVs.


Figure 7.7.  Screenshot of the “Lands and Waves” demo. Because we do not have lighting yet, it is difficult to see the shape of the waves. Hold the ‘1’ key down to view the scene in wireframe mode to see the waves better.

The graph of a “nice” real-valued function y = f(xz) is a surface. We can approximate the surface by constructing a grid in the xz-plane, where every quad is built from two triangles, and then applying the function to each grid point; see Figure 7.8.


Figure 7.8.  (Top) Lay down a grid in the xz-plane. (Bottom) For each grid point, apply the function f(xz) to obtain the y-coordinate. The plot of the points (xf(xz), z) gives the graph of a surface.

7.7.1 Generating the Grid Vertices

So the main task is how to build the grid in the xz-plane. A grid of m × n vertices induces (m – 1) × (n – 1) quads (or cells), as shown in Figure 7.9. Each cell will be covered by two triangles, so there are a total of 2 (m – 1) × (n – 1) triangles. If the grid has width and depth d, the cell spacing along the x-axis is dx = w/(n – 1) and the cell spacing along the z-axis is dz = d/(m − 1). To generate the vertices, we start at the upper-left corner and incrementally compute the vertex coordinates row-by-row. The coordinates of the ijth grid vertex in the xz-plane are given by:



Figure 7.9.  Grid construction.

The following code generates the grid vertices:


GeometryGenerator::CreateGrid(float width, float depth, uint32 m, uint32 n)


  MeshData meshData;

  uint32 vertexCount = m*n;

  uint32 faceCount  = (m-1)*(n-1)*2;

  float halfWidth = 0.5f*width;

  float halfDepth = 0.5f*depth;

  float dx = width / (n-1);

  float dz = depth / (m-1);

  float du = 1.0f / (n-1);

  float dv = 1.0f / (m-1);


  for(uint32 i = 0; i < m; ++i)


    float z = halfDepth - i*dz;

    for(uint32 j = 0; j < n; ++j)


      float x = -halfWidth + j*dx;

      meshData.Vertices[i*n+j].Position = XMFLOAT3(x, 0.0f, z);

      meshData.Vertices[i*n+j].Normal  = XMFLOAT3(0.0f, 1.0f, 0.0f);

      meshData.Vertices[i*n+j].TangentU = XMFLOAT3(1.0f, 0.0f, 0.0f);

      // Stretch texture over grid.

      meshData.Vertices[i*n+j].TexC.x = j*du;

      meshData.Vertices[i*n+j].TexC.y = i*dv;



7.7.2 Generating the Grid Indices

After we have computed the vertices, we need to define the grid triangles by specifying the indices. To do this, we iterate over each quad, again row-by-row starting at the top-left, and compute the indices to define the two triangles of the quad; referring to Figure 7.10, for an m × vertex grid, the linear array indices of the two triangles are computed as follows:



Figure 7.10.  The indices of the ijth quad’s vertices.

The corresponding code:

  meshData.Indices32.resize(faceCount*3); // 3 indices per face

  // Iterate over each quad and compute indices.

  uint32 k = 0;

  for(uint32 i = 0; i < m-1; ++i)


    for(uint32 j = 0; j < n-1; ++j)


      meshData.Indices32[k]  = i*n+j;

      meshData.Indices32[k+1] = i*n+j+1;

      meshData.Indices32[k+2] = (i+1)*n+j;

      meshData.Indices32[k+3] = (i+1)*n+j;

      meshData.Indices32[k+4] = i*n+j+1;

      meshData.Indices32[k+5] = (i+1)*n+j+1;

      k += 6; // next quad



  return meshData;


7.7.3 Applying the Height Function

After we have created the grid, we can extract the vertex elements we want from the MeshData grid, turn the flat grid into a surface representing hills, and generate a color for each vertex based on the vertex altitude (y-coordinate).

// Not to be confused with GeometryGenerator::Vertex.

struct Vertex



  XMFLOAT4 Color;


void LandAndWavesApp::BuildLandGeometry()


  GeometryGenerator geoGen;

  GeometryGenerator::MeshData grid = geoGen.CreateGrid(160.0f, 160.0f, 50, 50);


  // Extract the vertex elements we are interested and apply the height

  // function to each vertex. In addition, color the vertices based on

  // their height so we have sandy looking beaches, grassy low hills, 

  // and snow mountain peaks.


  std::vector<Vertex> vertices(grid.Vertices.size());

  for(size_t i = 0; i < grid.Vertices.size(); ++i)


    auto& p = grid.Vertices[i].Position;

    vertices[i].Pos = p;

    vertices[i].Pos.y = GetHillsHeight(p.x, p.z);

    // Color the vertex based on its height.

    if(vertices[i].Pos.y < -10.0f)


      // Sandy beach color.

      vertices[i].Color = XMFLOAT4(1.0f, 0.96f, 0.62f, 1.0f);


    else if(vertices[i].Pos.y < 5.0f)


      // Light yellow-green.

      vertices[i].Color = XMFLOAT4(0.48f, 0.77f, 0.46f, 1.0f);


    else if(vertices[i].Pos.y < 12.0f)


      // Dark yellow-green.

      vertices[i].Color = XMFLOAT4(0.1f, 0.48f, 0.19f, 1.0f);


    else if(vertices[i].Pos.y < 20.0f)


      // Dark brown.

      vertices[i].Color = XMFLOAT4(0.45f, 0.39f, 0.34f, 1.0f);




      // White snow.

      vertices[i].Color = XMFLOAT4(1.0f, 1.0f, 1.0f, 1.0f);



  const UINT vbByteSize = (UINT)vertices.size() * sizeof(Vertex);

  std::vector<std::uint16_t> indices = grid.GetIndices16();

  const UINT ibByteSize = (UINT)indices.size() * sizeof(std::uint16_t);

  auto geo = std::make_unique<MeshGeometry>();

  geo->Name = "landGeo";

  ThrowIfFailed(D3DCreateBlob(vbByteSize, &geo->VertexBufferCPU));

  CopyMemory(geo->VertexBufferCPU->GetBufferPointer(),, vbByteSize);

  ThrowIfFailed(D3DCreateBlob(ibByteSize, &geo->IndexBufferCPU));

  CopyMemory(geo->IndexBufferCPU->GetBufferPointer(),, ibByteSize);

  geo->VertexBufferGPU = d3dUtil::CreateDefaultBuffer(md3dDevice.Get(),

    mCommandList.Get(),, vbByteSize, geo->VertexBufferUploader);

  geo->IndexBufferGPU = d3dUtil::CreateDefaultBuffer(md3dDevice.Get(),

    mCommandList.Get(),, ibByteSize,  geo->IndexBufferUploader);

  geo->VertexByteStride = sizeof(Vertex);

  geo->VertexBufferByteSize = vbByteSize;

  geo->IndexFormat = DXGI_FORMAT_R16_UINT;

  geo->IndexBufferByteSize = ibByteSize;

  SubmeshGeometry submesh;

  submesh.IndexCount = (UINT)indices.size();

  submesh.StartIndexLocation = 0;

  submesh.BaseVertexLocation = 0;

  geo->DrawArgs["grid"] = submesh;

  mGeometries["landGeo"] = std::move(geo);


The function f(xz) we have used in this demo is given by:

float LandAndWavesApp::GetHeight(float x, float z)const


  return 0.3f*(z*sinf(0.1f*x) + x*cosf(0.1f*z));


Its graph looks like somewhat like a terrain with hills and valleys (see Figure 7.7).

7.7.4 Root CBVs

Another change we make to the “Land and Waves” demo from the previous “Shape” demos is that we use root descriptors so that we can bind CBVs directly without having to use a descriptor heap. Here are the changes that need to be made to do this:

1.    The root signature needs to be changed to take two root CBVs instead of two descriptor tables.

2.    No CBV heap is needed nor needs to be populated with descriptors.

3.    There is new syntax for binding a root descriptor.

The new root signature is defined like so:

// Root parameter can be a table, root descriptor or root constants.

CD3DX12_ROOT_PARAMETER slotRootParameter[2];

// Create root CBV.

slotRootParameter[0].InitAsConstantBufferView(0); // per-object CBV

slotRootParameter[1].InitAsConstantBufferView(1); // per-pass CBV

// A root signature is an array of root parameters.

CD3DX12_ROOT_SIGNATURE_DESC rootSigDesc(2, slotRootParameter, 0, 


Observe that we use the InitAsConstantBufferView helper method to create root CBV; the parameter specifies the shader register this parameter is bound to (in the above code, shader constant buffer register “b0” and “b1”).

Now, we bind a CBV as an argument to a root descriptor using the following method:



  UINT RootParameterIndex,

  D3D12_GPU_VIRTUAL_ADDRESS BufferLocation); 

1.    RootParameterIndex: The index of the root parameter we are binding a CBV to.

2.    BufferLocation: The virtual address to the resource that contains the constant buffer data.

With this change, our drawing code now looks like this:

void LandAndWavesApp::Draw(const GameTimer& gt)



  // Bind per-pass constant buffer. We only need to do this once per-

  // pass.

  auto passCB = mCurrFrameResource->PassCB->Resource();

  mCommandList->SetGraphicsRootConstantBufferView(1, passCB-


  DrawRenderItems(mCommandList.Get(), mRitemLayer[(int)RenderLayer::Opaque]);



void LandAndWavesApp::DrawRenderItems(

  ID3D12GraphicsCommandList* cmdList, 

  const std::vector<RenderItem*>& ritems)


  UINT objCBByteSize = d3dUtil::CalcConstantBufferByteSize(sizeof


  auto objectCB = mCurrFrameResource->ObjectCB->Resource();

  // For each render item…

  for(size_t i = 0; i < ritems.size(); ++i)


    auto ri = ritems[i];

    cmdList->IASetVertexBuffers(0, 1, &ri->Geo->VertexBufferView());



    D3D12_GPU_VIRTUAL_ADDRESS objCBAddress =  objectCB->GetGPUVirtualAddress();

    objCBAddress += ri->ObjCBIndex*objCBByteSize;

    cmdList->SetGraphicsRootConstantBufferView(0, objCBAddress);

    cmdList->DrawIndexedInstanced(ri->IndexCount, 1, 

      ri->StartIndexLocation, ri->BaseVertexLocation, 0);



7.7.5 Dynamic Vertex Buffers

So far we have stored our vertices in a default buffer resource. We use this kind of resource when we want to store static geometry. That is, geometry that we do not change—we set the data, and the GPU reads and draws the data. A dynamic vertex buffer is where we change the vertex data frequently, say per-frame. For example, suppose we are doing a wave simulation, and we solve the wave equation for the solution function f(xzt). This function represents the wave height at each point in the xz-plane at time t. If we were to use this function to draw the waves, we would use a triangle grid mesh like we did with the peaks and valleys, and apply f(xzt) to each grid point in order to obtain the wave heights at the grid points. Because this function also depends on time t (i.e., the wave surface changes with time), we would need to reapply this function to the grid points a short time later (say every 1/30th of a second) to get a smooth animation. Thus, we need a dynamic vertex buffer in order to update the heights of the triangle grid mesh vertices as time passes. Another situation that leads to dynamic vertex buffers is particle systems with complex physics and collision detection. Each frame we will do the physics and collision detection on the CPU to find the new position of the particles. Because the particle positions are changing each frame, we need a dynamic vertex buffer in order to update the particle positions for drawing each frame.

We have already seen an example of uploading data from the CPU to the GPU per-frame when we used upload buffers to update our constant buffer data. We can apply the same technique and use our UploadBuffer class, but instead of storing an array of constant buffers, we store an array of vertices:

  std::unique_ptr<UploadBuffer<Vertex>> WavesVB = nullptr;

  WavesVB = std::make_unique<UploadBuffer<Vertex>>(

    device, waveVertCount, false);

Because we need to upload the new contents from the CPU to the wave’s dynamic vertex buffer every frame, the dynamic vertex buffer needs to be a frame resource. Otherwise we could overwrite the memory before the GPU has finished processing the last frame.

Every frame, we run the wave simulation and update the vertex buffer like so:

void LandAndWavesApp::UpdateWaves(const GameTimer& gt)


  // Every quarter second, generate a random wave.

  static float t_base = 0.0f;

  if((mTimer.TotalTime() - t_base) >= 0.25f)


    t_base += 0.25f;

    int i = MathHelper::Rand(4, mWaves->RowCount() - 5);

    int j = MathHelper::Rand(4, mWaves->ColumnCount() - 5);

    float r = MathHelper::RandF(0.2f, 0.5f);

    mWaves->Disturb(i, j, r);


  // Update the wave simulation.


  // Update the wave vertex buffer with the new solution.

  auto currWavesVB = mCurrFrameResource->WavesVB.get();

  for(int i = 0; i < mWaves->VertexCount(); ++i)


    Vertex v;

    v.Pos = mWaves->Position(i);

    v.Color = XMFLOAT4(DirectX::Colors::Blue);

    currWavesVB->CopyData(i, v);


  // Set the dynamic VB of the wave renderitem to the current frame VB.

  mWavesRitem->Geo->VertexBufferGPU = currWavesVB->Resource();




We save a reference to the wave render item (mWavesRitem) so that we can set its vertex buffer on the fly. We need to do this because its vertex buffer is a dynamic buffer and changes every frame.

There is some overhead when using dynamic buffers, as the new data must be transferred from CPU memory back up to GPU memory. Therefore, static buffers should be preferred to dynamic buffers, provided static buffers will work. Recent versions of Direct3D have introduced new features to lessen the need for dynamic buffers. For instance:

1.    Simple animations may be done in a vertex shader.

2.    It is possible, through render to texture or compute shaders and vertex texture fetch functionality, to implement a wave simulation like the one described above that runs completely on the GPU.

3.    The geometry shader provides the ability for the GPU to create or destroy primitives, a task that would normally need to be done on the CPU without a geometry shader.

4.    The tessellation stages can add tessellate geometry on the GPU, a task that would normally need to be done on the CPU without hardware tessellation.

Index buffers can be dynamic, too. However, in the “Land and Waves” demo, the triangle topology remains constant and only the vertex heights change; therefore, only the vertex buffer needs to be dynamic.

The “Waves” demo for this chapter uses a dynamic vertex buffer to implement a simple wave simulation like the one described at the beginning of this section. For this book, we are not concerned with the actual algorithm details for the wave simulation (see [Lengyel02] for that), but more with the process so as to illustrate dynamic buffers: update the simulation on CPU and then update the vertex data using an upload buffer.



We mention again that this demo could be implemented on the GPU using more advanced methods such as render to texture functionality or the compute shader, and vertex texture fetch. Because we have not covered these topics yet, we do the wave simulation on the CPU and update the new vertices using dynamic vertex buffers.


1.    Waiting for the GPU to finish executing all the commands in the queue every frame is inefficient because it causes both the CPU and GPU to idle at some point. A more efficient technique is to create frame resources—a circular array of the resources the CPU needs to modify each frame. This way, the CPU does not need to wait for the GPU to finish before moving on to the next frame; the CPU will just work with the next available (i.e., not in use by GPU) frame resource. If the CPU is always processing frames faster than the GPU, then eventually the CPU will have to wait at some point for the GPU to catch up, but this is the desired situation, as the GPU is being fully utilized; the extra CPU cycles can always be used for other parts of the game such as AI, physics, and game play logic.

2.    We can get a handle to the first descriptor in a heap with the ID3D12DescriptorHeap::GetCPUDescriptorHandleForHeapStart method. We can get a descriptor size (depends on the hardware and descriptor type) with the ID3D12Device::GetDescriptorHandleIncrementSize(DescriptorHeapType type) method. Once we know the descriptor increment size, we can use one of the two CD3DX12_CPU_DESCRIPTOR_HANDLE::Offset methods to offset the handle by n descriptors:

// Specify the number of descriptors to offset times the descriptor

// increment size:

D3D12_CPU_DESCRIPTOR_HANDLE handle = mCbvHeap->


handle.Offset(n * mCbvSrvDescriptorSize);

// Or equivalently, specify the number of descriptors to offset,

// followed by the descriptor increment size:

D3D12_CPU_DESCRIPTOR_HANDLE handle = mCbvHeap->GetCPUDescriptorHandleForHeapStart();

handle.Offset(n, mCbvSrvDescriptorSize);

The CD3DX12_GPU_DESCRIPTOR_HANDLE type has the same Offset methods.

3.    A root signature defines what resources need to be bound to the pipeline before issuing a draw call and how those resources get mapped to shader input registers. What resources need to be bound depends on what resources the bound shader programs expect. When the PSO is created, the root signature and shader programs combination will be validated. A root signature is specified as an array of root parameters. A root parameter can be a descriptor table, root descriptor, or root constant. A descriptor table specifies a contiguous range of descriptors in a heap. A root descriptor is used to bind a descriptor directly in the root signature (it does not need to be in a heap). Root constants are used to bind constant values directly in the root signature. For performance, there is a limit of sixty-four DWORDs that can be put in a root signature. Descriptor tables cost one DWORD each, root descriptors cost two DWORDs each, and root constants cost one DWORD for each 32-bit constant. The hardware automatically saves a snapshot of the root arguments for each draw call. Thus we are safe to change root arguments per draw call, however, we should also try to keep the root signatures small so there is less memory to copy.

4.    Dynamic vertex buffers are used when the contents of a vertex buffer needs to be updated frequently at runtime (e.g., every frame or every 1/30th of a second). We can use an UploadBuffer to implement dynamic vertex buffers, but instead of storing an array of constant buffers, we store an array of vertices. Because we need to upload the new contents from the CPU to the wave’s dynamic vertex buffer every frame, the dynamic vertex buffer needs to be a frame resource. There is some overhead when using dynamic vertex buffers, as the new data must be transferred from CPU memory back up to GPU memory. Therefore, static vertex buffers should be preferred to dynamic vertex buffers, provided static vertex buffers will work. Recent versions of Direct3D have introduced new features to lessen the need for dynamic buffers.


1.    Modify the “Shape” demo to use GeometryGenerator::CreateGeosphere instead of GeometryGenerator::CreateSphere. Try with 0, 1, 2, and 3 subdivision levels.

2.    Modify the “Shapes” demo to use sixteen root constants to set the per-object world matrix instead of a descriptor table.

3.    On the DVD, there a file called Models/Skull.txt. This file contains the vertex and index lists needed to render the skull in Figure 7.11. Study the file using a text editor like notepad, and modify the “Shapes” demo to load and render the skull mesh.


Figure 7.11.  Rendered output of Exercise 4.