GPU Gems 2 Programming Techniques for High-Performance Graphics and General-Purpose Computation

by Pharr, Matt; Fernando, Randima, (Series Editor)

Edition: 1st

ISBN13: 9780321335593

ISBN10: 0321335597

Format: Hardcover

Pub. Date: 2005-03-03

Publisher(s): Addison-Wesley Professional

Other versions by this Author

List Price: ~~$74.99~~

Rent Book

Select for Price

Add to Cart

There was a problem. Please try again later.

New Book

We're Sorry
Sold Out

Used Book

We're Sorry
Sold Out

eBook

We're Sorry
Not Available

Buy from our Marketplace starting at $24.69

Summary

More useful techniques, tips, and tricks for harnessing the power of the new generation of powerful GPUs.

Author Biography

Matt Pharr is a software engineer at NVIDIA. Matt is also the coauthor of the book Physically Based Rendering: From Theory to Implementation (Morgan Kaufmann, 2004).

Randima (Randy) Fernando is Manager of Developer Education at NVIDIA.

Foreword

xxix

Preface

xxxi

Contributors

xxxv

PART I GEOMETRIC COMPLEXITY

(136)

Toward Photorealism in Virtual Botany

(20)

David Whatley

Scene Management

(4)

The Planting Grid

(1)

Planting Strategy

(1)

Real-Time Optimization

(1)

The Grass Layer

(6)

Simulating Alpha Transparency via Dissolve

(2)

Variation

(1)

Lighting

(2)

Wind

(1)

The Ground Clutter Layer

(1)

The Tree and Shrub Layers

(2)

Shadowing

(2)

Post-Processing

(2)

Sky Dome Blooming

(1)

Full-Scene Glow

(1)

Conclusion

(1)

References

(3)

Terrain Rendering Using GPU-Based Geometry Clipmaps

(20)

Arul Asirvatham

Hugues Hoppe

Review of Geometry Clipmaps

(3)

Overview of GPU Implementation

(2)

Data Structures

(1)

Clipmap Size

(1)

Rendering

(7)

Active Levels

(1)

Vertex and Index Buffers

(2)

View Frustum Culling

(1)

DrawPrimitive Calls

(1)

The Vertex Shader

(2)

The Pixel Shader

(1)

Update

(4)

Upsampling

(2)

Residuals

(1)

Normal Map

(1)

Results and Discussion

(1)

Summary and Improvements

(1)

Vertex Textures

(1)

Eliminating Normal Maps

(1)

Memory-Free Terrain Synthesis

(1)

References

(3)

Inside Geometry Instancing

(22)

Francesco Carucci

Why Geometry Instancing?

(1)

Definitions

(4)

Geometry Packet

(1)

Instance Attributes

(1)

Geometry Instance

(1)

Render and Texture Context

(1)

Geometry Batch

(3)

Implementation

(12)

Static Batching

(2)

Dynamic Batching

(1)

Vertex Constants Instancing

(4)

Batching with the Geometry Instancing API

(4)

Conclusion

(2)

References

(2)

Segment Buffering

(6)

Jon Olick

The Problem Space

(1)

The Solution

(1)

The Method

(1)

Segment Buffering, Step 1

(1)

Segment Buffering, Step 2

(1)

Segment Buffering, Step 3

(1)

Improving the Technique

(1)

Conclusion

(1)

References

(2)

Optimizing Resource Management with Multistreaming

(16)

Oliver Hoeller0

Kurt Pelzer

Overview

(1)

Implementation

(12)

Multistreaming with DirectX 9.0

(3)

Resource Management

(2)

Processing Vertices

(6)

Conclusion

(1)

References

(1)

Hardware Occlusion Queries Made Useful

(18)

Michael Wimmer

Jiri Bittner

Introduction

(1)

For Which Scenes Are Occlusion Queries Effective?

(1)

What Is Occlusion Culling?

(1)

Hierarchical Stop-and-Wait Method

(3)

The Naive Algorithm, or Why Use Hierarchies at All?

(1)

Hierarchies to the Rescue!

(1)

Hierarchical Algorithm

(1)

Problem 1: Stalls

(1)

Problem 2: Query Overhead

(1)

Coherent Hierarchical Culling

(8)

Idea 1: Being Smart and Guessing

(1)

Idea 2: Pull Up, Pull Up

(1)

Algorithm

(1)

Implementation Details

100

(3)

Why Are There Fewer Stalls?

103

(1)

Why Are There Fewer Queries?

104

(1)

How to Traverse the Hierarchy

104

(1)

Optimizations

105

(1)

Querying with Actual Geometry

105

(1)

Z-Only Rendering Pass

105

(1)

Approximate Visibility

105

(1)

Conservative Visibility Testing

106

(1)

Conclusion

106

(2)

References

108

(1)

Adaptive Tessellation of Subdivision Surfaces with Displacement Mapping

109

(14)

Michael Bunnell

Subdivision Surfaces

109

(10)

Some Definitions

110

(1)

Catmull-Clark Subdivisions

110

(1)

Using Subdivision for Tessellation

111

(1)

Patching the Surface

112

(2)

The GPU Tessellation Algorithm

114

(4)

Watertight Tessellation

118

(1)

Displacement Mapping

119

(3)

Changing the Flatness Test

120

(1)

Shading Using Normal Mapping

120

(2)

Conclusion

122

(1)

References

122

(1)

Per-Pixel Displacement Mapping with Distance Functions

123

(14)

William Donnelly

Introduction

123

(2)

Previous Work

125

(1)

The Distance-Mapping Algorithm

126

(4)

Arbitrary Meshes

129

(1)

Computing the Distance Map

130

(1)

The Shaders

130

(2)

The Vertex Shader

130

(1)

The Fragment Shader

130

(2)

A Note on Filtering

132

(1)

Results

132

(2)

Conclusion

134

(1)

References

135

(2)

PART II SHADING, LIGHTING, AND SHADOWS

137

(170)

Deferred Shading in S.T.A.L.K.E.R.

143

(24)

Oles Shishkovtsov

Introduction

143

(2)

The Myths

145

(2)

Optimizations

147

(7)

What to Optimize

147

(1)

Lighting Optimizations

148

(3)

G-Buffer-Creation Optimizations

151

(2)

Shadowing Optimizations

153

(1)

Improving Quality

154

(4)

The Power of ``Virtual Position''

155

(1)

Ambient Occlusion

156

(1)

Materials and Surface-Light Interaction

157

(1)

Antialiasing

158

(4)

Efficient Tone Mapping

161

(1)

Dealing with Transparency

162

(1)

Things We Tried but Did Not Include in the Final Code

162

(2)

Elevation Maps

163

(1)

Real-Time Global Illumination

163

(1)

Conclusion

164

(1)

References

165

(2)

Real-Time Computation of Dynamic Irradiance Environment Maps

167

(10)

Gary King

Irradiance Environment Maps

167

(3)

Spherical Harmonic Convolution

170

(2)

Mapping to the GPU

172

(3)

Spatial to Frequency Domain

172

(1)

Convolution and Back Again

173

(2)

Further Work

175

(1)

Conclusion

176

(1)

References

176

(1)

Approximate Bidirectional Texture Functions

177

(12)

Jan Kautz

Introduction

177

(2)

Acquisition

179

(2)

Setup and Acquisition

179

(1)

Assembling the Shading Map

179

(2)

Rendering

181

(3)

Detailed Algorithm

181

(1)

Real-Time Rendering

182

(2)

Results

184

(3)

Discussion

186

(1)

Conclusion

187

(1)

References

187

(2)

Tile-Based Texture Mapping

189

(12)

Li-Yi Wei

Our Approach

191

(1)

Texture Tile Construction

191

(1)

Texture Tile Packing

192

(3)

Texture Tile Mapping

195

(2)

Mipmap Issues

197

(1)

Conclusion

198

(1)

References

199

(2)

Implementing the mental images Phenomena Renderer on the GPU

201

(22)

Martin-Karl Lefrancois

Introduction

201

(1)

Shaders and Phenomena

202

(3)

Implementing Phenomena Using Cg

205

(16)

The Cg Vertex Program and the Varying Parameters

205

(2)

The main ( ) Entry Point for Fragment Shaders

207

(1)

The General Shader Interfaces

207

(1)

Example of a Simple Shader

208

(3)

Global State Variables

211

(1)

Light Shaders

211

(4)

Texture Shaders

215

(1)

Bump Mapping

216

(1)

Environment and Volume Shaders

217

(1)

Shaders Returning Structures

218

(2)

Rendering Hair

220

(1)

Putting It All Together

220

(1)

Conclusion

221

(1)

References

222

(1)

Dynamic Ambient Occlusion and Indirect Lighting

223

(12)

Michael Bunnell

Surface Elements

223

(2)

Ambient Occlusion

225

(6)

The Multipass Shadowing Algorithm

226

(2)

Improving Performance

228

(3)

Indirect Lighting and Area Lights

231

(1)

Conclusion

232

(1)

References

233

(2)

Blueprint Rendering and ``Sketchy Drawings''

235

(18)

Marc Nienhaus

Jurgen Dollner

Basic Principles

236

(2)

Intermediate Rendering Results

236

(1)

Edge Enhancement

236

(1)

Depth Sprite Rendering

237

(1)

Blueprint Rendering

238

(6)

Depth Peeling

238

(3)

Extracting Visible and Nonvisible Edges

241

(1)

Composing Blueprints

241

(1)

Depth Masking

242

(2)

Visualizing Architecture Using Blueprint Rendering

244

(1)

Sketchy Rendering

244

(7)

Edges and Color Patches

245

(1)

Applying Uncertainty

245

(2)

Adjusting Depth

247

(1)

Variations of Sketchy Drawing

247

(1)

Controlling Uncertainty

248

(2)

Reducing the Shower-Door Effect

250

(1)

Conclusion

251

(1)

References

252

(1)

Accurate Atmospheric Scattering

253

(16)

Sean O'Neil

Introduction

253

(1)

Solving the Scattering Equations

254

(4)

Rayleigh Scattering vs. Mie Scattering

255

(1)

The Phase Function

256

(1)

The Out-Scattering Equation

256

(1)

The In-Scattering Equation

257

(1)

The Surface-Scattering Equation

257

(1)

Making It Real-Time

258

(2)

Squeezing It into a Shader

260

(2)

Eliminating One Dimension

260

(1)

Eliminating the Other Dimension

261

(1)

Implementing the Scattering Shaders

262

(3)

The Vertex Shader

262

(2)

The Fragment Shader

264

(1)

Adding High-Dynamic-Range Rendering

265

(1)

Conclusion

266

(1)

References

267

(2)

Efficient Soft-Edged Shadows Using Pixel Shader Branching

269

(14)

Yury Uralsky

Current Shadowing Techniques

270

(1)

Soft Shadows with a Single Shadow Map

271

(10)

Blurring Hard-Edged Shadows

271

(3)

Improving Efficiency

274

(3)

Implementation Details

277

(4)

Conclusion

281

(1)

References

282

(1)

Using Vertex Texture Displacement for Realistic Water Rendering

283

(12)

Yuri Kryachko

Water Models

283

(1)

Implementation

284

(10)

Water Surface Model

284

(1)

Implementation Details

285

(1)

Sampling Height Maps

286

(2)

Quality Improvements and Optimizations

288

(4)

Rendering Local Perturbations

292

(2)

Conclusion

294

(1)

References

294

(1)

Generic Refraction Simulation

295

(12)

Tiago Sousa

Basic Technique

296

(1)

Refraction Mask

297

(3)

Examples

300

(5)

Water Simulation

300

(3)

Glass Simulation

303

(2)

Conclusion

305

(1)

References

305

(2)

PART III HIGH-QUALITY RENDERING

307

(144)

Fast Third-Order Texture Filtering

313

(18)

Christian Sigg

Markus Hadwiger

Higher-Order Filtering

314

(1)

Fast Recursive Cubic Convolution

315

(5)

Mipmapping

320

(4)

Derivative Reconstruction

324

(3)

Conclusion

327

(1)

References

328

(3)

High-Quality Antialiased Rasterization

331

(14)

Dan Wexler

Eric Enderton

Overview

331

(3)

Downsampling

334

(2)

Comparison to Existing Hardware and Software

334

(2)

Downsampling on the GPU

336

(1)

Padding

336

(1)

Filter Details

337

(1)

Two-Pass Separable Filtering

338

(1)

Tiling and Accumulation

339

(1)

The Code

339

(5)

The Rendering Loop

340

(1)

The Downsample Class

341

(2)

Implementation Details

343

(1)

Conclusion

344

(1)

References

344

(1)

Fast Prefiltered Lines

345

(16)

Eric Chan

Fredo Durand

Why Sharp Lines Look Bad

345

(2)

Bandlimiting the Signal

347

(2)

Prefiltering

347

(2)

The Preprocess

349

(2)

Runtime

351

(4)

Line Setup (CPU)

352

(1)

Table Lookups (GPU)

353

(2)

Implementation Issues

355

(1)

Drawing Fat Lines

355

(1)

Compositing Multiple Lines

355

(1)

Examples

356

(2)

Conclusion

358

(1)

References

359

(2)

Hair Animation and Rendering in the Nalu Demo

361

(20)

Hubert Nguyen

William Donnelly

Hair Geometry

362

(4)

Layout and Growth

362

(1)

Controlling the Hair

362

(2)

Data Flow

364

(1)

Tessellation

364

(1)

Interpolation

364

(2)

Dynamics and Collisions

366

(3)

Constraints

366

(1)

Collisions

367

(1)

Fins

368

(1)

Hair Shading

369

(9)

A Real-Time Reflectance Model for Hair

369

(6)

Real-Time Volumetric Shadows in Hair

375

(3)

Conclusion and Future Work

378

(2)

References

380

(1)

Using Lookup Tables to Accelerate Color Transformations

381

(12)

Jeremy Selan

Lookup Table Basics

381

(5)

One-Dimensional LUTs

382

(1)

Three-Dimensional LUTs

383

(2)

Interpolation

385

(1)

Implementation

386

(6)

Strategy for Mapping LUTs to the GPU

386

(1)

Cg Shader

386

(3)

System Integration

389

(1)

Extending 3D LUTs for Use with High-Dynamic-Range Imagery

390

(2)

Conclusion

392

(1)

References

392

(1)

GPU Image Processing in Apple's Motion

393

(16)

Pete Warden

Design

393

(4)

Loves and Loathings

394

(2)

Pick a Language

396

(1)

CPU Fallback

396

(1)

Implementation

397

(9)

GPU Resource Limits

397

(2)

Division by Zero

399

(1)

Loss of Vertex Components

400

(1)

Bilinear Filtering

400

(5)

High-Precision Storage

405

(1)

Debugging

406

(1)

Conclusion

407

(1)

References

408

(1)

Implementing Improved Perlin Noise

409

(8)

Simon Green

Random but Smooth

409

(1)

Storage vs. Computation

410

(1)

Implementation Details

411

(4)

Optimization

415

(1)

Conclusion

415

(1)

References

416

(1)

Advanced High-Quality Filtering

417

(20)

Justin Novosad

Implementing Filters on GPUs

417

(5)

Accessing Image Samples

418

(1)

Convolution Filters

419

(3)

The Problem of Digital Image Resampling

422

(8)

Background

423

(1)

Antialiasing

423

(4)

Image Reconstruction

427

(3)

Shock Filtering: A Method for Deblurring Images

430

(3)

Filter Implementation Tips

433

(1)

Advanced Applications

433

(1)

Time Warping

433

(1)

Motion Blur Removal

434

(1)

Adaptive Texture Filtering

434

(1)

Conclusion

434

(1)

References

435

(2)

Mipmap-Level Measurement

437

(14)

Iain Cantlay

Which Mipmap Level Is Visible?

438

(1)

GPU to the Rescue

439

(8)

Counting Pixels

439

(3)

Practical Considerations in an Engine

442

(3)

Extensions

445

(2)

Sample Results

447

(1)

Conclusion

448

(1)

References

449

(2)

PART IV GENERAL-PURPOSE COMPUTATION ON GPUS: A PRIMER

451

(140)

Streaming Architectures and Technology Trends

457

(14)

John Owens

Technology Trends

457

(4)

Core Technology Trends

458

(1)

Consequences

458

(3)

Keys to High-Performance Computing

461

(3)

Methods for Efficient Computation

462

(1)

Methods for Efficient Communication

462

(1)

Contrast to CPUs

463

(1)

Stream Computation

464

(4)

The Stream Programming Model

464

(2)

Building a Stream Processor

466

(2)

The Future and Challenges

468

(2)

Challenge: Technology Trends

468

(1)

Challenge: Power Management

468

(1)

Challenge: Supporting More Programmability and Functionality

469

(1)

Challenge: GPU Functionality Subsumed by CPU (or Vice Versa)?

470

(1)

References

470

(1)

The GeForce 6 Series GPU Architecture

471

(22)

Emmett Kilgariff

Randima Fernando

How the GPU Fits into the Overall Computer System

471

(2)

Overall System Architecture

473

(8)

Functional Block Diagram for Graphics Operations

473

(5)

Functional Block Diagram for Non-Graphics Operations

478

(3)

GPU Features

481

(7)

Fixed-Function Features

481

(2)

Shader Model 3.0 Programming Model

483

(5)

Supported Data Storage Formats

488

(1)

Performance

488

(2)

Achieving Optimal Performance

490

(1)

Use Z-Culling Aggressively

490

(1)

Exploit Texture Math When Loading Data

490

(1)

Use Branching in Fragment Programs Judiciously

490

(1)

Use fp16 Intermediate Values Wherever Possible

491

(1)

Conclusion

491

(2)

Mapping Computational Concepts to GPUs

493

(16)

Mark Harris

The Importance of Data Parallelism

493

(4)

What Kinds of Computation Map Well to GPUs?

494

(1)

Example: Simulation on a Grid

495

(1)

Stream Communication: Gather vs. Scatter

496

(1)

An Inventory of GPU Computational Resources

497

(3)

Programmable Parallel Processors

497

(3)

CPU-GPU Analogies

500

(3)

Streams: GPU Textures = CPU Arrays

500

(1)

Kernels: GPU Fragment Programs = CPU ``Inner Loops''

500

(1)

Render-to-Texture = Feedback

501

(1)

Geometry Rasterization = Computation Invocation

501

(1)

Texture Coordinates = Computational Domain

501

(1)

Vertex Coordinates = Computational Range

502

(1)

Reductions

502

(1)

From Analogies to Implementation

503

(2)

Putting It All Together: A Basic GPGPU Framework

503

(2)

A Simple Example

505

(3)

Conclusion

508

(1)

References

508

(1)

Taking the Plunge into GPU Computing

509

(12)

Ian Buck

Choosing a Fast Algorithm

509

(4)

Locality, Locality, Locality

510

(1)

Letting Computation Rule

511

(1)

Considering Download and Readback

512

(1)

Understanding Floating Point

513

(2)

Address Calculation

514

(1)

Implementing Scatter

515

(3)

Converting to Gather

515

(1)

Address Sorting

516

(2)

Rendering Points

518

(1)

Conclusion

518

(1)

References

519

(2)

Implementing Efficient Parallel Data Structures on GPUs

521

(26)

Aaron Lefohn

Joe Kniss

John Owens

Programming with Streams

521

(3)

The GPU Memory Model

524

(4)

Memory Hierarchy

524

(1)

GPU Stream Types

525

(2)

GPU Kernel Memory Access

527

(1)

GPU-Based Data Structures

528

(12)

Multidimensional Arrays

528

(6)

Structures

534

(1)

Sparse Data Structures

535

(5)

Performance Considerations

540

(3)

Dependent Texture Reads

540

(1)

Computational Frequency

541

(1)

Pbuffer Survival Guide

541

(2)

Conclusion

543

(1)

References

544

(3)

GPU Flow-Control Idioms

547

(10)

Mark Harris

Ian Buck

Flow-Control Challenges

547

(2)

Basic Flow-Control Strategies

549

(5)

Predication

549

(1)

Moving Branching up the Pipeline

549

(1)

Z-Cull

550

(3)

Branching Instructions

553

(1)

Choosing a Branching Mechanism

553

(1)

Data-Dependent Looping with Occlusion Queries

554

(1)

Conclusion

555

(2)

GPU Program Optimization

557

(16)

Cliff Woolley

Data-Parallel Computing

557

(4)

Instruction-Level Parallelism

558

(2)

Data-Level Parallelism

560

(1)

Computational Frequency

561

(7)

Precomputation of Loop Invariants

563

(1)

Precomputation Using Lookup Tables

564

(2)

Avoid Inner-Loop Branching

566

(1)

The Swizzle Operator

566

(2)

Profiling and Load Balancing

568

(2)

Conclusion

570

(1)

References

570

(3)

Stream Reduction Operations for GPGPU Applications

573

(18)

Daniel Horn

Filtering Through Compaction

574

(5)

Running Sum Scan

574

(1)

Scatter Through Search/Gather

575

(4)

Filtering Performance

579

(1)

Motivation: Collision Detection

579

(4)

Filtering for Subdivision Surfaces

583

(4)

Subdivision on Streaming Architectures

584

(3)

Conclusion

587

(1)

References

587

(4)

PART V IMAGE-ORIENTED COMPUTING

591

(100)

Octree Textures on the GPU

595

(20)

Sylvain Lefebvre

Samuel Hornus

Fabrice Neyret

A GPU-Accelerated Hierarchical Structure: The N3-Tree

597

(5)

Definition

597

(1)

Implementation

598

(4)

Application 1: Painting on Meshes

602

(9)

Creating the Octree

603

(1)

Painting

604

(1)

Rendering

604

(3)

Converting the Octree Texture to a Standard 2D Texture

607

(4)

Application 2: Surface Simulation

611

(1)

Conclusion

612

(1)

References

613

(2)

High-Quality Global Illumination Rendering Using Rasterization

615

(20)

Toshiya Hachisuka

Global Illumination via Rasterization

616

(1)

Overview of Final Gathering

617

(4)

Two-Pass Methods

617

(1)

Final Gathering

618

(1)

Problems with Two-Pass Methods

619

(2)

Final Gathering via Rasterization

621

(4)

Clustering of Final Gathering Rays

621

(2)

Ray Casting as Multiple Parallel Projection

623

(2)

Implementation Details

625

(2)

Initialization

625

(1)

Depth Peeling

626

(1)

Sampling

627

(1)

Performance

627

(1)

A Global Illumination Renderer on the GPU

627

(5)

The First Pass

628

(1)

Generating Visible Points Data

628

(1)

The Second Pass

629

(1)

Additional Solutions

629

(3)

Conclusion

632

(1)

References

632

(3)

Global Illumination Using Progressive Refinement Radiosity

635

(14)

Greg Coombe

Mark Harris

Radiosity Foundations

636

(2)

Progressive Refinement

637

(1)

GPU Implementation

638

(5)

Visibility Using Hemispherical Projection

639

(2)

Form Factor Computation

641

(2)

Choosing the Next Shooter

643

(1)

Adaptive Subdivision

643

(2)

Texture Quadtree

644

(1)

Quadtree Subdivision

644

(1)

Performance

645

(1)

Conclusion

645

(2)

References

647

(2)

Computer Vision on the GPU

649

(18)

James Fung

Introduction

649

(1)

Implementation Framework

650

(1)

Application Examples

651

(13)

Using Sequences of Fragment Programs for Computer Vision

651

(4)

Summation Operations

655

(3)

Systems of Equations for Creating Image Panoramas

658

(3)

Feature Vector Computations

661

(3)

Parallel Computer Vision Processing

664

(1)

Conclusion

664

(1)

References

665

(2)

Deferred Filtering: Rendering from Difficult Data Formats

667

(10)

Joe Kniss

Aaron Lefohn

Nathaniel Fout

Introduction

667

(1)

Why Defer?

668

(1)

Deferred Filtering Algorithm

669

(4)

Why It Works

673

(1)

Conclusions: When to Defer

673

(1)

References

674

(3)

Conservative Rasterization

677

(14)

Jon Hasselgren

Tomas Akenine-Moller

Lennart Ohlsson

Problem Definition

678

(1)

Two Conservative Algorithms

679

(7)

Clip Space

681

(1)

The First Algorithm

681

(2)

The Second Algorithm

683

(3)

Robustness Issues

686

(1)

Conservative Depth

687

(2)

Results and Conclusions

689

(1)

References

690

(1)

PART VI SIMULATION AND NUMERICAL ALGORITHMS

691

(94)

GPU Computing for Protein Structure Prediction

695

(8)

Paulius Micikevicius

Introduction

695

(2)

The Floyd-Warshall Algorithm and Distance-Bound Smoothing

697

(1)

GPU Implementation

698

(3)

Dynamic Updates

698

(1)

Indexing Data Textures

698

(1)

The Triangle Approach

699

(1)

Vectorization

699

(2)

Experimental Results

701

(1)

Conclusions and Further Work

701

(1)

References

702

(1)

A GPU Framework for Solving Systems of Linear Equations

703

(16)

Jens Kruger

Rudiger Westermann

Overview

703

(1)

Representation

704

(4)

The ``Single Float'' Representation

704

(1)

Vectors

704

(2)

Matrices

706

(2)

Operations

708

(6)

Vector Arithmetic

709

(1)

Vector Reduce

709

(1)

Matrix-Vector Product

710

(2)

Putting It All Together

712

(1)

Conjugate Gradient Solver

713

(1)

A Sample Partial Differential Equation

714

(4)

The Crank-Nicholson Scheme

716

(2)

Conclusion

718

(1)

References

718

(1)

Options Pricing on the GPU

719

(14)

Craig Kolb

Matt Pharr

What Are Options?

719

(2)

The Black-Scholes Model

721

(4)

Lattice Models

725

(5)

The Binomial Model

725

(1)

Pricing European Options

726

(4)

Conclusion

730

(1)

References

731

(2)

Improved GPU Sorting

733

(14)

Peter Kipfer

Rudiger Westermann

Sorting Algorithms

733

(1)

A Simple First Approach

734

(1)

Fast Sorting

735

(3)

Implementing Odd-Even Merge Sort

737

(1)

Using All GPU Resources

738

(7)

Implementing Bitonic Merge Sort

743

(2)

Conclusion

745

(1)

References

746

(1)

Flow Simulation with Complex Boundaries

747

(18)

Wei Li

Zhe Fan

Xiaoming Wei

Arie Kaufman

Introduction

747

(1)

The Lattice Boltzmann Method

748

(1)

GPU-Based LBM

749

(4)

Algorithm Overview

749

(2)

Packing

751

(1)

Streaming

752

(1)

GPU-Based Boundary Handling

753

(6)

GPU-Based Voxelization

754

(2)

Periodic Boundaries

756

(1)

Outflow Boundaries

756

(1)

Obstacle Boundaries

757

(2)

Visualization

759

(1)

Experimental Results

760

(1)

Conclusion

761

(2)

References

763

(2)

Medical Image Reconstruction with the FFT

765

(20)

Thilaka Sumanaweera

Donald Liu

Background

765

(1)

The Fourier Transform

766

(1)

The FFT Algorithm

767

(1)

Implementation on the GPU

768

(8)

Approach 1: Mostly Loading the Fragment Processor

770

(2)

Approach 2: Loading the Vertex Processor, the Rasterizer, and the Fragment Processor

772

(3)

Load Balancing

775

(1)

Benchmarking Results

775

(1)

The FFT in Medical Imaging

776

(7)

Magnetic Resonance Imaging

776

(2)

Results in MRI

778

(2)

Ultrasonic Imaging

780

(3)

Conclusion

783

(1)

References

784

(1)

Index

785

Excerpts

The first volume ofGPU Gemswas conceived in the spring of 2003, soon after the arrival of the first generation of fully programmable GPUs. The resulting book was released less than a year later and quickly became a best seller, providing a snapshot of the best ideas for making the most of the capabilities of the latest programmable graphics hardware. GPU programming is a rapidly changing field, and the time is already ripe for a sequel. In the handful of years since programmable graphics processors first became available, they have become faster and more flexible at an incredible pace. Early programmable GPUs supported programmability only at the vertex level, while today complex per-pixel programs are common. A year ago, real-time GPU programs were typically tens of instructions long, while this year's GPUs handle complex programs hundreds of instructions long and still render at interactive rates. Programmable graphics has even transcended the PC and is rapidly spreading to consoles, handheld gaming devices, and mobile phones. Until recently, performance-conscious developers might have considered writing their GPU programs in assembly language. These days, however, high-level GPU programming languages are ubiquitous. It is extremely rare for developers to bother writing assembly for GPUs anymore, thanks both to improvements in compilers and to the rapidly increasing capabilities of GPUs. (In contrast, it took many more years before game developers switched from writing their games in CPU assembly language to using higher-level languages.) This sort of rapid change makes a "gems"-style book a natural fit for assembling the state of the art and disseminating it to the developer community. Featuring chapters written by acknowledged experts,GPU Gems 2provides broad coverage of the most exciting new ideas in the field. Innovations in graphics hardware and programming environments have inspired further innovations in how to use programmability. While programmable shading has long been a staple of offline software rendering, the advent of programmability on GPUs has led to the invention of a wide variety of new techniques for programmable shading. Going far beyond procedural pattern generation and texture composition, the state of the art of using shaders on GPUs is rapidly breaking completely new ground, leading to novel techniques for animation, lighting, particle systems, and much more. Indeed, the flexibility and speed of GPUs have fostered considerable interest in doing computations on GPUs that go beyond computer graphics: general-purpose computation on GPUs, or "GPGPU." This volume of theGPU Gemsseries devotes a significant number of chapters to this new topic, including an overview of GPGPU programming techniques as well as in-depth discussions of a number of representative applications and key algorithms. As GPUs continue to increase in performance more quickly than CPUs, these topics will gain in importance for more and more programmers because GPUs will provide superior results for many computationally intensive applications. With this background, we sent out a public call for participation inGPU Gems 2.The response was overwhelming: more than 150 chapters were proposed in the short time that submissions were open, covering a variety of topics related to GPU programming. We were able to include only about a third of them in this volume; many excellent submissions could not be included purely because of constraints on the physical size of the book. It was difficult for the editors to whittle down the chapters to the 48 included here, and we would like to thank everyone who submitted proposals. The accepted chapters went through a rigorous review process in which the book's editors, the authors of other chapters in the same part of the book, and in some cases additional reviewers from NVIDIA ca

GPU Gems 2 Programming Techniques for High-Performance Graphics and General-Purpose Computation

Rent Book

New Book

Used Book

eBook

How Marketplace Works:

Summary

Author Biography

Table of Contents

Excerpts

Digital License