GPU Gems 2 Programming Techniques for High-Performance Graphics and General-Purpose Computation

by ;
Edition: 1st
Format: Hardcover
Pub. Date: 2005-03-03
Publisher(s): Addison-Wesley Professional
List Price: $74.99

Rent Book

Select for Price
There was a problem. Please try again later.

New Book

We're Sorry
Sold Out

Used Book

We're Sorry
Sold Out

eBook

We're Sorry
Not Available

How Marketplace Works:

  • This item is offered by an independent seller and not shipped from our warehouse
  • Item details like edition and cover design may differ from our description; see seller's comments before ordering.
  • Sellers much confirm and ship within two business days; otherwise, the order will be cancelled and refunded.
  • Marketplace purchases cannot be returned to eCampus.com. Contact the seller directly for inquiries; if no response within two days, contact customer service.
  • Additional shipping costs apply to Marketplace purchases. Review shipping costs at checkout.

Summary

More useful techniques, tips, and tricks for harnessing the power of the new generation of powerful GPUs.

Author Biography

Matt Pharr is a software engineer at NVIDIA. Matt is also the coauthor of the book Physically Based Rendering: From Theory to Implementation (Morgan Kaufmann, 2004).

Randima (Randy) Fernando is Manager of Developer Education at NVIDIA.



Table of Contents

Foreword xxix
Preface xxxi
Contributors xxxv
PART I GEOMETRIC COMPLEXITY
1(136)
Toward Photorealism in Virtual Botany
7(20)
David Whatley
Scene Management
7(4)
The Planting Grid
8(1)
Planting Strategy
9(1)
Real-Time Optimization
10(1)
The Grass Layer
11(6)
Simulating Alpha Transparency via Dissolve
13(2)
Variation
15(1)
Lighting
15(2)
Wind
17(1)
The Ground Clutter Layer
17(1)
The Tree and Shrub Layers
18(2)
Shadowing
20(2)
Post-Processing
22(2)
Sky Dome Blooming
23(1)
Full-Scene Glow
24(1)
Conclusion
24(1)
References
24(3)
Terrain Rendering Using GPU-Based Geometry Clipmaps
27(20)
Arul Asirvatham
Hugues Hoppe
Review of Geometry Clipmaps
27(3)
Overview of GPU Implementation
30(2)
Data Structures
31(1)
Clipmap Size
31(1)
Rendering
32(7)
Active Levels
32(1)
Vertex and Index Buffers
33(2)
View Frustum Culling
35(1)
DrawPrimitive Calls
35(1)
The Vertex Shader
36(2)
The Pixel Shader
38(1)
Update
39(4)
Upsampling
40(2)
Residuals
42(1)
Normal Map
42(1)
Results and Discussion
43(1)
Summary and Improvements
43(1)
Vertex Textures
44(1)
Eliminating Normal Maps
44(1)
Memory-Free Terrain Synthesis
44(1)
References
44(3)
Inside Geometry Instancing
47(22)
Francesco Carucci
Why Geometry Instancing?
48(1)
Definitions
49(4)
Geometry Packet
49(1)
Instance Attributes
49(1)
Geometry Instance
50(1)
Render and Texture Context
50(1)
Geometry Batch
50(3)
Implementation
53(12)
Static Batching
54(2)
Dynamic Batching
56(1)
Vertex Constants Instancing
57(4)
Batching with the Geometry Instancing API
61(4)
Conclusion
65(2)
References
67(2)
Segment Buffering
69(6)
Jon Olick
The Problem Space
69(1)
The Solution
70(1)
The Method
71(1)
Segment Buffering, Step 1
71(1)
Segment Buffering, Step 2
71(1)
Segment Buffering, Step 3
72(1)
Improving the Technique
72(1)
Conclusion
72(1)
References
73(2)
Optimizing Resource Management with Multistreaming
75(16)
Oliver Hoeller0
Kurt Pelzer
Overview
76(1)
Implementation
77(12)
Multistreaming with DirectX 9.0
78(3)
Resource Management
81(2)
Processing Vertices
83(6)
Conclusion
89(1)
References
90(1)
Hardware Occlusion Queries Made Useful
91(18)
Michael Wimmer
Jiri Bittner
Introduction
91(1)
For Which Scenes Are Occlusion Queries Effective?
92(1)
What Is Occlusion Culling?
93(1)
Hierarchical Stop-and-Wait Method
94(3)
The Naive Algorithm, or Why Use Hierarchies at All?
94(1)
Hierarchies to the Rescue!
95(1)
Hierarchical Algorithm
95(1)
Problem 1: Stalls
96(1)
Problem 2: Query Overhead
97(1)
Coherent Hierarchical Culling
97(8)
Idea 1: Being Smart and Guessing
97(1)
Idea 2: Pull Up, Pull Up
98(1)
Algorithm
99(1)
Implementation Details
100(3)
Why Are There Fewer Stalls?
103(1)
Why Are There Fewer Queries?
104(1)
How to Traverse the Hierarchy
104(1)
Optimizations
105(1)
Querying with Actual Geometry
105(1)
Z-Only Rendering Pass
105(1)
Approximate Visibility
105(1)
Conservative Visibility Testing
106(1)
Conclusion
106(2)
References
108(1)
Adaptive Tessellation of Subdivision Surfaces with Displacement Mapping
109(14)
Michael Bunnell
Subdivision Surfaces
109(10)
Some Definitions
110(1)
Catmull-Clark Subdivisions
110(1)
Using Subdivision for Tessellation
111(1)
Patching the Surface
112(2)
The GPU Tessellation Algorithm
114(4)
Watertight Tessellation
118(1)
Displacement Mapping
119(3)
Changing the Flatness Test
120(1)
Shading Using Normal Mapping
120(2)
Conclusion
122(1)
References
122(1)
Per-Pixel Displacement Mapping with Distance Functions
123(14)
William Donnelly
Introduction
123(2)
Previous Work
125(1)
The Distance-Mapping Algorithm
126(4)
Arbitrary Meshes
129(1)
Computing the Distance Map
130(1)
The Shaders
130(2)
The Vertex Shader
130(1)
The Fragment Shader
130(2)
A Note on Filtering
132(1)
Results
132(2)
Conclusion
134(1)
References
135(2)
PART II SHADING, LIGHTING, AND SHADOWS
137(170)
Deferred Shading in S.T.A.L.K.E.R.
143(24)
Oles Shishkovtsov
Introduction
143(2)
The Myths
145(2)
Optimizations
147(7)
What to Optimize
147(1)
Lighting Optimizations
148(3)
G-Buffer-Creation Optimizations
151(2)
Shadowing Optimizations
153(1)
Improving Quality
154(4)
The Power of ``Virtual Position''
155(1)
Ambient Occlusion
156(1)
Materials and Surface-Light Interaction
157(1)
Antialiasing
158(4)
Efficient Tone Mapping
161(1)
Dealing with Transparency
162(1)
Things We Tried but Did Not Include in the Final Code
162(2)
Elevation Maps
163(1)
Real-Time Global Illumination
163(1)
Conclusion
164(1)
References
165(2)
Real-Time Computation of Dynamic Irradiance Environment Maps
167(10)
Gary King
Irradiance Environment Maps
167(3)
Spherical Harmonic Convolution
170(2)
Mapping to the GPU
172(3)
Spatial to Frequency Domain
172(1)
Convolution and Back Again
173(2)
Further Work
175(1)
Conclusion
176(1)
References
176(1)
Approximate Bidirectional Texture Functions
177(12)
Jan Kautz
Introduction
177(2)
Acquisition
179(2)
Setup and Acquisition
179(1)
Assembling the Shading Map
179(2)
Rendering
181(3)
Detailed Algorithm
181(1)
Real-Time Rendering
182(2)
Results
184(3)
Discussion
186(1)
Conclusion
187(1)
References
187(2)
Tile-Based Texture Mapping
189(12)
Li-Yi Wei
Our Approach
191(1)
Texture Tile Construction
191(1)
Texture Tile Packing
192(3)
Texture Tile Mapping
195(2)
Mipmap Issues
197(1)
Conclusion
198(1)
References
199(2)
Implementing the mental images Phenomena Renderer on the GPU
201(22)
Martin-Karl Lefrancois
Introduction
201(1)
Shaders and Phenomena
202(3)
Implementing Phenomena Using Cg
205(16)
The Cg Vertex Program and the Varying Parameters
205(2)
The main ( ) Entry Point for Fragment Shaders
207(1)
The General Shader Interfaces
207(1)
Example of a Simple Shader
208(3)
Global State Variables
211(1)
Light Shaders
211(4)
Texture Shaders
215(1)
Bump Mapping
216(1)
Environment and Volume Shaders
217(1)
Shaders Returning Structures
218(2)
Rendering Hair
220(1)
Putting It All Together
220(1)
Conclusion
221(1)
References
222(1)
Dynamic Ambient Occlusion and Indirect Lighting
223(12)
Michael Bunnell
Surface Elements
223(2)
Ambient Occlusion
225(6)
The Multipass Shadowing Algorithm
226(2)
Improving Performance
228(3)
Indirect Lighting and Area Lights
231(1)
Conclusion
232(1)
References
233(2)
Blueprint Rendering and ``Sketchy Drawings''
235(18)
Marc Nienhaus
Jurgen Dollner
Basic Principles
236(2)
Intermediate Rendering Results
236(1)
Edge Enhancement
236(1)
Depth Sprite Rendering
237(1)
Blueprint Rendering
238(6)
Depth Peeling
238(3)
Extracting Visible and Nonvisible Edges
241(1)
Composing Blueprints
241(1)
Depth Masking
242(2)
Visualizing Architecture Using Blueprint Rendering
244(1)
Sketchy Rendering
244(7)
Edges and Color Patches
245(1)
Applying Uncertainty
245(2)
Adjusting Depth
247(1)
Variations of Sketchy Drawing
247(1)
Controlling Uncertainty
248(2)
Reducing the Shower-Door Effect
250(1)
Conclusion
251(1)
References
252(1)
Accurate Atmospheric Scattering
253(16)
Sean O'Neil
Introduction
253(1)
Solving the Scattering Equations
254(4)
Rayleigh Scattering vs. Mie Scattering
255(1)
The Phase Function
256(1)
The Out-Scattering Equation
256(1)
The In-Scattering Equation
257(1)
The Surface-Scattering Equation
257(1)
Making It Real-Time
258(2)
Squeezing It into a Shader
260(2)
Eliminating One Dimension
260(1)
Eliminating the Other Dimension
261(1)
Implementing the Scattering Shaders
262(3)
The Vertex Shader
262(2)
The Fragment Shader
264(1)
Adding High-Dynamic-Range Rendering
265(1)
Conclusion
266(1)
References
267(2)
Efficient Soft-Edged Shadows Using Pixel Shader Branching
269(14)
Yury Uralsky
Current Shadowing Techniques
270(1)
Soft Shadows with a Single Shadow Map
271(10)
Blurring Hard-Edged Shadows
271(3)
Improving Efficiency
274(3)
Implementation Details
277(4)
Conclusion
281(1)
References
282(1)
Using Vertex Texture Displacement for Realistic Water Rendering
283(12)
Yuri Kryachko
Water Models
283(1)
Implementation
284(10)
Water Surface Model
284(1)
Implementation Details
285(1)
Sampling Height Maps
286(2)
Quality Improvements and Optimizations
288(4)
Rendering Local Perturbations
292(2)
Conclusion
294(1)
References
294(1)
Generic Refraction Simulation
295(12)
Tiago Sousa
Basic Technique
296(1)
Refraction Mask
297(3)
Examples
300(5)
Water Simulation
300(3)
Glass Simulation
303(2)
Conclusion
305(1)
References
305(2)
PART III HIGH-QUALITY RENDERING
307(144)
Fast Third-Order Texture Filtering
313(18)
Christian Sigg
Markus Hadwiger
Higher-Order Filtering
314(1)
Fast Recursive Cubic Convolution
315(5)
Mipmapping
320(4)
Derivative Reconstruction
324(3)
Conclusion
327(1)
References
328(3)
High-Quality Antialiased Rasterization
331(14)
Dan Wexler
Eric Enderton
Overview
331(3)
Downsampling
334(2)
Comparison to Existing Hardware and Software
334(2)
Downsampling on the GPU
336(1)
Padding
336(1)
Filter Details
337(1)
Two-Pass Separable Filtering
338(1)
Tiling and Accumulation
339(1)
The Code
339(5)
The Rendering Loop
340(1)
The Downsample Class
341(2)
Implementation Details
343(1)
Conclusion
344(1)
References
344(1)
Fast Prefiltered Lines
345(16)
Eric Chan
Fredo Durand
Why Sharp Lines Look Bad
345(2)
Bandlimiting the Signal
347(2)
Prefiltering
347(2)
The Preprocess
349(2)
Runtime
351(4)
Line Setup (CPU)
352(1)
Table Lookups (GPU)
353(2)
Implementation Issues
355(1)
Drawing Fat Lines
355(1)
Compositing Multiple Lines
355(1)
Examples
356(2)
Conclusion
358(1)
References
359(2)
Hair Animation and Rendering in the Nalu Demo
361(20)
Hubert Nguyen
William Donnelly
Hair Geometry
362(4)
Layout and Growth
362(1)
Controlling the Hair
362(2)
Data Flow
364(1)
Tessellation
364(1)
Interpolation
364(2)
Dynamics and Collisions
366(3)
Constraints
366(1)
Collisions
367(1)
Fins
368(1)
Hair Shading
369(9)
A Real-Time Reflectance Model for Hair
369(6)
Real-Time Volumetric Shadows in Hair
375(3)
Conclusion and Future Work
378(2)
References
380(1)
Using Lookup Tables to Accelerate Color Transformations
381(12)
Jeremy Selan
Lookup Table Basics
381(5)
One-Dimensional LUTs
382(1)
Three-Dimensional LUTs
383(2)
Interpolation
385(1)
Implementation
386(6)
Strategy for Mapping LUTs to the GPU
386(1)
Cg Shader
386(3)
System Integration
389(1)
Extending 3D LUTs for Use with High-Dynamic-Range Imagery
390(2)
Conclusion
392(1)
References
392(1)
GPU Image Processing in Apple's Motion
393(16)
Pete Warden
Design
393(4)
Loves and Loathings
394(2)
Pick a Language
396(1)
CPU Fallback
396(1)
Implementation
397(9)
GPU Resource Limits
397(2)
Division by Zero
399(1)
Loss of Vertex Components
400(1)
Bilinear Filtering
400(5)
High-Precision Storage
405(1)
Debugging
406(1)
Conclusion
407(1)
References
408(1)
Implementing Improved Perlin Noise
409(8)
Simon Green
Random but Smooth
409(1)
Storage vs. Computation
410(1)
Implementation Details
411(4)
Optimization
415(1)
Conclusion
415(1)
References
416(1)
Advanced High-Quality Filtering
417(20)
Justin Novosad
Implementing Filters on GPUs
417(5)
Accessing Image Samples
418(1)
Convolution Filters
419(3)
The Problem of Digital Image Resampling
422(8)
Background
423(1)
Antialiasing
423(4)
Image Reconstruction
427(3)
Shock Filtering: A Method for Deblurring Images
430(3)
Filter Implementation Tips
433(1)
Advanced Applications
433(1)
Time Warping
433(1)
Motion Blur Removal
434(1)
Adaptive Texture Filtering
434(1)
Conclusion
434(1)
References
435(2)
Mipmap-Level Measurement
437(14)
Iain Cantlay
Which Mipmap Level Is Visible?
438(1)
GPU to the Rescue
439(8)
Counting Pixels
439(3)
Practical Considerations in an Engine
442(3)
Extensions
445(2)
Sample Results
447(1)
Conclusion
448(1)
References
449(2)
PART IV GENERAL-PURPOSE COMPUTATION ON GPUS: A PRIMER
451(140)
Streaming Architectures and Technology Trends
457(14)
John Owens
Technology Trends
457(4)
Core Technology Trends
458(1)
Consequences
458(3)
Keys to High-Performance Computing
461(3)
Methods for Efficient Computation
462(1)
Methods for Efficient Communication
462(1)
Contrast to CPUs
463(1)
Stream Computation
464(4)
The Stream Programming Model
464(2)
Building a Stream Processor
466(2)
The Future and Challenges
468(2)
Challenge: Technology Trends
468(1)
Challenge: Power Management
468(1)
Challenge: Supporting More Programmability and Functionality
469(1)
Challenge: GPU Functionality Subsumed by CPU (or Vice Versa)?
470(1)
References
470(1)
The GeForce 6 Series GPU Architecture
471(22)
Emmett Kilgariff
Randima Fernando
How the GPU Fits into the Overall Computer System
471(2)
Overall System Architecture
473(8)
Functional Block Diagram for Graphics Operations
473(5)
Functional Block Diagram for Non-Graphics Operations
478(3)
GPU Features
481(7)
Fixed-Function Features
481(2)
Shader Model 3.0 Programming Model
483(5)
Supported Data Storage Formats
488(1)
Performance
488(2)
Achieving Optimal Performance
490(1)
Use Z-Culling Aggressively
490(1)
Exploit Texture Math When Loading Data
490(1)
Use Branching in Fragment Programs Judiciously
490(1)
Use fp16 Intermediate Values Wherever Possible
491(1)
Conclusion
491(2)
Mapping Computational Concepts to GPUs
493(16)
Mark Harris
The Importance of Data Parallelism
493(4)
What Kinds of Computation Map Well to GPUs?
494(1)
Example: Simulation on a Grid
495(1)
Stream Communication: Gather vs. Scatter
496(1)
An Inventory of GPU Computational Resources
497(3)
Programmable Parallel Processors
497(3)
CPU-GPU Analogies
500(3)
Streams: GPU Textures = CPU Arrays
500(1)
Kernels: GPU Fragment Programs = CPU ``Inner Loops''
500(1)
Render-to-Texture = Feedback
501(1)
Geometry Rasterization = Computation Invocation
501(1)
Texture Coordinates = Computational Domain
501(1)
Vertex Coordinates = Computational Range
502(1)
Reductions
502(1)
From Analogies to Implementation
503(2)
Putting It All Together: A Basic GPGPU Framework
503(2)
A Simple Example
505(3)
Conclusion
508(1)
References
508(1)
Taking the Plunge into GPU Computing
509(12)
Ian Buck
Choosing a Fast Algorithm
509(4)
Locality, Locality, Locality
510(1)
Letting Computation Rule
511(1)
Considering Download and Readback
512(1)
Understanding Floating Point
513(2)
Address Calculation
514(1)
Implementing Scatter
515(3)
Converting to Gather
515(1)
Address Sorting
516(2)
Rendering Points
518(1)
Conclusion
518(1)
References
519(2)
Implementing Efficient Parallel Data Structures on GPUs
521(26)
Aaron Lefohn
Joe Kniss
John Owens
Programming with Streams
521(3)
The GPU Memory Model
524(4)
Memory Hierarchy
524(1)
GPU Stream Types
525(2)
GPU Kernel Memory Access
527(1)
GPU-Based Data Structures
528(12)
Multidimensional Arrays
528(6)
Structures
534(1)
Sparse Data Structures
535(5)
Performance Considerations
540(3)
Dependent Texture Reads
540(1)
Computational Frequency
541(1)
Pbuffer Survival Guide
541(2)
Conclusion
543(1)
References
544(3)
GPU Flow-Control Idioms
547(10)
Mark Harris
Ian Buck
Flow-Control Challenges
547(2)
Basic Flow-Control Strategies
549(5)
Predication
549(1)
Moving Branching up the Pipeline
549(1)
Z-Cull
550(3)
Branching Instructions
553(1)
Choosing a Branching Mechanism
553(1)
Data-Dependent Looping with Occlusion Queries
554(1)
Conclusion
555(2)
GPU Program Optimization
557(16)
Cliff Woolley
Data-Parallel Computing
557(4)
Instruction-Level Parallelism
558(2)
Data-Level Parallelism
560(1)
Computational Frequency
561(7)
Precomputation of Loop Invariants
563(1)
Precomputation Using Lookup Tables
564(2)
Avoid Inner-Loop Branching
566(1)
The Swizzle Operator
566(2)
Profiling and Load Balancing
568(2)
Conclusion
570(1)
References
570(3)
Stream Reduction Operations for GPGPU Applications
573(18)
Daniel Horn
Filtering Through Compaction
574(5)
Running Sum Scan
574(1)
Scatter Through Search/Gather
575(4)
Filtering Performance
579(1)
Motivation: Collision Detection
579(4)
Filtering for Subdivision Surfaces
583(4)
Subdivision on Streaming Architectures
584(3)
Conclusion
587(1)
References
587(4)
PART V IMAGE-ORIENTED COMPUTING
591(100)
Octree Textures on the GPU
595(20)
Sylvain Lefebvre
Samuel Hornus
Fabrice Neyret
A GPU-Accelerated Hierarchical Structure: The N3-Tree
597(5)
Definition
597(1)
Implementation
598(4)
Application 1: Painting on Meshes
602(9)
Creating the Octree
603(1)
Painting
604(1)
Rendering
604(3)
Converting the Octree Texture to a Standard 2D Texture
607(4)
Application 2: Surface Simulation
611(1)
Conclusion
612(1)
References
613(2)
High-Quality Global Illumination Rendering Using Rasterization
615(20)
Toshiya Hachisuka
Global Illumination via Rasterization
616(1)
Overview of Final Gathering
617(4)
Two-Pass Methods
617(1)
Final Gathering
618(1)
Problems with Two-Pass Methods
619(2)
Final Gathering via Rasterization
621(4)
Clustering of Final Gathering Rays
621(2)
Ray Casting as Multiple Parallel Projection
623(2)
Implementation Details
625(2)
Initialization
625(1)
Depth Peeling
626(1)
Sampling
627(1)
Performance
627(1)
A Global Illumination Renderer on the GPU
627(5)
The First Pass
628(1)
Generating Visible Points Data
628(1)
The Second Pass
629(1)
Additional Solutions
629(3)
Conclusion
632(1)
References
632(3)
Global Illumination Using Progressive Refinement Radiosity
635(14)
Greg Coombe
Mark Harris
Radiosity Foundations
636(2)
Progressive Refinement
637(1)
GPU Implementation
638(5)
Visibility Using Hemispherical Projection
639(2)
Form Factor Computation
641(2)
Choosing the Next Shooter
643(1)
Adaptive Subdivision
643(2)
Texture Quadtree
644(1)
Quadtree Subdivision
644(1)
Performance
645(1)
Conclusion
645(2)
References
647(2)
Computer Vision on the GPU
649(18)
James Fung
Introduction
649(1)
Implementation Framework
650(1)
Application Examples
651(13)
Using Sequences of Fragment Programs for Computer Vision
651(4)
Summation Operations
655(3)
Systems of Equations for Creating Image Panoramas
658(3)
Feature Vector Computations
661(3)
Parallel Computer Vision Processing
664(1)
Conclusion
664(1)
References
665(2)
Deferred Filtering: Rendering from Difficult Data Formats
667(10)
Joe Kniss
Aaron Lefohn
Nathaniel Fout
Introduction
667(1)
Why Defer?
668(1)
Deferred Filtering Algorithm
669(4)
Why It Works
673(1)
Conclusions: When to Defer
673(1)
References
674(3)
Conservative Rasterization
677(14)
Jon Hasselgren
Tomas Akenine-Moller
Lennart Ohlsson
Problem Definition
678(1)
Two Conservative Algorithms
679(7)
Clip Space
681(1)
The First Algorithm
681(2)
The Second Algorithm
683(3)
Robustness Issues
686(1)
Conservative Depth
687(2)
Results and Conclusions
689(1)
References
690(1)
PART VI SIMULATION AND NUMERICAL ALGORITHMS
691(94)
GPU Computing for Protein Structure Prediction
695(8)
Paulius Micikevicius
Introduction
695(2)
The Floyd-Warshall Algorithm and Distance-Bound Smoothing
697(1)
GPU Implementation
698(3)
Dynamic Updates
698(1)
Indexing Data Textures
698(1)
The Triangle Approach
699(1)
Vectorization
699(2)
Experimental Results
701(1)
Conclusions and Further Work
701(1)
References
702(1)
A GPU Framework for Solving Systems of Linear Equations
703(16)
Jens Kruger
Rudiger Westermann
Overview
703(1)
Representation
704(4)
The ``Single Float'' Representation
704(1)
Vectors
704(2)
Matrices
706(2)
Operations
708(6)
Vector Arithmetic
709(1)
Vector Reduce
709(1)
Matrix-Vector Product
710(2)
Putting It All Together
712(1)
Conjugate Gradient Solver
713(1)
A Sample Partial Differential Equation
714(4)
The Crank-Nicholson Scheme
716(2)
Conclusion
718(1)
References
718(1)
Options Pricing on the GPU
719(14)
Craig Kolb
Matt Pharr
What Are Options?
719(2)
The Black-Scholes Model
721(4)
Lattice Models
725(5)
The Binomial Model
725(1)
Pricing European Options
726(4)
Conclusion
730(1)
References
731(2)
Improved GPU Sorting
733(14)
Peter Kipfer
Rudiger Westermann
Sorting Algorithms
733(1)
A Simple First Approach
734(1)
Fast Sorting
735(3)
Implementing Odd-Even Merge Sort
737(1)
Using All GPU Resources
738(7)
Implementing Bitonic Merge Sort
743(2)
Conclusion
745(1)
References
746(1)
Flow Simulation with Complex Boundaries
747(18)
Wei Li
Zhe Fan
Xiaoming Wei
Arie Kaufman
Introduction
747(1)
The Lattice Boltzmann Method
748(1)
GPU-Based LBM
749(4)
Algorithm Overview
749(2)
Packing
751(1)
Streaming
752(1)
GPU-Based Boundary Handling
753(6)
GPU-Based Voxelization
754(2)
Periodic Boundaries
756(1)
Outflow Boundaries
756(1)
Obstacle Boundaries
757(2)
Visualization
759(1)
Experimental Results
760(1)
Conclusion
761(2)
References
763(2)
Medical Image Reconstruction with the FFT
765(20)
Thilaka Sumanaweera
Donald Liu
Background
765(1)
The Fourier Transform
766(1)
The FFT Algorithm
767(1)
Implementation on the GPU
768(8)
Approach 1: Mostly Loading the Fragment Processor
770(2)
Approach 2: Loading the Vertex Processor, the Rasterizer, and the Fragment Processor
772(3)
Load Balancing
775(1)
Benchmarking Results
775(1)
The FFT in Medical Imaging
776(7)
Magnetic Resonance Imaging
776(2)
Results in MRI
778(2)
Ultrasonic Imaging
780(3)
Conclusion
783(1)
References
784(1)
Index 785

Excerpts

The first volume ofGPU Gemswas conceived in the spring of 2003, soon after the arrival of the first generation of fully programmable GPUs. The resulting book was released less than a year later and quickly became a best seller, providing a snapshot of the best ideas for making the most of the capabilities of the latest programmable graphics hardware. GPU programming is a rapidly changing field, and the time is already ripe for a sequel. In the handful of years since programmable graphics processors first became available, they have become faster and more flexible at an incredible pace. Early programmable GPUs supported programmability only at the vertex level, while today complex per-pixel programs are common. A year ago, real-time GPU programs were typically tens of instructions long, while this year's GPUs handle complex programs hundreds of instructions long and still render at interactive rates. Programmable graphics has even transcended the PC and is rapidly spreading to consoles, handheld gaming devices, and mobile phones. Until recently, performance-conscious developers might have considered writing their GPU programs in assembly language. These days, however, high-level GPU programming languages are ubiquitous. It is extremely rare for developers to bother writing assembly for GPUs anymore, thanks both to improvements in compilers and to the rapidly increasing capabilities of GPUs. (In contrast, it took many more years before game developers switched from writing their games in CPU assembly language to using higher-level languages.) This sort of rapid change makes a "gems"-style book a natural fit for assembling the state of the art and disseminating it to the developer community. Featuring chapters written by acknowledged experts,GPU Gems 2provides broad coverage of the most exciting new ideas in the field. Innovations in graphics hardware and programming environments have inspired further innovations in how to use programmability. While programmable shading has long been a staple of offline software rendering, the advent of programmability on GPUs has led to the invention of a wide variety of new techniques for programmable shading. Going far beyond procedural pattern generation and texture composition, the state of the art of using shaders on GPUs is rapidly breaking completely new ground, leading to novel techniques for animation, lighting, particle systems, and much more. Indeed, the flexibility and speed of GPUs have fostered considerable interest in doing computations on GPUs that go beyond computer graphics: general-purpose computation on GPUs, or "GPGPU." This volume of theGPU Gemsseries devotes a significant number of chapters to this new topic, including an overview of GPGPU programming techniques as well as in-depth discussions of a number of representative applications and key algorithms. As GPUs continue to increase in performance more quickly than CPUs, these topics will gain in importance for more and more programmers because GPUs will provide superior results for many computationally intensive applications. With this background, we sent out a public call for participation inGPU Gems 2.The response was overwhelming: more than 150 chapters were proposed in the short time that submissions were open, covering a variety of topics related to GPU programming. We were able to include only about a third of them in this volume; many excellent submissions could not be included purely because of constraints on the physical size of the book. It was difficult for the editors to whittle down the chapters to the 48 included here, and we would like to thank everyone who submitted proposals. The accepted chapters went through a rigorous review process in which the book's editors, the authors of other chapters in the same part of the book, and in some cases additional reviewers from NVIDIA ca

An electronic version of this book is available through VitalSource.

This book is viewable on PC, Mac, iPhone, iPad, iPod Touch, and most smartphones.

By purchasing, you will be able to view this book online, as well as download it, for the chosen number of days.

Digital License

You are licensing a digital product for a set duration. Durations are set forth in the product description, with "Lifetime" typically meaning five (5) years of online access and permanent download to a supported device. All licenses are non-transferable.

More details can be found here.

A downloadable version of this book is available through the eCampus Reader or compatible Adobe readers.

Applications are available on iOS, Android, PC, Mac, and Windows Mobile platforms.

Please view the compatibility matrix prior to purchase.