CUDA 2.2 beta with zero-copy access CUDA 2.2 features
A brief overview of CUDA 2.2 beta features:
- Zero-copy support (see this thread for more information)
http://forums.nvidia.com/index.php?showtopic=92290 Cuda 2.2 / Zero-copy access
The Cuda 2.2 release notes describe the new zero-copy feature: * Zero-copy access to pinned system memory + Allows MCP7x and GT200 and later GPUs to use system memory withoutcopying to dedicated (video) memory for significant perf improvement.
MCP7x和GT200現在可以直接DMA主記憶體。 G9x看來是不行了。
- Asynchronous memcpy on Vista/Server 2008 - Texturing from pitchlinear memory - cuda-gdb for 64-bit Linux (it is pretty great) - OGL interop performance improvements - CUDA profiler supports a lot more counters on GT200. I think this includes memory bandwidth counters (counters for each transaction size) and instruction count. In other words, you can very easily determine if you're bandwidth limited or compute limited, which makes it far more useful than it used to be. - CUDA profiler works on Vista - >4GB of pinned memory in a single allocation (except in Vista, where the limit is still 256MB per allocation, but I think this is going to be raised between now and the final release) - Blocking sync for all platforms. Whether this made it into the headers for the beta, I'm not entirely sure--I've heard conflicting reports and need to check this afternoon. Basically, it's a context creation flag where instead of spinlocking or spinlocking+yielding when a thread is waiting for the GPU, the thread will sleep and the driver will wake it up when the event has completed. It's not the default mode because you're at the mercy of the OS thread scheduler which will sometimes increase latency, but if you want to minimize CPU utilization, it's very nice. - Officially supports Ubuntu 8.10, RHEL 5.3, Fedora 10