<P> A modern x86 CPU may use more than 4 GB of memory, utilizing Physical Address Extension (PAE), a 36 - bit addressing mode, or the native 64 - bit mode of x86 - 64 CPUs . In such a case, a device using DMA with a 32 - bit address bus is unable to address memory above the 4 GB line . The new Double Address Cycle (DAC) mechanism, if implemented on both the PCI bus and the device itself, enables 64 - bit DMA addressing . Otherwise, the operating system would need to work around the problem by either using costly double buffers (DOS / Windows nomenclature) also known as bounce buffers (FreeBSD / Linux), or it could use an IOMMU to provide address translation services if one is present . </P> <P> As an example of DMA engine incorporated in a general - purpose CPU, newer Intel Xeon chipsets include a DMA engine called I / O Acceleration Technology (I / OAT), which can offload memory copying from the main CPU, freeing it to do other work . In 2006, Intel's Linux kernel developer Andrew Grover performed benchmarks using I / OAT to offload network traffic copies and found no more than 10% improvement in CPU utilization with receiving workloads, and no improvement when transmitting data . </P> <P> Further performance - oriented enhancements to the DMA mechanism have been introduced in Intel Xeon E5 processors with their Data Direct I / O (DDIO) feature, allowing the DMA "windows" to reside within CPU caches instead of system RAM . As a result, CPU caches are used as the primary source and destination for I / O, allowing network interface controllers (NICs) to talk directly to the caches of local CPUs and avoid costly fetching of the I / O data from system RAM . As a result, DDIO reduces the overall I / O processing latency, allows processing of the I / O to be performed entirely in - cache, prevents the available RAM bandwidth from becoming a performance bottleneck, and lowers the power consumption by allowing RAM to remain longer in low - powered state . </P> <P> In systems - on - a-chip and embedded systems, typical system bus infrastructure is a complex on - chip bus such as AMBA High - performance Bus . AMBA defines two kinds of AHB components: master and slave . A slave interface is similar to programmed I / O through which the software (running on embedded CPU, e.g. ARM) can write / read I / O registers or (less commonly) local memory blocks inside the device . A master interface can be used by the device to perform DMA transactions to / from system memory without heavily loading the CPU . </P>

Design a dma controller and then api for it