#include <nitro/mi.h>
void MI_CpuCopy8( const void* src, void* dest, u32 size );
void MI_CpuCopy16( const void* src, void* dest, u32 size );
void MI_CpuCopy32( const void* src, void* dest, u32 size );
void MI_CpuCopyFast( const void* src, void* dest, u32 size );
src | The transfer source address. |
dest | The transfer destination address. |
size | Transfer size |
None.
Uses CPU to perform memory copy.
MI_CpuCopy8()
selects the most efficient copy method based on the transfer source address and transfer destination address, and appropriately carries out the copy in 16-bit and 32-bit units. There is no need to worry about the alignment of the transfer source address and transfer destination address.
MI_CpuCopy16()
copies in 16-bit units. Both the transfer source address and the transfer destination address must be 2-byte aligned.
MI_CpuCopy32()
copies in 32-bit units. Both the transfer source address and the transfer destination address must be 4-byte aligned.
MI_CpuCopyFast()
copies at high speed in 32-bit units. Both the transfer source address and the transfer destination address must be 4-byte aligned. The transfer size is an integral multiple of 4 bytes. It does not have to be an integral multiple of 32 bytes. After transferring in 32-byte units, the fractional part is handled by performing the same process as MI_CpuCopy32()
.
Therefore, MI_CpuCopyFast()
and MI_CpuCopy32()
have the same code for transfer operation when the transfer size is less than 32 bytes. However, MI_CpuCopyFast()
checks to determine whether the fractional part is smaller than 32 bytes, so a loss occurs for this part of code. Under these circumstances MI_CpuCopy32()
is just a little faster. But if the transfer size is large, MI_CpuCopyFast()
is faster.
Based on these considerations, you could implement the following code to transfer data efficiently using one function:
static inline void myCpuCopy32( const void *src, void *dest, u32 size )
{
if ( size >= 0x20 )
{
MIi_CpuCopyFast(src, dest, size);
}
else
{
MIi_CpuCopy32(src, dest, size);
}
}
However, 32 bytes should be regarded as a theoretical target because it is uncertain whether the size threshold value where differences in speed appear is exactly 32 bytes. Those differences depend on the cache state of the region where the transfer is performed or on the transfer address.
This processes with the CPU only and does not use the DMA controller. It does not use a system call. MI_CpuCopy8
copies in units of 16/32 bits and thus, accessing VRAM directly will not cause problems.
MI_CPUFill*, MI_CpuClear*, MI_CpuSend*, MI_DmaCopy*
2005/07/07 - Added section about the speed of MI_CpuCopy32
and MI_CpuCopyFast
04/29/2004 - Added a description of MI_CpuCopy8
03/29/2004 - Described that systems calls are not used
12/01/2003 - Initial version
CONFIDENTIAL