To transfer data from the device to memory:

  • The CPU uses interrupts or polling to determine if data is available
  • The CPU transfers data from the interface to a CPU register
  • The CPU transfers data from the register to memory

This isn’t a great method of doing things, so the CPU sets up the DMA to transfer a large file while it does other stuff

Block-oriented devices

These are usually more complex than character oriented devices
May have built in buffers to permit synchronization

Block Transfer

  • The device is assumed to be unidirectional and can only write
  • Data is applied by an external device and is clocked by that device
  • When data is clocked, the flip-flop is set to 1 and this 1 can be read by the bus controller when the status register is queried
  • When the data is read, the status register is cleared

On the circuit, we have some logic that generates a new interrupt whenever data is available

Transferring in Software

/* Loop through byte indices of 255 downto 0. */
for (int i = 255; i >= 0; i--)
{
    /* Loop until the byte at index i is ready. */
    teststatus: while ((Status_Register & 0x80) != 0x80)
    {
        /* Do nothing...tight polling loop. */
    }
 
    /* Transfer byte i into value array at index i. */
    value[i] = Data_Register;
}

Here, we transfer 256 items.

The cycle equation for moving 256 units of data takes: where our minimum case is 3,330 CPU cycles.

An Example

The Logical Interface

  • The MAR points to the next byte in memory to be transferred
  • The BCR is the number of bytes yet to be transferred in the block
  • The Status/Control Register
    • Mode number of transfers per bus controllership
    • R/W direction
    • Up/Down how to change the MAR
    • Start start the transfer
    • IE enable interrupts
    • Busy synchronization but for processing one block of data
    • IR interrupt pending asserted after processing one block of data Transfer Mode
  • Cycle Stealing: Transfer only 1 byte per bus controllership
  • Burst: Multiple transfers are permitted per bus controllership. This permits a transfer of up to the entire block but we need to make sure we don’t starve the CPU.

DMA Data Transfer Sequence

  1. CPU loads DMA with starting memory in MAR, the byte count register, and the control values
  2. When the device has data ready or is ready for more data, it sends a DMA request
  3. The DMAC requests the bus and waits until it is granted through arbitration
  4. The DMAC provides addresses and control to make the transfer happen
  5. The DMAC increments or decrements the MAR for the next byte and decrements the byte count
  6. If we are in burst mode, and more data is available, we go to step 4
  7. Release the bus, synchronize the CPU and DMAC

From the CPUs POV

  • Assume all global initialization has been completed
  • Assume device interface has been set up
  • Assume an integrated DMAC and polling synchronization

DMA Complete Cycle

From the CPUs perspective
Global Initialization

  • CPU configures global aspects
  • CPU configured the unchanging aspects of the interface + device
  • CPU configures the unchanging aspects of the DMAC interrupts Block Initialization
  • You may need to setup a device interface for each block
  • The CPU writes control values into the DMA controller registers

This process is different from the DMAC’s perspective

Ladder Diagram

vs polling…

Dual Adresses

The DMA controller acts like a surrogate CPU…
1,2. DMAC requests and claims the bus 3,4. DMAC reads the data from the device or the memory depending on the direction of the transfer \5. DMAC temporarily stores the data to be written 6,7. DMAC writes the data to the memory or the device depending on the direction of transfer - Twice as many bus cycles as the implicit version of the transfer but no hardware changes to the device interface

DMA cycles

Each DMA cycle requires 2 bus cycles, a read and write.

When the CPU is setting up the DMAC, the DMAC acts like a peripheral. Then it mocks the CPU once its all set up and it acts as a controller using the information it was configured with

What Does Performance Mean?

From the Device

  • Latency: How long is the delay between data being ready and being read
  • Transfer Rate: How quickly is data transferring out of the buffer
  • Effective Transfer Rate: Highest rate of sustained data transfer? Processing Hardware Perspective
  • Latency: How long is the delay between requesting data and data being available
  • Transfer Rate: How long is the delay for subsequent transfers
  • Effective Transfer Rate: What is the highest sustained rate of data transfer?