|
Part Number |
MPC106 |
|
Manufacturer |
Motorola |
|
Semiconductor DataSheet |
|
DataSheet View |
|
www.DataSheet4U.com AN1801/D (Motorola Order Number) 4/1999 REV. 0
ª
Application Note
Performance Differences between the MPC8240 and the MPC106
Top Changwatchai and Roy Jenevein risc10@email.sps.mot.com This paper discusses some of the major performance differences between the MPC8240Õs memory and PCI interfaces and those of the MPC106. This document compares the MPC8240 (rev. 1.0) the MPC106 (rev. 4.0). It contains the following: ¥ Part 1ÒArchitectural Differences,Ó discusses architectural differences between the two parts which impact performance. Ñ The MPC8240 offers faster memory and PCI buses and features PCI datastreaming capabilities. Ñ The MPC106 supports an external L2 cache, but this may result in lower memory bus speed. ¥ Part 2ÒSimulation,Ó presents results from an analytical model used to compare various system conÞgurations which highlight these architectural differences.
Part 1 Architectural Differences
This section discusses the differences in memory, L2 cache, and the PCI bus interface between the MPC8240 and the MPC106.
This document contains information on a new product under development by Motorola. Motorola reserves the right to change or discontinue this product without notice. © Motorola, Inc., 1999. All rights reserved.
1.1 Memory
Both the MPC8240 and the MPC106 support three types of dynamic RAM (DRAM)Ñfast page mode (FPM), extended data out (EDO), and synchronous DRAM (SDRAM). Of the three, SDRAM typically provides the best bus bandwidth utilization. The MPC106 supports an 83-MHz SDRAM bus (66 MHz if an L2 cache is present), while the MPC8240 supports a 100-MHz SDRAM bus. Access latency can be reduced (thereby increasing bus efÞciency) if more open pages of memory can be maintained. The MPC106 can maintain 2 open pages, while the MPC8240 can maintain 4 open pages at once. DRAM devices are accessed with a multiplexed address schemeÑeach unit of data is accessed by Þrst selecting its row address (also known as Òopening a pageÓ) and then selecting its column address. This unit of data is then transferred on the memory bus in a Òbeat.Ó Transferring data in bursts (multiple-beat transfers) can increase bus efÞciency and lower data access latency. During a burst transfer, the page and column are accessed for the Þrst beat (just as for a single-beat transfer). For subsequent beats in the burst, the page is then kept open, with just the column address changing. Because the page does not have to be reopened during a burst, these subsequent beats will have much lower latency. For example, a typical SDRAM may have an 8-1-1-1 access latency, which essentially means 8 bus clocks to transfer the Þrst beat of data, and 1 bus clock to transfer each subsequent beat in the burst1. Allowing pages to remain open between bursts can also result in lower latency. If a memory access hits in an already open page, then its Þrst beat will be accessed more quickly. Again, the MPC106 can maintain 2 open pages at once, while the MPC8240 can maintain 4. Typical access latencies are presented in Table 1.
Table 1. Typical SDRAM Access Latencies
Access Closed Page (miss) ÒfastÓ SDRAM ÒslowÓ SDRAM 8-1-1-1 10-1-1-1 Access Open Page (hit) 6-1-1-1 7-1-1-1
On the other hand, access latencies may be higher due to several factors, such as closing an already open page, or handling error correction code (ECC). If the maximum number of supported open pages are open when an access misses, then one page must Þrst be closed before the new one is opened. The access time for the Þrst beat will thereby be increased by about two bus cycles. Supporting more open pages, as the MPC8240 does, reduces the frequency of this occurrence. The MPC106 includes ECC support for FPM and EDO. It does not support ECC for SDRAM, but it does allow an external device to provide this support. The MPC8240 supports ECC for FPM, EDO, and SDRAM. When ECC is enabled, access latency to memory is increased. For SDRAM ECC, the latency per beat is increased by one bus cycle. The analytical model is conservative and does not include the effect of open pages, and it assumes that ECC is not used.
1 Technically, these access latency numbers refer to the timing of the TS_ (transfer start) and TA_ (transfer acknowledge) signals on the 60x bus. TS_ signals the beginning of the bus transaction, and each TA_ signals that a data beat transfer completed successfully. The Þrst number is the latency of the Þrst data beat, and refers to the number of bus clocks (inclusive) from TS_ to the Þrst TA_. Subsequent numbers refer to the number of bus clocks to transfer successive beats of the burst. Thus 8-1-1-1 means if TS_ is asserted in clock 1, then TA_Õs are asserted in clocks 8, 9, 10, and 11.
2
Performance Differences between the MPC8240 and the MPC106
MOTOROLA
1.2 L2 Cache
The MPC106 provides support for an external, lookaside L2 cache, the MPC8240 does not. An L2 cache can reduce data access latency by providing faster access to cached data. However, because the L2 cache shares the processor bus with memory, adding an L2 can introduce timing constraints which may lower the supported bus speed. Plus, the L2 tag RAM cannot run faster than 66 MHz so, as mentioned earlier, an MPC106 with L2 can support a 66-MHz SDRAM bus, as opposed to the MPC8240 (with no L2) which can support a 100-MHz SDRAM bus. In addition, the speed-up due to having an L2 can be lowered by cache maintenance overhead, such as L2 castouts. The analytical model includes the effect of an L2 cache, but it does not take the cache size as a parameter. Instead, it takes the cache miss rate directly as a parameter.
1.3 PCI Bus Interface
Both the MPC8240 and the MPC106 implement a 32-bit PCI bus interface, which provides a bridge between the 60x processor bus, memory, and the PCI bus. The MPC8240 offers two improvementsÑfaster PCI bus and PCI data-streaming capabilities.
1.3.1 Bus Speed
The MPC106 supports a 33-MHz PCI bus, whereas the MPC8240 supports a 66-MHz PCI bus. Assuming the PCI devices can support this speed, PCI accesses should proceed twice as fast.
1.3.2 Data Streaming
For identical PCI bus conÞgurations, the MPC8240Õs streaming capabilities allow higher PCI data throughput than the MPC106. The MPC8240 has two main streaming features not present in the MPC106Ñas a PCI target, it can support data bursts larger than 32 bytes; and as a PCI master, it can support transactions larger than 32 bytes through its on-chip dual-channel DMA engine. By streaming, the MPC8240 can utilize PCI bus bandwidth more efÞciently than the MPC106 with greater potential data throughput.
1.3.2.1 No Forced Disconnects Every 32 Bytes
When acting as a non-DMA PCI bus master, both the MPC106 and the MPC8240 limit transaction size to 32 bytes1. When acting as a target, the MPC106 also limits transactions to 32 bytes, by issuing a PCI disconnect after up to 32 bytes are transferred. To continue transferring data, the mastering device must start a new transaction. The MPC8240, on the other hand, does not have this 32-byte limit; that is, it has the capability to transfer data as long as data can be supplied2. As a target, the MPC8240 is able to avoid the 32-byte limit on transaction size because it has two internal 32-byte buffers for reads and two buffers for writes3. Thus, while one buffer is being Þlled from the source (for example, memory), the other buffer can be written to the destination (for example, the requesting PCI device).
1
The PCI speciÞcation does not have this limit on transaction size. PCI transaction sizes are realistically limited by internal buffers of the PCI devices, or by other device requests and the PCI master latency timer. 2 Again, in practice transaction size is limited by the buffer sizes of the PCI devices. The MPC8240 does initiate a disconnect when a transaction crosses a 4096-byte page boundary, to limit the risk of prefetching into improper memory address spaces. 3 The MPC106 also has two write buffers (only one read buffer), but the internal state machine still forces a disconnect every 32 bytes as a PCI target.
MOTOROLA
Performance Differences between the MPC8240 and the MPC106
3
Each disconnect causes a delay before the next transaction is started; these delays can reduce data throughput. This disconnect penalty includes sending the disconnect signal, transmitting the new address (since the PCI multiplexes its address/data bus), allowing PCI bus turnaround and bus arbitration, and initiating the new transaction. The penalty may be increased if there is contention due to other PCI devices, or if the bus is not parked. The minimum penalty for a disconnect for a PCI write is three PCI bus clocksÑtwo before FRAME is asserted, plus one to transmit the new address. However, this doesnÕt allow other devices to take the bus, so depending on the PCI device the actual penalty is probably higher. Also, PCI reads incur a higher penalty than writes, for two reasons: Þrst, there must be a one-cycle turnaround since the address and data are transmitted in different directions; second, after the new address is transmitted to the MPC106, there is some latency while the address is passed on to the memory device and data read into the MPC106. As data becomes available on the MPC106, it can then be transferred to the requesting PCI device. Results from the analytical model are presented for a minimum disconnect penalty of 5 clocks, and for a more typical disconnect penalty of 8 clocks. As a target, the MPC106 does support fast back-to-back transactions by the same master, in which the master starts a new transaction immediately, without an idle state. Theoretically, fast back-to-back PCI writes can incur a single clock penalty between writes (just to transmit the new address)1. This assumes that the MPC106 does not issue a disconnect (transactions must be 32 bytes or smaller). If transactions are disconnected, then a disconnect penalty applies as discus |