|
Part Number |
PC7447A |
|
Manufacturer |
ATMEL Corporation |
|
Semiconductor DataSheet |
|
DataSheet View |
|
www.DataSheet4U.com
Features
• • • • • • • • • • • • • • •
3000 Dhrystone 2.1 MIPS at 1.3 GHz Selectable Bus Clock (30 CPU Bus Dividers up to 28x) Selectable MPx/60x Interface Voltage (1.8V, 2.5V) PD Typically 18W at 1.33 GHz at VDD = 1.3V; 8.0W at 1 GHz at VDD = 1.1V Full Operating Conditions Nap, Doze and Sleep Power Saving Modes Superscalar (Four Instructions Fetched Per Clock Cycle) 4 GB Direct Addressing Range Virtual Memory: 4 Hexabytes (252) 64-bit Data and 36-bit Address Bus Interface Integrated L1: 32 KB Instruction and 32 KB Data Cache Integrated L2: 512 KB 11 Independent Execution Units and 3 Register Files Write-back and Write-through Operations fINT Max = 1.33 GHz (1.42 GHz to be Confirmed) fBUS Max = 133 MHz/166 MHz
PowerPC® 7447A RISC Microprocessor PC7447A Preliminary
Description
The PC7447A host processor is a high-performance, low-power, 32-bit implementations of the PowerPC Reduced Instruction Set Computer (RISC) architecture combined with a full 128-bit implementation of Freescale®’s AltiVec™ technology. This microprocessor is ideal for leading-edge embedded computing and signal processing applications. The PC7447A features 512 KB of on-chip L2 cache. The PC7447A microprocessor has no backside L3 cache, allowing for a smaller package designed as a pin-for-pin replacement for the PC7447 microprocessor. This device benefits from a silicon-on-insulator (SOI) CMOS process technology, engineered to help deliver tremendous power savings without sacrificing speed. A low-power version of the PC7447A microprocessor is also available. Figure 1-1 shows a block diagram of the PC7447A. The core is a high-performance superscalar design supporting a double-precision floating-point unit and a SIMD multimedia unit. The memory storage subsystem supports the MPX bus protocol and a subset of the 60x bus protocol to the main memory and other system resources. Note that the PC7447A is a footprint-compatible, drop-in replacement in a PC7447 application if the core power supply is 1.3V.
Screening
• Full Military Temperature Range (Tj = -55°C, +125°C) • Industrial Temperature Range (Tj = -40°C, +110°C)
GH suffix HITCE 360
Rev. 5387B–HIREL–07/05
Figure 1-1.
1. Block Diagram
2
Additional Features • Time Base Counter/Decrementer Clock Multiplier JTAG/COP Interface Thermal/Power Management Performance Monitor Dynamic Frequency Switching (DFS) Temperature Dioder Completion Unit 96-Bit (3 Instructions) Completion Queue (16-Entry) Instruction Unit Branch Processing Unit BTIC (128-Entry) BHT (2048-Entry) CTR IBAT Array LR Dispatch Unit Data MMU SRs (Original) VR Issue (4-Entry/2-Issue) GPR Issue (6-Entry/3-Issue) FPR Issue (2-Entry/1-Issue) 128-Entry DTLB Tags Fetcher Instruction Queue (12-Word) Instruction MMU SRs (Shadow) 128-Entry ITLB 128-Bit (4 Instructions)
PC7447A [Preliminary]
5387B–HIREL–07/05
Tags
32-Kbyte I Cache
PC7447A Microprocessor Block Diagram
32-Kbyte D Cache
DBAT Array
Reservation Stations (2-Entry) EA Completes up to three instructions per clock VR File 16 Rename Buffers Reservation Reservation Reservation Reservation Station Station Station Station Vector Touch Queue GPR File 16 Rename Buffers Load/Store Unit Vector Touch Engine + (EA Calculation) Finished Stores L1 Castout PA FPR File 16 Rename Buffers Reservation Stations (2)
Reservation Stations (2)
Reservation Station
Integer Unit 2 x÷
Integer Unit 1 (3) + 32-Bit
FloatingPoint Unit + x÷ FPSCR
L1 Push Completed Stores
Vector Permute Unit
Vector Integer Unit 2
Vector Integer Unit 1
Vector FPU 128-Bit 128-Bit
32-Bit
32-Bit
Load Miss
64-Bit
64-Bit
Memory Subsystem L1 Store Queue (LSQ) L1 Load Queue (LLQ) L1 Load Miss (5) L2 Prefetch (3) Instruction Fetch (2) Cacheable Store Request (1) L1 Castouts (4) L2 Store Queue (L2SQ) Snoop Push/ Interventions 512-Kbyte Unified L2 Cache Controller L1 Service Queues Line Block 0 (32-Byte) Block 1 (32-Byte) Tags Status Status System Bus Interface Load Queue (11) Bus Store Queue Castout Queue (9) / Push Queue (10)2
Bus Accumulator
Notes: The castout queue and push queue share resources such for a combined total of entries. The castout queue itself is limited to 9 entries, ensuring 1 entry will be available for a push.
36-bit Address Bus
64-bit Data Bus
PC7447A [Preliminary]
2. General Parameters
Table 2-1 provides a summary of the general parameters of the PC7477A. Table 2-1.
Parameter Technology Die size Transistor count Logic design Packages Core power supply I/O power supply
Device Parameters
Description 0.13 µm CMOS, nine-layer metal 8.51 mm × 9.86 mm 48.6 million Fully-static Surface mount 360 ceramic ball grid array (HITCE) 1.3V ±50 mV DC nominal 1.8V ±5% DC, or 2.5V ±5% DC
3. Features
This section summarizes features of the PC7447A implementation of the PowerPC architecture. Major features of the PC7447A are as follows: • High-performance, superscalar microprocessor – Up to four instructions can be fetched from the instruction cache at a time – Up to 12 instructions can be in the instruction queue (IQ) – Up to 16 instructions can be at some stage of execution simultaneously – Single-cycle execution for most instructions – One instruction per clock cycle throughput for most instructions – Seven-stage pipeline control • Eleven independent execution units and three register files – Branch processing unit (BPU) features static and dynamic branch prediction 128-entry (32-set, four-way set-associative) branch target instruction cache (BTIC), a cache of branch instructions that have been encountered in branch/loop code sequences. If a target instruction is in the BTIC, it is fetched into the instruction queue a cycle sooner than it can be made available from the instruction cache. Typically, a fetch that hits the BTIC provides the first four instructions in the target stream. 2048-entry branch history table (BHT) with two bits per entry for four levels of prediction: not taken, strongly not taken, taken, and strongly taken Up to three outstanding speculative branches Branch instructions that do not update the count register (CTR) or link register (LR) are often removed from the instruction stream
3
5387B–HIREL–07/05
Eight-entry link register stack to predict the target address of Branch Conditional to Link Register (BCLR) instructions – Four integer units (IUs) that share 32 GPRs for integer operands Three identical IUs (IU1a, IU1b, and IU1c) can execute all integer instructions except multiply, divide, and move to/from special-purpose register instructions. IU2 executes miscellaneous instructions including the CR logical operations, integer multiplication and division instructions, and move to/from special-purpose register instructions. – Five-stage FPU and a 32-entry FPR file Fully IEEE 754-1985-compliant FPU for both single- and double-precision operations Supports non-IEEE mode for time-critical operations Hardware support for denormalized number Thirty-two 64-bit FPRs for single- or double-precision operands – Four vector units and 32-entry vector register file (VRs) Vector permute unit (VPU) Vector integer unit 1 (VIU1) handles short-latency AltiVec™ integer instructions, such as vector add instructions (for example, vaddsbs, vaddshs, and vaddsws). Vector integer unit 2 (VIU2) handles longer-latency AltiVec integer instructions, such as vector multiply add instructions (for example, vmhaddshs, vmhraddshs, and vmladduhm). Vector floating-point unit (VFPU) – Three-stage load/store unit (LSU) Supports integer, floating-point, and vector instruction load/store traffic Four-entry vector touch queue (VTQ) supports all four architectures of the AltiVec data stream operations Three-cycle GPR and AltiVec load latency (byte, half word, word, vector) with onecycle throughput Four-cycle FPR load latency (single, double) with one-cycle throughput No additional delay for misaligned access within double-word boundary 4
PC7447A [Preliminary]
5387B–HIREL–07/05
PC7447A [Preliminary]
Dedicated adder calculates effective addresses (EAs) Supports store gathering Performs alignment, normalization, and precision conversion for floating-point data Executes cache control and TLB instructions Performs alignment, zero padding, and sign extension for integer data Supports hits under misses (multiple outstanding misses) Supports both big- and little-endian modes, including misaligned little-endian accesses • Three issue queues, FIQ, VIQ, and GIQ, can accept as many as one, two, and three instructions, respectively, in a cycle. Instruction dispatch requires the following: – Instructions can only be dispatched from the three lowest IQ entries: IQ0, IQ1, and IQ2 – A maximum of three instructions can be dispatched to the issue queues per clock cycle – Space must be available in the CQ for an instruction to dispatch (this includes instructions that are assigned a space in the CQ but not in an issue queue) • Rename buffers – 16 GPR rename buffers – 16 FPR rename buffers – 16 VR rename buffers • Dispatch unit – Decode/dispatch stage fully decodes each instruction • Completion unit – The completion unit retires an instruction from the 16-entry completion queue (CQ) when all instructions ahead of it have been completed, the instruction has finished execution, and no exceptions are pending – Guarantees sequential programming model (precise exception model) – Monitors all dispatched instructions and retires them in order – Tracks unresolved branches and flushes instructions after a mispredicted branch – Retires as many as three instructions per clock cycle • Separate on-chip L1 instruction and data caches (Harvard Architecture) – 32-Kbyte, eight-way set-associative instruction and data caches – Pseudo least-recently-used (PLRU) replacement algorithm – 32-byte (eight-word) L1 cache block – Physically indexed/physical tags – Cache write-back or write-through operation pro |