| Work per cycle | Every instruction (load, branch, ALU op, store, etc.) completes in exactly one clock cycle. | Each instruction is broken into multiple steps (IF, ID, EX, MEM, WB), with each step in its own cycle. | Also splits instructions into stages, but these stages are overlapped across instructions. |
| Clock period | Must be as long as the slowest instruction’s complete path (e.g. a load+memory writeback). ⇒ very long cycle. | Shorter: cycle only needs to accommodate one step (e.g. register‐file read + ALU). ⇒ higher clock frequency. | Same per‐stage timing as multi‐cycle (short cycle) ⇒ high clock frequency. |
| Cycles Per Instruction (CPI) | CPI = 1 for all instructions. | CPI > 1 (e.g. 4–6 cycles), but simple instructions take fewer cycles than worst case. | CPI ≈ 1 (ideally 1, but stalls/hazards can bump it to >1) |
| Throughput (instr/sec) | 1 ÷ (cycle time) | 1 ÷ (cycle time × average CPI) | ≈ 1 ÷ (cycle time) once pipeline is full |
| Hardware cost | Simplest control logic (one big FSM state) but replicates all datapath hardware for every instruction path in one cycle. | Reuses the same ALU, register file, memory ports every cycle ⇒ lower area. Control FSM more complex. | Extra pipeline registers between stages; forwarding and hazard‐control logic; more complex control FSM. |
| Latency per inst. | 1 cycle (but that cycle is long) | e.g. 5–6 cycles (but each cycle is short) | ≈ 5 cycles (but overlapped with other instructions) |
| Design complexity | Easiest to design and verify. | Moderate (you must sequence steps through FSM states). | Highest (must handle data/control hazards, branch penalties). |