Different type of core

FeatureSingle-CycleMulti-CyclePipelined
Work per cycleEvery instruction (load, branch, ALU op, store, etc.) completes in exactly one clock cycle.Each instruction is broken into multiple steps (IF, ID, EX, MEM, WB), with each step in its own cycle.Also splits instructions into stages, but these stages are overlapped across instructions.
Clock periodMust be as long as the slowest instruction’s complete path (e.g. a load+memory writeback). ⇒ very long cycle.Shorter: cycle only needs to accommodate one step (e.g. register‐file read + ALU). ⇒ higher clock frequency.Same per‐stage timing as multi‐cycle (short cycle) ⇒ high clock frequency.
Cycles Per Instruction (CPI)CPI = 1 for all instructions.CPI > 1 (e.g. 4–6 cycles), but simple instructions take fewer cycles than worst case.CPI ≈ 1 (ideally 1, but stalls/hazards can bump it to >1)
Throughput (instr/sec)1 ÷ (cycle time)1 ÷ (cycle time × average CPI)≈ 1 ÷ (cycle time) once pipeline is full
Hardware costSimplest control logic (one big FSM state) but replicates all datapath hardware for every instruction path in one cycle.Reuses the same ALU, register file, memory ports every cycle ⇒ lower area. Control FSM more complex.Extra pipeline registers between stages; forwarding and hazard‐control logic; more complex control FSM.
Latency per inst.1 cycle (but that cycle is long)e.g. 5–6 cycles (but each cycle is short)≈ 5 cycles (but overlapped with other instructions)
Design complexityEasiest to design and verify.Moderate (you must sequence steps through FSM states).Highest (must handle data/control hazards, branch penalties).