Different type of core

Feature	Single-Cycle	Multi-Cycle	Pipelined
Work per cycle	Every instruction (load, branch, ALU op, store, etc.) completes in exactly one clock cycle.	Each instruction is broken into multiple steps (IF, ID, EX, MEM, WB), with each step in its own cycle.	Also splits instructions into stages, but these stages are overlapped across instructions.
Clock period	Must be as long as the slowest instruction’s complete path (e.g. a load+memory writeback). ⇒ very long cycle.	Shorter: cycle only needs to accommodate one step (e.g. register‐file read + ALU). ⇒ higher clock frequency.	Same per‐stage timing as multi‐cycle (short cycle) ⇒ high clock frequency.
Cycles Per Instruction (CPI)	CPI = 1 for all instructions.	CPI > 1 (e.g. 4–6 cycles), but simple instructions take fewer cycles than worst case.	CPI ≈ 1 (ideally 1, but stalls/hazards can bump it to >1)
Throughput (instr/sec)	1 ÷ (cycle time)	1 ÷ (cycle time × average CPI)	≈ 1 ÷ (cycle time) once pipeline is full
Hardware cost	Simplest control logic (one big FSM state) but replicates all datapath hardware for every instruction path in one cycle.	Reuses the same ALU, register file, memory ports every cycle ⇒ lower area. Control FSM more complex.	Extra pipeline registers between stages; forwarding and hazard‐control logic; more complex control FSM.
Latency per inst.	1 cycle (but that cycle is long)	e.g. 5–6 cycles (but each cycle is short)	≈ 5 cycles (but overlapped with other instructions)
Design complexity	Easiest to design and verify.	Moderate (you must sequence steps through FSM states).	Highest (must handle data/control hazards, branch penalties).

Simple RISC V core diagram