# Viterbi Algorithm Advanced Architectures

Lecture 15 Vladimir Stojanović



#### 6.973 Communication System Design – Spring 2006 Massachusetts Institute of Technology

# Radix 2 ACS



Figure by MIT OpenCourseWare.

### Radix-2 trellis 2-way ACS Radix-2 ACS Unit



Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

### Radix-4 trellis



Figure by MIT OpenCourseWare.

#### 8-state Radix-2 trellis 4-state subtrellis 8-state Radix-4 trellis

| Radix - 2 <sup>k</sup> complexity speed measures |                       |                                         |                                                |
|--------------------------------------------------|-----------------------|-----------------------------------------|------------------------------------------------|
| k                                                | Ideal<br>speedup      | Complexity increase                     | Area<br>efficiency                             |
| 1                                                | 1                     | 1                                       | 1                                              |
| 2                                                | 2                     | 2                                       | 1                                              |
| 3                                                | 3                     | 4                                       | 0.75                                           |
| 4                                                | 4                     | 8                                       | 0.5                                            |
|                                                  | k<br>1<br>2<br>3<br>4 | Ideal   k Ideal   1 1   2 2   3 3   4 4 | Ideal<br>speedupComplexity<br>increase11223344 |

Figure by MIT OpenCourseWare.



### Radix-4 ACS



4-way ACS

Radix-4 ACS unit

Figure by MIT OpenCourseWare.

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].



Radix-4 trellis

# **Radix-4 ACS implementation**

- Use ripple carry adders and comparators
  - Take advantage of the ripple profile to hide the compare
  - Delay 17% longer than 2-way ACS due to
    - Increased adder fanout
    - 4:1 mux instead of 2:1 mux
  - Overall, results in 1.7x speedup compared to 2-way ACS

Image removed due to copyright restrictions.



Figure from from Black, P. J., and T. H. Meng. "A 140-Mb/s, 32-state, Radix-4 Viterbi Decoder." *IEEE Journal of Solid-State Circuits* 27 (1992): 1877-1885. Copyright 1992 IEEE. Used with permission.

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].



# Radix-4 placement and routing

 Paths given as Hamiltonian cycles (visit each node in the graph once)

Images removed due to copyright restrictions.



# Modulo arithmetic for ACS

• Viterbi algorithm inherently bounds the maximum dynamic range  $\Delta_{max}$  of state metrics

 $\Delta_{max} \le \lambda_{max} \log_2 N$  (N-number of states,  $\lambda_{max}$  maximum branch metric of the radix-2 trellis)

- Number theory
  - Given two numbers a and b such that  $|a-b| < \Delta$
  - Comparison |a-b| can be evaluated as |a-b| mod 2∆ without ambiguity
- Hence state metrics can be updated and compared modulo  $2\Delta_{max}$ 
  - Choose state metric precision to implement modulo by ignoring the state metric overflow
    - Required state metric precision equal to twice the maximum dynamic range of the updated state metrics
    - Required number of bits is  $\Gamma_{bits} = ceil[log_2(\Delta_{max} + k\lambda_{max})] + 1$ 
      - k accounts for branch metric addition
    - Example values (for the 32-state radix-4 decoder)

• k=2, 
$$\lambda_{max}$$
=14,  $\Delta_{max}$ =70,  $\Gamma_{bits}$ =8



# Branch metric unit

### Example (8-level soft input, R=1/2, K=6 (32 state)

- $\lambda(S_1S_2)=|G_1-S_1|+|G_2-S_2|$  (G-received sample, S-expected sample) ( $\lambda_{max}=14$ )
  - 4 bits required for radix-2 branch metrics
  - 5 bits for the radix-4 branch metrics

Images removed due to copyright restrictions.



# State-metric initialization

- Need to start from right state metrics for dynamic range bound to hold (and for modulo arithmetic to be valid)
- This is b/c there are constraints on the state metric values imposed by the trellis structure
  - For example state-0 and state-1 have a common ancestor state one iteration back
    - This constrains the state metrics to differ at most by the  $\lambda_{max}$
- Find the right initial metric through simulation (with all zero inputs) until steady state is reached

Figure removed due to copyright restriction.



# Decoder block diagram



Figure by MIT OpenCourseWare.

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].



# **Decision memory organization**

#### Survivor paths always merge L=5K steps back



Figure by MIT OpenCourseWare.



### Advanced algorithmic transformations

- Sliding block Viterbi decoder (SBVD)
  - Based on two important observations (µ+1=constraint length of the code)
  - 1. Survivor paths merge L=5µ iterations back into the trellis
  - 2. After K=5µ steps, state metrics independent on the initial value of state metrics
- Unknown state at time n can be decoded using only information from the block [n-K, n+L]
- Cannot store all the values in the memory
  - Have to obtain them "on-the-fly"



# SBVD implementation

Can find shortest path by running forward or backward

Concatenated

- At step m
  - Forward processing
    - 4 survivors
  - Backward processing
    - 4 shortest paths
  - Combined
    - Smallest concatenated state metri
    - Starting state for trace-back of the shortest path

Figure from Black, P. J., and T. Y. Meng. "A 1-Gb/s, Four-state, Sliding Block Viterbi Decoder." IEEE Journal of Solid-State Circuits 32 (1997): 797-805. Copyright 1992 IEEE. Used with permission.

### Continue fw and backw operation

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. 6.973 Communication System Design



Forward

Backward



13

### Forward vs. Forward-Backward



Figure from Black, P. J., and T. Y. Meng. "A 1-Gb/s, Four-state, Sliding Block Viterbi Decoder." *IEEE Journal of Solid-State Circuits* 32 (1997): 797-805. Copyright 1992 IEEE. Used with permission.

Can decode more than one state (M – states)

#### Fw-Bw has reduced decoding delay and skew buffer memory

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].



## Continuous stream processing



Figure from Black, P. J., and T. Y. Meng. "A 1-Gb/s, Four-state, Sliding Block Viterbi Decoder." *IEEE Journal of Solid-State Circuits* 32 (1997): 797-805. Copyright 1992 IEEE. Used with permission.

- Cut the incoming stream in overlapping chunks
- Process in parallel
- Outputs are non-overlapping

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].



# Systolic SBVD architecture



Figure from Black, P. J., and T. Y. Meng. "A 1-Gb/s, Four-state, Sliding Block Viterbi Decoder." *IEEE Journal of Solid-State Circuits* 32 (1997): 797-805. Copyright 1992 IEEE. Used with permission.

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

# Example for L=2



Figure from Black, P. J., and T. Y. Meng. "A 1-Gb/s, Four-state, Sliding Block Viterbi Decoder." *IEEE Journal of Solid-State Circuits* 32 (1997): 797-805. Copyright 1992 IEEE. Used with permission.

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].



# ACS units



Figures from Black, P. J., and T. Y. Meng. "A 1-Gb/s, Four-state, Sliding Block Viterbi Decoder." *IEEE Journal of Solid-State Circuits* 32 (1997): 797-805. Copyright 1992 IEEE. Used with permission.

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].



# A 64-state example

- Not fully parallel (8 radix-4 ACS units)
  - 2 radix-4 butterflies in each cycle
  - 8 cycles for 64 states radix-4 (i.e. two radix-2 steps)

Images removed due to copyright restrictions.

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].



# Readings

- [1] A.P. Hekstra "An alternative to metric rescaling in Viterbi decoders," *Communications, IEEE Transactions on* vol. 37, no. 11, pp. 1220-1222, 1989.
- [2] P.J. Black and T.H. Meng "A 140-Mb/s, 32-state, radix-4 Viterbi decoder," *Solid-State Circuits, IEEE Journal of* vol. 27, no. 12, pp. 1877-1885, 1992.
- [3] P.J. Black and T.Y. Meng "A 1-Gb/s, four-state, sliding block Viterbi decoder," Solid-State Circuits, IEEE Journal of vol. 32, no. 6, pp. 797-805, 1997.
- [4] M. Anders, S. Mathew, R. Krishnamurthy and S. Borkar "A 64-state 2GHz 500Mbps 40mW Viterbi accelerator in 90nm CMOS," VLSI Circuits, 2004. Digest of Technical Papers. 2004 Symposium on, pp. 174-175, 2004.

