Feynman Lectures on Computation: Solutions || Finite Ape

Before we start, please note that these are my solutions, not the solutions.

Chapter 1

For future reference:

Available on the pad:

Registers: A, B, X (general purpose), C (carry bit), PC (program counter)
Memory locations: M (general memory addresses), “instruction file cabinet”

Available Instructions:

Data Transfer: $A \leftarrow (M)$, $M \leftarrow (A)$, $B \leftarrow (M)$, $M \leftarrow (B)$, $X \leftarrow (B)$, $B \leftarrow (X)$, $PC \leftarrow (X)$, $X \leftarrow (PC)$
Arithmetic: $A \leftarrow 0$ (Clear A), $A \leftarrow (A) + (B)$ (Add), $A \leftarrow (A) + 1$ (Increment), $A \leftarrow (A) - 1$ (Decrement)
Bit Operations: Shift Left A, Shift Right A
Logical: $A \leftarrow (A) \land (B)$ (AND), $A \leftarrow (A) \lor (B)$ (OR), $A \leftarrow (A) \oplus (B)$ (XOR)
Direct Load: $B \leftarrow N$ (load constant N), $M \leftarrow 0$ (Clear memory location)
Control Flow: $PC \leftarrow (Z)$ (Jump), $PC \leftarrow (Z)$ if $C=1$ (Conditional Jump), $PC \leftarrow (PC) + (Z)$ (Relative Jump)
Special: Halt

NB: In what follows, I am a bit more liberal with the instructions. I sometimes shift $X$ instead of $A$, etc.

Problem 1.1

(a)

Yes, we can get some efficiency gains up to a certain point.

Since the computational model and the concepts of time complexity are not (yet?) defined formally, my assumption is that, it takes constant amount of work to process one card, which is compatible with the instruction set/capabilities that are discussed in the chapter. (Let’s say processing one card takes one second for any clerk.)

Suppose there are $n$ salesman and their cards are numbered (and filed in “locations”) $[0, \ldots, n-1]$. To illustrate the method, notice that if two clerks divide the locations equally, each one finding the total California sales in his half and writes their numbers in some prespecified card (probably together with some extra boolean variable about whether it has been updated already), and then either one of the two clerks or a third one adds up the two numbers from the cards, we more or less cut the time by half. (We can let the clerk who finishes his own half early OR who covers the range with the lower set of indices do the final sum but it makes the instructions more complicated and since we don’t seem to be concerned with the number of clerks, I am going to be wasteful.)

We can generalize this further. For now, suppose $n = 2^{K}$ for some positive integer $K$. Then, $n$ clerks can each check one card and report the total California sales from that card, $\frac{n}{2}$ clerks can aggregate the sums from the previous round, and so on until 1 clerk sums up the partial sums from the previous round. This way, we need at most $n + \frac{n}{2} + \cdots + 1 = 2n$ clerks to do the job in time proportional to $\log_2 n = K$ rounds. Each round takes constant amount of work, so the total time is proportional to $\log_2 n = K$. Of course, when $n$ is not a power of 2, we might need to be wasteful in some cases so we would round it up.

This is possible because in this problem, we could process the subproblems independently of each other, and then aggregate the solutions of the subproblems to find the final answer.

(b)

This breaks down whenever every step needs the result of the previous step.

Suppose we are on a scavenger hunt in a large library. The location of the trophy I am seeking is in some book $k$ but to find book $k-1$. I have to find the book $k-1$ first, so that it can tell me where to go next. (I’ll assume brute-force is impossible.) Then, the only way forward is to start from the initial clue, find the first book, get a hint, find the second book, etc. until the end.

(My first thought was generating the Fibonacci sequence, since its nature is also sequential but I then read that since its evolution is regular, we can essentially represent it as a system of difference equations and apply matrix operations, which allow for some parallelization. Then, I thought about the “prefix sum” problem but apparently there are ways to parallelize this as well…)

(c)

We could essentially use the same idea as above. We start with $n$ clerks who are assigned one bit each to compare. Let’s say they return 1 if there is no difference and return 0 if the bits are different. So, essentially, clerk $i$ returns $A_i = B_i$¹, if we let $A, B$ stand for the sequences of bits for the two numbers. We now have a single binary sequence of length $n$ and if there is a single 0, we want to say that the numbers are different.

Since the time complexity seems to be the main concern, I’ll ignore the details of the implementation.

Intuitively, we can now get $\frac{n}{2}$² clerks to consolidate two of the results each (by applying $\land$ to the numbers). And then, $\frac{n}{4}$ clerks can each “sum” the two results from the previous results in the same manner, and so on until we get a single number. If the final number is 0, we conclude that the numbers are different.

This way, we need $\log_2 n$ rounds, since this is how many times we need to “merge” the results to reach the final conclusion, and $2n$ clerks, assuming we ask each of them to perform in only one round (or we could just use $n$ clerks and give them a bit more complicated instructions.)

(d)

For adding two $n$-bit numbers in $O(\log n)$ time, as Feynman hints, the main challenge indeed showed itself to be managing the carry. While we can get $n$ clerks to compute the local sums $A_i + B_i$ in parallel (each producing a sum bit and a potential carry), the carries create sequential dependencies that would take $O(n)$ time to resolve.

After playing with it a bit, I could tell that we needed to do some extra work in each step; something like reporting both “sum if no incoming carry” and “sum if positive carry”. At some point, I fell victim to my curiosity and read about the carry-lookahead adder. (I may revisit this in the future!)

Problem 1.2

Part 1: Basic Instructions for Multiplication

Luckily, we have everything needed to implement multiplication. We can essentially use the algorithm outlined in the text itself.

To multiply $X \times Y$, we repeatedly add $X$ to itself $Y$ times, but we can optimize this significantly. If we think of $Y$ in binary, we only need to add $X \times 2^i$ for each bit position $i$ where $Y$ has a 1. The shift operation gives us the powers of 2 for free.

More precisely:

Clear $A$ just to be sure it doesn’t have junk values and load $X$ into $B$ and $Y$ into $D$³
Then, WHILE $Y$ (or the value in $D$) is not zero: Check if the least significant bit in $D$ is 1 (using AND with 1), and if so, add the current value of $X$ (the number in $B$) to $A$, and shift the number in $B$ (which is given by $X \times 2^t$ for some $t$) left (doubling it) and shift $Y$ right (effectively removing the bit we just processed).
HALT when $Y$ becomes zero.

Part 2: Subroutine Implementation

This is where we get to use the JUMP command. It seems the main issue is that we need to remember where to return, once we are done with the multiplication.

As suggested, we store our multiplication algorithm in memory locations $m$ through $m+k$ (which certainly needs to be written in a less hand-wavy way than I did above…)

I expect something like the following to work well:

Before jumping, we save our return address: $X \leftarrow (PC)$
Jump to the multiplication routine: $PC \leftarrow m$
The multiplication routine does its work using locations $m$ through $m+k-1$
At location $m+k$, return to the caller: $PC \leftarrow (X)$

Alternatively, we may not want to take up the precious memory locations on the pad. We could instead note where we are before the multiplication on a card in the “filing cabinet”, and instruct the “clerk” to check there and copy the number there into its PC.

Chapter 2

Problem 2.1: 8 to 3 Encoder

Notice that we have 8 inputs and exactly one of them will be “on”, and we are allowed 3 binary outputs. (In the solution, I will assume that exactly one input is “on” and I will not check this.) Luckily for us, $8 \leq 2^3$; in fact $8 = 2^3$. So, we can simply number the input wires from 0 to 7 and output the number of the wire that is on.

One output can be 0 if the input is even (i.e., “the number of the input wire which is on” is even), and 1 otherwise. Another output could be 0 if the input is 0, 1, 4 or 5 and 1 otherwise (i.e., the remainder of the input from division by 4 is 0 or 1). Finally, the third output could be 0 if the input is less than or equal to 3 and 1 otherwise. Together, these three outputs pin down the exact input for every possible input.

Let’s represent the input wires with $w_0, w_1, \ldots, w_7$. We define them such that $w_i$ is 0 if that input is off and 1 otherwise. Then, we want:

Output 0 to be given by $w_1 \lor w_3 \lor w_5 \lor w_7$
Output 1 to be given by $w_2 \lor w_3 \lor w_6 \lor w_7$
Output 2 to be given by $w_4 \lor w_5 \lor w_6 \lor w_7$

8-to-3 Binary Encoder Circuit

Circuit diagram showing how 8 input lines (w₀ through w₇) are encoded into 3 binary output lines using OR gates. Each OR gate implements the logical OR of the inputs corresponding to the bit positions in the binary representation of the input number.

Problem 2.2: Half Adder

I think what Feynman means by a “simple adder” is a half adder, meaning an adder with takes two bits and returns the sum and the carry.

Let’s say we have two input bits, $A$ and $B$. In this case, we need $A \oplus B $ for the sum and $A \land B $ for the carry. We are given the AND gate for free but we are supposed to use AND, OR, and NOT only. It might make sense to try to make double use of the $A \land B$, since we need to compute it anyway. So, we can get $A \oplus B$ using $(\neg(A \land B) \land \neg(\neg A \land \neg B))$.

Given $A$ and $B$, we compute $(A \land B)$ and $(\neg A \land \neg B)$. Then, we output $(A \land B)$ as the carry while also negating both of these and applying AND to both of them, and returning the outcome as the sum.

Half Adder Circuit

Implementation of a half adder using AND, OR, and NOT gates only.

Problem 2.3: Full Adder

This time, we also have a “carry-in”. Conceptually, it doesn’t make a big difference; we are adding three 1-bit numbers, instead of 2. However, since our operations are defined in a binary way, we need to do a bit more work to achieve this. Thanks to the commutativity and associativity of addition, we know that we could do this in any number of ways. However, since there are three inputs, the process will be asymmetric (at least, if we try to be more efficient). I think this asymmetry can be used to increase the efficiency but we don’t have the right model for that right now.⁴

One thing that is somewhat obvious is that we can use a half-adder to add any two of the inputs, and get some intermediate SUM and CARRY outputs. Then, if the third input is 0, we could simply return the results whereas if it is 1, we would have to update the intermediate outputs. I think a direct approach is cleaner and not much more of a rewrite so I’ll attempt that here.

For the sum bit, we want to return 1 if an odd number of the inputs are 1 and return 0 otherwise. So, we could do something like $ (A \oplus B) \oplus C $⁵. This gives us the sum. For the carry, we want it to be 1 if at least two of the inputs are 1 and 0 otherwise. If two of the inputs are 1, then their AND would be 1. So, we could take the pairwise ANDs, and then put all of these through OR. The end result could be something like $((A \land B) \lor (A \land C) ) \lor (B \land C)$⁶.

Full Adder Circuit

Implementation of a full adder that adds three 1-bit inputs (A, B, and carry-in C) to produce sum and carry outputs.

Problem 2.4: Ripple-Carry Adder

Here, I assume we are still working with a binary addition, as opposed to summing up any number of r-bit numbers. I’ll also abstract away the 1-bit full adder with its own black box, similar to the simpler operations like AND, OR, etc. from above.

So, we need to add two $r$-bit binary numbers. Let’s call them $A = A_{r-1}A_{r-2}\ldots A_1A_0$ and $B = B_{r-1}B_{r-2}\ldots B_1B_0$. Essentially, we have $2\times r$ input bits and $r + 1$ output bits. So we want to chain $r$ full adders together, where each full adder handles one bit position:

Bit 0 (rightmost bit): Full adder takes $A_0$, $B_0$, and $C_{-1} := 0$
Bit 1: Full adder takes $A_1$, $B_1$, and the carry output from bit 0 ($C_0$)
Bit i: Full adder takes $A_i$, $B_i$, and the carry output from bit i-1 ($C_{i-1}$)
Bit r-1 (leftmost bit): Full adder takes $A_{r-1}$, $B_{r-1}$, and the carry output from bit r-2 ($C_{r-2}$)

As usual, the carry will propagate from right to left:

$\text{FA}_0$: $A_0 + B_0 + 0 \rightarrow S_0, C_0$
$\text{FA}_1$: $A_1 + B_1 + C_0 \rightarrow S_1, C_1$
$\ldots$
$\text{FA}_{r-1}$: $A_{r-1} + B_{r-1} + C_{r-2} \rightarrow S_{r-1}, C_{r-1}$

where $C_{r-1}$ becomes the final carry output $C_{out}$.

r-bit Ripple-Carry Adder

Circuit diagram showing how r 1-bit full adders are chained together to create an r-bit adder.

Chapter 3

Problem 3.1

I don’t understand what he expects in this problem. It seems I am not alone in not being able to follow his thinking. I can find several ways to solve it but they either require violating his model of FSM or using a different number of states than 4. Curious to know how others interpreted this.

Leslie: IDKY (I don’t know yet), FNIZ (first number is zero), 0, 1

Problem 3.2

I am not sure about how to interpret this problem. Do we take two inputs at once and output both of them at the same time? If so, we don’t really need states; we could simply have a single state, and whatever the input is, we can simply return that as the output (and stay in the same state, of course). However, since Feynman says that we will need 4 states, I suppose we are going process the input sequentially. I am still not sure whether we are supposed to return the output at the same time (after both of the inputs are processed) or one by one. However, if it is one by one, we again don’t need 4 states. So, I suppose what he wants to say is that “process the inputs sequentially but output the results altogether”.

The issue is that, in his model, we haven’t really seen an option to not output anything while making a transition. Moreover, it is also not clear if we are allowed to emit several outputs in one transition. However, since none of the other interpretations of the problem make sense, given his hint about needing 4 states, I have the following solution. (It is entirely possible that I misinterpreted the problem entirely…)

I constructed an FSM with 4 states: $Q = {00, 01, 10, 11}$. The transitions can be described as below:

Machine starts from the state $Q_{0} = 00$.
$Q_{t} = 00$ and $S_{t} = 0$, $Q_{t+1} = 00$ and $R_{t+1} = 0$.
$Q_{t} = 00$ and $S_{t} = 1$, $Q_{t+1} = 01$ and $R_{t+1} = 0$.
$Q_{t} = 01$ and $S_{t} = 0$, $Q_{t+1} = 10$ and $R_{t+1} = 0$.
$Q_{t} = 01$ and $S_{t} = 1$, $Q_{t+1} = 11$ and $R_{t+1} = 0$.
$Q_{t} = 10$ and $S_{t} = 0$, $Q_{t+1} = 00$ and $R_{t+1} = 1$.
$Q_{t} = 10$ and $S_{t} = 1$, $Q_{t+1} = 01$ and $R_{t+1} = 1$.
$Q_{t} = 11$ and $S_{t} = 0$, $Q_{t+1} = 10$ and $R_{t+1} = 1$.
$Q_{t} = 11$ and $S_{t} = 1$, $Q_{t+1} = 11$ and $R_{t+1} = 1$.

Finite State Machine for Problem 3.2

State diagram showing a 2-bit shift register FSM. The machine stores the last 2 inputs in its state and outputs the input from 2 time steps ago. State AB represents storing input A from 2 steps ago and input B from 1 step ago.

Note that first two outputs will be $0$, because that’s what we used as our initial state.

Problem 3.3

This time, I constructed an FSM with 2 states, representing the current carry bit: $Q = {0, 1}$. The transitions can be described as below:

Machine starts from the state $Q_{0}=0$.
$Q_t = 0$ and $S_t = 00$, $Q_{t+1}=0$, $R_{t+1}=0$.
$Q_t = 0$ and $S_t \in {01,10}$, $Q_{t+1}=0$, $R_{t+1}=1$.
$Q_t = 0$ and $S_t = 11$, $Q_{t+1}=1$, $R_{t+1}=0$.
$Q_t = 1$ and $S_t = 00$, $Q_{t+1}=0$, $R_{t+1}=1$.
$Q_t = 1$ and $S_t \in {01,10}$, $Q_{t+1}=1$, $R_{t+1}=0$.
$Q_t = 1$ and $S_t = 11$, $Q_{t+1}=1$, $R_{t+1}=1$.

Finite State Machine for Problem 3.3

State diagram showing a serial adder. States represent the current carry bit.

Problem 3.4

I’ll probably need to spend some time on this problem in some airports. As with all the other problems, I am struggling with the statement of the problem.⁷

We don’t really have the instruction for this but of course, we can build it up using $\neg (A_i \oplus B_i)$ but we don’t have $\neg$ either. For that, we can essentially use the following as a new instruction: $A \leftarrow A \oplus 1$. So, the entire formula would be $ ((A_i \oplus B_i)) \oplus 1$. ↩︎
I won’t bother with the ceiling, since we are dealing with asymptotics but obviously, whenever $n$ is not an integer power of 2, we technically need the ceiling. ↩︎
I use location $D$ to represent a memory address distinct from the other registers, mostly for convenience, since using $X$ for $Y$’s location would be confusing. I don’t think it makes sense to use $X$ location either because sometimes, $X$ would have to hold a number other than $X$. ↩︎
If we know anything about the statistical distribution of inputs from each stream $A$, $B$, and $C$, we could use this to our advantage to be slightly more efficient. For instance, if $C$ will be 0 a lot more often than the other two, this could be taken into account to make the overall process faster on average. However, we are clearly not given any such information, nor do we have a precise way of measuring the speed of each particular approach. So I’ll ignore this for now. ↩︎
I think we are allowed to use XOR in this problem. If not, we already know how to create it from the previous problem. ↩︎
So far, all the operations we considered are both associative and commutative. So, we technically don’t care about the order of these operations in anyway. However, I am using the parantheses to indicate the diagram that I will try to draw below, since our operations are still binary. ↩︎
Wikipedia has a short history of the efforts to improve the solutions to the problem but it sounds like a fun problem to work on so I wouldn’t spoil it by reading the solution. ↩︎