Monday, January 18, 2016

Inside the armv1 - decoding barrel-shifter commands

This is one of a number of posts on my work on reverse engineering the armv1 processor.  The first in the series, and an index of the other articles can be found here.

Today I'm going to solve a puzzle I have been pondering for some time - how the processor implements instructions that reference 4 registers.

If we look at the data processing (DP) instruction format in more detail, we see that there are the following instruction types:

If we set Bit 15 to zero (Operand 2 in a register), and Bit 4 to one (Shift amount in a register) we get the following layout (made from copying/pasting portions of the image above):

This layout plainly shows this this single instruction references 4 registers simultaneously - Rd (the destination register), Rn (one of the ALU operands), Rm (the second ALU operand), and Rs (which gives the amount by which Rm is first shifted).

However various architecture descriptions of the arm's Data Processing instructions only refer to there being two input operands and one output register. This matches the descriptions of two data read buses (referred to as "Bus A" and "Bus B" or "read bus A" and "read bus B"), and also matches there being just 3 sets of register select logic (which was explored in detail in an earlier post). So how does the processor execute an instruction which references 4 registers?

A clue to solving this mystery was in my last post, where I analysed how the instruction decoder works. By referencing the first table in that post we see that the instruction decoder treats as a special case all Data Processing instructions where the shift amount is in a register. These instructions execute in 2 cycles, one more cycle that all other Data Processing instructions. It would be safe to bet that the first cycle extracts the shift amount from the Rs register and holds the value somewhere, and on the second cycle actually carries out the ALU operation.

Let's now move to the processor itself to verify that our guess is correct. The area of the chip that we're interested in is highlighted in red below. Ken Shirriff has already given an overview of how the barrel shifter works here:

If we zoom in a little further, we see that there are two distinct sections in this area:

The lower area generates the column drive signals to the barrel shifter itself. The shift-amount and shift-type is via signals originating in the upper "Barrel Shifter Decode Selection" logic. We won't look at the driver logic in any further detail in this post. Instead we turn to the upper section, and start by zooming in some more:

The layout of this upper section is spectacular in that all the inputs and outputs to the logic are very readily apparent, and are marked on the diagram:

  • The I-Bus inputs b11..b5 correspond to the Shift Amount (b11..b7) and Shift Type (b6..b5) that we saw in the instruction layout we looked at above. 
  • The outputs on the RHS - Shift Amount (5 bits) and Shift Type (2 bits) are the signals that feed the Barrel Shifter Driver Logic that we saw earlier. 
  • The 3x outputs from the PLA (nodes 8287, 8288, b286) that enter the area from above correspond to columns 2, 3, 4 of the PLA output table I included in my last blog.
  • Two signals derived from the lowest two address outputs lines enter the area from above and to the right.
  • 4x signals associated with Carry processing enter/exit from the lower edge.
The main logic areas are also apparent:

We've found the 8-bit wide dynamic latch that stores the register-sourced shift-amount from the Read Bus!

The other key logic is the group of seven 6-way multiplexers, whose outputs feed via the seven drivers to the rest of the barrel-shifter logic.

The 3 to 8 decoder is driven from the 3x PLA outputs (nodes 8287, 8288, b286) and 6 of its outputs select which of the 6-way multiplexer inputs is chosen. A further output controls the Dynamic latch, and the final 3 to 8 decoder output is not implemented.

The remaining logic that is not highlighted in the diagram is complex and convoluted; it's task is to ensure that the correct shift results occur, even when a shift amount greater than 32 is selected. This includes ensuring that the Carry bit is set appropriately and that the sign bit is extended in the correct manner. The rules are described on page 2-34 of the VTI arm databook (1990). I won't dwell further on this part of the circuit.

The dynamic latch circuitry and associated latch processing is straightforward:

The data on the Read Bus is latched during phase 1 of the clock only when output 7 of the 3-to-8 decoder has been selected (i.e. all 3x inputs from the PLA are high). The latched data (or zero) is then available on one of the 6 inputs to the multiplexer. Zero is selected depending on the complex logic referred to earlier.

The output driver circuitry for each of the 7 signals is just two inverters in series.

The multiplexer circuitry, and decoding circuitry, is identical in form to the read bus input multiplexer I described in my earlier blog on register selection, and won't be repeated here.

Just the following "glue logic" circuits generate additional inputs to the multiplexer:

Note how the Shift Type in I-Reg b6 and I-Reg b5 are potentially adjusted, dependent on complex logic, in a similar manner to how the shift-amount described earlier might be adjusted.

The table below summarises the multiplexer's operation, and lists what the 7x outputs to the Barrel Shift Driver Logic are for each of the 8 combinations on the 3 signals from the PLA.

If we compare these PLA values, and the barrel-shifter outputs, with the values in the PLA output table from my previous article, it starts to makes sense.

Let's take row 1 of the PLA table as the first example. This row decodes a Data Processing instruction where Operand 2 is a register and the shift amount is immediate (i.e. the amount is in the instruction). The PLA values fed into the table above are "001" (row 1). This selects that the Shift Amount and Shift Type sent to the Barrel Shifter Driver Logic is b11..b5 of the DP instruction, exactly as we would expect.

Now let's examine an instruction that has the shift amount in a register - the situation I began this blog with. We see from the PLA table that this instruction type takes two cycles to execute (rows 2 and 3). The PLA values fed into the table above on the first cycle are "111" (row 7), which is a command to latch the content of the register which is present on the read bus. The PLA values on the second cycle are "000" (row 0), which feed the (possibly modified) values of the latch (for the shift amount) and the instruction's Shift Type to the Barrel Shifter Driver Logic.

We can now deal with the remaining PLA input types:

  • "010" - this appears in a number of lines in the PLA output table where the result of the ALU is not used, and ensures the Carry bit, etc. are not inadvertently affected. It can be regarded as a no-op.
  • "011" - appears only on row 27 of the PLA output table, and corresponds to the first instruction cycle of a Branch or Branch and Link instruction. In these cases the immediate value in the instruction is a word address, and the Barrel Shifter is instructed to shift this value left by two to convert it to a correct memory address.
  • "100" - appears only on row 4 of the PLA output table, and corresponds to a DP instruction where operand 2 is immediate. In this case the value to be rotated is in the lowest 8 bits of the instruction, and is to be rotated right by twice the amount in b8..11 of the instruction (see the instruction format at the beginning of the blog). The "glue" logic chooses a Shift Type of "11" (Rotate Right) when the shift amount is non-zero or "00" (LSL) for when a 0 shift is selected.
  • "101" - appears on row 7 and row 12 of the PLA output table. These both correspond to the last cycle of a LDR (Load Register from memory) instruction. Here, the Shift Amount is x8 the value appearing on Address line a0, a1. These address lines are only non-zero if a byte access has occurred, so this means that the lowest 8 bytes of data that has just been read from memory is rotated into it's correct position before being output from the barrel-shifter.
I ignored the reserved/undocumented instruction in the analysis above. However, even though our reverse-engineering of the chip is far from complete, from the information available we already know quite a lot about variant 1 of the reserved instruction:

  • Cycle 0 (row 15): selects "111" to save a register value in the barrel-shifter latch.
  • Cycle 1 (row 16): selects "000", which shifts the number now on the read bus by the amount in the latch
  • Cycle 2 (row 17): selects "001", perform a further shift, by the amount, and type specified in the instruction.
  • Cycle 3 (row 18): selects "101", which corresponds the byte rotate operation associated with a LDR instruction.
From this set of steps we can make a guarded guess that this instruction is loading data from memory, and that, since it takes one more cycle than a standard LDR instruction, the address is calculated using the content of one register as a shift amount, in addition to the typical LDR address calculation. Our being able to accurately predict what the reserved instructions do will be a good test of the accuracy of the reverse-engineering!


We've now reverse-engineered the main logic associated with controlling the barrel-shifter. There remains some logic still to reverse-engineer for us to fully understand all the edge cases, but fortunately this logic is very isolated and does not detract from our wider understanding of how the barrel-shifter works and is controlled. We have also found that the barrel-shifter is put to extensive use for a variety of tasks, not just for the Data Processing (DP) instructions. We also have garnered some "teaser" information about one of the reserved instructions.


  1. What is the software that was used to create the logic circuits of this power supply. Is it creately ?