Assembly Simulator
by Juan Pablo Vega
Problem Statement
The Assembly Simulator is... actually an assembly emulator
for an imaginary processor.
A processor is an operation crunching unit consisting of
an Arithmetic-Logic-Unit, a register to store comparisons,
and has a n-byte addressable memory.
An assembly language is low-level programming language designed
for a specific processor and with a limited instruction set.
In this problem, each instruction has a specific structure:
[label] command operand[s]
The first element,
[label]
, is an optional identifier
for the specific line of the instruction in case of a program branch.
The second element, command
, is the instruction
per se defining either an arithmetic or logical operation
on specific operands.
The third element, operand[s]
is a set of operands with
which the previous instruction must be executed.
The full set of instruction is given in the image below:

The objective is to write a code that emulates a processor able to
execute a short program based on the previous assembly language.
The input is the (useless) size for the memory S and a list of
ordered instructions I to execute.
The output is the result of any
PRINT
command. Hardware Architecture: Lists Everywhere
Let's first define the architecture we want to emulate. As we already
stated, there is a n-byte memory, a register and a program. And yes,
lists are the easiest data structure in Python so we are going to use
them to define our hardware. In addition to those three lists, we will
add a variable index that points to the next instruction to be executed,
called the program counter or PC.
memory = [0 for k in range(256)] register = [False for k in range(6)] program = list() PC = 0
The elements of
memory
are the memory content, the elements of
register
are flags corresponding to different results of the
comparison and the elements of program
are the instructions of
the program to be executed.
Fecth-Decode-Execute: A Short Loop
Let's assume for a moment that we have loaded our program into
program
. We would have that each instruction is represented
by a list of three elements: an optional label (replaced by a white space
when not used), a string naming a specific command, and a string
representing any valid set of operands.
For instance:
loop PRINT A5,AF
would be represented by
['loop', 'PRINT', 'A5,AF']
.
Now, suppose that we define each and every command as a Python function
that accepts a single list as argument. Said functions would be responsible
to decode the content of the set of operands, based on the definition
of the command they each represent, and then execute the command.
We could then link each function to its name in string format with a
hash-table to enable the command itself to be decoded.
As a result of the previous program structure, we would define the main
loop a Fetch-Decode-Execute loop like this:
while PC < len(program): instruction = program[PC] PC += 1 ISA[instruction[1]](instruction[2])
where the first line establishes the loop, the second line corresponds
to the Fetch stage, the third line is the increment of the PC,
and the fourth line implements the Decode stage for the commands.
ISA[i](j)
are calls to command functions,
which perform the Decode stage for the operands and then the
Execute stage.
The hash-table
ISA
is defined below:
ISA = { 'PRINT' :Print, \ 'MOVE' :Move, \ 'ADD' :Add, \ 'SUB' :Sub, \ 'AND' :And, \ 'OR' :Or, \ 'XOR' :Xor, \ 'COMP' :Comp, \ 'BEQ' :Beq, \ 'BNE' :Bne, \ 'BGT' :Bgt, \ 'BLT' :Blt, \ 'BGE' :Bge, \ 'BLE' :Ble }
Instruction Set Architecture: Easy Parsing
Before we travel through the meanders of each command function,
we need to define two helper functions. We need to be able to going
back and forth between string format and integer representation for
hexadecimal values. Indeed, we need to decode those values from the
operands, where they are written in string format, and use them
either as constants or as memory addresses.
Python makes it easy to implement both functions as shown below:
def hextoint(x): return int(x,16) def inttohex(x): return hex(x)[2:].upper()
We are ready to define the last big piece of the puzzle:
the command functions. We will first define
PRINT
to
introduce operands parsing and memory management, as well as the
output to our program. Then, we'll move on to MOVE
where we take operands parsing and memory management to another
level. Finally, we'll define how COMP
works and how
branching instructions like BEQ
operates. Extending
The functions described below gives a full definition of the emulator.
Output Operation: PRINT
Following the definition of the command
PRINT
, we first
need to split the set of operands in at least one hexadecimal value.
If there is only one operand, we only print the memory content at
the location specified by the operand. If two operands are present,
we print al memory content between the location specified by the
first operand and the location specified by the second one, both
included.Notice the use of
hextoint
and inttohex
and
the way memory
is addressed.def Print(x): x = x.split(',') t = inttohex(memory[hextoint(x[0])]) if len(x) > 1: for k in range(hextoint(x[0])+1, hextoint(x[1])+1): t = t + ' ' + inttohex(memory[k]) print(t) return
Sequential Operations: MOVE
In
PRINT
, the operands where direct memory addresses.
It is not the case for all commands. Following the definition of
MOVE
, we first need to split the set of operands in
two hexadecimal values. We then classify each of them in one of
three categories: constants, direct memory addresses and indirect
memory addresses. And operand belongs to the first category if its
first character is a # symbol, it belongs to the third
category if it is delimited by ( and ), and it belongs
to the second category when the value is let alone.
Notice the use of
hextoint
and inttohex
and
the way memory
is addressed.
def Move(x): global memory x = x.split(',') if (x[0][0] == '#'): d = hextoint(x[0][1:]) elif (x[0][0] == '('): d = memory[memory[hextoint(x[0][1:-1])]] else: d = memory[hextoint(x[0])] if (x[1][0] == '('): memory[memory[hextoint(x[1][1:-1])]] = d else: memory[hextoint(x[1])] = d return
Branching Operations: COMP
and BEQ
The branching behaviour is split in three steps. First we compare
two operands and store the result in our register. We then evaluate
if a particular condition has occurred to enable the jump operation.
Finally the PC is updated properly.
The first part is similar to
PRINT
and MOCE
and is defined in COMP
. Following its definition, we first
split the set of operands in two hexadecimal values. After proper parsing,
we store the result of the comparison.
Notice the meaning of each element of
register
.
def Comp(x): global register x = x.split(',') if (x[0][0] == '#'): A1 = hextoint(x[0][1:]) else: A1 = memory[hextoint(x[0])] A2 = memory[hextoint(x[1])] register = [A1 == A2, \ A1 != A2, \ A1 > A2, \ A1 < A2, \ A1 >= A2, \ A1 <= A2] return
The second part is as easy as it seems. It is a logical predication,
followed by a condition function call. The index of
register[i]
that is tested changes depending of the
condition, but jump
is always the function called.
BEQ
is one such predication, as shown below: def Beq(x): if register[0]: jump(x) return
Given that the branching destination is always expressed as a label,
we need a way to decode the operand from a specific label to a valid
memory address. To do so we recall that the first element of each
program line is the label. We then (arbitrarily) search for the last
occurrence of the label given in the program, and use
this line as the
Notice that this update is performed after the automatic update of the main loop.
program counter
update value.
Notice that this update is performed after the automatic update of the main loop.
def jump(x): global PC for k in range(len(program)): if program[k][0] == x: PC = k return
The Code
The complete code that solves the challenge is given below:
# Assembler Simulator, by Juan Pablo Vega # IEEEXtreme 8.0 (2014) def hextoint(x): return int(x,16) def inttohex(x): return hex(x)[2:].upper() def tobyte(x): return x&255 def Print(x): x = x.split(',') t = inttohex(memory[hextoint(x[0])]) if len(x) > 1: for k in range(hextoint(x[0])+1, hextoint(x[1])+1): t = t + ' ' + inttohex(memory[k]) print(t) return def Move(x): global memory x = x.split(',') if (x[0][0] == '#'): d = hextoint(x[0][1:]) elif (x[0][0] == '('): d = memory[memory[hextoint(x[0][1:-1])]] else: d = memory[hextoint(x[0])] if (x[1][0] == '('): memory[memory[hextoint(x[1][1:-1])]] = d else: memory[hextoint(x[1])] = d return def Add(x): global memory x = x.split(',') if (x[0][0] == '#'): A1 = hextoint(x[0][1:]) else: A1 = memory[hextoint(x[0])] A2 = memory[hextoint(x[1])] memory[hextoint(x[1])] = tobyte(A2 + A1) return def Sub(x): global memory x = x.split(',') if (x[0][0] == '#'): A1 = hextoint(x[0][1:]) else: A1 = memory[hextoint(x[0])] A2 = memory[hextoint(x[1])] memory[hextoint(x[1])] = tobyte(tobyte(A2) + 256 - tobyte(A1)) return def And(x): global memory x = x.split(',') if (x[0][0] == '#'): A1 = hextoint(x[0][1:]) else: A1 = memory[hextoint(x[0])] A2 = memory[hextoint(x[1])] memory[hextoint(x[1])] = tobyte(A2 & A1) return def Or(x): global memory x = x.split(',') if (x[0][0] == '#'): A1 = hextoint(x[0][1:]) else: A1 = memory[hextoint(x[0])] A2 = memory[hextoint(x[1])] memory[hextoint(x[1])] = tobyte(A2 | A1) return def Xor(x): global memory x = x.split(',') if (x[0][0] == '#'): A1 = hextoint(x[0][1:]) else: A1 = memory[hextoint(x[0])] A2 = memory[hextoint(x[1])] memory[hextoint(x[1])] = tobyte(A2 ^ A1) return def Comp(x): global register x = x.split(',') if (x[0][0] == '#'): A1 = hextoint(x[0][1:]) else: A1 = memory[hextoint(x[0])] A2 = memory[hextoint(x[1])] register = [A1 == A2, \ A1 != A2, \ A1 > A2, \ A1 < A2, \ A1 >= A2, \ A1 <= A2] return def Beq(x): if register[0]: jump(x) return def Bne(x): if register[1]: jump(x) return def Bgt(x): if register[2]: jump(x) return def Blt(x): if register[3]: jump(x) return def Bge(x): if register[4]: jump(x) return def Ble(x): if register[5]: jump(x) return def jump(x): global PC for k in range(len(program)): if program[k][0] == x: PC = k return memory = [0 for k in range(256)] register = [False for k in range(6)] program = list() PC = 0 ISA = {'PRINT':Print, 'MOVE':Move, 'ADD':Add, \ 'SUB':Sub, 'AND':And, 'OR':Or, 'XOR':Xor, \ 'COMP':Comp, 'BEQ':Beq, 'BNE':Bne, \ 'BGT':Bgt, 'BLT':Blt, 'BGE':Bge, 'BLE':Ble} input() EOF = False; while not EOF: try: inputLine = input().split() if len(inputLine) == 2: line = list(' ') line.extend(inputLine) else: line = inputLine program.append(line) except EOFError: EOF = True; while PC < len(program): instruction = program[PC] PC += 1 ISA[instruction[1]](instruction[2])
Files
This drive folder contains:
- The solution given in this article
- Test cases
- An alternative solution by Juan I Carrano