Monday, May 25, 2015

Assembly Simulator

Assembly Simulator

by Juan Pablo Vega

 

Problem Statement

 The Assembly Simulator is... actually an assembly emulator for an imaginary processor. A processor is an operation crunching unit consisting of an Arithmetic-Logic-Unit, a register to store comparisons, and has a n-byte addressable memory. An assembly language is low-level programming language designed for a specific processor and with a limited instruction set. In this problem, each instruction has a specific structure: 

[label] command operand[s]

The first element, [label], is an optional identifier for the specific line of the instruction in case of a program branch. The second element, command, is the instruction per se defining either an arithmetic or logical operation on specific operands. The third element, operand[s] is a set of operands with which the previous instruction must be executed.
The full set of instruction is given in the image below: 


The objective is to write a code that emulates a processor able to execute a short program based on the previous assembly language. The input is the (useless) size for the memory S and a list of ordered instructions I to execute. The output is the result of any PRINT command.

 

Hardware Architecture: Lists Everywhere


 Let's first define the architecture we want to emulate. As we already stated, there is a n-byte memory, a register and a program. And yes, lists are the easiest data structure in Python so we are going to use them to define our hardware. In addition to those three lists, we will add a variable index that points to the next instruction to be executed, called the program counter or PC.

memory   = [0 for k in range(256)]
register = [False for k in range(6)]
program  = list()
PC   = 0 
 
The elements of memory are the memory content, the elements of register are flags corresponding to different results of the comparison and the elements of program are the instructions of the program to be executed.

 

 

Fecth-Decode-Execute: A Short Loop


 Let's assume for a moment that we have loaded our program into program. We would have that each instruction is represented by a list of three elements: an optional label (replaced by a white space when not used), a string naming a specific command, and a string representing any valid set of operands.

 For instance:

loop PRINT A5,AF

 would be represented by

['loop', 'PRINT', 'A5,AF'].

Now, suppose that we define each and every command as a Python function that accepts a single list as argument. Said functions would be responsible to decode the content of the set of operands, based on the definition of the command they each represent, and then execute the command. We could then link each function to its name in string format with a hash-table to enable the command itself to be decoded.
As a result of the previous program structure, we would define the main loop a Fetch-Decode-Execute loop like this: 

while PC < len(program):
    instruction = program[PC]
    PC += 1
    ISA[instruction[1]](instruction[2])
 
where the first line establishes the loop, the second line corresponds to the Fetch stage, the third line is the increment of the PC, and the fourth line implements the Decode stage for the commands. ISA[i](j) are calls to command functions, which perform the Decode stage for the operands and then the Execute stage.

The hash-table ISA is defined below:
 
ISA = { 'PRINT' :Print, \
 'MOVE' :Move, \
 'ADD' :Add,  \
 'SUB' :Sub,  \
 'AND' :And,  \
 'OR' :Or,    \
 'XOR' :Xor,  \
 'COMP' :Comp,  \
 'BEQ' :Beq,  \
 'BNE' :Bne,   \
 'BGT' :Bgt,  \
 'BLT' :Blt,  \
 'BGE' :Bge,  \
 'BLE' :Ble  }

Instruction Set Architecture: Easy Parsing



 Before we travel through the meanders of each command function, we need to define two helper functions. We need to be able to going back and forth between string format and integer representation for hexadecimal values. Indeed, we need to decode those values from the operands, where they are written in string format, and use them either as constants or as memory addresses.

Python makes it easy to implement both functions as shown below:
def hextoint(x):
    return int(x,16)

def inttohex(x):
    return hex(x)[2:].upper()
 
We are ready to define the last big piece of the puzzle: the command functions. We will first define PRINT to introduce operands parsing and memory management, as well as the output to our program. Then, we'll move on to MOVE where we take operands parsing and memory management to another level. Finally, we'll define how COMP works and how branching instructions like BEQ operates. Extending The functions described below gives a full definition of the emulator.

 

Output Operation: PRINT


 Following the definition of the command PRINT, we first need to split the set of operands in at least one hexadecimal value. If there is only one operand, we only print the memory content at the location specified by the operand. If two operands are present, we print al memory content between the location specified by the first operand and the location specified by the second one, both included.

Notice the use of hextoint and inttohex and the way memory is addressed.

def Print(x):
    x = x.split(',')
 
    t = inttohex(memory[hextoint(x[0])])
    if len(x) > 1:
        for k in range(hextoint(x[0])+1, hextoint(x[1])+1):
            t = t + ' ' + inttohex(memory[k])
 
    print(t)
    return 

 

Sequential Operations: MOVE


 In PRINT, the operands where direct memory addresses. It is not the case for all commands. Following the definition of MOVE, we first need to split the set of operands in two hexadecimal values. We then classify each of them in one of three categories: constants, direct memory addresses and indirect memory addresses. And operand belongs to the first category if its first character is a # symbol, it belongs to the third category if it is delimited by ( and ), and it belongs to the second category when the value is let alone.
Notice the use of hextoint and inttohex and the way memory is addressed.

def Move(x):
    global memory
 
    x = x.split(',')
 
    if (x[0][0] == '#'):
        d = hextoint(x[0][1:])
    elif (x[0][0] == '('):
        d = memory[memory[hextoint(x[0][1:-1])]]
    else:
        d = memory[hextoint(x[0])]
    
    if (x[1][0] == '('):
        memory[memory[hextoint(x[1][1:-1])]] = d
    else:
        memory[hextoint(x[1])] = d   
    return

 

Branching Operations: COMP and BEQ


 The branching behaviour is split in three steps. First we compare two operands and store the result in our register. We then evaluate if a particular condition has occurred to enable the jump operation. Finally the PC is updated properly. 

The first part is similar to PRINT and MOCE and is defined in COMP. Following its definition, we first split the set of operands in two hexadecimal values. After proper parsing, we store the result of the comparison.
Notice the meaning of each element of register.

def Comp(x):
    global register
 
    x = x.split(',')
 
    if (x[0][0] == '#'):
        A1 = hextoint(x[0][1:])
    else:
        A1 = memory[hextoint(x[0])]
  
    A2 = memory[hextoint(x[1])]
 
    register = [A1 == A2, \
                A1 != A2, \
                A1 >  A2, \
                A1 <  A2, \
                A1 >= A2, \
                A1 <= A2]
    return 
 
The second part is as easy as it seems. It is a logical predication, followed by a condition function call. The index of register[i] that is tested changes depending of the condition, but jump is always the function called.
BEQ is one such predication, as shown below: 
def Beq(x):
    if register[0]:
        jump(x)
    return 
 
Given that the branching destination is always expressed as a label, we need a way to decode the operand from a specific label to a valid memory address. To do so we recall that the first element of each program line is the label. We then (arbitrarily) search for the last occurrence of the label given in the program, and use this line as the program counter update value.
Notice that this update is performed after the automatic update of the main loop.
def jump(x):
    global PC
 
    for k in range(len(program)):
        if program[k][0] == x:
            PC = k
    return

 

The Code


The complete code that solves the challenge is given below:
# Assembler Simulator, by Juan Pablo Vega
# IEEEXtreme 8.0 (2014)

def hextoint(x):
    return int(x,16)

def inttohex(x):
    return hex(x)[2:].upper()

def tobyte(x):
    return x&255

def Print(x):
    x = x.split(',')
 
    t = inttohex(memory[hextoint(x[0])])
    if len(x) > 1:
        for k in range(hextoint(x[0])+1, hextoint(x[1])+1):
            t = t + ' ' + inttohex(memory[k])
 
    print(t)
    return

def Move(x):
    global memory
 
    x = x.split(',')
 
    if (x[0][0] == '#'):
        d = hextoint(x[0][1:])
    elif (x[0][0] == '('):
        d = memory[memory[hextoint(x[0][1:-1])]]
    else:
        d = memory[hextoint(x[0])]
    
    if (x[1][0] == '('):
        memory[memory[hextoint(x[1][1:-1])]] = d
    else:
        memory[hextoint(x[1])] = d   
    return

def Add(x):
    global memory
    x = x.split(',')
    if (x[0][0] == '#'):
        A1 = hextoint(x[0][1:])
    else:
        A1 = memory[hextoint(x[0])]
    A2 = memory[hextoint(x[1])]
    memory[hextoint(x[1])] = tobyte(A2 + A1)
    return

def Sub(x):
    global memory
    x = x.split(',')
    if (x[0][0] == '#'):
        A1 = hextoint(x[0][1:])
    else:
        A1 = memory[hextoint(x[0])]
    A2 = memory[hextoint(x[1])]
    memory[hextoint(x[1])] = tobyte(tobyte(A2) + 256 - tobyte(A1))
    return
    
def And(x):
    global memory
    x = x.split(',')
    if (x[0][0] == '#'):
        A1 = hextoint(x[0][1:])
    else:
        A1 = memory[hextoint(x[0])]
    A2 = memory[hextoint(x[1])]
    memory[hextoint(x[1])] = tobyte(A2 & A1)
    return

def Or(x):
    global memory
    x = x.split(',')
    if (x[0][0] == '#'):
        A1 = hextoint(x[0][1:])
    else:
        A1 = memory[hextoint(x[0])]
    A2 = memory[hextoint(x[1])]
    memory[hextoint(x[1])] = tobyte(A2 | A1)
    return

def Xor(x):
    global memory
    x = x.split(',')
    if (x[0][0] == '#'):
        A1 = hextoint(x[0][1:])
    else:
        A1 = memory[hextoint(x[0])]
    A2 = memory[hextoint(x[1])]
    memory[hextoint(x[1])] = tobyte(A2 ^ A1)
    return
    
def Comp(x):
    global register
    x = x.split(',')
    if (x[0][0] == '#'):
        A1 = hextoint(x[0][1:])
    else:
        A1 = memory[hextoint(x[0])]
    A2 = memory[hextoint(x[1])]
    register = [A1 == A2, \
                A1 != A2, \
                A1 >  A2, \
                A1 <  A2, \
                A1 >= A2, \
                A1 <= A2]
    return

def Beq(x):
    if register[0]:
        jump(x)
    return
    
def Bne(x):
    if register[1]:
        jump(x)
    return

def Bgt(x):
    if register[2]:
        jump(x)
    return
    
def Blt(x):
    if register[3]:
        jump(x)
    return
    
def Bge(x):
    if register[4]:
        jump(x)
    return
    
def Ble(x):
    if register[5]:
        jump(x)
    return
    
def jump(x):
    global PC
    for k in range(len(program)):
        if program[k][0] == x:
            PC = k
    return
 
memory   = [0 for k in range(256)]
register = [False for k in range(6)]
program  = list()
PC   = 0 
ISA      = {'PRINT':Print, 'MOVE':Move, 'ADD':Add, \
            'SUB':Sub, 'AND':And, 'OR':Or, 'XOR':Xor, \
            'COMP':Comp, 'BEQ':Beq, 'BNE':Bne, \
            'BGT':Bgt, 'BLT':Blt, 'BGE':Bge, 'BLE':Ble}

input()

EOF = False; 
while not EOF:
    try:
        inputLine = input().split()
  
        if len(inputLine) == 2:
            line = list(' ')
            line.extend(inputLine)
        else:
            line = inputLine
        
        program.append(line) 
    except EOFError:
        EOF = True;

while PC < len(program):
    instruction = program[PC]
    PC += 1
    ISA[instruction[1]](instruction[2])  

 

Files


This drive folder contains:
  • The solution given in this article
  • Test cases
  • An alternative solution by Juan I Carrano

No comments:

Post a Comment