Monday, May 25, 2015

Assembly Simulator

Assembly Simulator

by Juan Pablo Vega

 

Problem Statement

 The Assembly Simulator is... actually an assembly emulator for an imaginary processor. A processor is an operation crunching unit consisting of an Arithmetic-Logic-Unit, a register to store comparisons, and has a n-byte addressable memory. An assembly language is low-level programming language designed for a specific processor and with a limited instruction set. In this problem, each instruction has a specific structure: 

[label] command operand[s]

The first element, [label], is an optional identifier for the specific line of the instruction in case of a program branch. The second element, command, is the instruction per se defining either an arithmetic or logical operation on specific operands. The third element, operand[s] is a set of operands with which the previous instruction must be executed.
The full set of instruction is given in the image below: 


The objective is to write a code that emulates a processor able to execute a short program based on the previous assembly language. The input is the (useless) size for the memory S and a list of ordered instructions I to execute. The output is the result of any PRINT command.

 

Hardware Architecture: Lists Everywhere


 Let's first define the architecture we want to emulate. As we already stated, there is a n-byte memory, a register and a program. And yes, lists are the easiest data structure in Python so we are going to use them to define our hardware. In addition to those three lists, we will add a variable index that points to the next instruction to be executed, called the program counter or PC.

memory   = [0 for k in range(256)]
register = [False for k in range(6)]
program  = list()
PC   = 0 
 
The elements of memory are the memory content, the elements of register are flags corresponding to different results of the comparison and the elements of program are the instructions of the program to be executed.

 

 

Fecth-Decode-Execute: A Short Loop


 Let's assume for a moment that we have loaded our program into program. We would have that each instruction is represented by a list of three elements: an optional label (replaced by a white space when not used), a string naming a specific command, and a string representing any valid set of operands.

 For instance:

loop PRINT A5,AF

 would be represented by

['loop', 'PRINT', 'A5,AF'].

Now, suppose that we define each and every command as a Python function that accepts a single list as argument. Said functions would be responsible to decode the content of the set of operands, based on the definition of the command they each represent, and then execute the command. We could then link each function to its name in string format with a hash-table to enable the command itself to be decoded.
As a result of the previous program structure, we would define the main loop a Fetch-Decode-Execute loop like this: 

while PC < len(program):
    instruction = program[PC]
    PC += 1
    ISA[instruction[1]](instruction[2])
 
where the first line establishes the loop, the second line corresponds to the Fetch stage, the third line is the increment of the PC, and the fourth line implements the Decode stage for the commands. ISA[i](j) are calls to command functions, which perform the Decode stage for the operands and then the Execute stage.

The hash-table ISA is defined below:
 
ISA = { 'PRINT' :Print, \
 'MOVE' :Move, \
 'ADD' :Add,  \
 'SUB' :Sub,  \
 'AND' :And,  \
 'OR' :Or,    \
 'XOR' :Xor,  \
 'COMP' :Comp,  \
 'BEQ' :Beq,  \
 'BNE' :Bne,   \
 'BGT' :Bgt,  \
 'BLT' :Blt,  \
 'BGE' :Bge,  \
 'BLE' :Ble  }

Instruction Set Architecture: Easy Parsing



 Before we travel through the meanders of each command function, we need to define two helper functions. We need to be able to going back and forth between string format and integer representation for hexadecimal values. Indeed, we need to decode those values from the operands, where they are written in string format, and use them either as constants or as memory addresses.

Python makes it easy to implement both functions as shown below:
def hextoint(x):
    return int(x,16)

def inttohex(x):
    return hex(x)[2:].upper()
 
We are ready to define the last big piece of the puzzle: the command functions. We will first define PRINT to introduce operands parsing and memory management, as well as the output to our program. Then, we'll move on to MOVE where we take operands parsing and memory management to another level. Finally, we'll define how COMP works and how branching instructions like BEQ operates. Extending The functions described below gives a full definition of the emulator.

 

Output Operation: PRINT


 Following the definition of the command PRINT, we first need to split the set of operands in at least one hexadecimal value. If there is only one operand, we only print the memory content at the location specified by the operand. If two operands are present, we print al memory content between the location specified by the first operand and the location specified by the second one, both included.

Notice the use of hextoint and inttohex and the way memory is addressed.

def Print(x):
    x = x.split(',')
 
    t = inttohex(memory[hextoint(x[0])])
    if len(x) > 1:
        for k in range(hextoint(x[0])+1, hextoint(x[1])+1):
            t = t + ' ' + inttohex(memory[k])
 
    print(t)
    return 

 

Sequential Operations: MOVE


 In PRINT, the operands where direct memory addresses. It is not the case for all commands. Following the definition of MOVE, we first need to split the set of operands in two hexadecimal values. We then classify each of them in one of three categories: constants, direct memory addresses and indirect memory addresses. And operand belongs to the first category if its first character is a # symbol, it belongs to the third category if it is delimited by ( and ), and it belongs to the second category when the value is let alone.
Notice the use of hextoint and inttohex and the way memory is addressed.

def Move(x):
    global memory
 
    x = x.split(',')
 
    if (x[0][0] == '#'):
        d = hextoint(x[0][1:])
    elif (x[0][0] == '('):
        d = memory[memory[hextoint(x[0][1:-1])]]
    else:
        d = memory[hextoint(x[0])]
    
    if (x[1][0] == '('):
        memory[memory[hextoint(x[1][1:-1])]] = d
    else:
        memory[hextoint(x[1])] = d   
    return

 

Branching Operations: COMP and BEQ


 The branching behaviour is split in three steps. First we compare two operands and store the result in our register. We then evaluate if a particular condition has occurred to enable the jump operation. Finally the PC is updated properly. 

The first part is similar to PRINT and MOCE and is defined in COMP. Following its definition, we first split the set of operands in two hexadecimal values. After proper parsing, we store the result of the comparison.
Notice the meaning of each element of register.

def Comp(x):
    global register
 
    x = x.split(',')
 
    if (x[0][0] == '#'):
        A1 = hextoint(x[0][1:])
    else:
        A1 = memory[hextoint(x[0])]
  
    A2 = memory[hextoint(x[1])]
 
    register = [A1 == A2, \
                A1 != A2, \
                A1 >  A2, \
                A1 <  A2, \
                A1 >= A2, \
                A1 <= A2]
    return 
 
The second part is as easy as it seems. It is a logical predication, followed by a condition function call. The index of register[i] that is tested changes depending of the condition, but jump is always the function called.
BEQ is one such predication, as shown below: 
def Beq(x):
    if register[0]:
        jump(x)
    return 
 
Given that the branching destination is always expressed as a label, we need a way to decode the operand from a specific label to a valid memory address. To do so we recall that the first element of each program line is the label. We then (arbitrarily) search for the last occurrence of the label given in the program, and use this line as the program counter update value.
Notice that this update is performed after the automatic update of the main loop.
def jump(x):
    global PC
 
    for k in range(len(program)):
        if program[k][0] == x:
            PC = k
    return

 

The Code


The complete code that solves the challenge is given below:
# Assembler Simulator, by Juan Pablo Vega
# IEEEXtreme 8.0 (2014)

def hextoint(x):
    return int(x,16)

def inttohex(x):
    return hex(x)[2:].upper()

def tobyte(x):
    return x&255

def Print(x):
    x = x.split(',')
 
    t = inttohex(memory[hextoint(x[0])])
    if len(x) > 1:
        for k in range(hextoint(x[0])+1, hextoint(x[1])+1):
            t = t + ' ' + inttohex(memory[k])
 
    print(t)
    return

def Move(x):
    global memory
 
    x = x.split(',')
 
    if (x[0][0] == '#'):
        d = hextoint(x[0][1:])
    elif (x[0][0] == '('):
        d = memory[memory[hextoint(x[0][1:-1])]]
    else:
        d = memory[hextoint(x[0])]
    
    if (x[1][0] == '('):
        memory[memory[hextoint(x[1][1:-1])]] = d
    else:
        memory[hextoint(x[1])] = d   
    return

def Add(x):
    global memory
    x = x.split(',')
    if (x[0][0] == '#'):
        A1 = hextoint(x[0][1:])
    else:
        A1 = memory[hextoint(x[0])]
    A2 = memory[hextoint(x[1])]
    memory[hextoint(x[1])] = tobyte(A2 + A1)
    return

def Sub(x):
    global memory
    x = x.split(',')
    if (x[0][0] == '#'):
        A1 = hextoint(x[0][1:])
    else:
        A1 = memory[hextoint(x[0])]
    A2 = memory[hextoint(x[1])]
    memory[hextoint(x[1])] = tobyte(tobyte(A2) + 256 - tobyte(A1))
    return
    
def And(x):
    global memory
    x = x.split(',')
    if (x[0][0] == '#'):
        A1 = hextoint(x[0][1:])
    else:
        A1 = memory[hextoint(x[0])]
    A2 = memory[hextoint(x[1])]
    memory[hextoint(x[1])] = tobyte(A2 & A1)
    return

def Or(x):
    global memory
    x = x.split(',')
    if (x[0][0] == '#'):
        A1 = hextoint(x[0][1:])
    else:
        A1 = memory[hextoint(x[0])]
    A2 = memory[hextoint(x[1])]
    memory[hextoint(x[1])] = tobyte(A2 | A1)
    return

def Xor(x):
    global memory
    x = x.split(',')
    if (x[0][0] == '#'):
        A1 = hextoint(x[0][1:])
    else:
        A1 = memory[hextoint(x[0])]
    A2 = memory[hextoint(x[1])]
    memory[hextoint(x[1])] = tobyte(A2 ^ A1)
    return
    
def Comp(x):
    global register
    x = x.split(',')
    if (x[0][0] == '#'):
        A1 = hextoint(x[0][1:])
    else:
        A1 = memory[hextoint(x[0])]
    A2 = memory[hextoint(x[1])]
    register = [A1 == A2, \
                A1 != A2, \
                A1 >  A2, \
                A1 <  A2, \
                A1 >= A2, \
                A1 <= A2]
    return

def Beq(x):
    if register[0]:
        jump(x)
    return
    
def Bne(x):
    if register[1]:
        jump(x)
    return

def Bgt(x):
    if register[2]:
        jump(x)
    return
    
def Blt(x):
    if register[3]:
        jump(x)
    return
    
def Bge(x):
    if register[4]:
        jump(x)
    return
    
def Ble(x):
    if register[5]:
        jump(x)
    return
    
def jump(x):
    global PC
    for k in range(len(program)):
        if program[k][0] == x:
            PC = k
    return
 
memory   = [0 for k in range(256)]
register = [False for k in range(6)]
program  = list()
PC   = 0 
ISA      = {'PRINT':Print, 'MOVE':Move, 'ADD':Add, \
            'SUB':Sub, 'AND':And, 'OR':Or, 'XOR':Xor, \
            'COMP':Comp, 'BEQ':Beq, 'BNE':Bne, \
            'BGT':Bgt, 'BLT':Blt, 'BGE':Bge, 'BLE':Ble}

input()

EOF = False; 
while not EOF:
    try:
        inputLine = input().split()
  
        if len(inputLine) == 2:
            line = list(' ')
            line.extend(inputLine)
        else:
            line = inputLine
        
        program.append(line) 
    except EOFError:
        EOF = True;

while PC < len(program):
    instruction = program[PC]
    PC += 1
    ISA[instruction[1]](instruction[2])  

 

Files


This drive folder contains:
  • The solution given in this article
  • Test cases
  • An alternative solution by Juan I Carrano

The Pipeline

The Pipeline

by Juan Pablo Vega

 

Problem Statement 


The Pipeline is searching problem. It is quite conventional. There is a grid map filled with cost values. There is a start and a goal. Nothing really complicated huh? Well, there are some particularities. First, the map is a square grid. Second, there is no one start nor one goal. The search can start from any row of the first column and finish at any row of the last column. Furthermore, one cannot move to the left, only up-down-right steps are allowed. Oh, did I mention that the map can get really huge and that you have limited time to find the optimal path? Fortunately, we are only asked for the cost of the optimal solution. 

 

 Think Twice Before You Start 


One might jump in and try to solve it with all the conventional tools one might have. BFS, DFS, A*, Dijkstra, and many more, are possible methods that could cross your mind. But implementing any of them would be in vain. Simply take a step back and think about the map. It has a known structure: it's a square. It has known dynamics: one can't go left. And most importantly, all the content (cost and constraints) are defined in a descriptive manner, and are not subject to any kind of variation in any way. 

 

Turn It Upside Down 

So, where do I start? From the beginning right? Wrong! Starting from there has many disadvantages. First, you don't know which row is the correct start point and you are going to pay the price of additional, useless computing. Second, and most important you are going to repeat many steps of your search. Ok, what if I start from some sort of optimal internal point? Well, for any internal point there will be a portion of the map that remains on the right, for which it will act as a starting point. And we are back to the disadvantages I mentioned above. Ok, ok, from the last column then. Exactly, but why? Well, if you start from the leaves, you might eliminated once and for all a dead branch. At every level (column) you choose to discard dead ending paths. And this process is incremental, meaning that your only use past pre-processed plus new non-processed information. 

 

 Compute Only Once: Dynamic Programming


That seems interesting right? It is called dynamic programming. It is a method for solving complex problems by first solving simpler sub-problems only once, and then combining all optimal sub-solutions to give the optimal global solution. For those who have never heard of it, stay put and appreciate the magic of it. 

 

 Let's Get Started


Let's assume that the grid map can be found in city and that cost is a copy of the grid map without any content. We will use cost as a structure containing the backward-incremental costs, i.e. the optimal sub-solutions. Let's cut the problem in half. The first sub-problem only permits us to move down-right, while the second, up-right. With these two new problems, the problem gets much easier. Finally, it's important to recognize that the last column of city already states the optimal sub-solutions. Indeed, for every column of the map we have reach the end, and there is no use of going up or down. Thus the last column of city can be copied to the last column of cost.

 

Solving down-right Upside Down


Let's say that we are at a given column j and that we have already found all sub-solutions for n-1, n, ..., j+1. This means that all the information we need to find the sub-solution at j can be found in the j+1th column of cost and in the jth column of city. What are the possible moves from the last row, n? Only right. We then update cost with this sub-solutions. When we analyse row n-1, we find that cost[n][j] and cost[n-1][j+1] are both optimal sub-solutions for the down-right problem. It is easy then to find the sub-solution for row n-1. Similarly, we can find the solutions for any row of column j up to the first.

 

Solving up-right Upside Down


We could follow the same procedure for this sub-problem. But we would soon find that there is a certain degree of overlap. Well, in down-right, we already considered the case where going down and going right were the optimal sub-solution. And the result was saved in cost. Now, in up-right we are considering going right again. It is of no use if the values of cost are shared among the sub-problems. That way, we should do as follows. From row 1 sequentially until row n-1 we would compare the possibility of going up against the optimal sub-solutions of the previous sub-problem.

 

From Right To Left


Repeating down-right and then up-right for every column, starting from column n-1 up until column 0 represents the process of combining optimal sub-solutions to fin the optimal global solution. The last step, once the first column have been processed, is to find the minimum accumulated cost, located somewhere in the first column of cost

 

 Sample Code


Here is a snippet of the search block of the program:
# The Pipeline, by Juan Pablo Vega | IEEEXtreme 8.0 (2014)
for j in range(n-2,-1,-1):
    for i in (range(n-1,-1,-1)):
        if i < (n-1):
            cost[i][j] = min(city[i][j]+cost[i][j+1], city[i][j]+cost[i+1][j])
        else:
            cost[i][j] = city[i][j]+cost[i][j+1]
    for i in range(n):
        if i > 0:
            cost[i][j] = min(cost[i][j], cost[i-1][j]+city[i][j])

 

Files


The solution is available at this drive folder.