Run Me - IEEE Xplore problem
The Problem
This problem is hard. Very hard. In fact, we couldn't solve it during the competition, but we didn't give up and finally we were able to solve it... one day after the competition ended (tough luck).
Problem statement
You are given a hex dump of a MS-DOS program, that is 8086 machine code. Of course, you are not told what does the program do. You must then submit your own program that does exactly the same as the 8086 code. To make things harder, whatever the 8086 program does, it does it inefficiently. Your submission must run within the alloted time.
Here is the machine code:
BF 00 04 BE C0 00 56 31 C9 B4 00 CD 16 3C 2E AA E0 F7 F7 D1 29 D2 89 CD 5B 53 FE 07 75 03 43 EB F9 BF 00 02 89 F9 89 F8 F3 AA 89 FE AC 89 C3 FE 07 80 FB 2E 75 F6 FE 0F 5E 56 89 E9 AC 89 C3 FE 0F 7C D5 E2 F7 42 5E 56 89 E9 F3 A6 75 CA 5D 92 D4 0A E8 00 00 86 C4 04 30 CD 29 C3
Pretty scaring, isn't it?
We are told that this programs accepts a string of characters terminated with a dot ("."). We are not told anything about the output, except it is terminated with a newline.
The disassembly
Before we try and solve this problem, it would be nice to have a disassembly of the above code:
;====================================== ; IEEE Xtreme 8.0 (2014) ; IEEE Xplore Run-me Problem ;====================================== code segment assume cs:code,ds:code,es:nothing,ss:code org 100h ip label near start: MOV DI,0400h MOV SI,00C0h PUSH SI XOR CX,CX L003: MOV AH,00h INT 16h CMP AL,2Eh STOSB LOOPNZ L003 NOT CX SUB DX,DX MOV BP,CX L005: POP BX PUSH BX L001: INC BYTE PTR [BX] JNZ L002 INC BX JMP L001 L002: MOV DI,0200h MOV CX,DI MOV AX,DI DS:REPZ STOSB MOV SI,DI L004: LODSB MOV BX,AX INC BYTE PTR [BX] CMP BL,2Eh JNZ L004 DEC BYTE PTR [BX] POP SI PUSH SI MOV CX,BP L006: LODSB MOV BX,AX DEC BYTE PTR [BX] JL L005 LOOP L006 INC DX POP SI PUSH SI MOV CX,BP DS:REPZ CMPSB JNZ L005 POP BP XCHG DX,AX AAM CALL L007 L007: XCHG AL,AH ADD AL,30h INT 29h RET end_code label near code ends end ip
You can download it here: Run-me.asm
Solution
The Tedious Way
After hours of reading over and over the same x86 assembly file we came out with this monolithic and self-explanatory asm files.
The first one, called "first-step.asm" is a line-by-line commented assembly file with an introduction for non assembly users:
;====================================== ; IEEE Xtreme 8.0 (2014) ; IEEE Xplore Run-me Problem ;====================================== code segment assume cs:code,ds:code,es:nothing,ss:code org 100h ip label near ; The code above is only there because its a *.com ; file (program starts at 0x0100) ; This comments are intended as a firs approach to ; understanding this code. ; A "C Style" pseudo-code will be used to explain ; at the end of each line. ; Initial considerations: ; ; CX: ; is a 16-bit register ; In this program, CX is used mainly as a counter ; ; AX: ; It's a 16-bit register used for operations. ; In this program, AX is used to read/store ; strings to from/to memory, read input from ; keyboard and print on screen data. ; ; AH is the most significant Byte of AX ; AL is the least significant Byte of AX ; ; BX: ; It's a 16-bit register used for operations. ; In this program, BX is used as a pointer ; ; BH is the most significant Byte of BX ; BL is the least significant Byte of BX ; ; DI and SI: ; DI and SI are used as pointers for writing/ ; reading strings from memory. ; SI is the Source. ; DI is the Destiny. ; ; Flags ; ZF is the Zero flag. It's turned on when ; an operation of compare, decremet, ; increment or any other operation that ; sets flags returns 0. ; ; Stack ; TOS (Top Of Stack) is the top of stack :) ; ; BP is a register used as a variable. start: MOV DI,0400h ; DI = 0x0400; MOV SI,00C0h ; SI = 0x00C0; PUSH SI ; TOS = SI; (TOS=0x00C0) XOR CX,CX ; CX ^= CX; (this sets CX=0) L003: MOV AH,00h ; AH = 0x00; INT 16h ; AL = GETCHAR(); CMP AL,2Eh ; ZF = ((AL-0x2E)==0); STOSB ; *(DI++) = AL; LOOPNZ L003 ; CX--; IF((CX!=0)&&(ZL==FALSE)) GOTO L003; NOT CX ; CX = ~CX; SUB DX,DX ; DX -= DX (sets DX=0 and ZF=1) MOV BP,CX ; BP = CX; L005: POP BX ; BX = TOS; PUSH BX ; TOS = BX; L001: INC BYTE PTR [BX] ; (*BX)++; (if (*BX) is initially 0xFF, this operation will turn ZF=1) JNZ L002 ; IF(ZF==FALSE) GOTO L002; INC BX ; BX++; JMP L001 ; GOTO L001; L002: MOV DI,0200h ; DI = 0x0200; MOV CX,DI ; CX = DI; (CX = 0x0200) MOV AX,DI ; AX = DI; (AX = 0x0200) DS:REP ; REP runs the next string operation in a loop STOSB ; DO{ CX--; *(DI++)=AL; }WHILE((CX!=0)) MOV SI,DI ; SI = DI; L004: LODSB ; AL = *(SI++); MOV BX,AX ; BX = AX; INC BYTE PTR [BX] ; *(BX)++; CMP BL,2Eh ; ZF=((BL-0x2E)==0); JNZ L004 ; IF(ZF==FALSE) GOTO L004; DEC BYTE PTR [BX] ; *(BX)--; POP SI ; SI = TOS; PUSH SI ; TOS = SI; MOV CX,BP ; CX = BP; L006: LODSB ; AL = *(SI++); MOV BX,AX ; BX = AX; DEC BYTE PTR [BX] ; *(BX)--; JL L005 ; JL means Jump if Less (in this case I'll just ; use: IF(*(BX)<0) GOTO L005; ) LOOP L006 ; CX--; if(CX!=0) GOTO L006; INC DX ; DX++; POP SI ; SI = TOS; PUSH SI ; TOS = SI; MOV CX,BP ; CX = BP; DS:REPZ ; REPZ runs the next string operation in a loop CMPSB ; DO{ CX--; ZF=((*(SI++)-*(DI++))==0); }WHILE((CX!=0)&&(ZF==TRUE)) JNZ L005 ; IF(ZF==FALSE) GOTO L005; POP BP ; BP = TOS; XCHG DX,AX ; SWAP(DX, AX); AAM ; AH = AL/10; AL = AL%10; CALL L007 ; L007(); L007: XCHG AL,AH ; SWAP(AL, AH); ADD AL,30h ; AL+=0x30; INT 29h ; PUTCHAR(AL); RET ; RETURN; end_code label near code ends end ipYou can download it here: first-step.asm
The second file, called "second-step.asm" goes one step further and groups the assembly instructions into easy-to-understand blocks:
;====================================== ; IEEE Xtreme 8.0 (2014) ; IEEE Xplore Run-me Problem ; Second Step - Identifying code groups ;====================================== ; The second step to solve this problem is finding code groups ; and identifying it's purpose. ;------------------------------------------------------------------------- ; Observations: ; The program uses POP and PUSH one after each other. This means that ; the TOS (Top Of Stack) is used as a variable for holding the value ; 0x00C0. ; For example: (lines 34 & 35) ; Initially, the TOS value is 0x00C0. ; ; POP SI ; SI = TOS; ; This means that SI = 0x00C0; and the TOS will now be the next ; value on the stack; ; ; PUSH SI ; TOS = SI; ; The stack is re-filled with the value 0x00C0. In the end, TOS ; remained unchanged. ; ; In the end, the stack remained unchanged and SI = 0x00C0 ;------------------------------------------------------------------------- ;------------------------------------------------------------------------- ; String Input: ; If we look carefully, the first 10 assembly lines are intended to ; receive a string from keyboard with a terminator character '.' ;------------------------------------------------------------------------- start: MOV DI,0400h ; DI = 0x0400; ; The destiny pointer is 0x0400 and the string ; will be stored there. MOV SI,00C0h ; SI = 0x00C0; PUSH SI ; TOS = SI; (TOS=0x00C0) ; This is not used for string storage. Its used ; to initialize TOS as 0x00C0 (see Observations above). XOR CX,CX ; CX ^= CX; (this sets CX=0) L003: MOV AH,00h ; AH = 0x00; INT 16h ; AL = GETCHAR(); CMP AL,2Eh ; ZF = ((AL-0x2E)==0); ; 0x2E is the ASCII value for character '.' STOSB ; *(DI++) = AL; ; Store AL value on 0x0400 initially and then on the ; next position and so. ; ZF is not modified by this instruction LOOPNZ L003 ; CX--; IF((CX!=0)&&(ZF==FALSE)) GOTO L003; ; ; CX--; is not an instruction. it's a representation ; of what LOOPNZ does and it does NOT modify ZF. ; ; first comparison (CX!=0): ; CX started as 0. first it's decremented and then ; it's compared to be different than 0. ; The first comparison made is 0xFFFF!=0. ; This means that a maximum of 0xFFFF characters ; can be taken as input before breaking the loop ; due to the (CX!=0) comparison. (this will never happen :)) ; ; Second comparison (ZF==FALSE): ; As mentioned before, the instruction STOSB does NOT ; modify the ZF (Zero Flag) value. ; This means that this comparison can be replaced by the ; instruction CMP AL,2Eh. ; So: ((AL-0x2E)==0)==FALSE => ((AL==0x2E))==FALSE ; => AL!=0x2E (or AL!='.') ; ; ; NOT CX ; CX = ~CX; ; CX started as 0x0000. Then, CX was decremented on each loop ; until the input was '.' ; In the end, CX = 0x0000 - Number of inputs ; so CX is MINUS (the number of characters on the string plus one ; (because of the terminator character)); ; ; On signed binary arithmetic, to turn a number to it's negative ; value, 2's complement must be applied. This can be done by ; applying a NOT to every bit and then adding one. ; We can say that: ; -A = NOT(A) + 1; ; or in this case: ; NOT(A) = -A - 1; ; ; So when we do CX = ~CX, whe are doing: ; CX = NOT(CX) = -CX - 1; ; Then: ; CX = -( -(number_of_characters_on_string + 1) ) -1 ; CX = number_of_characters_on_string +1 -1 ; CX = number_of_characters_on_string; ; This code group can be re-written as : ; ; char* p2memory = 0x0400; ; uint16 n_characters = 0; ; do{ ; AL = getchar(); ; *(p2memory++) = AL; ; n_characters++; ; }while(AL!='.'); ; n_characters--; ; ;------------------------------------------------------------------------- ; Reverse "Big Num" Counter: ; The lines below are used to create a "Big Num" counter. ; The counter goes from left to right and can grow undefined. ; Like this: ;=============================================================== ; 0x00C0 0x00C1 0x00C2 0x00C3 0x00C4 0x00C5 0x00C6 ... ;=============================================================== ; 0x00 0x00 0x00 0x00 0x00 0x00 0x00 ... ; 0x01 0x00 0x00 0x00 0x00 0x00 0x00 ... ; 0x02 0x00 0x00 0x00 0x00 0x00 0x00 ... ; . . . . . . . ; . . . . . . . ; . . . . . . . ; 0xFF 0x00 0x00 0x00 0x00 0x00 0x00 ... ; 0x00 0x01 0x00 0x00 0x00 0x00 0x00 ... ; . . . . . . . ; . . . . . . . ; . . . . . . . ; 0xFF 0xFF 0x00 0x00 0x00 0x00 0x00 ... ; 0x00 0x00 0x01 0x00 0x00 0x00 0x00 ... ; . . . . . . . ; . . . . . . . ; . . . . . . . ; 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF ... ;------------------------------------------------------------------------- SUB DX,DX ; DX -= DX (sets DX=0 and ZF=1) MOV BP,CX ; BP = CX; BP is only used to store the number of chars ; obtained in the "String Input" section. L005: POP BX ; PUSH BX ; BX = 0x00C0 (see Observations) L001: INC BYTE PTR [BX] ; (*BX)++; (if (*BX) is initially 0xFF, ; this operation will turn ZF=1) JNZ L002 ; IF(ZF==FALSE) GOTO L002; INC BX ; BX++; JMP L001 ; GOTO L001; ; ; This is a simple counter: ; - Increment *(BX) (which is 0x00C0 initially) ; - If (ZF==FALSE) continue with the rest of the code ; (didn't went from 0xFF to 0x00) ; - If (ZF==TRUE) increment next memory position. ; and loop. ; ; This code group can be re-written as : ; char *BX = 0x00C0; ; while((*BX)++ == 0xFF) ; BX++; ; ;------------------------------------------------------------------------- ; Memset: ; The lines below are used to clean the memory positions from ; 0x0200 to 0x03FF. Like a memset ( 0x0200, 0, 0x1FF ); ;+=========================================================+ ;| 0x0200 0x0201 0x0202 0x0203 0x0204 ... 0x03FF | ;+=========================================================+ ;| 0x00 0x00 0x00 0x00 0x00 0x00 0x00 | ;+---------------------------------------------------------+ ;------------------------------------------------------------------------- L002: MOV DI,0200h ; DI = 0x0200; Set the destiny pointer to 0x0200 MOV CX,DI ; CX = DI; (CX = 0x0200) count = 200 MOV AX,DI ; AX = DI; (AX = 0x0200) AL will be stored so ; this means that it will memset 0x00. ; ; The 0x02 value assigned to AH is very important ; and it will be used along the code. (keep that in mind) ; DS:REP ; REP runs the next string operation in a loop STOSB ; DO{ CX--; *(DI++)=AL; }WHILE((CX!=0)) ; ; This code group is REALLY simple so no explanation is needed. ; This code group can be re-written as : ; AH = 0x02; //(VERY IMPORTANT FOR THE REST OF THE CODE) ; memset(0x0200, 0, 0x1FF); ;------------------------------------------------------------------------- ; Character counter: ; The lines below are used to "count" the number of occurrences of ; each character in the string. ; The memory section from addresses 0x0200 to 0x03FF are used as ; variables to storage the number of times each possible character is ; present in the input string. (with the exception of character '.') ; For example: ; The position 0x0230 holds the number of times the character '0' ; (ASCII value 0x30) appears in the input string. ;------------------------------------------------------------------------- MOV SI,DI ; SI = DI; ; DI was 0x0400 from last code group and it ; was pointing to the first character of the ; string. L004: LODSB ; AL = *(SI++); ; load a character of the string. ; AL now holds the ASCII value of a character MOV BX,AX ; BX = AX; ; If you kept in mind that AH contains 0x02, you'll ; see that BX is pointing to 0x02AL being AL the ; ASCII value of a character from the string. INC BYTE PTR [BX] ; *(BX)++; ; Increment the value of the memory position 0X02AL ; (incrementing the counter for the character AL) CMP BL,2Eh ; ZF=((BL-0x2E)==0); ; BL holds the ASCII value of a character from the ; string. This comparison is checking if we reached ; the string terminator 0x2E ('.'). JNZ L004 ; IF(ZF==FALSE) GOTO L004; ; If not(end of string) loop. ; ; The following line is executed after the end of string ; is reached. Notice that after that, BX will be pointing ; to 0x022E (counter for the '.' character). ; Take into consideration that the code BX is incremented ; BEFORE the comparison to '.' so the memory position ; 0x022E will have a 1 (showing the one and only occurrence) ; of the character '.' in the string. ; Knowing all this, the next line simply erases the counter ; for the character '.' DEC BYTE PTR [BX] ; *(BX)--; ; This code group can be re-written as : ; for (int i=0 ; i<n_characters ; i++) ; *(0x0200+i)++; ;------------------------------------------------------------------------- ; Permutation Finder: ; This code section checks if the counter bytes are a permutation of the ; input string. ;------------------------------------------------------------------------- POP SI ; PUSH SI ; SI = 0x00C0 (see Observations) MOV CX,BP ; CX = BP; Load n_characters L006: LODSB ; AL = *(SI++); Load a counter byte MOV BX,AX ; BX = AX; BX = 0x02AL with AL = current counter byte DEC BYTE PTR [BX] ; *(BX)--; Decrement memory location 0x02AL ; Next line will evaluate the result of this instruction. ; ; It's necessary to remember that the memory locations ; with address 0x02CC is either 0x00 if CC is NOT a character ; present in the string; or 0xXX being XX the number of times ; the character CC is indeed present in the string. ; ; There are two possible outcomes for the next conditional ; jump: ; ; - The selected "big number" counter byte is a value in the ; string => the memory position we just decremented had the ; number of times a character appears in the string => the ; decrement operation did not trigger a "Less" flag. ; ; - The selected "big number" counter byte is NOT a value in ; the string => the memory position we just decremented had ; 0x00 before DEC operation => the decrement operation did ; trigger a "Less" flag. ; ; Take into consideration that L005 is the begin of the loop, ; just before the "big number" counter. ; JL L005 ; JL means Jump L005 if Less ; If selected "big num" counter byte is not in the string, start again ; If selected "big num" counter byte IS in the string: ; Decrement CX(CX has the number of characters to find), ; if CX !=0 (didn't compare to every character) keep comparing (L006) LOOP L006 ; CX--; if(CX!=0) GOTO L006; ; This point is reached if we already compared every character and ; each comparison was true (each "big num" counter byte is a character ; in the string). Notice that the bytes of the counter must NOT be in ; order so "big num" counter is a permutation of the input string. ; INC DX ; DX++; DX is the number of permutations found. ; ; This code group can be re-written as: ; if ("big num" counter) in permutations(input string): DX++ ; ;------------------------------------------------------------------------- ; "Big Num" counter limit: ; This code group checks the value of the counter located at 0x00C0 ; and leaves the main loop if that condition is true. ; Note: The end condition is a permutation of the input string, so ; evaluating the end condition after the permutation is okay :) ;------------------------------------------------------------------------- POP SI ; PUSH SI ; SI = 0x00C0; (See Observations) MOV CX,BP ; CX = n_characters; DS:REPZ ; REPZ runs the next string operation in a loop CMPSB ; DO{ CX--; ZF=((*(SI++)-*(DI++))==0); }WHILE((CX!=0)&&(ZF==TRUE)) ; This two lines above compare each byte of the "big num" counter ; with each character of the input string. ; ; keep looping until "big num" counter is equal to the string. JNZ L005 ; IF(ZF==FALSE) GOTO L005; ; ; ; This code group can be re-written as: ; if ("big num" counter) != (input string) ; goto L005; ; ;------------------------------------------------------------------------- ; Print Output: ; This code prints the contents of DX on the screen ;------------------------------------------------------------------------- POP BP ; BP = 0x00C0; Why??? ; XCHG DX,AX ; SWAP(DX, AX); AX = DX (load permutation counter on AX) AAM ; AH = AL/10; AL = AL%10; AAM converts AX to a printable form ; If DX = NM then ; AH = N ; and ; AL = M ; for example: ; DX = 24 -> AH = 2; AL = 4; CALL L007 ; L007(); This is a well known trick. Call a function at ; L007, and that function's code is right beneath. So that ; code will be executed twice. ; FIRST TIME CALLED: XCHG AL,AH ; SWAP(AL, AH); INT 29h prints the content of AL. ; this means we are displaying the first digit ; (which was originally in AH) ADD AL,30h ; AL+=0x30; Convert it to ASCII by adding '0' INT 29h ; PUTCHAR(AL); print it RET ; RETURN; ; SECOND TIME: L007: XCHG AL,AH ; SWAP(AL, AH); display de second digit ADD AL,30h ; AL+=0x30; INT 29h ; PUTCHAR(AL); RET ; RETURN; ; END end_code label near code ends end ip
Download it here: second-step.asm
At this point you should have an idea of what this program is doing. If you don't then you haven't spent enough hours with these files. Just kidding, if you don't understand yet what this program does please read on to the following section.