Reverse engineering von eigenem Code, fehler beim Übersetzen?

raufaser

Hallo,
ich habe spaßeshalber mal meinen eigenen mit objdump Code geREt (hab kein Adjektiv). Dabei bin ich auf etwas verwirrendes (fehlerhaftes?) gestoßen.

Also erstmal mein Code(wollte opCodes analysieren):

mov eax,ebx
mov eax,ecx
mov eax,edx

beim objdump -D kam raus:
401000: 8b c3 mov %ebx,%eax
401002: 8b c1 mov %ecx,%eax
401004: 8b c2 mov %edx,%eax

Dabei zwei Sachen.
Erstens was bedeutene die Prozentzeichen?

Zweitens warum wurden Quell -und Zieloperand vertauscht, ich hatte ja zb
"mov eax,ebx" geschrieben was ja im Prinzip bedeutet: Inhalt von ebx in eax.
Der Übersetzter (oder Disassembler?) macht daraus aber Inhalt von eax in ebx
?

grüße

raufaser

edit: benutzte masm32 / OS: Windows 7 x64

Athar

Das ist schon in Ordnung so. Das eine ist die Intel-Syntax, das andere die AT&T-Syntax.
Quell- und Zieloperand werden da in der Tat anders herum angegeben.

nachtfeuer

Athar schrieb:

Das ist schon in Ordnung so. Das eine ist die Intel-Syntax, das andere die AT&T-Syntax.
Quell- und Zieloperand werden da in der Tat anders herum angegeben.

Trotzdem komisch: an welcher Stelle kommt denn hier genau die AT&T Syntax ins Spiel?

Außerdem stimmt nicht, was da oben steht, die Opcodes stehen für mov ax,cx
mov ax,dx und mov ax,bx und nicht für overwrites zwischen erweiterten Registern

FrEEzE2046

nachtfeuer schrieb:

Trotzdem komisch: an welcher Stelle kommt denn hier genau die AT&T Syntax ins Spiel?

Wie wann kommt die AT%T-Syntax ins Spiel? Objdump zeigt dir den Assembler Code eben nicht in Intel- sondern AT&T-Syntax an. MIt welcher du programmiert hast, lässt sich an den binarys schlecht erkennen.

AT&T:

mov %ebx, %eax

Intel:

mov eax, ebx

Unterm Strich ist beides gleich.

Wie sieht dein vollständiger Code eigentlich aus? Binarys lassen sich mit masm32 doch gar nicht erzeugen.

nachtfeuer

FrEEzE2046 schrieb:

[
Wie wann kommt die AT%T-Syntax ins Spiel? Objdump zeigt dir den Assembler Code eben nicht in Intel- sondern AT&T-Syntax an. Mit welcher du programmiert hast, lässt sich an den binarys schlecht erkennen.

habs selbst rausgefunden:
objdump ist ein unix/linux-tool. Ganz offenbar ist der Import von 64bit Linuxwerkzeugen für 64bit Windows vonnöten.

nachtfeuer

..und dann hab ich noch herausgefunden, das objdump standardmäßig (Cygwin) AT&T Sytax ausgibt, man aber auf Intel Syntax-Ausgabe umschalten kann.

Für Windows kann man auch den object file converter (objconv) von Agner: http://www.agner.org/optimize/ nehmen.

raufaser

FrEEzE2046 schrieb:

nachtfeuer schrieb:

Trotzdem komisch: an welcher Stelle kommt denn hier genau die AT&T Syntax ins Spiel?

Wie wann kommt die AT%T-Syntax ins Spiel? Objdump zeigt dir den Assembler Code eben nicht in Intel- sondern AT&T-Syntax an. MIt welcher du programmiert hast, lässt sich an den binarys schlecht erkennen.

AT&T:
mov %ebx, %eax
Intel:
mov eax, ebx
Unterm Strich ist beides gleich.

Wie sieht dein vollständiger Code eigentlich aus? Binarys lassen sich mit masm32 doch gar nicht erzeugen.

Also mein vollst.Code sieht so aus:

;opcode.asm

.386
.model flat, stdcall
includelib C:\masm32\lib\kernel32.lib
;includelib C:\masm32\lib\user32.lib

;MessageBoxA proto :dword, :dword, :dword, :dword
ExitProcess proto :dword
.data
.code
start:
mov eax,ebx
mov eax,ecx
mov eax,edx
nop
mov ebx,eax
mov ebx,ecx
mov ebx,edx
nop
mov ecx,eax
mov ecx,ebx
mov ecx,edx
nop
mov edx,eax
mov edx,ebx
mov edx,ecx
nop
invoke ExitProcess,0
end start

über Sinn/Unsinn lässt sich Streiten. So, dann geb ich in die cmd ein:

ml /c /coff opcode.asm
link /Subsystem:Windows opcode
und heraus kommt dann die Datei Opcode.exe, die sich auch ausführen lässt, hat nen 512 Byte großen Dateikopf. Bei der Section <.text> fängt es dann mit 8bc3 an.

Eigentlich interessieren mich nur die Opcodes, weil ich gern wüsste, wie die erstellt/berechnet werden. Leider bin ich da noch nicht wirklich schlau geworden...ich weiss jetzt nur, dass 8b für mov steht...aber wie die Register(oder generell die Operanden der Mnenomics) codiert werden ist mir immernoch schleierhaft^^ (vllt hat da einer nen Link/oder Literaturhinweise...hab nix richtiges gefunden)

nachtfeuer schrieb:

..und dann hab ich noch herausgefunden, das objdump standardmäßig (Cygwin) AT&T Sytax ausgibt, man aber auf Intel Syntax-Ausgabe umschalten kann.

Für Windows kann man auch den object file converter (objconv) von Agner: http://www.agner.org/optimize/ nehmen.

wie kann man den Umschalten, hab bei objdump --help nichts gefunden.
Den objconverter schau ich mir mal an. Vielen Dank.

lg
raufaser

Wenn es dir um die Codierung von Befehlen geht, gibt es nichts(!) Besseres als die Dokumentation von Intel (oder AMD):
Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 2A: Instruction Set Reference, A-M (Capter 2)
Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 2B: Instruction Set Reference, N-Z (Appendix

Im übrigen kannst du dir das disassemblieren meistens sparen, in dem du dir von masm ein listing erstellen lässt:

/FlList.txt /Sn

(-> List.txt)
der code dazu:

.nolist ; includes sollen nicht gelistet werden
include masm32rt.inc ;-)
.data
.code
start:
.listall
mov eax,ebx
mov eax,ecx
mov eax,edx
nop
mov ebx,eax
mov ebx,ecx
mov ebx,edx
nop
mov ecx,eax
mov ecx,ebx
mov ecx,edx
nop
mov edx,eax
mov edx,ebx
mov edx,ecx
nop
.nolist
invoke ExitProcess,0
end start

Ansonsten noch sehr empfehlenswert: OllyDbg v1.10 (nicht 2.0)

nachtfeuer

erstmal zu objdump:

nachtfeuer@cygwin ~
$ objdump --help

Usage: objdump <option(s)> <file(s)>
 Display information from object <file(s)>.
 At least one of the following switches must be given:
  -a, --archive-headers    Display archive header information
  -f, --file-headers       Display the contents of the overall file header
  -p, --private-headers    Display object format specific file header contents
  -h, --[section-]headers  Display the contents of the section headers
  -x, --all-headers        Display the contents of all headers
  -d, --disassemble        Display assembler contents of executable sections
  -D, --disassemble-all    Display assembler contents of all sections
  -S, --source             Intermix source code with disassembly
  -s, --full-contents      Display the full contents of all sections requested
  -g, --debugging          Display debug information in object file
  -e, --debugging-tags     Display debug information using ctags style
  -G, --stabs              Display (in raw form) any STABS info in the file
  -W[lLiaprmfFsoR] or
  --dwarf[=rawline,=decodedline,=info,=abbrev,=pubnames,=aranges,=macro,=frames,
=str,=loc,=Ranges]
                           Display DWARF info in the file
  -t, --syms               Display the contents of the symbol table(s)
  -T, --dynamic-syms       Display the contents of the dynamic symbol table
  -r, --reloc              Display the relocation entries in the file
  -R, --dynamic-reloc      Display the dynamic relocation entries in the file
  @<file>                  Read options from <file>
  -v, --version            Display this program's version number
  -i, --info               List object formats and architectures supported
  -H, --help               Display this information

 The following switches are optional:
  -b, --target=BFDNAME           Specify the target object format as BFDNAME
  -m, --architecture=MACHINE     Specify the target architecture as MACHINE
  -j, --section=NAME             Only display information for section NAME
  -M, --disassembler-options=OPT Pass text OPT on to the disassembler
  -EB --endian=big               Assume big endian format when disassembling
  -EL --endian=little            Assume little endian format when disassembling
      --file-start-context       Include context from start of file (with -S)
  -I, --include=DIR              Add DIR to search list for source files
  -l, --line-numbers             Include line numbers and filenames in output
  -F, --file-offsets             Include file offsets when displaying informatio
n
  -C, --demangle[=STYLE]         Decode mangled/processed symbol names
                                  The STYLE, if specified, can be `auto', `gnu',

                                  `lucid', `arm', `hp', `edg', `gnu-v3', `java'
                                  or `gnat'
  -w, --wide                     Format output for more than 80 columns
  -z, --disassemble-zeroes       Do not skip blocks of zeroes when disassembling

      --start-address=ADDR       Only process data whose address is >= ADDR
      --stop-address=ADDR        Only process data whose address is <= ADDR
      --prefix-addresses         Print complete address alongside disassembly
      --[no-]show-raw-insn       Display hex alongside symbolic disassembly
      --adjust-vma=OFFSET        Add OFFSET to all displayed section addresses
      --special-syms             Include special symbols in symbol dumps
      --prefix=PREFIX            Add PREFIX to absolute paths for -S
      --prefix-strip=LEVEL       Strip initial directory names for -S

objdump: supported targets: pe-i386 pei-i386 elf32-i386 elf32-little elf32-big s
rec symbolsrec verilog tekhex binary ihex
objdump: supported architectures: i386 i386:x86-64 i8086 i386:intel i386:x86-64:
intel

The following i386/x86-64 specific disassembler options are supported for use
with the -M switch (multiple options should be separated by commas):
  x86-64      Disassemble in 64bit mode
  i386        Disassemble in 32bit mode
  i8086       Disassemble in 16bit mode
  att         Display instruction in AT&T syntax
  intel       Display instruction in Intel syntax
  att-mnemonic
              Display instruction in AT&T mnemonic
  intel-mnemonic
              Display instruction in Intel mnemonic
  addr64      Assume 64bit address size
  addr32      Assume 32bit address size
  addr16      Assume 16bit address size
  data32      Assume 32bit data size
  data16      Assume 16bit data size
  suffix      Always display instruction suffix in AT&T syntax
Report bugs to <http://www.sourceware.org/bugzilla/>.

nachtfeuer@cygwin ~
$

Ganz unten steht die Ausgabeinfo

Eine einfache Opcodeeinsicht geht mit Fasm und Hexeditor ganz gut:

(Fasmw)

org 100h
mov eax,ebx
mov eax,ecx 
mov eax,edx 
nop 
mov ebx,eax 
mov ebx,ecx 
mov ebx,edx 
nop 
mov ecx,eax 
mov ecx,ebx 
mov ecx,edx 
nop 
mov edx,eax 
mov edx,ebx 
mov edx,ecx 
nop

Das ganze wird compiliert und dann im Hexeditor bestaunt:

00000000 6689 D866 89C8 6689 D090 6689 C366 89CB f..f..f...f..f..
00000010 6689 D390 6689 C166 89D9 6689 D190 6689 f...f..f..f...f.
00000020 C266 89DA 6689 CA90                     .f..f...

Um die Funktionsweise einzelnener Befehle schnell zu überprüfen, eignet sich das Windowsinterne programm debug ganz gut. debug kann aber nur 16bit interpretieren.
bis 32bit kann das debug clone programm debugx von Paul Vojta befehle testen und zeigen:
http://math.berkeley.edu/~vojta/

Die Logik der Opcodes ist in den Intel-Manuals beschrieben. Wer kein Englisch kann oder nicht so gut ist, kann sich mit dem Buch "Assembler Ge-Packt" von Joachim Rhode helfen.

raufaser

Gilt das auch für AMD ? (Hab Phenom II x4 965)

Das objdump was ich benutzt habe ist halt beim Codeblocks gcc dabeigewesen (Also der MingW-Compiler für Windows). Das scheint ein bisschen anders zu sein, weil er nur ein paar Switches kennt/zulässt.
Hab auch Immunity Debugger, aber damit muss ich mich erst noch bisschen beschäftigen.

Ansonsten danke für die Tips.

lg
raufaser

raufaser schrieb:

Gilt das auch für AMD ? (Hab Phenom II x4 965)

Der Basisbefehlssatz ist bei AMD und Intel gleich. Unterschiede gibt es nur bei den SIMD-Erweiterungen und den Systembefehlen.
Developer Guides & Manuals

FrEEzE2046

Ich hab diesen Code:

TITLE opcodae.asm

INCLUDE \masm32\include\masm32rt.inc

.CODE

start:
	call main
	inkey
	exit

main PROC
	mov eax,ebx
	mov eax,ecx
	mov eax,edx
	nop
	mov ebx,eax
	mov ebx,ecx
	mov ebx,edx
	nop
	mov ecx,eax
	mov ecx,ebx
	mov ecx,edx
	nop
	mov edx,eax
	mov edx,ebx
	mov edx,ecx
	nop

	ret
main ENDP

END start

folgendermaßen kompiliert:

ml /c /coff /Flopcode.txt /Sn

Exzerpt dieses Listings:

00000025			main PROC
 00000025  8B C3			mov eax,ebx
 00000027  8B C1			mov eax,ecx
 00000029  8B C2			mov eax,edx
 0000002B  90				nop
 0000002C  8B D8			mov ebx,eax
 0000002E  8B D9			mov ebx,ecx
 00000030  8B DA			mov ebx,edx
 00000032  90				nop
 00000033  8B C8			mov ecx,eax
 00000035  8B CB			mov ecx,ebx
 00000037  8B CA			mov ecx,edx
 00000039  90				nop
 0000003A  8B D0			mov edx,eax
 0000003C  8B D3			mov edx,ebx
 0000003E  8B D1			mov edx,ecx
 00000040  90				nop

 00000041  C3				ret
 00000042			main ENDP

Wie bereits erwähnt wurde, hättest du das auch ohne Assembliertest aus der Instruction Reference von Intel erfahren:
Opcode | Instruction | Description
8B / r | MOV r32, r/m32 | Move r/m32 to r32

Also, wenn du 32 Bit von einer Speicherstelle oder einem Register in ein Register schreibst, dann beginnt der Opcode mit 8B gefolgt von r.

masm schrieb:

Der Basisbefehlssatz ist bei AMD und Intel gleich.

Worüber wir auch sehr froh sind!