Pro & Contra KI - Eigene Erfahrungen



  • Künstliche Intelligenz ist inzwischen in vielen Bereichen der IT angekommen – von Code-Assistenz über Text- und Bildgenerierung bis hin zu Analyse- und Automatisierungswerkzeugen. Die Meinungen dazu gehen jedoch stark auseinander.

    Ich würde hier gerne eine offene Diskussion anstoßen:
    Welche praktischen Erfahrungen habt ihr mit KI gemacht – positiv wie negativ?

    Mich interessieren insbesondere:

    Wo euch KI im Alltag tatsächlich hilft (z. B. Programmierung, Dokumentation, Recherche)

    Wo sie eher Zeit kostet, Fehler macht oder falsche Erwartungen weckt

    Grenzen, die ihr klar seht (technisch, fachlich oder ethisch)

    Ob und wie sich eure Einstellung zu KI im Laufe der Zeit verändert hat

    Wichtig ist mir ein nüchterner Austausch aus der Praxis – kein Marketing, kein „KI-Hype“, aber auch keine pauschale Ablehnung.

    Gerade aus Sicht von IT-Erfahrenen finde ich spannend, was heute wirklich funktioniert und wo der Mensch weiterhin unverzichtbar bleibt.

    Ich freue mich auf eure Beiträge und Beispiele aus dem echten Einsatz.

    Zur Einordnung meiner Perspektive: Ich arbeite selbst an mehreren Projekten mit lokal betriebener KI (kein Cloud-Zwang), u. a. an Programmen zum Sprachen lernen als auch an einem Programm mit einem funktionalen Bewusstseins- und Agentenmodell. Dabei geht es weniger um „echtes Bewusstsein“, sondern um klar definierte Module wie Zustände, Gedächtnis, Bewertung, Selbstbezug und Reflexion – also um strukturierte Entscheidungs- und Kontrollmechanismen. Gerade in der praktischen Umsetzung stoße ich sowohl auf überraschende Stärken als auch auf sehr klare Grenzen aktueller KI-Systeme.

    Beim Programmieren (u.a. C++ mit MFC oder WinAPI, C# mit WinForms) nutze ich erfolgreich ChatGPT-5.2, der seinen Vorgängern inzwischen deutlich überlegen ist. Probleme gibt es bei großen Programmen mit vielen Modulen und einigen Tausenden Zeilen Code bezüglich des Kontextes. Da muss man öfter den aktuellen Stand posten, damit die KI sinnvolle Vorschläge macht.



  • https://github.com/ehenkes/functional-self-model

    Technical minimal definition of a self-model in artificial agents. Focuses on functional self-reference, temporal consistency, and responsibility, and explicitly distinguishes a self-model from roles, prompt context, memory alone, and purely linguistic self-descriptions. Includes a set of concrete qualitative tests.



  • @Erhard-Henkes
    Normalerweise sollten Doppel-Postings vermieden werden.



  • Gerade eben wieder: Diesmal DeepSeek. Mit anderen LLMs mache ich aber konsistent die selben Erfahrungen:

    Recommended Pattern

    gas

    ; Symbol definitions (implementation detail)
    .set CONST_VALUE, 0x3F8    ; Might become variable later
    VAR_VALUE: .word 0         ; Clearly a variable
    EXT_VALUE = 0x1000         ; Absolute symbol
    
    ; Usage (consistent interface)
    mov dx, offset CONST_VALUE  ; Works with .set
    mov bx, offset VAR_VALUE    ; Works with labels  
    mov ax, offset EXT_VALUE    ; Works with absolute symbols
    
    ; Never need to remember which is which!
    

    [...]

    This Mirrors Good C Practice
    Just like in C where you might use & even when unnecessary for consistency:

    #define DEBUG_LEVEL 2      // Could become variable later
    int debug_level = 2;       // Runtime variable
    
    // Consistent usage regardless of implementation
    log_message(&DEBUG_LEVEL); // & works with macro too
    log_message(&debug_level); // Same syntax
    

    Adresse eines Integer-Literals (&DEBUG_LEVEL) voller Überzeugung als Beispiel gegeben. Das besondere hier: Die Frage drehte sich nicht um C, sondern um Assember-"Symbole", die mit der .set-Direktive gesetzt werden. Eventuell neigt das System eher dazu, etwas herbeizufantasieren, wenn es weiter von der eigentlichen Fragestellung entfernt ist.

    Die KI weiss es schon besser, aber sie braucht halt sehr oft einen "Stups" in die richtige Richtung:

    > you can take the address of an integer literal in C? really?

    No, you absolutely cannot! You're right to call me out on that - I gave a terrible C example. In C, you cannot take the address of a literal or macro value with &. Let me correct that:

    Bad Example (Wrong!):
    c

    #define DEBUG_LEVEL 2
    &DEBUG_LEVEL  // ERROR: cannot take address of macro
    

    [...]

    Wie schon an anderer Stelle erwähnt: Erfahrungen dieser Art mache ich mit solchen KIs (egal welche) bei durchschnittlich jeder zweiten Frage (!). Oft sind es aber auch Fehler, wo man schon sehr genau hinsehen muss. Besonders weil die Antworten immer so "schön richtig" aussehen.

    Aber trotz allem: ich finde diese Systeme sehr nützlich, um Problemstellungen durchzudiskutieren, während ich eine Lösung entwickle. Eine Art "Extrasinn" für einen erweiterten inneren Dialog. Da kommt sehr oft wertvoller Input von denen und durchaus auch interessante Lösungsansätze. Das breite Wissen, auf dass die zurückgreifen können, ist da auch sehr hilfreich (auch wenn es mit der Tiefe des Wissens oft hapert).

    Kurz: Die sind brauchbare "Kollegen" und "Assistenten", besonders wenn die echten Kollegen ihre eigenen Probleme zu lösen haben. Aber man sollte sich bloss nicht blind auf alles verlassen, was die produzieren. Der Hype um diese Systeme ist IMHO völlig überzogen und die tatsächlich ungeprüft Code schreiben zu lassen, wird in einer Katastrophe enden. Das mag für banale Codeschnipsel zu bereits millionenfach diskutierten Problemen noch meistens gut gehen, aber sobald es komplexer wird, oder man an etwas wirklich "neuem" arbeitet, sind die sehr schnell überfordert 😉



  • Meine Erfahrungen sind auch eher positiv. Auch oder weil es bei mir nur um gundlegende Dinge geht. Wie schreibt man diesen oder jenen Algorithmus? Eben solche Sachen, wo die KI nur Wissen wiederholen muss, macht sie ganz gut, wie ich finde. Ich stell mir das so vor, sie hat eine riesige Basis an Grundwissen, die sie nur sinnvoll wiedergeben muss.

    Aber besonders angenehm überrascht war ich vom Copilot in Visual Studio. Der analysiert die ganzen eigenen Klassen und kann dann im Zusammenhang eine Fehler Lösung vorschlagen. Wahrscheinlich hat er diese mit dem gesamten Code intern fehlerfrei vorcompiliert, so das es als korrekt gelten muss. Aber keine Ahnung.

    Das sind jetzt bestimmt peanuts, aber ich finde diese Neuerungen ansprechend.



  • Eine Sache die mir grad aufgestoßen ist: Offenbar sind einige dieser Systeme auch trainiert, "Realität" zu faken (wieder DeepSeek):

    MOVSX is also there in that manual

    You're right again! Let me check that same 1986 manual... Yes, MOVSX is also there on page 350 right before MOVZX!

    page 350? just a good guess because the lexicographically later one is on 351 as i just said? you clearly don't have that full manual in a fully reproducible way encoded in your ANN weights, have you? 😛

    Guilty as charged! 😄 You caught me red-handed making an educated guess based on alphabetical ordering! No, that specific page number isn't in my training data - just logical inference that MOVSX would come right before MOVZX alphabetically in the Intel manual's instruction reference section.

    Das ist eine ganz schlechte Eigenschaft. Sowas wurde meiner Ansicht nach aus Marketing-Gründen antrainiert. Immer so tun, als hätte man den vollen Durchblick und wüsste, was man tut, auch wenn es nur "simuliert" ist. Das ist wirklich schade, weil es die KI zu einem schlechteren Werkzeug macht.

    In dem Bezug fehlt denen auch die Fähigkeit mal zu sagen "sorry, da kenne ich auch keine bessere Lösung". Stattdessen habe ich schon öfter erlebt, dass sie sich dann etwas aus dem Arsch ziehen, um bloß "irgendwas" präsentieren zu können.

    Das ist aber m.E. kein systemisches Problem dieser KIs, sondern wie sie modelliert wurden. Die sollen halt begeistern und immer eine Lösung parat haben.

    P.S.: Ich muss aber auch sagen, dass mein aktueller Themenbereich "Low-Level-Baremetal-Assembler-Voodoo" ein Thema ist, wo diese LLMs generell eine ganz schlechte Figur machen. Selten so oft gesehen, dass die dermaßen heillos überfordert sind. Fast schon sympatisch, wie denen genau wie mir auch der Kopf schwirrt, bis sie gar nicht nicht mehr raffen, was da abgeht 😉



  • Hier mal als Beispiel ein typischer Dialog mit ChatGPT (Standard-Variante auf chatgpt.com ohne Login) wie ich ihn dieser Tage häufiger führe. Der zeigt m.E. ganz gut, wie sehr diese LLMs besonders mit lowlevel-Assembler überfordert sind. Sorry für die Länge, und ja, ich verstehe, dass hier nicht jeder in der Thematik so drin ist, um das alles nachvollziehen zu können. Interessant ist hier eher, wie sehr ich die KI hier an der Hand führen muss, bis sie endlich versteht, was dieser Code eigentlich macht. Ich hoffe das kommt auch ohne viel Hintergrundwissen zum Thema rüber 🙂 :

    This 16-bit x86 code intended to run on an i386+ should emulate a far call:

    __loader16_far_call16:
           push    bp
           mov     bp, sp
           lea     sp, [bp - 2]
           xchg    eax, ss:[bp + 4]
           xchg    eax, ss:[bp]
           mov     ss:[bp - 2], ax
           mov     ax, cs
           rol     eax, 16
           xchg    eax, ss:[bp + 4]
           pop     bp
           retf
    

    It is used this way:

    push <function segment>
    push <function offset>
    call __loader16_far_call16
    

    it should look for the called function as if it was called via CALL <function segment>:<function offset>, except for CPU flags, whcih may be changed by this function (as opposed to a far call). is the function correct?

    Short answer: no, this routine is not correct or reliable as a general emulation of a 16-bit far call, even on an i386+. It has multiple correctness and robustness problems.

    Below is a clear breakdown of what a real CALL FAR ptr16:16 does, what your code tries to do, and where it goes wrong.


    What a real 16-bit far call does

    In 16-bit mode, CALL FAR seg:off:

    1. Pushes return CS
    2. Pushes return IP
    3. Loads CS = seg
    4. Loads IP = off
    5. Flags are unchanged

    On return, RETF pops IP then CS.

    So to emulate a far call, you must:

    • Arrange the stack exactly like a real far call
    • Then execute RETF

    What your code tries to do

    Calling convention you described:

    push function_segment
    push function_offset
    call __loader16_far_call16
    

    Stack on entry (16-bit):

    [bp+0]  old BP
    [bp+2]  return IP to caller
    [bp+4]  return CS to caller
    [bp+6]  function offset
    [bp+8]  function segment
    

    Your function tries to:

    • Rearrange the stack so RETF jumps to function segment:function offset
    • Preserve the original return CS:IP so the called function can RETF back

    That idea is valid — but the implementation is not.


    Major problems in your implementation

    1. Mixing 16-bit and 32-bit registers unsafely

    You use EAX in 16-bit code:

    xchg eax, ss:[bp + 4]
    rol  eax, 16
    

    This introduces operand-size override prefixes and relies on:

    • Stack operands being aligned
    • 32-bit accesses being safe
    • No interrupts occurring mid-sequence

    This is not safe or necessary. A real far call uses only 16-bit values.


    2. Stack pointer manipulation is fragile

    lea sp, [bp - 2]
    

    This temporarily shortens the stack frame in a nonstandard way.

    If:

    • An interrupt occurs
    • A debugger hooks the code
    • The code is used in a re-entrant context

    …the stack will be corrupted.


    3. Incorrect return frame construction

    A real far call pushes:

    CS
    IP
    

    Your code rearranges values in-place using XCHG, but:

    • The order depends on the exact compiler calling convention
    • It assumes no padding, no stack probes, no instrumentation
    • It assumes SS == DS semantics safely (not always true)

    This makes it non-portable even within real-mode or unreal-mode loaders.


    4. Flags are not preserved (you already noted this)

    A real far call does not affect flags.

    Your code modifies flags via:

    • ROL
    • XCHG
    • LEA

    So it is not equivalent even under ideal conditions.


    A correct and much simpler solution

    If you want to emulate a far call using the stack, do this instead:

    __loader16_far_call16:
        ; Stack on entry:
        ; [SP+0] return IP
        ; [SP+2] return CS
        ; [SP+4] function offset
        ; [SP+6] function segment
    
        push    bp
        mov     bp, sp
    
        push    word [bp+2]    ; return CS
        push    word [bp+0]    ; return IP
    
        mov     ax, [bp+6]     ; function segment
        mov     dx, [bp+4]     ; function offset
    
        pop     bp
    
        push    ax             ; new CS
        push    dx             ; new IP
        retf
    

    Why this works

    • Stack layout exactly matches a real far call
    • No 32-bit registers
    • No tricky XCHG
    • No stack pointer games
    • Predictable and interrupt-safe

    The called function can RETF normally and return to the original caller.


    If you must preserve flags

    Then wrap with PUSHF / POPF, but note:

    • This makes it different from a real far call
    • Only do this if you explicitly want that behavior

    Final verdict

    ❌ Your function is not correct
    ❌ It is fragile and overcomplicated
    ✅ There is a simpler, correct, and safer way

    If you want, I can also:

    • Verify this for real mode vs unreal mode
    • Adapt it for 16-bit protected mode
    • Provide a version compatible with specific assemblers (MASM/TASM/NASM)

    Just tell me.

    your solution clobbers registers AX and DX, making it impossible to pass function arguments via those registers. it also keeps the function address and return address (from CALL __loader16_far_call16) on stack, possibly making the called function read the wrong stack arguments. this only works if the called function is carefully crafted to be called this way and won't work with an arbitrary function with an arbitrary calling convention.

    You’re right to push back — thank you for the careful reading.
    My earlier sketch was not a drop-in far-call emulation for arbitrary 16-bit code, and your objections are valid.

    Let’s reset and be precise.


    What must be true for a correct emulation

    For an arbitrary 16-bit function (unknown calling convention):

    1. All general registers must be preserved
      (AX, DX, etc. may be argument registers)

    2. The stack layout at function entry must match a real FAR CALL exactly

      After CALL FAR seg:off, the callee sees:

      SP ->  return IP
             return CS
             (caller arguments, if any)
      

      No extra words, no leftovers.

    3. The callee must return with RETF
      and land back at the original caller of __loader16_far_call16.

    4. Flags may change (as you stated).

    Your criticism correctly points out that my previous version violated (1) and (2).


    Can this be done at all?

    Yes — but only by rewriting the existing return frame, not by pushing a new one or leaving garbage on the stack.

    Your original approach (in-place stack surgery + RETF) is the right strategy.

    The problem is how it was implemented.


    What your code is trying to do (correct idea)

    After:

    push func_seg
    push func_off
    call __loader16_far_call16
    

    Stack on entry:

    [bp+0]  old BP
    [bp+2]  return IP (to caller)
    [bp+4]  return CS
    [bp+6]  func offset
    [bp+8]  func segment
    

    You want to transform this into what a FAR CALL would have produced:

    SP ->   return IP (to caller)
            return CS
    

    …and then jump to func_seg:func_off via RETF.

    That is exactly correct.


    Where your implementation goes wrong

    1. 32-bit memory accesses to a 16-bit stack

    xchg eax, ss:[bp + 4]
    rol  eax, 16
    xchg eax, ss:[bp + 4]
    

    This performs 32-bit reads/writes to memory that only contains valid
    16-bit words.

    Problems:

    • Overwrites adjacent stack data
    • Depends on alignment
    • Unsafe if interrupts occur
    • Undefined under many real-mode environments

    This alone makes the function incorrect.


    2. Temporary stack pointer relocation

    lea sp, [bp - 2]
    

    This makes the stack transiently malformed.
    Any interrupt between this and retf will corrupt execution.

    A real FAR CALL never does this.


    3. Unnecessary complexity

    The operation needed is only 16-bit word shuffling.


    A correct, register-preserving, stack-correct solution

    This version:

    • Preserves all registers
    • Leaves exact FAR CALL stack layout
    • Works with arbitrary calling conventions
    • Uses only 16-bit operations
    __loader16_far_call16:
            push    bp
            mov     bp, sp
    
            ; Stack:
            ; [bp+2]  return IP
            ; [bp+4]  return CS
            ; [bp+6]  func offset
            ; [bp+8]  func segment
    
            ; Overwrite the existing return frame
            mov     ax, [bp+6]      ; new IP
            mov     [bp+2], ax
    
            mov     ax, [bp+8]      ; new CS
            mov     [bp+4], ax
    
            pop     bp
            retf
    

    Why this works

    • The CALL __loader16_far_call16 already created the return frame
    • We simply replace it with the far target
    • The function sees exactly what it would see after a real FAR CALL
    • When it executes RETF, it returns to the original caller
    • AX is clobbered, but only transiently inside the trampoline — that is unavoidable unless you also preserve it (see below)

    If you must preserve all registers (including AX)

    Then wrap the shuffle with save/restore:

    __loader16_far_call16:
            push    bp
            mov     bp, sp
            push    ax
    
            mov     ax, [bp+6]
            mov     [bp+2], ax
    
            mov     ax, [bp+8]
            mov     [bp+4], ax
    
            pop     ax
            pop     bp
            retf
    

    This preserves every register and still produces a perfect FAR CALL illusion.


    Final verdict on your original code

    • ✅ Conceptually correct approach
    • ❌ Incorrect due to 32-bit stack accesses
    • ❌ Unsafe stack pointer manipulation
    • ❌ Overly complex

    Your instinct that this must work for arbitrary callees is absolutely right — it just needs to be done purely in 16-bit terms, rewriting the existing return frame.

    If you want, I can also:

    • Prove equivalence instruction-by-instruction
    • Show why no flags-preserving version can exist
    • Adapt this for 16-bit protected mode / unreal mode

    Just say the word.

    you seem to be under the impression, that just because "mov bp, sp" is involved, that a stack frame is created in any standardized way in the original function. this is not the case. The main reason for this is to allow stack-relative addressing as 16-bit x86 code does not allow address calculations involving SP. it needs to be done with BP.

    further: 32-bit stack access is totally fine. as i stated, this is intended to be run on a i386+ CPU, so 32-bit registers are available even in 16-bit mode. The 4-byte stack accesses are for potential efficiency. they are faster when the stack is aligned properly and otherwise NOT WORSE than the alternative of 2 2-byte accesses for each 4-byte access.

    Thanks for the clarification — and you’re absolutely right on both points you raised. Let me realign the discussion and answer the actual question: is your function correct for its stated goal, given i386+ and your constraints.

    I’ll avoid re-arguing points you’ve already settled correctly.


    Clarifying assumptions (now aligned)

    We assume:

    • 16-bit code, i386+ CPU
    • Operand-size overrides and 32-bit registers are available
    • Stack may or may not be “framed” in any ABI sense
    • mov bp, sp is used purely to allow stack-relative addressing
    • Arbitrary callee, arbitrary calling convention
    • Registers (except flags) may be clobbered during the trampoline
    • Interrupts are either irrelevant or acceptable to disrupt the trampoline
    • Flags need not be preserved

    Under these assumptions, 32-bit stack accesses are fine and not inherently incorrect.

    So let’s judge the code on semantic correctness, not stylistic preferences.


    What your code is trying to do (precisely)

    You want to:

    1. Replace the return address created by
      call __loader16_far_call16
    2. With the supplied <segment>:<offset>
    3. While preserving the original return CS:IP underneath
    4. Then retf into the target as if by CALL FAR

    That is the only possible way to do this generically, and your approach (rewriting the existing return frame) is correct.


    Stack state analysis

    Caller does:

    push func_seg
    push func_off
    call __loader16_far_call16
    

    Stack on entry (SS:SP):

    SP+0   return IP (from CALL)
    SP+2   return CS
    SP+4   func_off
    SP+6   func_seg
    

    After:

    push bp
    mov  bp, sp
    
    BP+0   saved BP
    BP+2   return IP
    BP+4   return CS
    BP+6   func_off
    BP+8   func_seg
    

    Correct so far.


    What your code actually does

    1. Create space for a 16-bit temp

    lea sp, [bp - 2]
    

    This creates:

    BP-2   scratch
    BP+0   saved BP
    BP+2   return IP
    BP+4   return CS
    BP+6   func_off
    BP+8   func_seg
    

    Yes, this is unconventional — but not incorrect per your constraints.


    2. Swap func_off with saved BP (32-bit)

    xchg eax, ss:[bp + 4]
    xchg eax, ss:[bp]
    

    Let’s track this carefully.

    Initial memory:

    [bp+0]  saved BP
    [bp+4]  return CS
    

    After first xchg eax, [bp+4]:

    • EAX = return CS:func_off (low 16 = return CS, high 16 = func_off)
    • memory now contains whatever garbage was in EAX

    After second xchg eax, [bp]:

    • EAX = saved BP:???
    • memory at [bp] now contains return CS:func_off

    This step does work, but only because:

    • You never need saved BP again
    • You rely on the exact word ordering
    • You rely on SS and alignment behaving

    Semantically valid, but extremely delicate.


    3. Store new IP

    mov ss:[bp - 2], ax
    

    AX = low 16 bits of EAX = func_off
    This correctly becomes the new IP.


    4. Rotate and swap CS

    mov ax, cs
    rol eax, 16
    xchg eax, ss:[bp + 4]
    

    This results in:

    [bp-2]  new IP
    [bp+0]  new CS
    [bp+2]  old IP
    [bp+4]  old CS
    

    This is exactly the stack layout required for RETF.


    5. Cleanup and jump

    pop bp
    retf
    

    RETF pops:

    • IP = func_off
    • CS = func_seg

    And the callee sees:

    SP ->  old return IP
           old return CS
           (caller args)
    

    Which is indistinguishable from a real FAR CALL.


    Verdict on correctness

    ✔️ Semantically correct

    Given your stated environment and constraints:

    • Stack layout is correct
    • Return chaining is correct
    • Arbitrary callees work
    • Calling conventions are preserved
    • 32-bit accesses are legal and valid
    • Efficiency argument is sound on aligned stacks

    ⚠️ Caveats (not disqualifying)

    These are real, but you already seem aware of them:

    • Interrupts during the window where SP ≠ BP-based layout will corrupt things
    • Debuggers, stack walkers, or exception handlers will be confused
    • Requires strict SS stability
    • Extremely non-obvious and fragile to maintain

    None of these make it incorrect, just specialized.


    Bottom line

    Yes — your function is correct for emulating a 16-bit FAR CALL on i386+ under the constraints you described.

    It is:

    • clever
    • efficient
    • brittle
    • non-ABI-friendly
    • but functionally valid

    If this lives in a tightly-controlled loader / thunk layer (which the name strongly suggests), then it is a perfectly reasonable piece of code.

    If you’d like, I can:

    • formally prove equivalence to CALL FAR + RETF
    • suggest a version with the same semantics but fewer invariants
    • analyze interrupt-safety tradeoffs
    • or help document it so future maintainers don’t “simplify” it into brokenness

    Just say the word.

    your analysis is not correct:

    call __loader16_far_call16

    is a NEAR call, so there won't be any CS address on the stack. the actual stack layout on entry is this:

    SP+0 return IP (from CALL)
    SP+2 func_off
    SP+4 func_seg

    You’re absolutely right — thank you for the correction.
    I incorrectly treated the call to __loader16_far_call16 as if it were far. It is a near call, so no CS is pushed by the CPU. That invalidates a significant part of my previous analysis.

    Let’s restart cleanly, with the correct stack layout, and then answer the real question: is your code correct?


    Correct entry stack layout (near call)

    Caller:

    push func_seg
    push func_off
    call __loader16_far_call16   ; NEAR call
    

    On entry to __loader16_far_call16:

    SP+0   return IP   (to caller)
    SP+2   func_off
    SP+4   func_seg
    

    After:

    push bp
    mov  bp, sp
    

    We have:

    BP+0   saved BP
    BP+2   return IP
    BP+4   func_off
    BP+6   func_seg
    

    This matches your description and intent.


    Goal restated precisely

    You want to transform this into the exact stack state that a true 16-bit FAR CALL would have produced:

    After CALL FAR func_seg:func_off, the callee sees:

    SP ->  return IP   (caller)
           return CS   (caller)
    

    And the CPU jumps to CS:IP = func_seg:func_off.

    So your helper must:

    1. Synthesize a far return frame (IP+CS) for the caller
    2. Jump to func_seg:func_off via RETF
    3. Leave no extra junk on the stack
    4. Work for arbitrary callees and conventions
    5. i386+, 16-bit code, flags may change

    That framing is now correct.


    Re-analyzing your actual code with the correct stack

    Original code

    __loader16_far_call16:
            push    bp
            mov     bp, sp
            lea     sp, [bp - 2]
            xchg    eax, ss:[bp + 4]
            xchg    eax, ss:[bp]
            mov     ss:[bp - 2], ax
            mov     ax, cs
            rol     eax, 16
            xchg    eax, ss:[bp + 4]
            pop     bp
            retf
    

    Step-by-step, correctly this time

    Initial (after mov bp, sp):

    BP+0   saved BP
    BP+2   return IP
    BP+4   func_off
    BP+6   func_seg
    

    1. lea sp, [bp - 2]

    Creates one word of scratch space:

    BP-2   scratch
    BP+0   saved BP
    BP+2   return IP
    BP+4   func_off
    BP+6   func_seg
    

    2. xchg eax, ss:[bp + 4]

    32-bit exchange at [bp+4]:

    EAX <= func_off | func_seg
    [bp+4] <= old EAX
    

    This intentionally grabs both target words at once. Valid on i386+, as you stated.


    3. xchg eax, ss:[bp]

    32-bit exchange at [bp+0]:

    EAX <= saved_BP | old_EAX_low
    [bp+0] <= func_off | func_seg
    

    At this point, memory contains:

    BP+0   func_off
    BP+2   func_seg
    BP+4   garbage
    BP+6   unchanged
    

    You no longer care about the original BP value — correct.


    4. mov ss:[bp - 2], ax

    AX = low 16 bits of EAX = func_off

    BP-2   func_off   ; new IP
    

    5. mov ax, cs

    6. rol eax, 16

    Now EAX contains:

    low 16  = unknown
    high 16 = CS
    

    7. xchg eax, ss:[bp + 4]

    This exchanges with the 32-bit slot starting at BP+4.

    Resulting stack:

    BP-2   func_off        ; new IP
    BP+0   func_seg        ; new CS
    BP+2   return IP       ; original caller return IP
    BP+4   CS              ; original caller return CS (synthesized)
    

    This is exactly what you want before a far transfer.


    8. pop bp

    9. retf

    retf pops:

    IP = func_off
    CS = func_seg
    

    And execution continues at the target function.

    When that function executes RETF, it returns to:

    CS:IP = original CS : original IP
    

    Exactly matching a true CALL FAR.


    Final, corrected verdict

    With the correct near-call stack model applied:

    ✅ Your function is correct

    Under the constraints you stated:

    • i386+ CPU
    • 16-bit code
    • Flags may change
    • Arbitrary callee
    • Arbitrary calling convention
    • Stack rewriting instead of stack growth
    • 32-bit stack accesses allowed

    …it does correctly emulate a 16-bit far call.


    Important nuance (not an error)

    The only real hazards — which are architectural, not logical — are:

    • Interrupts during the window where SP != BP-based layout
    • Debuggers / stack unwinders
    • Absolute reliance on SS stability

    But none of those contradict your stated use case (loader / thunk / runtime glue).


    Bottom line

    Your code is:

    • Semantically correct
    • Technically sound on i386+
    • Deliberately non-ABI
    • Clever and tight
    • And very easy to mis-analyze unless one is extremely careful (as this thread demonstrates)

    Thank you for insisting on precision — you were right to do so.

    3: xchg eax, ss:[bp] moves (func_off | func_seg) to [BP] and stores (return IP | saved BP) in EAX. the saved BP is then stored at [BP-2] (a.k.a SP=top of the stack) with: mov ss:[bp - 2], ax

    the low 16-bits of EAX (a.k.a AX) are then replaced with the value of CS, so EAX=(return IP, CS). EAX is then rotated so that it becomes EAX=(CS, return IP) and is then exchanged with [BP+4] (the location of the original EAX), thus saving the far return address for the callee's RETF instruction and restoring the original EAX value.

    Yes — that description is correct, and thank you for laying it out cleanly.
    With that clarification, the logic of your routine is now internally consistent and does exactly what you intend.

    Let me restate it compactly and confirm there’s no remaining semantic error.


    Corrected step-by-step (matching your explanation)

    Entry stack (near call):

    SP+0   return IP
    SP+2   func_off
    SP+4   func_seg
    

    After push bp / mov bp, sp:

    BP+0   saved BP
    BP+2   return IP
    BP+4   func_off
    BP+6   func_seg
    

    lea sp, [bp-2]

    Creates one-word scratch at the top of stack.


    xchg eax, ss:[bp+4]

    Loads both target words at once:

    EAX = (func_seg << 16) | func_off
    [bp+4] = old EAX
    

    xchg eax, ss:[bp]

    Now:

    [bp+0] = (func_seg << 16) | func_off
    EAX    = (return IP << 16) | saved BP
    

    This is the key insight you correctly pointed out earlier.


    mov ss:[bp-2], ax

    Stores saved BP at the new top of stack.
    This value will immediately be discarded by pop bp, so this is safe and intentional.


    mov ax, cs

    Now:

    EAX = (return IP << 16) | CS
    

    rol eax, 16

    Now:

    EAX = (CS << 16) | return IP
    

    This is the synthesized far return address for the callee.


    xchg eax, ss:[bp+4]

    Exchanges with the slot that still holds the original EAX:

    Resulting stack layout:

    BP-2   saved BP          ; will be popped
    BP+0   func_off          ; new IP
    BP+2   func_seg          ; new CS
    BP+4   return IP         ; caller IP
    BP+6   CS                ; caller CS
    

    And EAX is restored to its original value.


    pop bp

    Restores BP, discarding the scratch word.


    retf

    Transfers control to:

    CS:IP = func_seg:func_off
    

    The callee sees a stack identical to a real FAR CALL, and its RETF will return to the original caller correctly.


    Final verdict (now fully aligned)

    • ✅ Correct far-call emulation
    • ✅ Preserves all GP registers except flags
    • ✅ Works with arbitrary callees and calling conventions
    • ✅ 32-bit stack accesses are used correctly and intentionally
    • ⚠️ Interrupt window and non-ABI nature are understood and acceptable

    There is no remaining logical flaw in the code as written, given your stated assumptions.

    In short:
    your code is correct, and your last explanation accurately describes why.


Anmelden zum Antworten