suche schnelle little-to-bigendian Conversion

Erhard Henkes

BSWAP r32 http://faydoc.tripod.com/cpu/bswap.htm

Erhard Henkes schrieb:

BSWAP r32
http://faydoc.tripod.com/cpu/bswap.htm

ist das für'n 8051? ich dachte der kann nur swap auf'm akku (4-bit swap)

c_newbie

ob es nun schneller ist kann ich nicht sagen allerdings werden doch bei einem
shift die neuen stellen mit null gefüllt, daher sollte sowas ausreichen.

#define SWAP32(x) (                         \
                    ((x) << 24)             \
                   |(((x) & 0xff00) << 8)   \
                   |(((x) & 0xff0000) >> 8) \
                   |((x) >> 24)             \
                  )

pointercrash**

Die hier tut und das recht flott:

void SwapOrder32(int *in32)
{
	char c;
	char *swap32;

	swap32 = (char*) in32;

	c = swap32[0];
	swap32[0] = swap32[3];
	swap32[3] = c;

	c = swap32[1];
	swap32[1] = swap32[2];
	swap32[2] = c;
}

Ich hatte bisher noch keinen Prozessor, wo der char != 8 Bit war, bis dahin spare ich mir die Mantelung mit defines.

c_newbie

sparst dir noch ne variable

void SwapOrder32(int *in32){

    char *swap32 = (char*) in32;

    if(swap32[0] != swap32[3]){
        swap32[0] ^= swap32[3];
        swap32[3] ^= swap32[0];
        swap32[0] ^= swap32[3];
    }

    if(swap32[1] != swap32[2]){
        swap32[1] ^= swap32[2];
        swap32[2] ^= swap32[1];
        swap32[1] ^= swap32[2];
    }
}

pointercrash**

Achjaaa, den ^^^^^ hatte ich schon fast vergessen ... tricky
Wobei "schnell" dann eher prozessorabhängig wird ...

Also das if-Gedöns würde ich nicht machen. Ansonsten würde ich mir im Zweifel den Assembler-Output anschauen, ob nicht vielleicht doch die Variante mit temporärer "Variable" besser ist. Lesbarer ist sie allemal

c_newbie

Also mein Fazit sieht so aus:

Der gute alte Xor-Swap macht in diesem Fall keinen Sinn.
Für nen kleinen memory footprint am besten die Lösung von pointercrash()
Für absolute Performance der Vorschlag von Gregor_
Mein erster Post mit dem Vorschlag 2 unds wegzulassen brachte keine veränderung.

@Gregor_

8048420:       55                      push   %ebp
 8048421:       89 e5                   mov    %esp,%ebp
 8048423:       53                      push   %ebx
 8048424:       8b 5d 08                mov    0x8(%ebp),%ebx
 8048427:       8b 0b                   mov    (%ebx),%ecx
 8048429:       89 ca                   mov    %ecx,%edx
 804842b:       89 c8                   mov    %ecx,%eax
 804842d:       81 e2 00 ff 00 00       and    $0xff00,%edx
 8048433:       25 00 00 ff 00          and    $0xff0000,%eax
 8048438:       c1 f8 08                sar    $0x8,%eax
 804843b:       c1 e2 08                shl    $0x8,%edx
 804843e:       09 c2                   or     %eax,%edx
 8048440:       89 c8                   mov    %ecx,%eax
 8048442:       c1 e0 18                shl    $0x18,%eax
 8048445:       09 c2                   or     %eax,%edx
 8048447:       c1 e9 18                shr    $0x18,%ecx
 804844a:       09 ca                   or     %ecx,%edx
 804844c:       89 13                   mov    %edx,(%ebx)
 804844e:       5b                      pop    %ebx
 804844f:       5d                      pop    %ebp
 8048450:       c3                      ret

@pointercrash()

8048460:       55                      push   %ebp
 8048461:       89 e5                   mov    %esp,%ebp
 8048463:       8b 45 08                mov    0x8(%ebp),%eax
 8048466:       0f b6 08                movzbl (%eax),%ecx
 8048469:       0f b6 50 03             movzbl 0x3(%eax),%edx
 804846d:       88 48 03                mov    %cl,0x3(%eax)
 8048470:       0f b6 48 01             movzbl 0x1(%eax),%ecx
 8048474:       88 10                   mov    %dl,(%eax)
 8048476:       0f b6 50 02             movzbl 0x2(%eax),%edx
 804847a:       88 48 02                mov    %cl,0x2(%eax)
 804847d:       88 50 01                mov    %dl,0x1(%eax)
 8048480:       5d                      pop    %ebp
 8048481:       c3                      ret

@xor-swap

8048490:       55                      push   %ebp
 8048491:       89 e5                   mov    %esp,%ebp
 8048493:       8b 55 08                mov    0x8(%ebp),%edx
 8048496:       53                      push   %ebx
 8048497:       0f b6 0a                movzbl (%edx),%ecx
 804849a:       0f b6 42 03             movzbl 0x3(%edx),%eax
 804849e:       38 c1                   cmp    %al,%cl
 80484a0:       74 0c                   je     80484ae <swap32_3+0x1e>
 80484a2:       31 c8                   xor    %ecx,%eax
 80484a4:       88 02                   mov    %al,(%edx)
 80484a6:       32 42 03                xor    0x3(%edx),%al
 80484a9:       30 02                   xor    %al,(%edx)
 80484ab:       88 42 03                mov    %al,0x3(%edx)
 80484ae:       0f b6 4a 01             movzbl 0x1(%edx),%ecx
 80484b2:       8d 5a 01                lea    0x1(%edx),%ebx
 80484b5:       0f b6 42 02             movzbl 0x2(%edx),%eax
 80484b9:       38 c1                   cmp    %al,%cl
 80484bb:       74 0d                   je     80484ca <swap32_3+0x3a>
 80484bd:       31 c8                   xor    %ecx,%eax
 80484bf:       88 42 01                mov    %al,0x1(%edx)
 80484c2:       32 42 02                xor    0x2(%edx),%al
 80484c5:       88 42 02                mov    %al,0x2(%edx)
 80484c8:       30 03                   xor    %al,(%ebx)
 80484ca:       5b                      pop    %ebx
 80484cb:       5d                      pop    %ebp
 80484cc:       c3                      ret

Nobuo T

c_newbie: Dir ist aber klar, dass es hier um die Performance des Codes auf einem 8Bit µC geht? Das kannst du nicht mit dem Intel-Monster vergleichen.
Zudem sind die Codes in Sachen Optimierung IMHO auch so alle Muell. Siehe zB. Erhards Anmerkung.

pointercrash**

Nobuo T schrieb:

c_newbie: Dir ist aber klar, dass es hier um die Performance des Codes auf einem 8Bit µC geht?

Ich hab' das jetzt aufm 16- Bitter gebencht, optimal wär' ein Compiler für'n 8- Bitter, aber da hab' ich grad nix offen.
- Gregors Macro schlägt zur Funktion umgemodelt mit 148 Zyklen zu.
- c_newbies Xor- Swap braucht immer noch 142 Zyklen.
- mein Zeug braucht 70 Zyklen.
Dürfte aufm 8- Bitter noch dramatischer sein.

Nobuo T schrieb:

Das kannst du nicht mit dem Intel-Monster vergleichen.

Der 8051 ist auch ein ganz schauriges Intel- Monster. Ich weiß nicht, warum der so verbreitet ist, muß so in Richtung Fliegen und Kuhscheiße gehen.

Nobuo T schrieb:

Zudem sind die Codes in Sachen Optimierung IMHO auch so alle Muell. Siehe zB. Erhards Anmerkung.

Meinst Du den Compileroutput oder den C-Input?