What is a canary, how does it work, and what does that mean if I want to write a modern exploit.



Canaries were once regularly used in coal mining as an early warning system. Toxic gases such as carbon monoxide or asphyxiant gases such as methane in the mine would kill the bird before affecting the miners. Signs of distress from the bird indicated to the miners that conditions were unsafe.


Let's start with some history

Oh the good old days. I remember a time when hackthissite.org, and Smash The Stack were fresh, and BOF's were often as easy as shoving your shellcode where the buffer was supposed to be and overwriting the return pointer with where that buffer was... Then they had to go and ruin the fun by widely adopting ASLR (Address Space Layout Randomization), DEP (Data execution protection), and canaries (stack cookies). Now in these dark times not only do we have to ROP with return to libc, we have to chain that with a memory leak vulnerability if we have any hope of  smashing the stack. While I plan to do posts soon covering all of these exciting things, I only have time for canaries tonight. I will only be covering this in the context of GCC and Stackguard.

Types of canaries and how GCC implements StackGuard with libssp

Terminator canaries consist of at least one string terminating character (-1, \r, \n, \0) at the junction between the canary and the direction of overflow. This helps to mitigate exploitation involving string operations (strcpy, strcat, ... etc), however it is absolutely useless against raw memory operations (memcpy, memset, ... etc). GCC often takes the null terminator at the start of the junction approach when this form is used.

Random canaries consist of a random magic sequence that is stored in a table or a constant that is often stored on a private page surrounded by a barrier of unmapped pages. However, this approach is vulnerable if the attacker has some access to the stack prior to overflow, say with a memory leak vulnerability. GCC often implements this in conjunction with a fallback to a terminator for a nice mixed approach.

Random XOR canaries are exactly like described above, with the "added layer of protection" of having the canary XOR'ed with a mask constructed from the adjacent frame pointer and return address. This is considered by GCC as potentially inefficient, and they argue that in the case of position independent code this becomes rather difficult to implement. As well they point out that a particularly skilled attacker could simply take this into account when constructing their payload given that this method is vulnerable to the same risks as the method above when combined with a memory leak that affects the region containing the stack. This method is only available with unofficial patches.


The command line arguments

-fstack-protector
Emit extra code to check for buffer overflows, such as stack smashing attacks. This is done by adding a guard variable to functions with vulnerable objects. This includes functions that call alloca, and functions with buffers larger than or equal to 8 bytes. The guards are initialized when a function is entered and then checked when the function exits. If a guard check fails, an error message is printed and the program exits. Only variables that are actually allocated on the stack are considered, optimized away variables or variables allocated in registers don’t count.

-fstack-protector-all
Like -fstack-protector except that all functions are protected.

-fstack-protector-strong
Like -fstack-protector but includes additional functions to be protected — those that have local array definitions, or have references to local frame addresses. Only variables that are actually allocated on the stack are considered, optimized away variables or variables allocated in registers don’t count.

-fstack-protector-explicit
Like -fstack-protector but only protects those functions which have the stack_protect attribute.
Special note: all -O(*) except -O0 and -O disable canaries (eg -O1, ... -Og, -Ofast, ...)

What's the recent sauce (gcc-10.1.0 ) look like

As you can see below, the most recent implementation attempts to construct a single global random canary and if that fails it defers to constructing a terminator canary. This means that if you gain access to the value of the canary it will be the same in all future checks as long as the application has not crashed or been restarted.

// https://github.com/gcc-mirror/gcc/blob/master/libssp/
// Last updated 05/29/2020

// ** /libssp/ssp.c *********************************************

void *__stack_chk_guard = 0;

static void __attribute__ ((constructor)) __guard_setup (void)
{
  unsigned char *p;

  if (__stack_chk_guard != 0)
    return;

#if defined (_WIN32) && !defined (__CYGWIN__)
  HCRYPTPROV hprovider = 0;
  if (CryptAcquireContext(&hprovider, NULL, NULL, PROV_RSA_FULL,
                          CRYPT_VERIFYCONTEXT | CRYPT_SILENT))
    {
      if (CryptGenRandom(hprovider, sizeof (__stack_chk_guard),
          (BYTE *)&__stack_chk_guard) &&  __stack_chk_guard != 0)
        {
           CryptReleaseContext(hprovider, 0);
           return;
        }
      CryptReleaseContext(hprovider, 0);
    }
#else
  int fd = open ("/dev/urandom", O_RDONLY);
  if (fd != -1)
    {
      ssize_t size = read (fd, &__stack_chk_guard,
                           sizeof (__stack_chk_guard));
      close (fd);
      if (size == sizeof(__stack_chk_guard) && __stack_chk_guard != 0)
        return;
    }

#endif
  /* If a random generator can't be used, the protector switches the guard
     to the "terminator canary".  */
  p = (unsigned char *) &__stack_chk_guard;
  p[sizeof(__stack_chk_guard)-1] = 255;
  p[sizeof(__stack_chk_guard)-2] = '\n';
  p[0] = 0;
}

void __stack_chk_fail (void)
{
  const char *msg = "*** stack smashing detected ***: ";
  fail (msg, strlen (msg), "stack smashing detected: terminated");
}

void __chk_fail (void)
{
  const char *msg = "*** buffer overflow detected ***: ";
  fail (msg, strlen (msg), "buffer overflow detected: terminated");
}


How the check happens

At the beginning of each function that contains local buffers, after setting up the initial frame it makes space for the stack variables and the canary right before the return pointer and the original base pointer and pops the canary in right after the very last variable stored on the stack by copying it from where it is stored in the fs segment. After cleaning up rax it then populates the space with the local stack variables and continues on to the rest of the functions code.
0x0000000000000d1d <+0>: push rbp 0x0000000000000d1e <+1>: mov rbp,rsp 0x0000000000000d21 <+4>: sub rsp,0x10 0x0000000000000d25 <+8>: mov rax,QWORD PTR fs:0x28
0x0000000000000d2e <+17>: mov QWORD PTR [rbp-0x8],rax 0x0000000000000d32 <+21>: xor eax,eax 0x0000000000000d34 <+23>: mov BYTE PTR [rbp-0x10],0x1 0x0000000000000d38 <+27>: mov BYTE PTR [rbp-0xf],0x2 0x0000000000000d3c <+31>: mov BYTE PTR [rbp-0xe],0x3 0x0000000000000d40 <+35>: mov BYTE PTR [rbp-0xd],0x4 0x0000000000000d44 <+39>: mov BYTE PTR [rbp-0xc],0x5 0x0000000000000d48 <+43>: mov BYTE PTR [rbp-0xb],0x6 0x0000000000000d4c <+47>:    mov    BYTE PTR [rbp-0xa],0x7 0x0000000000000d50 <+51>:    mov    BYTE PTR [rbp-0x9],0x8

At the end of each applicable function it XORs the canary with its saved copy and then either returns control flow to the caller if that results in 0 (they are the same) or calls __stack_chk_fail to fault on a detected overflow.
0x0000000000000d54 <+55>: nop 0x0000000000000d55 <+56>: mov rax,QWORD PTR [rbp-0x8] 0x0000000000000d59 <+60>: xor rax,QWORD PTR fs:0x28 0x0000000000000d62 <+69>: je 0xd69 <_Z3funv+76> 0x0000000000000d64 <+71>: call 0x9c0 <__stack_chk_fail@plt> 0x0000000000000d69 <+76>: leave 0x0000000000000d6a <+77>: ret

How does it look in action?

While I COULD have done this example using a debugger, I decided that it would be more accessible AND fun to write an exploratory piece of code.
#include <iostream>
#include <iomanip>
#include <cstdint>

/**** STACK ****
 --- buffer ----
 --- canary ----
 ---- orsp -----
 ---- retp -----
 ***************/

#define print_frame(RSP, RBP) __print_frame(RSP, RBP, __PRETTY_FUNCTION__)

void __print_frame (int64_t rsp, int64_t rbp, const char* name) {
    uint64_t size = ((rbp + 8 * 2) - rsp) / 8;
    std::cout << name << std::endl;
    std::cout << "\trsp: " << std::hex << rsp << std::endl;
    std::cout << "\trbp: " << std::hex << rbp << std::endl << std::endl;
    
    for(uint64_t i = 0;< size;i++) {
        uint64_t address = rsp + i * 8;
        std::cout << "\t" << std::hex << address 
            << " -> " << std::setfill('0') << std::setw(16)
            << *((uint64_t*) address) << std::endl;
    }
    
    std::cout << std::endl;
}

void fun (void) {
    register int64_t rsp asm("%rsp");
    register int64_t rbp asm("%rbp");
    uint8_t buffer[]= {0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08};
    print_frame(rsp, rbp);
}

int main (void) {
    register int64_t rsp asm("%rsp");
    register int64_t rbp asm("%rbp");
    print_frame(rsp, rbp);
    fun();
}

This code has the following output on gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0:

int main() rsp: 7ffc75d9b810 rbp: 7ffc75d9b810 7ffc75d9b810 -> 0000560786950f40 7ffc75d9b818 -> 00007fcd2f60eb97 void fun() rsp: 7ffc75d9b7f0 rbp: 7ffc75d9b800 7ffc75d9b7f0 -> 0807060504030201 7ffc75d9b7f8 -> 0459466cac1d0a00 7ffc75d9b800 -> 00007ffc75d9b810 7ffc75d9b808 -> 0000560786950da4


Comments

Popular posts from this blog

Better visualization of data formats using assembly POC's to better implement them in C

simulating router firmware for live demonstration, more advanced RE, or just giggles