[34] | 1 | Blktap Userspace Tools + Library |
---|
| 2 | ================================ |
---|
| 3 | |
---|
| 4 | Andrew Warfield and Julian Chesterfield |
---|
| 5 | 16th June 2006 |
---|
| 6 | |
---|
| 7 | {firstname.lastname}@cl.cam.ac.uk |
---|
| 8 | |
---|
| 9 | The blktap userspace toolkit provides a user-level disk I/O |
---|
| 10 | interface. The blktap mechanism involves a kernel driver that acts |
---|
| 11 | similarly to the existing Xen/Linux blkback driver, and a set of |
---|
| 12 | associated user-level libraries. Using these tools, blktap allows |
---|
| 13 | virtual block devices presented to VMs to be implemented in userspace |
---|
| 14 | and to be backed by raw partitions, files, network, etc. |
---|
| 15 | |
---|
| 16 | The key benefit of blktap is that it makes it easy and fast to write |
---|
| 17 | arbitrary block backends, and that these user-level backends actually |
---|
| 18 | perform very well. Specifically: |
---|
| 19 | |
---|
| 20 | - Metadata disk formats such as Copy-on-Write, encrypted disks, sparse |
---|
| 21 | formats and other compression features can be easily implemented. |
---|
| 22 | |
---|
| 23 | - Accessing file-based images from userspace avoids problems related |
---|
| 24 | to flushing dirty pages which are present in the Linux loopback |
---|
| 25 | driver. (Specifically, doing a large number of writes to an |
---|
| 26 | NFS-backed image don't result in the OOM killer going berserk.) |
---|
| 27 | |
---|
| 28 | - Per-disk handler processes enable easier userspace policing of block |
---|
| 29 | resources, and process-granularity QoS techniques (disk scheduling |
---|
| 30 | and related tools) may be trivially applied to block devices. |
---|
| 31 | |
---|
| 32 | - It's very easy to take advantage of userspace facilities such as |
---|
| 33 | networking libraries, compression utilities, peer-to-peer |
---|
| 34 | file-sharing systems and so on to build more complex block backends. |
---|
| 35 | |
---|
| 36 | - Crashes are contained -- incremental development/debugging is very |
---|
| 37 | fast. |
---|
| 38 | |
---|
| 39 | How it works (in one paragraph): |
---|
| 40 | |
---|
| 41 | Working in conjunction with the kernel blktap driver, all disk I/O |
---|
| 42 | requests from VMs are passed to the userspace deamon (using a shared |
---|
| 43 | memory interface) through a character device. Each active disk is |
---|
| 44 | mapped to an individual device node, allowing per-disk processes to |
---|
| 45 | implement individual block devices where desired. The userspace |
---|
| 46 | drivers are implemented using asynchronous (Linux libaio), |
---|
| 47 | O_DIRECT-based calls to preserve the unbuffered, batched and |
---|
| 48 | asynchronous request dispatch achieved with the existing blkback |
---|
| 49 | code. We provide a simple, asynchronous virtual disk interface that |
---|
| 50 | makes it quite easy to add new disk implementations. |
---|
| 51 | |
---|
| 52 | As of June 2006 the current supported disk formats are: |
---|
| 53 | |
---|
| 54 | - Raw Images (both on partitions and in image files) |
---|
| 55 | - File-backed Qcow disks |
---|
| 56 | - Standalone sparse Qcow disks |
---|
| 57 | - Fast shareable RAM disk between VMs (requires some form of cluster-based |
---|
| 58 | filesystem support e.g. OCFS2 in the guest kernel) |
---|
| 59 | - Some VMDK images - your mileage may vary |
---|
| 60 | |
---|
| 61 | Raw and QCow images have asynchronous backends and so should perform |
---|
| 62 | fairly well. VMDK is based directly on the qemu vmdk driver, which is |
---|
| 63 | synchronous (a.k.a. slow). |
---|
| 64 | |
---|
| 65 | Build and Installation Instructions |
---|
| 66 | =================================== |
---|
| 67 | |
---|
| 68 | Make to configure the blktap backend driver in your dom0 kernel. It |
---|
| 69 | will cooperate fine with the existing backend driver, so you can |
---|
| 70 | experiment with tap disks without breaking existing VM configs. |
---|
| 71 | |
---|
| 72 | To build the tools separately, "make && make install" in |
---|
| 73 | tools/blktap. |
---|
| 74 | |
---|
| 75 | |
---|
| 76 | Using the Tools |
---|
| 77 | =============== |
---|
| 78 | |
---|
| 79 | Prepare the image for booting. For qcow files use the qcow utilities |
---|
| 80 | installed earlier. e.g. qcow-create generates a blank standalone image |
---|
| 81 | or a file-backed CoW image. img2qcow takes an existing image or |
---|
| 82 | partition and creates a sparse, standalone qcow-based file. |
---|
| 83 | |
---|
| 84 | The userspace disk agent is configured to start automatically via xend |
---|
| 85 | (alternatively you can start it manually => 'blktapctrl') |
---|
| 86 | |
---|
| 87 | Customise the VM config file to use the 'tap' handler, followed by the |
---|
| 88 | driver type. e.g. for a raw image such as a file or partition: |
---|
| 89 | |
---|
| 90 | disk = ['tap:aio:<FILENAME>,sda1,w'] |
---|
| 91 | |
---|
| 92 | e.g. for a qcow image: |
---|
| 93 | |
---|
| 94 | disk = ['tap:qcow:<FILENAME>,sda1,w'] |
---|
| 95 | |
---|
| 96 | |
---|
| 97 | Mounting images in Dom0 using the blktap driver |
---|
| 98 | =============================================== |
---|
| 99 | Tap (and blkback) disks are also mountable in Dom0 without requiring an |
---|
| 100 | active VM to attach. You will need to build a xenlinux Dom0 kernel that |
---|
| 101 | includes the blkfront driver (e.g. the default 'make world' or |
---|
| 102 | 'make kernels' build. Simply use the xm command-line tool to activate |
---|
| 103 | the backend disks, and blkfront will generate a virtual block device that |
---|
| 104 | can be accessed in the same way as a loop device or partition: |
---|
| 105 | |
---|
| 106 | e.g. for a raw image file <FILENAME> that would normally be mounted using |
---|
| 107 | the loopback driver (such as 'mount -o loop <FILENAME> /mnt/disk'), do the |
---|
| 108 | following: |
---|
| 109 | |
---|
| 110 | xm block-attach 0 tap:aio:<FILENAME> /dev/xvda1 w 0 |
---|
| 111 | mount /dev/xvda1 /mnt/disk <--- don't use loop driver |
---|
| 112 | |
---|
| 113 | In this way, you can use any of the userspace device-type drivers built |
---|
| 114 | with the blktap userspace toolkit to open and mount disks such as qcow |
---|
| 115 | or vmdk images: |
---|
| 116 | |
---|
| 117 | xm block-attach 0 tap:qcow:<FILENAME> /dev/xvda1 w 0 |
---|
| 118 | mount /dev/xvda1 /mnt/disk |
---|
| 119 | |
---|
| 120 | |
---|
| 121 | |
---|
| 122 | |
---|