1 | Blktap Userspace Tools + Library |
---|
2 | ================================ |
---|
3 | |
---|
4 | Andrew Warfield and Julian Chesterfield |
---|
5 | 16th June 2006 |
---|
6 | |
---|
7 | {firstname.lastname}@cl.cam.ac.uk |
---|
8 | |
---|
9 | The blktap userspace toolkit provides a user-level disk I/O |
---|
10 | interface. The blktap mechanism involves a kernel driver that acts |
---|
11 | similarly to the existing Xen/Linux blkback driver, and a set of |
---|
12 | associated user-level libraries. Using these tools, blktap allows |
---|
13 | virtual block devices presented to VMs to be implemented in userspace |
---|
14 | and to be backed by raw partitions, files, network, etc. |
---|
15 | |
---|
16 | The key benefit of blktap is that it makes it easy and fast to write |
---|
17 | arbitrary block backends, and that these user-level backends actually |
---|
18 | perform very well. Specifically: |
---|
19 | |
---|
20 | - Metadata disk formats such as Copy-on-Write, encrypted disks, sparse |
---|
21 | formats and other compression features can be easily implemented. |
---|
22 | |
---|
23 | - Accessing file-based images from userspace avoids problems related |
---|
24 | to flushing dirty pages which are present in the Linux loopback |
---|
25 | driver. (Specifically, doing a large number of writes to an |
---|
26 | NFS-backed image don't result in the OOM killer going berserk.) |
---|
27 | |
---|
28 | - Per-disk handler processes enable easier userspace policing of block |
---|
29 | resources, and process-granularity QoS techniques (disk scheduling |
---|
30 | and related tools) may be trivially applied to block devices. |
---|
31 | |
---|
32 | - It's very easy to take advantage of userspace facilities such as |
---|
33 | networking libraries, compression utilities, peer-to-peer |
---|
34 | file-sharing systems and so on to build more complex block backends. |
---|
35 | |
---|
36 | - Crashes are contained -- incremental development/debugging is very |
---|
37 | fast. |
---|
38 | |
---|
39 | How it works (in one paragraph): |
---|
40 | |
---|
41 | Working in conjunction with the kernel blktap driver, all disk I/O |
---|
42 | requests from VMs are passed to the userspace deamon (using a shared |
---|
43 | memory interface) through a character device. Each active disk is |
---|
44 | mapped to an individual device node, allowing per-disk processes to |
---|
45 | implement individual block devices where desired. The userspace |
---|
46 | drivers are implemented using asynchronous (Linux libaio), |
---|
47 | O_DIRECT-based calls to preserve the unbuffered, batched and |
---|
48 | asynchronous request dispatch achieved with the existing blkback |
---|
49 | code. We provide a simple, asynchronous virtual disk interface that |
---|
50 | makes it quite easy to add new disk implementations. |
---|
51 | |
---|
52 | As of June 2006 the current supported disk formats are: |
---|
53 | |
---|
54 | - Raw Images (both on partitions and in image files) |
---|
55 | - File-backed Qcow disks |
---|
56 | - Standalone sparse Qcow disks |
---|
57 | - Fast shareable RAM disk between VMs (requires some form of cluster-based |
---|
58 | filesystem support e.g. OCFS2 in the guest kernel) |
---|
59 | - Some VMDK images - your mileage may vary |
---|
60 | |
---|
61 | Raw and QCow images have asynchronous backends and so should perform |
---|
62 | fairly well. VMDK is based directly on the qemu vmdk driver, which is |
---|
63 | synchronous (a.k.a. slow). |
---|
64 | |
---|
65 | Build and Installation Instructions |
---|
66 | =================================== |
---|
67 | |
---|
68 | Make to configure the blktap backend driver in your dom0 kernel. It |
---|
69 | will cooperate fine with the existing backend driver, so you can |
---|
70 | experiment with tap disks without breaking existing VM configs. |
---|
71 | |
---|
72 | To build the tools separately, "make && make install" in |
---|
73 | tools/blktap. |
---|
74 | |
---|
75 | |
---|
76 | Using the Tools |
---|
77 | =============== |
---|
78 | |
---|
79 | Prepare the image for booting. For qcow files use the qcow utilities |
---|
80 | installed earlier. e.g. qcow-create generates a blank standalone image |
---|
81 | or a file-backed CoW image. img2qcow takes an existing image or |
---|
82 | partition and creates a sparse, standalone qcow-based file. |
---|
83 | |
---|
84 | The userspace disk agent is configured to start automatically via xend |
---|
85 | (alternatively you can start it manually => 'blktapctrl') |
---|
86 | |
---|
87 | Customise the VM config file to use the 'tap' handler, followed by the |
---|
88 | driver type. e.g. for a raw image such as a file or partition: |
---|
89 | |
---|
90 | disk = ['tap:aio:<FILENAME>,sda1,w'] |
---|
91 | |
---|
92 | e.g. for a qcow image: |
---|
93 | |
---|
94 | disk = ['tap:qcow:<FILENAME>,sda1,w'] |
---|
95 | |
---|
96 | |
---|
97 | Mounting images in Dom0 using the blktap driver |
---|
98 | =============================================== |
---|
99 | Tap (and blkback) disks are also mountable in Dom0 without requiring an |
---|
100 | active VM to attach. You will need to build a xenlinux Dom0 kernel that |
---|
101 | includes the blkfront driver (e.g. the default 'make world' or |
---|
102 | 'make kernels' build. Simply use the xm command-line tool to activate |
---|
103 | the backend disks, and blkfront will generate a virtual block device that |
---|
104 | can be accessed in the same way as a loop device or partition: |
---|
105 | |
---|
106 | e.g. for a raw image file <FILENAME> that would normally be mounted using |
---|
107 | the loopback driver (such as 'mount -o loop <FILENAME> /mnt/disk'), do the |
---|
108 | following: |
---|
109 | |
---|
110 | xm block-attach 0 tap:aio:<FILENAME> /dev/xvda1 w 0 |
---|
111 | mount /dev/xvda1 /mnt/disk <--- don't use loop driver |
---|
112 | |
---|
113 | In this way, you can use any of the userspace device-type drivers built |
---|
114 | with the blktap userspace toolkit to open and mount disks such as qcow |
---|
115 | or vmdk images: |
---|
116 | |
---|
117 | xm block-attach 0 tap:qcow:<FILENAME> /dev/xvda1 w 0 |
---|
118 | mount /dev/xvda1 /mnt/disk |
---|
119 | |
---|
120 | |
---|
121 | |
---|
122 | |
---|