[34] | 1 | === How the Blkif Drivers Work === |
---|
| 2 | Andrew Warfield |
---|
| 3 | andrew.warfield@cl.cam.ac.uk |
---|
| 4 | |
---|
| 5 | The intent of this is to explain at a fairly detailed level how the |
---|
| 6 | split device drivers work in Xen 1.3 (aka 2.0beta). The intended |
---|
| 7 | audience for this, I suppose, is anyone who intends to work with the |
---|
| 8 | existing blkif interfaces and wants something to help them get up to |
---|
| 9 | speed with the code in a hurry. Secondly though, I hope to break out |
---|
| 10 | the general mechanisms that are used in the drivers that are likely to |
---|
| 11 | be necessary to implement other drivers interfaces. |
---|
| 12 | |
---|
| 13 | As a point of warning before starting, it is worth mentioning that I |
---|
| 14 | anticipate much of the specifics described here changing in the near |
---|
| 15 | future. There has been talk about making the blkif protocol |
---|
| 16 | a bit more efficient than it currently is. Keir's addition of grant |
---|
| 17 | tables will change the current remapping code that is used when shared |
---|
| 18 | pages are initially set up. |
---|
| 19 | |
---|
| 20 | Also, writing other control interface types will likely need support |
---|
| 21 | from Xend, which at the moment has a steep learning curve... this |
---|
| 22 | should be addressed in the future. |
---|
| 23 | |
---|
| 24 | For more information on the driver model as a whole, read the |
---|
| 25 | "Reconstructing I/O" technical report |
---|
| 26 | (http://www.cl.cam.ac.uk/Research/SRG/netos/papers/2004-xenngio.pdf). |
---|
| 27 | |
---|
| 28 | ==== High-level structure of a split-driver interface ==== |
---|
| 29 | |
---|
| 30 | Why would you want to write a split driver in the first place? As Xen |
---|
| 31 | is a virtual machine manager and focuses on isolation as an initial |
---|
| 32 | design principle, it is generally considered unwise to share physical |
---|
| 33 | access to devices across domains. The reasons for this are obvious: |
---|
| 34 | when device resources are shared, misbehaving code or hardware can |
---|
| 35 | result in the failure of all of the client applications. Moreover, as |
---|
| 36 | virtual machines in Xen are entire OSs, standard device drives that |
---|
| 37 | they might use cannot have multiple instantiations for a single piece |
---|
| 38 | of hardware. In light of all this, the general approach in Xen is to |
---|
| 39 | give a single virtual machine hardware access to a device, and where |
---|
| 40 | other VMs want to share the device, export a higher-level interface to |
---|
| 41 | facilitate that sharing. If you don't want to share, that's fine. |
---|
| 42 | There are currently Xen users actively exploring running two |
---|
| 43 | completely isolated X-Servers on a Xen host, each with it's own video |
---|
| 44 | card, keyboard, and mouse. In these situations, the guests need only |
---|
| 45 | be given physical access to the necessary devices and left to go on |
---|
| 46 | their own. However, for devices such as disks and network interfaces, |
---|
| 47 | where sharing is required, the split driver approach is a good |
---|
| 48 | solution. |
---|
| 49 | |
---|
| 50 | The structure is like this: |
---|
| 51 | |
---|
| 52 | +--------------------------+ +--------------------------+ |
---|
| 53 | | Domain 0 (privileged) | | Domain 1 (unprivileged) | |
---|
| 54 | | | | | |
---|
| 55 | | Xend ( Application ) | | | |
---|
| 56 | | Blkif Backend Driver | | Blkif Frontend Driver | |
---|
| 57 | | Physical Device Driver | | | |
---|
| 58 | +--------------------------+ +--------------------------+ |
---|
| 59 | +--------------------------------------------------------+ |
---|
| 60 | | X E N | |
---|
| 61 | +--------------------------------------------------------+ |
---|
| 62 | |
---|
| 63 | |
---|
| 64 | The Blkif driver is in two parts, which we refer to as frontend (FE) |
---|
| 65 | and a backend (BE). Together, they serve to proxy device requests |
---|
| 66 | between the guest operating system in an unprivileged domain, and the |
---|
| 67 | physical device driver in the physical domain. An additional benefit |
---|
| 68 | to this approach is that the FE driver can provide a single interface |
---|
| 69 | for a whole class of physical devices. The blkif interface mounts |
---|
| 70 | IDE, SCSI, and our own VBD-structured disks, independent of the |
---|
| 71 | physical driver underneath. Moreover, supporting additional OSs only |
---|
| 72 | requires that a new FE driver be written to connect to the existing |
---|
| 73 | backend. |
---|
| 74 | |
---|
| 75 | ==== Inter-Domain Communication Mechanisms ==== |
---|
| 76 | |
---|
| 77 | ===== Event Channels ===== |
---|
| 78 | |
---|
| 79 | Before getting into the specifics of the block interface driver, it is |
---|
| 80 | worth discussing the mechanisms that are used to communicate between |
---|
| 81 | domains. Two mechanisms are used to allow the construction of |
---|
| 82 | high-performance drivers: event channels and shared-memory rings. |
---|
| 83 | |
---|
| 84 | Event channels are an asynchronous interdomain notification |
---|
| 85 | mechanism. Xen allows channels to be instantiated between two |
---|
| 86 | domains, and domains can request that a virtual irq be attached to |
---|
| 87 | notifications on a given channel. The result of this is that the |
---|
| 88 | frontend domain can send a notification on an event channel, resulting |
---|
| 89 | in an interrupt entry into the backend at a later time. |
---|
| 90 | |
---|
| 91 | The event channel between two domains is instantiated in the Xend code |
---|
| 92 | during driver startup (described later). Xend's channel.py |
---|
| 93 | (tools/python/xen/xend/server/channel.py) defines the function |
---|
| 94 | |
---|
| 95 | |
---|
| 96 | def eventChannel(dom1, dom2): |
---|
| 97 | return xc.evtchn_bind_interdomain(dom1=dom1, dom2=dom2) |
---|
| 98 | |
---|
| 99 | |
---|
| 100 | which maps to xc_evtchn_bind_interdomain() in tools/libxc/xc_evtchn.c, |
---|
| 101 | which in turn generates a hypercall to Xen to patch the event channel |
---|
| 102 | between the domains. Only a privileged domain can request the |
---|
| 103 | creation of an event channel. |
---|
| 104 | |
---|
| 105 | Once the event channel is created in Xend, its ends are passed to both the |
---|
| 106 | front and backend domains over the control channel. The end that is |
---|
| 107 | passed to a domain is just an integer "port" uniquely identifying the |
---|
| 108 | event channel's local connection to that domain. An example of this |
---|
| 109 | setup code is in linux-2.6.x/drivers/xen/blkfront/blkfront.c in |
---|
| 110 | blkif_connect(), which receives several status change events as |
---|
| 111 | the driver starts up. It is passed an event channel end in a |
---|
| 112 | BLKIF_INTERFACE_STATUS_CONNECTED message, and patches it in like this: |
---|
| 113 | |
---|
| 114 | |
---|
| 115 | blkif_evtchn = status->evtchn; |
---|
| 116 | blkif_irq = bind_evtchn_to_irq(blkif_evtchn); |
---|
| 117 | if ( (rc = request_irq(blkif_irq, blkif_int, |
---|
| 118 | SA_SAMPLE_RANDOM, "blkif", NULL)) ) |
---|
| 119 | printk(KERN_ALERT"blkfront request_irq failed (%ld)\n",rc); |
---|
| 120 | |
---|
| 121 | |
---|
| 122 | This code associates a virtual irq with the event channel, and |
---|
| 123 | attaches the function blkif_int() as an interrupt handler for that |
---|
| 124 | irq. blkif_int() simply handles the notification and returns, it does |
---|
| 125 | not need to interact with the channel at all. |
---|
| 126 | |
---|
| 127 | An example of generating a notification can also be seen in blkfront.c: |
---|
| 128 | |
---|
| 129 | |
---|
| 130 | static inline void flush_requests(void) |
---|
| 131 | { |
---|
| 132 | DISABLE_SCATTERGATHER(); |
---|
| 133 | wmb(); /* Ensure that the frontend can see the requests. */ |
---|
| 134 | blk_ring->req_prod = req_prod; |
---|
| 135 | notify_via_evtchn(blkif_evtchn); |
---|
| 136 | } |
---|
| 137 | }}} |
---|
| 138 | |
---|
| 139 | notify_via_evtchn() issues a hypercall to set the event waiting flag on |
---|
| 140 | the other domain's end of the channel. |
---|
| 141 | |
---|
| 142 | ===== Communication Rings ===== |
---|
| 143 | |
---|
| 144 | Event channels are strictly a notification mechanism between domains. |
---|
| 145 | To move large chunks of data back and forth, Xen allows domains to |
---|
| 146 | share pages of memory. We use communication rings as a means of |
---|
| 147 | managing access to a shared memory page for message passing between |
---|
| 148 | domains. These rings are not explicitly a mechanism of Xen, which is |
---|
| 149 | only concerned with the actual sharing of the page and not how it is |
---|
| 150 | used, they are however worth discussing as they are used in many |
---|
| 151 | places in the current code and are a useful model for communicating |
---|
| 152 | across a shared page. |
---|
| 153 | |
---|
| 154 | A shared page is set up by a front end guest first allocating and passing |
---|
| 155 | the address of a page in its own address space to the backend driver. |
---|
| 156 | |
---|
| 157 | Consider the following code, also from blkfront.c. Note: this code |
---|
| 158 | is in blkif_disconnect(). The driver transitions from STATE_CLOSED |
---|
| 159 | to STATE_DISCONNECTED before becoming CONNECTED. The state automata |
---|
| 160 | is in blkif_status(). |
---|
| 161 | |
---|
| 162 | blk_ring = (blkif_ring_t *)__get_free_page(GFP_KERNEL); |
---|
| 163 | blk_ring->req_prod = blk_ring->resp_prod = resp_cons = req_prod = 0; |
---|
| 164 | ... |
---|
| 165 | /* Construct an interface-CONNECT message for the domain controller. */ |
---|
| 166 | cmsg.type = CMSG_BLKIF_FE; |
---|
| 167 | cmsg.subtype = CMSG_BLKIF_FE_INTERFACE_CONNECT; |
---|
| 168 | cmsg.length = sizeof(blkif_fe_interface_connect_t); |
---|
| 169 | up.handle = 0; |
---|
| 170 | up.shmem_frame = virt_to_machine(blk_ring) >> PAGE_SHIFT; |
---|
| 171 | memcpy(cmsg.msg, &up, sizeof(up)); |
---|
| 172 | |
---|
| 173 | |
---|
| 174 | blk_ring will be the shared page. The producer and consumer pointers |
---|
| 175 | are then initialised (these will be discussed soon), and then the |
---|
| 176 | machine address of the page is send to the backend via a control |
---|
| 177 | channel to Xend. This control channel itself uses the notification |
---|
| 178 | and shared memory mechanisms described here, but is set up for each |
---|
| 179 | domain automatically at startup. |
---|
| 180 | |
---|
| 181 | The backend, which is a privileged domain then takes the page address |
---|
| 182 | and maps it into its own address space (in |
---|
| 183 | linux26/drivers/xen/blkback/interface.c:blkif_connect()): |
---|
| 184 | |
---|
| 185 | |
---|
| 186 | void blkif_connect(blkif_be_connect_t *connect) |
---|
| 187 | |
---|
| 188 | ... |
---|
| 189 | unsigned long shmem_frame = connect->shmem_frame; |
---|
| 190 | ... |
---|
| 191 | |
---|
| 192 | if ( (vma = get_vm_area(PAGE_SIZE, VM_IOREMAP)) == NULL ) |
---|
| 193 | { |
---|
| 194 | connect->status = BLKIF_BE_STATUS_OUT_OF_MEMORY; |
---|
| 195 | return; |
---|
| 196 | } |
---|
| 197 | |
---|
| 198 | prot = __pgprot(_PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY | _PAGE_ACCESSED); |
---|
| 199 | error = direct_remap_area_pages(&init_mm, VMALLOC_VMADDR(vma->addr), |
---|
| 200 | shmem_frame<<PAGE_SHIFT, PAGE_SIZE, |
---|
| 201 | prot, domid); |
---|
| 202 | |
---|
| 203 | ... |
---|
| 204 | |
---|
| 205 | blkif->blk_ring_base = (blkif_ring_t *)vma->addr |
---|
| 206 | }}} |
---|
| 207 | |
---|
| 208 | The machine address of the page is passed in the shmem_frame field of |
---|
| 209 | the connect message. This is then mapped into the virtual address |
---|
| 210 | space of the backend domain, and saved in the blkif structure |
---|
| 211 | representing this particular backend connection. |
---|
| 212 | |
---|
| 213 | NOTE: New mechanisms will be added very shortly to allow domains to |
---|
| 214 | explicitly grant access to their pages to other domains. This "grant |
---|
| 215 | table" support is in the process of being added to the tree, and will |
---|
| 216 | change the way a shared page is set up. In particular, it will remove |
---|
| 217 | the need of the remapping domain to be privileged. |
---|
| 218 | |
---|
| 219 | Sending data across shared rings: |
---|
| 220 | |
---|
| 221 | Shared rings avoid the potential for write interference between |
---|
| 222 | domains in a very cunning way. A ring is partitioned into a request |
---|
| 223 | and a response region, and domains only work within their own space. |
---|
| 224 | This can be thought of as a double producer-consumer ring -- the ring |
---|
| 225 | is described by four pointers into a circular buffer of fixed-size |
---|
| 226 | records. Pointers may only advance, and may not pass one another. |
---|
| 227 | |
---|
| 228 | |
---|
| 229 | resp_cons----+ |
---|
| 230 | V |
---|
| 231 | +----+----+----+----+----+----+----+ |
---|
| 232 | | | | free(A) |RSP1|RSP2| |
---|
| 233 | +----+----+----+----+----+----+----+ |
---|
| 234 | req_prod->| | --------> |RSP3| |
---|
| 235 | +----+ +----+ |
---|
| 236 | |REQ8| | |<-resp_prod |
---|
| 237 | +----+ +----+ |
---|
| 238 | |REQ7| | | |
---|
| 239 | +----+ +----+ |
---|
| 240 | |REQ6| <-------- | | |
---|
| 241 | +----+----+----+----+----+----+----+ |
---|
| 242 | |REQ5|REQ4| free(B) | | | |
---|
| 243 | +----+----+----+----+----+----+----+ |
---|
| 244 | req_cons---------^ |
---|
| 245 | |
---|
| 246 | |
---|
| 247 | |
---|
| 248 | By adopting the convention that every request will receive a response, |
---|
| 249 | not all four pointers need be shared and flow control on the ring |
---|
| 250 | becomes very easy to manage. Each domain manages its own |
---|
| 251 | consumer pointer, and the two producer pointers are visible to both |
---|
| 252 | (xen/include/public/io/blkif.h): |
---|
| 253 | |
---|
| 254 | |
---|
| 255 | /* NB. Ring size must be small enough for sizeof(blkif_ring_t) <=PAGE_SIZE.*/ |
---|
| 256 | #define BLKIF_RING_SIZE 64 |
---|
| 257 | |
---|
| 258 | ... |
---|
| 259 | |
---|
| 260 | /* |
---|
| 261 | * We use a special capitalised type name because it is _essential_ that all |
---|
| 262 | * arithmetic on indexes is done on an integer type of the correct size. |
---|
| 263 | */ |
---|
| 264 | typedef u32 BLKIF_RING_IDX; |
---|
| 265 | |
---|
| 266 | /* |
---|
| 267 | * Ring indexes are 'free running'. That is, they are not stored modulo the |
---|
| 268 | * size of the ring buffer. The following macro converts a free-running counter |
---|
| 269 | * into a value that can directly index a ring-buffer array. |
---|
| 270 | */ |
---|
| 271 | #define MASK_BLKIF_IDX(_i) ((_i)&(BLKIF_RING_SIZE-1)) |
---|
| 272 | |
---|
| 273 | typedef struct { |
---|
| 274 | BLKIF_RING_IDX req_prod; /* 0: Request producer. Updated by front-end. */ |
---|
| 275 | BLKIF_RING_IDX resp_prod; /* 4: Response producer. Updated by back-end. */ |
---|
| 276 | union { /* 8 */ |
---|
| 277 | blkif_request_t req; |
---|
| 278 | blkif_response_t resp; |
---|
| 279 | } PACKED ring[BLKIF_RING_SIZE]; |
---|
| 280 | } PACKED blkif_ring_t; |
---|
| 281 | |
---|
| 282 | |
---|
| 283 | |
---|
| 284 | As shown in the diagram above, the rules for using a shared memory |
---|
| 285 | ring are simple. |
---|
| 286 | |
---|
| 287 | 1. A ring is full when a domain's producer and consumer pointers are |
---|
| 288 | equal (e.g. req_prod == resp_cons). In this situation, the |
---|
| 289 | consumer pointer must be advanced. Furthermore, if the consumer |
---|
| 290 | pointer is equal to the other domain's producer pointer, |
---|
| 291 | (e.g. resp_cons = resp_prod), then the other domain has all the |
---|
| 292 | buffers. |
---|
| 293 | |
---|
| 294 | 2. Producer pointers point to the next buffer that will be written to. |
---|
| 295 | (So blk_ring[MASK_BLKIF_IDX(req_prod)] should not be consumed.) |
---|
| 296 | |
---|
| 297 | 3. Consumer pointers point to a valid message, so long as they are not |
---|
| 298 | equal to the associated producer pointer. |
---|
| 299 | |
---|
| 300 | 4. A domain should only ever write to the message pointed |
---|
| 301 | to by its producer index, and read from the message at it's |
---|
| 302 | consumer. More generally, the domain may be thought of to have |
---|
| 303 | exclusive access to the messages between its consumer and producer, |
---|
| 304 | and should absolutely not read or write outside this region. |
---|
| 305 | |
---|
| 306 | Thus the front end has exclusive access to the free(A) region |
---|
| 307 | in the figure above, and the back end driver has exclusive |
---|
| 308 | access to the free(B) region. |
---|
| 309 | |
---|
| 310 | In general, drivers keep a private copy of their producer pointer and |
---|
| 311 | then set the shared version when they are ready for the other end to |
---|
| 312 | process a set of messages. Additionally, it is worth paying attention |
---|
| 313 | to the use of memory barriers (rmb/wmb) in the code, to ensure that |
---|
| 314 | rings that are shared across processors behave as expected. |
---|
| 315 | |
---|
| 316 | ==== Structure of the Blkif Drivers ==== |
---|
| 317 | |
---|
| 318 | Now that the communications primitives have been discussed, I'll |
---|
| 319 | quickly cover the general structure of the blkif driver. This is |
---|
| 320 | intended to give a high-level idea of what is going on, in an effort |
---|
| 321 | to make reading the code a more approachable task. |
---|
| 322 | |
---|
| 323 | There are three key software components that are involved in the blkif |
---|
| 324 | drivers (not counting Xen itself). The frontend and backend driver, |
---|
| 325 | and Xend, which coordinates their initial connection. Xend may also |
---|
| 326 | be involved in control-channel signalling in some cases after startup, |
---|
| 327 | for instance to manage reconnection if the backend is restarted. |
---|
| 328 | |
---|
| 329 | ===== Frontend Driver Structure ===== |
---|
| 330 | |
---|
| 331 | The frontend domain uses a single event channel and a shared memory |
---|
| 332 | ring to trade control messages with the backend. These are both setup |
---|
| 333 | during domain startup, which will be discussed shortly. The shared |
---|
| 334 | memory ring is called blkif_ring, and the private ring indexes are |
---|
| 335 | resp_cons, and req_prod. The ring is protected by blkif_io_lock. |
---|
| 336 | Additionally, the frontend keeps a list of outstanding requests in |
---|
| 337 | rec_ring[]. These are uniquely identified by a guest-local id number, |
---|
| 338 | which is associated with each request sent to the backend, and |
---|
| 339 | returned with the matching responses. Information about the actual |
---|
| 340 | disks are stored in major_info[], of which only the first nr_vbds |
---|
| 341 | entries are valid. Finally, the global 'recovery' indicates that the |
---|
| 342 | connection between the backend and frontend drivers has been broken |
---|
| 343 | (possibly due to a backend driver crash) and that the frontend is in |
---|
| 344 | recovery mode, in which case it will attempt to reconnect and reissue |
---|
| 345 | outstanding requests. |
---|
| 346 | |
---|
| 347 | The frontend driver is single-threaded and after setup is entered only |
---|
| 348 | through three points: (1) read/write requests from the XenLinux guest |
---|
| 349 | that it is a part of, (2) interrupts from the backend driver on its |
---|
| 350 | event channel (blkif_int()), and (3) control messages from Xend |
---|
| 351 | (blkif_ctrlif_rx). |
---|
| 352 | |
---|
| 353 | ===== Backend Driver Structure ===== |
---|
| 354 | |
---|
| 355 | The backend driver is slightly more complex as it must manage any |
---|
| 356 | number of concurrent frontend connections. For each domain it |
---|
| 357 | manages, the backend driver maintains a blkif structure, which |
---|
| 358 | describes all the connection and disk information associated with that |
---|
| 359 | particular domain. This structure is associated with the interrupt |
---|
| 360 | registration, and allows the backend driver to have immediate context |
---|
| 361 | when it takes a notification from some domain. |
---|
| 362 | |
---|
| 363 | All of the blkif structures are stored in a hash table (blkif_hash), |
---|
| 364 | which is indexed by a hash of the domain id, and a "handle", really a |
---|
| 365 | per-domain blkif identifier, in case it wants to have multiple connections. |
---|
| 366 | |
---|
| 367 | The per-connection blkif structure is of type blkif_t. It contains |
---|
| 368 | all of the communication details (event channel, irq, shared memory |
---|
| 369 | ring and indexes), and blk_ring_lock, which is the backend mutex on |
---|
| 370 | the shared ring. The structure also contains vbd_rb, which is a |
---|
| 371 | red-black tree, containing an entry for each device/partition that is |
---|
| 372 | assigned to that domain. This structure is filled by xend passing |
---|
| 373 | disk information to the backend at startup, and is protected by |
---|
| 374 | vbd_lock. Finally, the blkif struct contains a status field, which |
---|
| 375 | describes the state of the connection. |
---|
| 376 | |
---|
| 377 | The backend driver spawns a kernel thread at startup |
---|
| 378 | (blkio_schedule()), which handles requests to and from the actual disk |
---|
| 379 | device drivers. This scheduler thread maintains a list of blkif |
---|
| 380 | structures that have pending requests, and services them round-robin |
---|
| 381 | with a maximum per-round request limit. blkifs are added to the list |
---|
| 382 | in the interrupt handler (blkif_be_int()) using |
---|
| 383 | add_to_blkdev_list_tail(), and removed in the scheduler loop after |
---|
| 384 | calling do_block_io_op(), which processes a batch of requests. The |
---|
| 385 | scheduler thread is explicitly activated at several points in the code |
---|
| 386 | using maybe_trigger_blkio_schedule(). |
---|
| 387 | |
---|
| 388 | Pending requests between the backend driver and the physical device |
---|
| 389 | drivers use another ring, pending_ring. Requests are placed in this |
---|
| 390 | ring in the scheduler thread and issued to the device. A completion |
---|
| 391 | callback, end_block_io_op, indicates that requests have been serviced |
---|
| 392 | and generates a response on the appropriate blkif ring. pending |
---|
| 393 | reqs[] stores a list of outstanding requests with the physical drivers. |
---|
| 394 | |
---|
| 395 | So, control entries to the backend are (1) the blkio scheduler thread, |
---|
| 396 | which sends requests to the real device drivers, (2) end_block_io_op, |
---|
| 397 | which is called as serviced requests complete, (3) blkif_be_int() |
---|
| 398 | handles notifications from the frontend drivers in other domains, and |
---|
| 399 | (4) blkif_ctrlif_rx() handles control messages from xend. |
---|
| 400 | |
---|
| 401 | ==== Driver Startup ==== |
---|
| 402 | |
---|
| 403 | Prior to starting a new guest using the frontend driver, the backend |
---|
| 404 | will have been started in a privileged domain. The backend |
---|
| 405 | initialisation code initialises all of its data structures, such as |
---|
| 406 | the blkif hash table, and starts the scheduler thread as a kernel |
---|
| 407 | thread. It then sends a driver status up message to let xend know it |
---|
| 408 | is ready to take frontend connections. |
---|
| 409 | |
---|
| 410 | When a new domain that uses the blkif frontend driver is started, |
---|
| 411 | there are a series of interactions between it, xend, and the specified |
---|
| 412 | backend driver. These interactions are as follows: |
---|
| 413 | |
---|
| 414 | The domain configuration given to xend will specify the backend domain |
---|
| 415 | and disks that the new guest is to use. Prior to actually running the |
---|
| 416 | domain, xend and the backend driver interact to setup the initial |
---|
| 417 | blkif record in the backend. |
---|
| 418 | |
---|
| 419 | (1) Xend sends a BLKIF_BE_CREATE message to backend. |
---|
| 420 | |
---|
| 421 | Backend does blkif_create(), having been passed FE domid and handle. |
---|
| 422 | It creates and initialises a new blkif struct, and puts it in the |
---|
| 423 | hash table. |
---|
| 424 | It then returns a STATUS_OK response to xend. |
---|
| 425 | |
---|
| 426 | (2) Xend sends a BLKIF_BE_VBD_CREATE message to the backend. |
---|
| 427 | |
---|
| 428 | Backend adds a vbd entry in the red-black tree for the |
---|
| 429 | specified (dom, handle) blkif entry. |
---|
| 430 | Sends a STATUS_OK response. |
---|
| 431 | |
---|
| 432 | (3) Xend sends a BLKIF_BE_VBD_GROW message to the backend. |
---|
| 433 | |
---|
| 434 | Backend takes the physical device information passed in the |
---|
| 435 | message and assigns them to the newly created vbd struct. |
---|
| 436 | |
---|
| 437 | (2) and (3) repeat as any additional devices are added to the domain. |
---|
| 438 | |
---|
| 439 | At this point, the backend has enough state to allow the frontend |
---|
| 440 | domain to start. The domain is run, and eventually gets to the |
---|
| 441 | frontend driver initialisation code. After setting up the frontend |
---|
| 442 | data structures, this code continues the communications with xend and |
---|
| 443 | the backend to negotiate a connection: |
---|
| 444 | |
---|
| 445 | (4) Frontend sends Xend a BLKIF_FE_DRIVER_STATUS_CHANGED message. |
---|
| 446 | |
---|
| 447 | This message tells xend that the driver is up. The init function |
---|
| 448 | now spin-waits until driver setup is complete in order to prevent |
---|
| 449 | Linux from attempting to boot before the disks are connected. |
---|
| 450 | |
---|
| 451 | (5) Xend sends the frontend an INTERFACE_STATUS_CHANGED message |
---|
| 452 | |
---|
| 453 | This message specifies that the interface is now disconnected |
---|
| 454 | (instead of closed). |
---|
| 455 | The domain updates it's state, and allocates the shared blk_ring |
---|
| 456 | page. Next, |
---|
| 457 | |
---|
| 458 | (6) Frontend sends Xend a BLKIF_INTERFACE_CONNECT message |
---|
| 459 | |
---|
| 460 | This message specifies the domain and handle, and includes the |
---|
| 461 | address of the newly created page. |
---|
| 462 | |
---|
| 463 | (7) Xend sends the backend a BLKIF_BE_CONNECT message |
---|
| 464 | |
---|
| 465 | The backend fills in the blkif connection information, maps the |
---|
| 466 | shared page, and binds an irq to the event channel. |
---|
| 467 | |
---|
| 468 | (8) Xend sends the frontend an INTERFACE_STATUS_CHANGED message |
---|
| 469 | |
---|
| 470 | This message takes the frontend driver to a CONNECTED state, at |
---|
| 471 | which point it binds an irq to the event channel and calls |
---|
| 472 | xlvbd_init to initialise the individual block devices. |
---|
| 473 | |
---|
| 474 | The frontend Linux is stall spin waiting at this point, until all of |
---|
| 475 | the disks have been probed. Messaging now is directly between the |
---|
| 476 | front and backend domain using the new shared ring and event channel. |
---|
| 477 | |
---|
| 478 | (9) The frontend sends a BLKIF_OP_PROBE directly to the backend. |
---|
| 479 | |
---|
| 480 | This message includes a reference to an additional page, that the |
---|
| 481 | backend can use for it's reply. The backend responds with an array |
---|
| 482 | of the domains disks (as vdisk_t structs) on the provided page. |
---|
| 483 | |
---|
| 484 | The frontend now initialises each disk, calling xlvbd_init_device() |
---|
| 485 | for each one. |
---|