[34] | 1 | Xen Performance Monitor |
---|
| 2 | ----------------------- |
---|
| 3 | |
---|
| 4 | The xenmon tools make use of the existing xen tracing feature to provide fine |
---|
| 5 | grained reporting of various domain related metrics. It should be stressed that |
---|
| 6 | the xenmon.py script included here is just an example of the data that may be |
---|
| 7 | displayed. The xenbake demon keeps a large amount of history in a shared memory |
---|
| 8 | area that may be accessed by tools such as xenmon. |
---|
| 9 | |
---|
| 10 | For each domain, xenmon reports various metrics. One part of the display is a |
---|
| 11 | group of metrics that have been accumulated over the last second, while another |
---|
| 12 | part of the display shows data measured over 10 seconds. Other measurement |
---|
| 13 | intervals are possible, but we have just chosen 1s and 10s as an example. |
---|
| 14 | |
---|
| 15 | |
---|
| 16 | Execution Count |
---|
| 17 | --------------- |
---|
| 18 | o The number of times that a domain was scheduled to run (ie, dispatched) over |
---|
| 19 | the measurement interval |
---|
| 20 | |
---|
| 21 | |
---|
| 22 | CPU usage |
---|
| 23 | --------- |
---|
| 24 | o Total time used over the measurement interval |
---|
| 25 | o Usage expressed as a percentage of the measurement interval |
---|
| 26 | o Average cpu time used during each execution of the domain |
---|
| 27 | |
---|
| 28 | |
---|
| 29 | Waiting time |
---|
| 30 | ------------ |
---|
| 31 | This is how much time the domain spent waiting to run, or put another way, the |
---|
| 32 | amount of time the domain spent in the "runnable" state (or on the run queue) |
---|
| 33 | but not actually running. Xenmon displays: |
---|
| 34 | |
---|
| 35 | o Total time waiting over the measurement interval |
---|
| 36 | o Wait time expressed as a percentage of the measurement interval |
---|
| 37 | o Average waiting time for each execution of the domain |
---|
| 38 | |
---|
| 39 | Blocked time |
---|
| 40 | ------------ |
---|
| 41 | This is how much time the domain spent blocked (or sleeping); Put another way, |
---|
| 42 | the amount of time the domain spent not needing/wanting the cpu because it was |
---|
| 43 | waiting for some event (ie, I/O). Xenmon reports: |
---|
| 44 | |
---|
| 45 | o Total time blocked over the measurement interval |
---|
| 46 | o Blocked time expressed as a percentage of the measurement interval |
---|
| 47 | o Blocked time per I/O (see I/O count below) |
---|
| 48 | |
---|
| 49 | Allocation time |
---|
| 50 | --------------- |
---|
| 51 | This is how much cpu time was allocated to the domain by the scheduler; This is |
---|
| 52 | distinct from cpu usage since the "time slice" given to a domain is frequently |
---|
| 53 | cut short for one reason or another, ie, the domain requests I/O and blocks. |
---|
| 54 | Xenmon reports: |
---|
| 55 | |
---|
| 56 | o Average allocation time per execution (ie, time slice) |
---|
| 57 | o Min and Max allocation times |
---|
| 58 | |
---|
| 59 | I/O Count |
---|
| 60 | --------- |
---|
| 61 | This is a rough measure of I/O requested by the domain. The number of page |
---|
| 62 | exchanges (or page "flips") between the domain and dom0 are counted. The |
---|
| 63 | number of pages exchanged may not accurately reflect the number of bytes |
---|
| 64 | transferred to/from a domain due to partial pages being used by the network |
---|
| 65 | protocols, etc. But it does give a good sense of the magnitude of I/O being |
---|
| 66 | requested by a domain. Xenmon reports: |
---|
| 67 | |
---|
| 68 | o Total number of page exchanges during the measurement interval |
---|
| 69 | o Average number of page exchanges per execution of the domain |
---|
| 70 | |
---|
| 71 | |
---|
| 72 | Usage Notes and issues |
---|
| 73 | ---------------------- |
---|
| 74 | - Start xenmon by simply running xenmon.py; The xenbake demon is started and |
---|
| 75 | stopped automatically by xenmon. |
---|
| 76 | - To see the various options for xenmon, run xenmon -h. Ditto for xenbaked. |
---|
| 77 | - xenmon also has an option (-n) to output log data to a file instead of the |
---|
| 78 | curses interface. |
---|
| 79 | - NDOMAINS is defined to be 32, but can be changed by recompiling xenbaked |
---|
| 80 | - Xenmon.py appears to create 1-2% cpu overhead; Part of this is just the |
---|
| 81 | overhead of the python interpreter. Part of it may be the number of trace |
---|
| 82 | records being generated. The number of trace records generated can be |
---|
| 83 | limited by setting the trace mask (with a dom0 Op), which controls which |
---|
| 84 | events cause a trace record to be emitted. |
---|
| 85 | - To exit xenmon, type 'q' |
---|
| 86 | - To cycle the display to other physical cpu's, type 'c' |
---|
| 87 | - The first time xenmon is run, it attempts to allocate xen trace buffers |
---|
| 88 | using a default size. If you wish to use a non-default value for the |
---|
| 89 | trace buffer size, run the 'setsize' program (located in tools/xentrace) |
---|
| 90 | and specify the number of memory pages as a parameter. The default is 20. |
---|
| 91 | - Not well tested with domains using more than 1 virtual cpu |
---|
| 92 | - If you create a lot of domains, or repeatedly kill a domain and restart it, |
---|
| 93 | and the domain id's get to be bigger than NDOMAINS, then xenmon behaves badly. |
---|
| 94 | This is a bug that is due to xenbaked's treatment of domain id's vs. domain |
---|
| 95 | indices in a data array. Will be fixed in a future release; Workaround: |
---|
| 96 | Increase NDOMAINS in xenbaked and rebuild. |
---|
| 97 | |
---|
| 98 | Future Work |
---|
| 99 | ----------- |
---|
| 100 | o RPC interface to allow external entities to programmatically access processed data |
---|
| 101 | o I/O Count batching to reduce number of trace records generated |
---|
| 102 | |
---|
| 103 | Case Study |
---|
| 104 | ---------- |
---|
| 105 | We have written a case study which demonstrates some of the usefulness of |
---|
| 106 | this tool and the metrics reported. It is available at: |
---|
| 107 | http://www.hpl.hp.com/techreports/2005/HPL-2005-187.html |
---|
| 108 | |
---|
| 109 | Authors |
---|
| 110 | ------- |
---|
| 111 | Diwaker Gupta <diwaker.gupta@hp.com> |
---|
| 112 | Rob Gardner <rob.gardner@hp.com> |
---|
| 113 | Lucy Cherkasova <lucy.cherkasova.hp.com> |
---|
| 114 | |
---|