1 | Xen Performance Monitor |
---|
2 | ----------------------- |
---|
3 | |
---|
4 | The xenmon tools make use of the existing xen tracing feature to provide fine |
---|
5 | grained reporting of various domain related metrics. It should be stressed that |
---|
6 | the xenmon.py script included here is just an example of the data that may be |
---|
7 | displayed. The xenbake demon keeps a large amount of history in a shared memory |
---|
8 | area that may be accessed by tools such as xenmon. |
---|
9 | |
---|
10 | For each domain, xenmon reports various metrics. One part of the display is a |
---|
11 | group of metrics that have been accumulated over the last second, while another |
---|
12 | part of the display shows data measured over 10 seconds. Other measurement |
---|
13 | intervals are possible, but we have just chosen 1s and 10s as an example. |
---|
14 | |
---|
15 | |
---|
16 | Execution Count |
---|
17 | --------------- |
---|
18 | o The number of times that a domain was scheduled to run (ie, dispatched) over |
---|
19 | the measurement interval |
---|
20 | |
---|
21 | |
---|
22 | CPU usage |
---|
23 | --------- |
---|
24 | o Total time used over the measurement interval |
---|
25 | o Usage expressed as a percentage of the measurement interval |
---|
26 | o Average cpu time used during each execution of the domain |
---|
27 | |
---|
28 | |
---|
29 | Waiting time |
---|
30 | ------------ |
---|
31 | This is how much time the domain spent waiting to run, or put another way, the |
---|
32 | amount of time the domain spent in the "runnable" state (or on the run queue) |
---|
33 | but not actually running. Xenmon displays: |
---|
34 | |
---|
35 | o Total time waiting over the measurement interval |
---|
36 | o Wait time expressed as a percentage of the measurement interval |
---|
37 | o Average waiting time for each execution of the domain |
---|
38 | |
---|
39 | Blocked time |
---|
40 | ------------ |
---|
41 | This is how much time the domain spent blocked (or sleeping); Put another way, |
---|
42 | the amount of time the domain spent not needing/wanting the cpu because it was |
---|
43 | waiting for some event (ie, I/O). Xenmon reports: |
---|
44 | |
---|
45 | o Total time blocked over the measurement interval |
---|
46 | o Blocked time expressed as a percentage of the measurement interval |
---|
47 | o Blocked time per I/O (see I/O count below) |
---|
48 | |
---|
49 | Allocation time |
---|
50 | --------------- |
---|
51 | This is how much cpu time was allocated to the domain by the scheduler; This is |
---|
52 | distinct from cpu usage since the "time slice" given to a domain is frequently |
---|
53 | cut short for one reason or another, ie, the domain requests I/O and blocks. |
---|
54 | Xenmon reports: |
---|
55 | |
---|
56 | o Average allocation time per execution (ie, time slice) |
---|
57 | o Min and Max allocation times |
---|
58 | |
---|
59 | I/O Count |
---|
60 | --------- |
---|
61 | This is a rough measure of I/O requested by the domain. The number of page |
---|
62 | exchanges (or page "flips") between the domain and dom0 are counted. The |
---|
63 | number of pages exchanged may not accurately reflect the number of bytes |
---|
64 | transferred to/from a domain due to partial pages being used by the network |
---|
65 | protocols, etc. But it does give a good sense of the magnitude of I/O being |
---|
66 | requested by a domain. Xenmon reports: |
---|
67 | |
---|
68 | o Total number of page exchanges during the measurement interval |
---|
69 | o Average number of page exchanges per execution of the domain |
---|
70 | |
---|
71 | |
---|
72 | Usage Notes and issues |
---|
73 | ---------------------- |
---|
74 | - Start xenmon by simply running xenmon.py; The xenbake demon is started and |
---|
75 | stopped automatically by xenmon. |
---|
76 | - To see the various options for xenmon, run xenmon -h. Ditto for xenbaked. |
---|
77 | - xenmon also has an option (-n) to output log data to a file instead of the |
---|
78 | curses interface. |
---|
79 | - NDOMAINS is defined to be 32, but can be changed by recompiling xenbaked |
---|
80 | - Xenmon.py appears to create 1-2% cpu overhead; Part of this is just the |
---|
81 | overhead of the python interpreter. Part of it may be the number of trace |
---|
82 | records being generated. The number of trace records generated can be |
---|
83 | limited by setting the trace mask (with a dom0 Op), which controls which |
---|
84 | events cause a trace record to be emitted. |
---|
85 | - To exit xenmon, type 'q' |
---|
86 | - To cycle the display to other physical cpu's, type 'c' |
---|
87 | - The first time xenmon is run, it attempts to allocate xen trace buffers |
---|
88 | using a default size. If you wish to use a non-default value for the |
---|
89 | trace buffer size, run the 'setsize' program (located in tools/xentrace) |
---|
90 | and specify the number of memory pages as a parameter. The default is 20. |
---|
91 | - Not well tested with domains using more than 1 virtual cpu |
---|
92 | - If you create a lot of domains, or repeatedly kill a domain and restart it, |
---|
93 | and the domain id's get to be bigger than NDOMAINS, then xenmon behaves badly. |
---|
94 | This is a bug that is due to xenbaked's treatment of domain id's vs. domain |
---|
95 | indices in a data array. Will be fixed in a future release; Workaround: |
---|
96 | Increase NDOMAINS in xenbaked and rebuild. |
---|
97 | |
---|
98 | Future Work |
---|
99 | ----------- |
---|
100 | o RPC interface to allow external entities to programmatically access processed data |
---|
101 | o I/O Count batching to reduce number of trace records generated |
---|
102 | |
---|
103 | Case Study |
---|
104 | ---------- |
---|
105 | We have written a case study which demonstrates some of the usefulness of |
---|
106 | this tool and the metrics reported. It is available at: |
---|
107 | http://www.hpl.hp.com/techreports/2005/HPL-2005-187.html |
---|
108 | |
---|
109 | Authors |
---|
110 | ------- |
---|
111 | Diwaker Gupta <diwaker.gupta@hp.com> |
---|
112 | Rob Gardner <rob.gardner@hp.com> |
---|
113 | Lucy Cherkasova <lucy.cherkasova.hp.com> |
---|
114 | |
---|