1 00:00:10,670 --> 00:00:10,670 Well, I will have an interesting 2 00:00:10,670 --> 00:00:12,470 So, lets begin with the agenda 3 00:00:12,470 --> 00:00:12,670 I will talk a little bit about So, lets begin with the agenda 4 00:00:12,670 --> 00:00:14,970 I will talk a little bit about 5 00:00:16,970 --> 00:00:18,970 the history, 6 00:00:18,970 --> 00:00:20,970 what makes Munin 7 00:00:20,970 --> 00:00:22,970 very unique, 8 00:00:22,970 --> 00:00:24,220 we'll see 9 00:00:24,220 --> 00:00:24,970 unique but not in a very good way we'll see 10 00:00:24,970 --> 00:00:26,970 what makes it also really unique but not in a very good way 11 00:00:26,970 --> 00:00:29,960 unique but not in a very good way 12 00:00:29,960 --> 00:00:31,960 what makes 13 00:00:31,960 --> 00:00:33,960 unique in 14 00:00:33,960 --> 00:00:35,960 2.0 version, what 15 00:00:35,960 --> 00:00:37,960 we got in Wheezy, 16 00:00:37,960 --> 00:00:39,960 it's very interesting. 17 00:00:42,930 --> 00:00:45,960 we will also see 18 00:00:47,960 --> 00:00:49,960 according to the new features 19 00:00:49,960 --> 00:00:51,960 of 2.0 you can scale 20 00:00:51,960 --> 00:00:53,960 much more a Munin 21 00:00:53,960 --> 00:00:55,960 install 22 00:00:55,960 --> 00:00:57,960 install from 23 00:00:57,960 --> 00:00:59,960 1.4 package 24 00:01:01,960 --> 00:01:03,960 and we will see also a limitation 25 00:01:03,960 --> 00:01:05,960 of 2.0 since 26 00:01:07,960 --> 00:01:09,960 now you can scale quite well 27 00:01:09,960 --> 00:01:11,960 I mean, 28 00:01:11,960 --> 00:01:13,960 theorically, you can scale really well, 29 00:01:13,960 --> 00:01:15,960 practically, well, 30 00:01:15,960 --> 00:01:17,960 we will see. We still have some 31 00:01:17,960 --> 00:01:19,960 big big issues 32 00:01:19,960 --> 00:01:21,960 very different from 33 00:01:21,960 --> 00:01:23,960 the ones in 1.4, but 34 00:01:23,960 --> 00:01:25,960 still, and I will 35 00:01:25,960 --> 00:01:27,960 present rapidly 36 00:01:27,960 --> 00:01:29,960 the roadmap for 37 00:01:29,960 --> 00:01:31,960 2.2. 38 00:01:31,960 --> 00:01:33,960 Hopefully, it's released 39 00:01:33,960 --> 00:01:35,960 this year. Well, 40 00:01:35,960 --> 00:01:37,960 that's a challenge. 41 00:01:39,960 --> 00:01:41,960 Thing is, if you have 42 00:01:41,960 --> 00:01:43,960 any questions, I will 43 00:01:43,960 --> 00:01:45,960 stop at 10:00, and 44 00:01:45,960 --> 00:01:47,960 we will have 15 minutes for 45 00:01:47,960 --> 00:01:49,960 questions just after. 46 00:01:51,960 --> 00:01:53,960 So, brief history, 47 00:01:55,220 --> 00:02:00,490 Munin was born in 2002, and was named RRD 48 00:02:01,930 --> 00:02:04,820 I didn't know that fact before 49 00:02:04,820 --> 00:02:06,820 and I just know it because I researched 50 00:02:06,820 --> 00:02:08,820 for the presentation 51 00:02:10,820 --> 00:02:12,820 and it's not a well known fact but 52 00:02:12,820 --> 00:02:14,820 some code 53 00:02:14,820 --> 00:02:16,820 I mean, most code 54 00:02:16,820 --> 00:02:18,820 still dates from 55 00:02:18,820 --> 00:02:20,820 that day. So it's quite 56 00:02:20,820 --> 00:02:22,820 important to see that 57 00:02:22,820 --> 00:02:24,820 issues 58 00:02:24,820 --> 00:02:26,820 when changing code 59 00:02:26,820 --> 00:02:28,820 it's more like geology. 60 00:02:28,820 --> 00:02:30,820 You have every 61 00:02:30,820 --> 00:02:32,820 every layers 62 00:02:32,820 --> 00:02:34,820 If you want to add a functionality 63 00:02:34,820 --> 00:02:36,820 one layer newer functionality 64 00:02:36,820 --> 00:02:38,820 one layer... Well, 65 00:02:38,820 --> 00:02:40,820 you all know this. [laughter] 66 00:02:40,820 --> 00:02:42,820 So, I hacked zooming 67 00:02:42,820 --> 00:02:44,820 for 1.2 in 2007 68 00:02:44,820 --> 00:02:46,820 I mean 69 00:02:46,820 --> 00:02:48,820 to 1.2 70 00:02:48,820 --> 00:02:50,820 was very static 71 00:02:50,820 --> 00:02:52,820 And well, 72 00:02:52,820 --> 00:02:54,820 I maintained it in my 73 00:02:54,820 --> 00:02:56,820 own private 74 00:02:56,820 --> 00:02:58,820 place 75 00:02:58,820 --> 00:03:00,820 And in 2009 76 00:03:00,820 --> 00:03:02,820 1.4 77 00:03:02,820 --> 00:03:04,820 came out, and I asked 78 00:03:04,820 --> 00:03:06,820 if I could send my 79 00:03:06,820 --> 00:03:08,820 patch to Munin 80 00:03:08,820 --> 00:03:10,820 and, well, they got 81 00:03:10,820 --> 00:03:12,820 accepted, and from 82 00:03:14,820 --> 00:03:16,820 2009 until 83 00:03:16,820 --> 00:03:18,820 2011, so, I was 84 00:03:18,820 --> 00:03:20,820 slowly 85 00:03:20,820 --> 00:03:22,820 gaining ground in the 86 00:03:22,820 --> 00:03:24,820 Munin community until now 87 00:03:24,820 --> 00:03:26,820 where, well, 88 00:03:26,820 --> 00:03:28,820 I've just 89 00:03:28,820 --> 00:03:30,820 took over the leadership 90 00:03:30,820 --> 00:03:32,820 from the previous team 91 00:03:32,820 --> 00:03:34,820 and it didn't 92 00:03:34,820 --> 00:03:36,820 happen officially, but just 93 00:03:36,820 --> 00:03:38,820 it's just the way it is. 94 00:03:40,820 --> 00:03:42,820 so in 2012 I released 95 00:03:42,820 --> 00:03:44,820 2.0 96 00:03:44,820 --> 00:03:46,820 thanks to Holger 97 00:03:46,820 --> 00:03:48,820 who said, "hey, 98 00:03:48,820 --> 00:03:50,820 you have to release now, otherwise 99 00:03:50,820 --> 00:03:52,820 you will release in ten years". 100 00:03:54,820 --> 00:03:56,820 So, thanks to him, things were 101 00:03:56,820 --> 00:03:58,820 very very 102 00:03:58,820 --> 00:04:00,820 hectic at the early days 103 00:04:00,820 --> 00:04:02,820 of 2.0 104 00:04:02,820 --> 00:04:04,820 because I realized that 105 00:04:04,820 --> 00:04:06,820 the biggest 106 00:04:06,820 --> 00:04:08,820 point was 107 00:04:08,820 --> 00:04:10,820 since it wasn't released 108 00:04:10,820 --> 00:04:12,820 we didn't have many testers 109 00:04:12,820 --> 00:04:14,820 and since we 110 00:04:14,820 --> 00:04:16,820 didn't have many testers, I didn't want to 111 00:04:16,820 --> 00:04:18,820 release it, since I still have some 112 00:04:18,820 --> 00:04:20,820 bugs that 113 00:04:20,820 --> 00:04:22,820 came out and so on. 114 00:04:22,820 --> 00:04:24,820 So, thanks to Holger we broke 115 00:04:24,820 --> 00:04:26,820 this cycle. 116 00:04:26,820 --> 00:04:28,820 And we released 117 00:04:28,820 --> 00:04:30,820 in 2012 118 00:04:30,820 --> 00:04:32,820 it's interesting since it's 119 00:04:32,820 --> 00:04:34,820 ten years since 120 00:04:34,820 --> 00:04:36,820 it's born 121 00:04:36,820 --> 00:04:38,820 and someone said that 122 00:04:38,820 --> 00:04:40,820 every software gets good 123 00:04:40,820 --> 00:04:42,820 after ten years, maybe, 124 00:04:42,820 --> 00:04:44,820 and it's in 125 00:04:44,820 --> 00:04:46,820 Wheezy since September 126 00:04:46,820 --> 00:04:48,820 2012 127 00:04:48,820 --> 00:04:50,820 and 128 00:04:50,820 --> 00:04:52,820 it's in Stable 129 00:04:52,820 --> 00:04:54,820 since Wheezy got out. 130 00:04:54,820 --> 00:04:56,820 So, in 131 00:04:56,820 --> 00:04:58,820 2013, 132 00:04:58,820 --> 00:05:00,820 I released 133 00:05:00,820 --> 00:05:02,820 2.1 134 00:05:02,820 --> 00:05:04,820 it's unstable 135 00:05:04,820 --> 00:05:06,820 a branch, because 136 00:05:08,820 --> 00:05:10,820 I didn't want to have the 137 00:05:10,820 --> 00:05:12,820 same problems as with 138 00:05:12,820 --> 00:05:14,820 2.0, its 139 00:05:14,820 --> 00:05:16,820 lack of testers, so I just packaged something, 140 00:05:16,820 --> 00:05:18,820 I packaged the development branch 141 00:05:18,820 --> 00:05:20,820 and released it. It's unstable. 142 00:05:20,820 --> 00:05:22,820 Normally it works 143 00:05:22,820 --> 00:05:24,820 but well, it's unstable. 144 00:05:24,820 --> 00:05:26,820 You know what 145 00:05:26,820 --> 00:05:28,820 unstable means. 146 00:05:28,820 --> 00:05:30,820 And the biggest thing 147 00:05:30,820 --> 00:05:32,820 is the Internet will 148 00:05:32,820 --> 00:05:34,820 change in 149 00:05:34,820 --> 00:05:36,820 ...in the 150 00:05:36,820 --> 00:05:38,820 2.1 lifeline 151 00:05:40,820 --> 00:05:42,820 And I said, October 152 00:05:42,820 --> 00:05:44,820 2013 is target for 153 00:05:44,820 --> 00:05:46,820 release 2.2 154 00:05:46,820 --> 00:05:48,820 But time will tell. 155 00:05:50,820 --> 00:05:52,820 I mean, if you don't fix timelines and deadlines 156 00:05:52,820 --> 00:05:54,820 you will never release things. 157 00:05:54,820 --> 00:05:56,820 So, better be late than never. 158 00:05:58,820 --> 00:06:00,820 Ok, so 159 00:06:00,820 --> 00:06:02,820 a very simple 160 00:06:02,820 --> 00:06:04,820 design principle of Munin is 161 00:06:06,820 --> 00:06:08,820 I really love this quote from 162 00:06:08,820 --> 00:06:10,820 Allan Kay, it's 163 00:06:10,820 --> 00:06:12,820 "Simple things should be simple, 164 00:06:12,820 --> 00:06:14,820 complex things should be possible." 165 00:06:14,820 --> 00:06:16,820 That's exactly the motto of Munin. 166 00:06:16,820 --> 00:06:18,820 Munin makes 167 00:06:18,820 --> 00:06:20,620 simple things simple, and 168 00:06:20,620 --> 00:06:20,820 complex things simple things simple, and 169 00:06:20,820 --> 00:06:22,820 complex things 170 00:06:22,820 --> 00:06:24,820 possible 171 00:06:26,820 --> 00:06:28,820 It's very easy to use, it has 172 00:06:28,820 --> 00:06:30,820 the same out-of-the-box 173 00:06:30,820 --> 00:06:32,820 behavior, because when you 174 00:06:32,820 --> 00:06:34,820 install it on a server, it 175 00:06:34,820 --> 00:06:36,820 automatically starts monitoring. 176 00:06:36,820 --> 00:06:38,820 And if not, it's a bug. 177 00:06:38,820 --> 00:06:40,820 Please report it. 178 00:06:40,820 --> 00:06:42,820 And, it 179 00:06:42,820 --> 00:06:44,820 has a complete plug-and-play infrastructure 180 00:06:44,820 --> 00:06:46,820 compared to 181 00:06:46,820 --> 00:06:48,820 others, you... 182 00:06:48,820 --> 00:06:50,820 The only thing you need 183 00:06:50,820 --> 00:06:52,820 is to declare 184 00:06:52,820 --> 00:06:54,820 the node, because, well, 185 00:06:54,820 --> 00:06:56,820 broadcasting on 186 00:06:56,820 --> 00:06:58,820 the local network 187 00:06:58,820 --> 00:07:00,820 is not very practical 188 00:07:00,820 --> 00:07:02,820 in my point of view, so it's the only thing 189 00:07:02,820 --> 00:07:04,820 you need to say. 190 00:07:04,820 --> 00:07:06,820 You have to poll this node, and 191 00:07:06,820 --> 00:07:08,820 the node will just 192 00:07:08,820 --> 00:07:10,820 hand all the config 193 00:07:10,820 --> 00:07:12,820 to the master 194 00:07:12,820 --> 00:07:14,820 so th graphs 195 00:07:14,820 --> 00:07:16,820 are drawing. 196 00:07:16,820 --> 00:07:18,820 Thing is, 197 00:07:18,820 --> 00:07:20,820 our user 198 00:07:20,820 --> 00:07:22,820 The vast majority of 199 00:07:22,820 --> 00:07:24,820 our users, just has one server 200 00:07:24,820 --> 00:07:26,820 to monitor, and 201 00:07:26,820 --> 00:07:28,820 it's the same that 202 00:07:28,820 --> 00:07:30,820 the Munin install is on it. 203 00:07:30,820 --> 00:07:32,820 I mean, that's 204 00:07:32,820 --> 00:07:34,820 that's why 205 00:07:34,820 --> 00:07:36,820 the default Munin 206 00:07:36,820 --> 00:07:38,820 is always targetted at this user. 207 00:07:38,820 --> 00:07:40,820 But if you have a bigger install 208 00:07:40,820 --> 00:07:42,820 Well, you already know 209 00:07:42,820 --> 00:07:44,820 how to change 210 00:07:44,820 --> 00:07:46,820 config files, usually. 211 00:07:48,820 --> 00:07:50,820 And, as I said, 212 00:07:50,820 --> 00:07:52,820 some are running bigger 213 00:07:52,820 --> 00:07:54,820 installs, and 214 00:07:54,820 --> 00:07:56,820 this are 215 00:07:56,820 --> 00:07:58,820 the ones that interest me 216 00:07:58,820 --> 00:08:00,820 very much in 2.2, since 217 00:08:00,820 --> 00:08:02,820 well, we do 218 00:08:02,820 --> 00:08:04,820 address really well the 219 00:08:04,820 --> 00:08:06,820 one-node install type 220 00:08:06,820 --> 00:08:08,820 but 221 00:08:08,820 --> 00:08:10,820 for bigger installs 222 00:08:10,820 --> 00:08:12,820 we 223 00:08:12,820 --> 00:08:14,820 we have very much problems. 224 00:08:14,820 --> 00:08:16,820 We improved 225 00:08:16,820 --> 00:08:18,820 very much from 226 00:08:18,820 --> 00:08:20,820 1.4 to 227 00:08:20,820 --> 00:08:22,820 2.0, but now we 228 00:08:22,820 --> 00:08:24,820 hit other limits that we will discuss 229 00:08:24,820 --> 00:08:26,820 just after. 230 00:08:28,820 --> 00:08:30,820 Ok, so, new features: 231 00:08:32,820 --> 00:08:34,820 We really have now full 232 00:08:34,820 --> 00:08:36,820 CGI implementation! 233 00:08:36,820 --> 00:08:38,820 I mean, the one in 234 00:08:38,820 --> 00:08:40,820 1.5 you should not 235 00:08:40,820 --> 00:08:42,820 use it, I mean, it works, 236 00:08:42,820 --> 00:08:44,820 sometimes, and it's 237 00:08:44,820 --> 00:08:46,820 bugged every time. 238 00:08:46,820 --> 00:08:48,820 And, so 239 00:08:48,820 --> 00:08:50,820 it has also a full FastCGI 240 00:08:50,820 --> 00:08:52,820 implementation 241 00:08:52,820 --> 00:08:54,820 This is very important to 242 00:08:54,820 --> 00:08:56,820 have adequate performance 243 00:08:56,820 --> 00:08:58,820 so you don't reload everything 244 00:09:00,820 --> 00:09:02,820 The biggest selling point 245 00:09:02,820 --> 00:09:04,820 is, it has complete integration 246 00:09:04,820 --> 00:09:06,820 with RRDcacheD 247 00:09:06,820 --> 00:09:08,820 We will talk more about this 248 00:09:08,820 --> 00:09:10,820 later, but this 249 00:09:10,820 --> 00:09:12,820 is *the* 250 00:09:12,820 --> 00:09:14,820 main issue 251 00:09:14,820 --> 00:09:16,820 when scaling. 252 00:09:16,820 --> 00:09:18,820 Because our RRD is 253 00:09:18,820 --> 00:09:20,820 very nice, but doesn't scale 254 00:09:20,820 --> 00:09:22,820 very well in its native 255 00:09:22,820 --> 00:09:24,820 without RRDcacheD. 256 00:09:24,820 --> 00:09:26,820 Thing is, when you use 257 00:09:26,820 --> 00:09:28,820 RRDcacheD with some guidelines 258 00:09:28,820 --> 00:09:30,820 that I will describe 259 00:09:30,820 --> 00:09:32,820 later, and 260 00:09:32,820 --> 00:09:34,820 you should not do 261 00:09:34,820 --> 00:09:36,820 what you... 262 00:09:36,820 --> 00:09:38,820 It has native ssh transport 263 00:09:40,820 --> 00:09:42,820 Well, it's 264 00:09:42,820 --> 00:09:44,820 before you used 265 00:09:44,820 --> 00:09:46,820 plain TCP for 266 00:09:46,820 --> 00:09:48,820 the connection 267 00:09:48,820 --> 00:09:50,820 port 4949 268 00:09:50,820 --> 00:09:52,820 You could use TLS 269 00:09:52,820 --> 00:09:54,820 But most people didn't. 270 00:09:56,820 --> 00:09:58,820 And, with native ssh, 271 00:09:58,820 --> 00:10:00,820 usually people also 272 00:10:00,820 --> 00:10:02,820 already use ssh on their 273 00:10:02,820 --> 00:10:04,820 installs, so setting 274 00:10:06,820 --> 00:10:08,820 ssh transport for them is quite easy 275 00:10:08,820 --> 00:10:10,820 whereas having a TLS 276 00:10:10,820 --> 00:10:12,820 thing is, you have to have a 277 00:10:12,820 --> 00:10:14,820 certificate and so on and so on, it's... 278 00:10:14,820 --> 00:10:16,820 Quite much more complicated. 279 00:10:18,820 --> 00:10:20,820 And it's another open new port, as I said, 280 00:10:20,820 --> 00:10:22,820 and it's secure 281 00:10:22,820 --> 00:10:24,820 it's usually more integrated 282 00:10:24,820 --> 00:10:26,820 than 283 00:10:28,820 --> 00:10:30,820 in setups. 284 00:10:30,820 --> 00:10:32,820 The other big feature is 285 00:10:32,820 --> 00:10:34,820 async proxy. It's something 286 00:10:34,820 --> 00:10:36,820 that sits on 287 00:10:36,820 --> 00:10:38,820 the node, 288 00:10:38,820 --> 00:10:40,820 that holds the node autonomously, 289 00:10:40,820 --> 00:10:42,820 and stores 290 00:10:42,820 --> 00:10:44,820 locally on the node, 291 00:10:44,820 --> 00:10:46,820 the 292 00:10:48,820 --> 00:10:50,820 Munin update 293 00:10:50,820 --> 00:10:52,820 will then connect to the 294 00:10:52,820 --> 00:10:54,820 async client part 295 00:10:54,820 --> 00:10:56,820 and just repllay the spool that he 296 00:10:56,820 --> 00:10:58,820 spooled just 297 00:10:58,820 --> 00:11:00,820 before. So it has very 298 00:11:00,820 --> 00:11:02,820 interesting 299 00:11:02,820 --> 00:11:04,820 features. If you have 300 00:11:04,820 --> 00:11:06,820 some nodes that have 301 00:11:06,820 --> 00:11:08,820 loose connection, for example, you 302 00:11:08,820 --> 00:11:10,820 want to monitor a remote 303 00:11:10,820 --> 00:11:12,820 location that 304 00:11:12,820 --> 00:11:14,820 has sometimes 305 00:11:14,820 --> 00:11:16,820 no network or whatever 306 00:11:16,820 --> 00:11:18,820 since it's locally spooling 307 00:11:18,820 --> 00:11:20,820 when you connect you will 308 00:11:20,820 --> 00:11:22,820 recover everything 309 00:11:22,820 --> 00:11:24,820 that was 310 00:11:24,820 --> 00:11:26,820 collected meanwhile you didn't 311 00:11:26,820 --> 00:11:28,820 connect. 312 00:11:30,820 --> 00:11:32,820 So, those little white bars 313 00:11:32,820 --> 00:11:34,820 you were accustomed to are 314 00:11:34,820 --> 00:11:36,820 gone. 315 00:11:36,820 --> 00:11:38,820 It also speeds up poll. 316 00:11:38,820 --> 00:11:40,820 Even for a local network 317 00:11:40,820 --> 00:11:42,820 since 318 00:11:42,820 --> 00:11:44,820 it delegates 319 00:11:44,820 --> 00:11:46,820 all the polling and the waiting 320 00:11:46,820 --> 00:11:48,820 for plugins 321 00:11:48,820 --> 00:11:50,820 to the async proxy 322 00:11:50,820 --> 00:11:52,820 the data collection 323 00:11:52,820 --> 00:11:54,820 the Munin update 324 00:11:54,820 --> 00:11:56,820 goes *really* faster 325 00:11:56,820 --> 00:11:58,820 It only replays 326 00:11:58,820 --> 00:12:00,820 a logged text file 327 00:12:00,820 --> 00:12:02,820 So, the thing is that, 328 00:12:02,820 --> 00:12:04,820 when you have a big cluster 329 00:12:04,820 --> 00:12:06,820 it sometimes makes sense to 330 00:12:06,820 --> 00:12:08,820 use async, since 331 00:12:08,820 --> 00:12:10,820 well 332 00:12:12,820 --> 00:12:14,820 the fixed five minutes 333 00:12:14,820 --> 00:12:16,820 for Munin updates is 334 00:12:16,820 --> 00:12:18,820 still a hard one, and 335 00:12:18,820 --> 00:12:20,820 you cannot go further. 336 00:12:22,820 --> 00:12:24,820 And one 337 00:12:24,820 --> 00:12:26,820 last known thing about async proxy 338 00:12:26,820 --> 00:12:28,820 is, it can 339 00:12:28,820 --> 00:12:30,820 poll at various 340 00:12:30,820 --> 00:12:32,820 update rates. 341 00:12:32,820 --> 00:12:34,820 If you have one plugin 342 00:12:34,820 --> 00:12:36,820 that says, "I want to be polled 343 00:12:36,820 --> 00:12:38,820 every one hour", 344 00:12:38,820 --> 00:12:40,820 async will only poll it 345 00:12:40,820 --> 00:12:42,820 at one hour, 346 00:12:42,820 --> 00:12:44,820 and the most interesting part, 347 00:12:44,820 --> 00:12:46,620 if you have a plugin that says 348 00:12:46,620 --> 00:12:46,820 "I want to be polled every ten seconds" if you have a plugin that says 349 00:12:46,820 --> 00:12:49,120 "I want to be polled every ten seconds" 350 00:12:49,120 --> 00:12:51,120 it will poll it every ten seconds, and still 351 00:12:51,120 --> 00:12:53,120 send every 352 00:12:53,120 --> 00:12:55,120 five minutes all of the data back 353 00:12:55,120 --> 00:12:57,120 to the Munin update. 354 00:12:57,120 --> 00:12:59,120 So, you won't have real-time 355 00:12:59,120 --> 00:13:01,120 information 356 00:13:01,120 --> 00:13:03,120 but you have very precise 357 00:13:03,120 --> 00:13:05,120 information. 358 00:13:09,120 --> 00:13:11,120 So now, we go 359 00:13:11,120 --> 00:13:13,120 to scalability 360 00:13:13,120 --> 00:13:15,120 That's the 361 00:13:15,120 --> 00:13:17,120 biggest focus 362 00:13:17,120 --> 00:13:19,120 on 2.0 363 00:13:19,120 --> 00:13:21,120 because the first one was to 364 00:13:21,120 --> 00:13:23,120 the zooming part 365 00:13:23,120 --> 00:13:25,120 and zooming just 366 00:13:25,120 --> 00:13:27,120 showed 367 00:13:27,120 --> 00:13:29,120 that, well, 368 00:13:29,120 --> 00:13:31,120 you can have huge data files 369 00:13:31,120 --> 00:13:33,120 since 370 00:13:33,120 --> 00:13:35,120 it doesn't... It's not very useful 371 00:13:35,120 --> 00:13:37,120 to zoom on one year 372 00:13:37,120 --> 00:13:39,120 history, if you 373 00:13:39,120 --> 00:13:41,120 don't keep the 374 00:13:41,120 --> 00:13:43,120 finer granularity in RRD 375 00:13:43,120 --> 00:13:45,120 one year back 376 00:13:45,120 --> 00:13:47,120 So 377 00:13:47,120 --> 00:13:49,120 that will be scaling 378 00:13:49,120 --> 00:13:51,120 data at the end. Really, what 379 00:13:51,120 --> 00:13:53,120 people want is adding more nodes 380 00:13:53,120 --> 00:13:55,120 That's the most common 381 00:13:55,120 --> 00:13:57,120 scaling issue 382 00:13:57,120 --> 00:13:59,120 that we get. 383 00:13:59,120 --> 00:14:01,120 Inside the node, you can 384 00:14:01,120 --> 00:14:03,120 also have a 385 00:14:03,120 --> 00:14:05,120 huge number of plugins 386 00:14:05,120 --> 00:14:07,120 some have very very large installations, 387 00:14:07,120 --> 00:14:09,120 specially when you start to 388 00:14:09,120 --> 00:14:11,120 use SNMP 389 00:14:11,120 --> 00:14:13,120 because SNMP is done 390 00:14:13,120 --> 00:14:15,120 by one host 391 00:14:15,120 --> 00:14:17,120 to monitor many many 392 00:14:17,120 --> 00:14:19,120 remote 393 00:14:19,120 --> 00:14:21,120 routers, or 394 00:14:21,120 --> 00:14:23,120 SNMP agents 395 00:14:23,120 --> 00:14:25,120 and the thing is, some 396 00:14:25,120 --> 00:14:27,120 also have slow plugins. 397 00:14:29,120 --> 00:14:31,120 We already 398 00:14:31,120 --> 00:14:33,120 discussed about 399 00:14:33,120 --> 00:14:35,120 Munin update should take less than 400 00:14:35,120 --> 00:14:37,120 five minutes, otherwise, well, 401 00:14:37,120 --> 00:14:39,120 bad things happen 402 00:14:39,120 --> 00:14:41,120 that's one point 403 00:14:41,120 --> 00:14:43,120 and that's a hard rule 404 00:14:43,120 --> 00:14:45,120 If your Munin updates 405 00:14:45,120 --> 00:14:47,120 takes more than five minutes 406 00:14:47,120 --> 00:14:49,120 really bad things happen 407 00:14:49,120 --> 00:14:51,120 mostly white bars. 408 00:14:51,120 --> 00:14:53,120 And so, if you have 409 00:14:53,120 --> 00:14:55,120 many many plugins, and 410 00:14:55,120 --> 00:14:57,120 many many plugins take 411 00:14:57,120 --> 00:14:59,120 quite a long time to 412 00:14:59,120 --> 00:15:01,120 poll, since it's all synchronous, 413 00:15:01,120 --> 00:15:03,120 the fact is, 414 00:15:03,120 --> 00:15:05,120 well, 415 00:15:05,120 --> 00:15:07,120 it can go, even if you 416 00:15:07,120 --> 00:15:09,120 parallelize 417 00:15:09,120 --> 00:15:11,120 very much, it sometimes 418 00:15:11,120 --> 00:15:13,120 still goes 419 00:15:13,120 --> 00:15:15,120 quite slowly, and 420 00:15:15,120 --> 00:15:17,120 if you 421 00:15:17,120 --> 00:15:19,120 multiply the number of plugins 422 00:15:19,120 --> 00:15:21,120 with long response times, 423 00:15:21,120 --> 00:15:23,120 many nodes, 424 00:15:23,120 --> 00:15:25,120 you have, you usually 425 00:15:25,120 --> 00:15:27,120 pass the five minutes bar. 426 00:15:27,120 --> 00:15:29,120 and it's scaling 427 00:15:29,120 --> 00:15:31,120 data usually what's 428 00:15:31,120 --> 00:15:33,120 with the zooming part 429 00:15:33,120 --> 00:15:35,120 usually many many people 430 00:15:35,120 --> 00:15:37,120 ask for, "well, 431 00:15:37,120 --> 00:15:39,120 I can zoom one year ago 432 00:15:39,120 --> 00:15:41,120 but now all I have 433 00:15:41,120 --> 00:15:43,120 is one bar per day", I mean, 434 00:15:43,120 --> 00:15:45,120 I don't care about the average 435 00:15:45,120 --> 00:15:47,120 for one day 436 00:15:47,120 --> 00:15:49,120 So here is 437 00:15:49,120 --> 00:15:51,120 you can natively have 438 00:15:51,120 --> 00:15:53,120 much 439 00:15:53,120 --> 00:15:55,120 more data inside 440 00:15:55,120 --> 00:15:57,120 we will see more about it later. 441 00:15:57,120 --> 00:15:59,120 So, scaling the master 442 00:15:59,120 --> 00:16:01,120 To have 443 00:16:01,120 --> 00:16:03,120 a big install 444 00:16:03,120 --> 00:16:05,120 the first thing is 445 00:16:05,120 --> 00:16:07,120 use FastCGI. 446 00:16:07,120 --> 00:16:09,120 Default is cron-based, remember 447 00:16:09,120 --> 00:16:11,120 default is for the typical 448 00:16:11,120 --> 00:16:13,120 user that has only one node and 449 00:16:13,120 --> 00:16:15,120 one server, 450 00:16:15,120 --> 00:16:17,120 any one that has 451 00:16:17,120 --> 00:16:19,120 more than, lets say, five 452 00:16:19,120 --> 00:16:21,120 nodes should really go 453 00:16:21,120 --> 00:16:23,120 the CGI road 454 00:16:23,120 --> 00:16:25,120 and not really CGI, but FastCGI 455 00:16:25,120 --> 00:16:27,120 Because 456 00:16:29,120 --> 00:16:31,120 The cron road 457 00:16:31,120 --> 00:16:33,120 is, you generate every 458 00:16:33,120 --> 00:16:35,120 kind of graphic, and it's 459 00:16:35,120 --> 00:16:37,120 just pointless. I mean, 460 00:16:37,120 --> 00:16:39,120 it's very simple, but it's pointless. 461 00:16:41,120 --> 00:16:43,120 As I said, 462 00:16:43,120 --> 00:16:45,120 You have to use 463 00:16:45,120 --> 00:16:47,120 RRDcacheD, because, the thing is 464 00:16:47,120 --> 00:16:49,120 RRD 465 00:16:49,120 --> 00:16:51,120 is very, very nice, it's a 466 00:16:51,120 --> 00:16:53,120 very nice piece of software, but 467 00:16:53,120 --> 00:16:55,120 it has only one main problem 468 00:16:55,120 --> 00:16:57,120 it is, it's so efficient 469 00:16:57,120 --> 00:16:59,120 that it 470 00:16:59,120 --> 00:17:01,120 writes only 471 00:17:01,120 --> 00:17:03,120 the very little 472 00:17:03,120 --> 00:17:05,120 part of the file 473 00:17:05,120 --> 00:17:07,120 and to the underlying 474 00:17:07,120 --> 00:17:09,120 I/O subsystem 475 00:17:11,120 --> 00:17:13,120 RRD updates, when you have a big one, 476 00:17:13,120 --> 00:17:15,120 it feels just like 477 00:17:15,120 --> 00:17:17,120 random I/O. And when I say 478 00:17:17,120 --> 00:17:19,120 random I/O, it's real random I/O. 479 00:17:19,120 --> 00:17:21,120 I mean, 480 00:17:21,120 --> 00:17:23,120 almost cryptographically 481 00:17:23,120 --> 00:17:25,120 secure. 482 00:17:25,120 --> 00:17:27,120 When I ask about 483 00:17:27,120 --> 00:17:29,120 some 484 00:17:29,120 --> 00:17:31,120 storage vendor, 485 00:17:31,120 --> 00:17:33,120 he says, "random I/O, we can do that!" 486 00:17:33,120 --> 00:17:35,120 I plugged Munin 487 00:17:35,120 --> 00:17:37,120 with a big install on it 488 00:17:37,120 --> 00:17:39,120 and he said, "what's that?" 489 00:17:39,120 --> 00:17:41,120 Yeah, random I/O 490 00:17:41,120 --> 00:17:43,120 "Well, not that random usually!" 491 00:17:45,120 --> 00:17:47,120 The people of 492 00:17:47,120 --> 00:17:49,120 RRD are well well aware of it 493 00:17:49,120 --> 00:17:51,120 and even designed 494 00:17:51,120 --> 00:17:53,120 RRDcacheD, that is specially 495 00:17:53,120 --> 00:17:55,120 designed to make this 496 00:17:55,120 --> 00:17:57,120 random I/O buffered 497 00:17:57,120 --> 00:17:59,120 and to make it like normal 498 00:17:59,120 --> 00:18:01,120 random I/O 499 00:18:01,120 --> 00:18:03,120 And it's called, 500 00:18:03,120 --> 00:18:05,120 there is a slide 501 00:18:05,120 --> 00:18:07,120 you can Google it, 502 00:18:07,120 --> 00:18:09,120 "RRDcacheD: 503 00:18:09,120 --> 00:18:11,120 to escape the I/O hell" 504 00:18:11,120 --> 00:18:13,120 it's really well described, and 505 00:18:13,120 --> 00:18:15,120 to understand 506 00:18:15,120 --> 00:18:17,120 what's behind 507 00:18:17,120 --> 00:18:19,120 RRDcacheD 508 00:18:19,120 --> 00:18:21,120 And it even works 509 00:18:21,120 --> 00:18:23,120 on SSD, because usually 510 00:18:23,120 --> 00:18:25,120 random I/O 511 00:18:25,120 --> 00:18:27,120 "OK, no problem, just use SSD" 512 00:18:27,120 --> 00:18:29,120 fast storage vendors 513 00:18:29,120 --> 00:18:31,120 said, "well, no problem, we just put an SSD" 514 00:18:31,120 --> 00:18:33,120 The thing is, 515 00:18:33,120 --> 00:18:35,120 after 516 00:18:35,120 --> 00:18:37,120 in my test, after four hours 517 00:18:37,120 --> 00:18:39,120 with, yes, big installs, 518 00:18:39,120 --> 00:18:41,120 all of the SSDs 519 00:18:41,120 --> 00:18:43,120 were just 520 00:18:43,120 --> 00:18:45,120 offline, because 521 00:18:45,120 --> 00:18:47,120 too many I/Os 522 00:18:47,120 --> 00:18:49,120 Because it writes, writes, writes, writes a lot. 523 00:18:53,120 --> 00:18:55,120 So, SSD is 524 00:18:55,120 --> 00:18:57,120 interesting, but not 525 00:18:57,120 --> 00:18:59,120 only for us. 526 00:18:59,120 --> 00:19:01,120 Thing is, 527 00:19:01,120 --> 00:19:03,120 RRDcacheD 528 00:19:03,120 --> 00:19:05,120 has only one 529 00:19:05,120 --> 00:19:07,120 very big drawback, which is, you should never 530 00:19:07,120 --> 00:19:09,120 ever read 531 00:19:09,120 --> 00:19:11,120 from the RRD file 532 00:19:11,120 --> 00:19:13,120 specially in cron, because 533 00:19:13,120 --> 00:19:15,120 if you read on demand, it's perfect 534 00:19:15,120 --> 00:19:17,120 it's only flushed 535 00:19:17,120 --> 00:19:19,120 the file you are reading 536 00:19:19,120 --> 00:19:21,120 but if you read it 537 00:19:21,120 --> 00:19:23,120 in cron, by default, 538 00:19:23,120 --> 00:19:25,120 you will read 539 00:19:25,120 --> 00:19:27,120 the whoooole install 540 00:19:27,120 --> 00:19:29,120 and that's exactly the same as 541 00:19:29,120 --> 00:19:31,120 not using RRDcacheD, so... 542 00:19:31,120 --> 00:19:33,120 So use it! 543 00:19:33,120 --> 00:19:35,120 Thing is, for Munin, 544 00:19:35,120 --> 00:19:37,120 you need lots of RAM. 545 00:19:37,120 --> 00:19:39,120 Because, as I said, 546 00:19:39,120 --> 00:19:41,120 we have RRDcacheD 547 00:19:43,120 --> 00:19:45,120 but the more RAM 548 00:19:45,120 --> 00:19:47,120 you put at RRDcacheD, 549 00:19:47,120 --> 00:19:49,120 the longer you can keep 550 00:19:49,120 --> 00:19:51,120 the spool 551 00:19:51,120 --> 00:19:53,120 and so, the thing is, 552 00:19:53,120 --> 00:19:55,120 it can 553 00:19:55,120 --> 00:19:57,120 it writes very less often 554 00:19:57,120 --> 00:19:59,120 and that's 555 00:19:59,120 --> 00:20:01,120 very interesting. 556 00:20:01,120 --> 00:20:03,120 If you have lots of RAM, you can 557 00:20:03,120 --> 00:20:05,120 multiply the number of workers. 558 00:20:05,120 --> 00:20:07,120 It means, you 559 00:20:07,120 --> 00:20:09,120 obviously, if you have... 560 00:20:09,120 --> 00:20:11,120 Since Munin is very very 561 00:20:11,120 --> 00:20:13,120 very much I/O bound 562 00:20:13,120 --> 00:20:15,120 either, so for waiting for NAS 563 00:20:15,120 --> 00:20:17,120 or for waiting for the I/O subsystem 564 00:20:17,120 --> 00:20:19,120 If you have 565 00:20:19,120 --> 00:20:21,120 many workers, usually it 566 00:20:21,120 --> 00:20:23,120 it helps 567 00:20:23,120 --> 00:20:25,120 a lot, because every worker is 568 00:20:25,120 --> 00:20:27,120 single-threaded. 569 00:20:29,120 --> 00:20:31,120 Thing is, 570 00:20:31,120 --> 00:20:33,120 but, do never 571 00:20:33,120 --> 00:20:35,120 ever swap. 572 00:20:35,120 --> 00:20:37,120 That's obvious. The thing is, 573 00:20:39,120 --> 00:20:41,120 Munin is designed to 574 00:20:41,120 --> 00:20:43,120 use all the memory 575 00:20:43,120 --> 00:20:45,120 of its workers. So if 576 00:20:45,120 --> 00:20:47,120 you only swap a little thing, 577 00:20:47,120 --> 00:20:49,120 then there is no, how to say, 578 00:20:49,120 --> 00:20:51,120 there is no lost memory. 579 00:20:51,120 --> 00:20:53,120 You cannot swap... 580 00:20:53,120 --> 00:20:55,120 For people that know the "swappiness" setting 581 00:20:55,120 --> 00:20:57,120 it means swapping 582 00:20:57,120 --> 00:20:59,120 before trading 583 00:20:59,120 --> 00:21:01,120 some application memory 584 00:21:01,120 --> 00:21:03,120 to file 585 00:21:03,120 --> 00:21:05,120 cache memory 586 00:21:05,120 --> 00:21:07,120 that's not a good idea 587 00:21:07,120 --> 00:21:09,120 The application memory 588 00:21:09,120 --> 00:21:11,120 is useful 589 00:21:11,120 --> 00:21:13,120 at one point. 590 00:21:15,120 --> 00:21:17,120 [... oops... That's OK ...] 591 00:21:21,120 --> 00:21:23,120 On the master, you have really 592 00:21:23,120 --> 00:21:25,120 to watch out for 593 00:21:25,120 --> 00:21:27,120 shared hardware. 594 00:21:27,120 --> 00:21:29,120 Because Munin is 595 00:21:29,120 --> 00:21:31,120 very nice 596 00:21:31,120 --> 00:21:33,120 and it loves 597 00:21:33,120 --> 00:21:35,120 to annihilate any hardware 598 00:21:35,120 --> 00:21:37,120 you put on it, because 599 00:21:37,120 --> 00:21:39,120 ...well... 600 00:21:39,120 --> 00:21:41,120 It has 601 00:21:41,120 --> 00:21:43,120 It is designed to be 602 00:21:43,120 --> 00:21:45,120 very very scalable, and you can 603 00:21:45,120 --> 00:21:47,120 launch as many processes as you 604 00:21:47,120 --> 00:21:49,120 want, we will see 605 00:21:49,120 --> 00:21:51,120 some kinds, 606 00:21:51,120 --> 00:21:53,120 some limitations just after, but 607 00:21:53,120 --> 00:21:55,120 it's designed to be very scalable 608 00:21:55,120 --> 00:21:57,120 but the [?] arrives 609 00:21:57,120 --> 00:21:59,120 not in a very efficient manner 610 00:21:59,120 --> 00:22:01,120 I mean, it's not very clever 611 00:22:01,120 --> 00:22:03,120 it just uses and goes 612 00:22:03,120 --> 00:22:05,120 on your system. 613 00:22:05,120 --> 00:22:07,120 So, for the record, 614 00:22:07,120 --> 00:22:09,120 I have the 615 00:22:09,120 --> 00:22:11,120 storage vendor 616 00:22:11,120 --> 00:22:13,120 but was 617 00:22:13,120 --> 00:22:15,120 [?] with all 618 00:22:15,120 --> 00:22:17,120 of the 619 00:22:17,120 --> 00:22:19,120 application of the 620 00:22:19,120 --> 00:22:21,120 of the thing, and when we 621 00:22:21,120 --> 00:22:23,120 wrote to it, I mean 622 00:22:23,120 --> 00:22:25,120 99% 623 00:22:25,120 --> 00:22:27,120 of the I/O ops were 624 00:22:27,120 --> 00:22:29,120 delivered to the Munin server, so 625 00:22:29,120 --> 00:22:31,120 let's imagine what 626 00:22:31,120 --> 00:22:33,120 stays for the others. 627 00:22:33,120 --> 00:22:35,120 Not much. 628 00:22:37,120 --> 00:22:39,120 So, we put it on dedicated hardware, 629 00:22:39,120 --> 00:22:41,120 it goes slower, but 630 00:22:41,120 --> 00:22:43,120 well, other applications are happier. 631 00:22:47,120 --> 00:22:49,120 The thing I said before, 632 00:22:49,120 --> 00:22:51,120 is, use the async proxy. 633 00:22:51,120 --> 00:22:53,120 Even you don't have a special 634 00:22:53,120 --> 00:22:55,120 need on it, 635 00:22:55,120 --> 00:22:57,120 thing is, it will 636 00:22:59,120 --> 00:23:01,120 enable a very fast collection 637 00:23:01,120 --> 00:23:03,120 As all the 638 00:23:03,120 --> 00:23:05,120 I/O time or the wait 639 00:23:05,120 --> 00:23:07,120 time is absorbed directly by 640 00:23:07,120 --> 00:23:09,120 the async daemon, 641 00:23:09,120 --> 00:23:11,120 your Munin 642 00:23:11,120 --> 00:23:13,120 update almost 643 00:23:13,120 --> 00:23:15,120 doesn't wait at all. 644 00:23:15,120 --> 00:23:17,120 It only connects, reads a file 645 00:23:17,120 --> 00:23:19,120 on the server 646 00:23:19,120 --> 00:23:21,120 and disconnects. So, 647 00:23:21,120 --> 00:23:23,120 while a usually typical 648 00:23:23,120 --> 00:23:25,120 polling time is about 649 00:23:25,120 --> 00:23:27,120 ten to 650 00:23:27,120 --> 00:23:29,120 fifteen seconds 651 00:23:29,120 --> 00:23:31,120 with Munin async, typical 652 00:23:31,120 --> 00:23:33,120 time is about one second 653 00:23:33,120 --> 00:23:35,120 to mosly two seconds. 654 00:23:35,120 --> 00:23:37,120 Depends on 655 00:23:37,120 --> 00:23:39,120 You have a 656 00:23:39,120 --> 00:23:41,120 10 factor, that's 657 00:23:41,120 --> 00:23:43,120 very interesting when you want to scale. 658 00:23:47,120 --> 00:23:49,120 Because it lowers the 659 00:23:49,120 --> 00:23:51,120 numer of update workers needed, 660 00:23:51,120 --> 00:23:53,120 I said, Munin uses 661 00:23:53,120 --> 00:23:55,120 lots of RAM, OK, but 662 00:23:55,120 --> 00:23:57,120 usually you don't want to use RAM 663 00:23:57,120 --> 00:23:59,120 for Munin update, you 664 00:23:59,120 --> 00:24:01,120 prefer to use RAM for 665 00:24:01,120 --> 00:24:03,120 the restitution part, 666 00:24:03,120 --> 00:24:05,120 for graphs, for 667 00:24:05,120 --> 00:24:07,120 the XML, of which we will speak 668 00:24:07,120 --> 00:24:09,120 about later, and 669 00:24:09,120 --> 00:24:11,120 Munin update you just want 670 00:24:11,120 --> 00:24:13,120 it to be very quick and 671 00:24:15,120 --> 00:24:17,120 if you don't have... 672 00:24:17,120 --> 00:24:19,120 if it is 673 00:24:19,120 --> 00:24:21,120 not I/O-bound from the network anymore, 674 00:24:21,120 --> 00:24:23,120 it's only CPU-bound, and you 675 00:24:23,120 --> 00:24:25,120 don't want to have more than the 676 00:24:25,120 --> 00:24:27,120 CPU on your hardware 677 00:24:27,120 --> 00:24:29,120 since over this is useless anyway. 678 00:24:31,120 --> 00:24:33,120 And the thing that 679 00:24:33,120 --> 00:24:35,120 a side effect but is very nice 680 00:24:35,120 --> 00:24:37,120 it's, *if* your 681 00:24:37,120 --> 00:24:39,120 Munin updates is very slow, 682 00:24:39,120 --> 00:24:41,120 it happens, 683 00:24:41,120 --> 00:24:43,120 and we speak about the five 684 00:24:43,120 --> 00:24:45,120 minutes hard limit, 685 00:24:45,120 --> 00:24:47,120 all the async-enabled 686 00:24:47,120 --> 00:24:49,120 nodes will 687 00:24:49,120 --> 00:24:51,120 not have any data loss. 688 00:24:51,120 --> 00:24:53,120 You will have delays in integrating 689 00:24:53,120 --> 00:24:55,120 the 690 00:24:55,120 --> 00:24:57,120 the data, but you won't have 691 00:24:57,120 --> 00:24:59,120 these infamous 692 00:24:59,120 --> 00:25:01,120 white bars that most of you 693 00:25:01,120 --> 00:25:03,120 already experience, 694 00:25:03,120 --> 00:25:05,120 as these ones. 695 00:25:07,120 --> 00:25:09,120 That was for 696 00:25:09,120 --> 00:25:11,120 the server. For the node, 697 00:25:11,120 --> 00:25:13,120 as I said, you 698 00:25:13,120 --> 00:25:15,120 have, some have 699 00:25:15,120 --> 00:25:17,120 a huge number of plugins. 700 00:25:17,120 --> 00:25:19,120 The biggest install I 701 00:25:19,120 --> 00:25:21,120 saw is about 702 00:25:21,120 --> 00:25:23,120 one thousand plugins. 703 00:25:23,120 --> 00:25:25,120 Wow. 704 00:25:25,120 --> 00:25:27,120 It's 705 00:25:27,120 --> 00:25:29,120 very interesting also 706 00:25:29,120 --> 00:25:31,120 as async, because 707 00:25:31,120 --> 00:25:33,120 it has the fork option, and async 708 00:25:33,120 --> 00:25:35,120 knows it will just 709 00:25:35,120 --> 00:25:37,120 prior to 710 00:25:37,120 --> 00:25:39,120 to async 711 00:25:39,120 --> 00:25:41,120 Munin update was 712 00:25:41,120 --> 00:25:43,120 doing it very sequentially 713 00:25:43,120 --> 00:25:45,120 and 1000 plugins 714 00:25:45,120 --> 00:25:47,120 well, to have it 715 00:25:47,120 --> 00:25:49,120 in less than 716 00:25:49,120 --> 00:25:51,120 five minutes, it has to be quite fast. 717 00:25:51,120 --> 00:25:53,120 Since it's not the only load 718 00:25:53,120 --> 00:25:55,120 that is polled, 719 00:25:55,120 --> 00:25:57,120 in async, with 720 00:25:57,120 --> 00:25:59,120 the fork option, 721 00:25:59,120 --> 00:26:01,120 each plugin will be 722 00:26:01,120 --> 00:26:03,120 asked in its 723 00:26:03,120 --> 00:26:05,120 own process, so 724 00:26:05,120 --> 00:26:07,120 if you have 725 00:26:07,120 --> 00:26:09,120 long running plugins, 726 00:26:11,120 --> 00:26:13,120 as just after, 727 00:26:13,120 --> 00:26:15,120 you also can use 728 00:26:15,120 --> 00:26:17,120 the fork option, 729 00:26:17,120 --> 00:26:19,120 before the plugin can 730 00:26:19,120 --> 00:26:21,120 usually they poll themselves 731 00:26:21,120 --> 00:26:23,120 either in cron or they just read the status back 732 00:26:25,120 --> 00:26:27,120 that was the official way of doing it 733 00:26:27,120 --> 00:26:29,120 in 1.4 734 00:26:29,120 --> 00:26:31,120 but since async does exactly 735 00:26:31,120 --> 00:26:33,120 that, in 2.0 736 00:26:33,120 --> 00:26:35,120 just use async, I mean, it's 737 00:26:35,120 --> 00:26:37,120 standard, and 738 00:26:37,120 --> 00:26:39,120 it just makes using your 739 00:26:39,120 --> 00:26:41,120 whatever you 740 00:26:41,120 --> 00:26:43,120 use. 741 00:26:43,120 --> 00:26:45,120 That's for the node. 742 00:26:45,120 --> 00:26:47,120 Usually, the 743 00:26:47,120 --> 00:26:49,120 only problem that 744 00:26:49,120 --> 00:26:51,120 the node has when you have many many plugins 745 00:26:51,120 --> 00:26:53,120 is the starting of the node 746 00:26:53,120 --> 00:26:55,120 is typically serialized. 747 00:26:55,120 --> 00:26:57,120 That's 748 00:26:57,120 --> 00:26:59,120 When you have 1000 plugins, it's 749 00:26:59,120 --> 00:27:01,120 a big problem. 750 00:27:01,120 --> 00:27:03,120 Ok 751 00:27:03,120 --> 00:27:05,120 Now, we are scaling the data. 752 00:27:05,120 --> 00:27:07,120 As I said before, 753 00:27:09,120 --> 00:27:11,120 zooming 754 00:27:11,120 --> 00:27:13,120 brought the need of having prcise 755 00:27:13,120 --> 00:27:15,120 data very 756 00:27:15,120 --> 00:27:17,120 far away in time 757 00:27:17,120 --> 00:27:19,120 and 758 00:27:19,120 --> 00:27:21,120 to keep more data in RRD 759 00:27:21,120 --> 00:27:23,120 it's very very easy 760 00:27:23,120 --> 00:27:25,120 in 2.0 761 00:27:25,120 --> 00:27:27,120 you have a new option 762 00:27:27,120 --> 00:27:29,120 it's graph_data_size 763 00:27:29,120 --> 00:27:31,120 you already 764 00:27:31,120 --> 00:27:33,120 had it in 765 00:27:33,120 --> 00:27:35,120 1.4 766 00:27:35,120 --> 00:27:37,120 but it was global 767 00:27:37,120 --> 00:27:39,120 now it is per plugin 768 00:27:39,120 --> 00:27:41,120 it's also global, but you can 769 00:27:41,120 --> 00:27:43,120 precise it per plugin 770 00:27:43,120 --> 00:27:45,120 and 771 00:27:47,120 --> 00:27:49,120 actually, it's designed to be 772 00:27:49,120 --> 00:27:51,120 per field, but it 773 00:27:51,120 --> 00:27:53,120 doesn't work, it... 774 00:27:53,120 --> 00:27:55,120 it's bugged, and 775 00:27:55,120 --> 00:27:57,120 mostly works per plugin, that's where 776 00:27:57,120 --> 00:27:59,120 it works well, but it 777 00:27:59,120 --> 00:28:01,120 only works on RRD create, so 778 00:28:01,120 --> 00:28:03,120 there is an external tool to 779 00:28:03,120 --> 00:28:05,120 move it 780 00:28:09,120 --> 00:28:11,120 rota tool, that is called 781 00:28:11,120 --> 00:28:13,120 RRD copy, to move from 782 00:28:13,120 --> 00:28:15,120 some data, from 783 00:28:15,120 --> 00:28:17,120 a small RRD, to a bigger 784 00:28:17,120 --> 00:28:19,120 RRD. 785 00:28:19,120 --> 00:28:21,120 But that's not part of 786 00:28:21,120 --> 00:28:23,120 core Munin. 787 00:28:23,120 --> 00:28:25,120 And when you create it 788 00:28:29,120 --> 00:28:31,120 Its graphing is handled automatically by RRD 789 00:28:39,120 --> 00:28:41,120 And RRD, as I said, is very very very 790 00:28:41,120 --> 00:28:43,120 efficient, but 791 00:28:43,120 --> 00:28:45,120 beware. It can use very 792 00:28:45,120 --> 00:28:47,120 much space. I mean, I had 793 00:28:47,120 --> 00:28:49,120 one person 794 00:28:49,120 --> 00:28:51,120 who wanted to have 795 00:28:51,120 --> 00:28:53,120 a ten-seconds precision for two years 796 00:28:53,120 --> 00:28:55,120 ...Wow. 797 00:28:55,120 --> 00:28:57,120 It's about 500 megabytes 798 00:28:57,120 --> 00:28:59,120 per RRD 799 00:28:59,120 --> 00:29:01,120 so, per line 800 00:29:01,120 --> 00:29:03,120 in Munin 801 00:29:03,120 --> 00:29:05,120 Big data. 802 00:29:07,120 --> 00:29:09,120 You can also increase RRD precision 803 00:29:11,120 --> 00:29:13,120 it's called supersampling, 804 00:29:13,120 --> 00:29:15,120 that works 805 00:29:15,120 --> 00:29:17,120 without munin-async 806 00:29:17,120 --> 00:29:19,120 if you put 807 00:29:19,120 --> 00:29:21,120 munin-async, it will do 808 00:29:21,120 --> 00:29:23,120 the job for you. 809 00:29:23,120 --> 00:29:25,120 I will just go a little faster 810 00:29:25,120 --> 00:29:27,120 since my time is almost up. 811 00:29:35,120 --> 00:29:37,120 Bigger thing is 812 00:29:37,120 --> 00:29:39,120 If you modify the 813 00:29:39,120 --> 00:29:41,120 RRD size 814 00:29:41,120 --> 00:29:43,120 always have 815 00:29:43,120 --> 00:29:45,120 the RRA increased 816 00:29:45,120 --> 00:29:47,120 because when you want 817 00:29:47,120 --> 00:29:49,120 to have the graphs 818 00:29:49,120 --> 00:29:51,120 if you 819 00:29:51,120 --> 00:29:53,120 take huge, for example 820 00:29:53,120 --> 00:29:55,120 there is a setting that is "huge", 821 00:29:55,120 --> 00:29:57,120 it's not very... 822 00:29:57,120 --> 00:29:59,120 These settings aren't very nice, because 823 00:29:59,120 --> 00:30:01,120 it only has the 824 00:30:01,120 --> 00:30:03,120 maximum precision for 825 00:30:03,120 --> 00:30:05,120 two years, but 826 00:30:05,120 --> 00:30:07,120 it doesn't have any RRA, and the RRA 827 00:30:07,120 --> 00:30:09,120 are a part of Munin's 828 00:30:09,120 --> 00:30:11,120 ability to reply 829 00:30:11,120 --> 00:30:13,120 very fast on a yearly graph 830 00:30:13,120 --> 00:30:15,120 for example. So if you 831 00:30:15,120 --> 00:30:17,120 it's pre-consolidation 832 00:30:17,120 --> 00:30:19,120 for yearly values. 833 00:30:19,120 --> 00:30:21,120 So, the ideal way 834 00:30:21,120 --> 00:30:23,120 you know the size of the graph, 835 00:30:23,120 --> 00:30:25,120 in your templates, 836 00:30:25,120 --> 00:30:27,120 and if you have one RRA 837 00:30:27,120 --> 00:30:29,120 per pixel in the 838 00:30:29,120 --> 00:30:31,120 graph outputted, it goes 839 00:30:31,120 --> 00:30:33,120 fastest, since it doesn't 840 00:30:33,120 --> 00:30:35,120 even have to interpolate the data. 841 00:30:39,120 --> 00:30:41,120 So... Now, 842 00:30:41,120 --> 00:30:43,120 the limitations of 2.0 843 00:30:45,120 --> 00:30:47,120 The CGI of HTML 844 00:30:47,120 --> 00:30:49,120 is very, very 845 00:30:49,120 --> 00:30:51,120 very ugly. 846 00:30:51,120 --> 00:30:53,120 I don't know yf many many of you 847 00:30:53,120 --> 00:30:55,120 tried with big installs 848 00:30:55,120 --> 00:30:57,120 but the practical limit is about 849 00:30:59,120 --> 00:31:01,120 between 150 850 00:31:01,120 --> 00:31:03,120 and 200 nodes. 851 00:31:03,120 --> 00:31:05,120 After that, it's 852 00:31:05,120 --> 00:31:07,120 very, very slow. 853 00:31:07,120 --> 00:31:09,120 And it's slow 854 00:31:09,120 --> 00:31:11,120 on reload, because 855 00:31:11,120 --> 00:31:13,120 the whole configuration 856 00:31:13,120 --> 00:31:15,120 is stored in a big 857 00:31:15,120 --> 00:31:17,120 storable file 858 00:31:17,120 --> 00:31:19,120 that is 859 00:31:21,120 --> 00:31:23,120 that is reloaded 860 00:31:23,120 --> 00:31:25,120 and most of the time is 861 00:31:25,120 --> 00:31:27,120 took by 862 00:31:27,120 --> 00:31:29,120 storable.reload 863 00:31:29,120 --> 00:31:31,120 so I cannot do much about it. 864 00:31:31,120 --> 00:31:33,120 We'll see how 865 00:31:33,120 --> 00:31:35,120 I plan to do it. 866 00:31:37,120 --> 00:31:39,120 The UI itself does not 867 00:31:39,120 --> 00:31:41,120 ... It is not very scalable. 868 00:31:41,120 --> 00:31:43,120 I mean, 869 00:31:43,120 --> 00:31:45,120 you all know the default 870 00:31:45,120 --> 00:31:47,120 UI, so now 871 00:31:47,120 --> 00:31:49,120 you have your cluster, just imagine 872 00:31:49,120 --> 00:31:51,120 one thousand nodes inside 873 00:31:51,120 --> 00:31:53,120 it's, well, 874 00:31:53,120 --> 00:31:55,120 it's a little bit flat, and 875 00:31:55,120 --> 00:31:57,120 not very... 876 00:31:57,120 --> 00:31:59,120 All the nodes 877 00:31:59,120 --> 00:32:01,120 are essentially 878 00:32:01,120 --> 00:32:03,120 on the overview and 879 00:32:03,120 --> 00:32:05,120 it's very static, and it's not 880 00:32:05,120 --> 00:32:07,120 what one 881 00:32:07,120 --> 00:32:09,120 does expect in 2013. 882 00:32:11,120 --> 00:32:13,120 Because we all have 883 00:32:13,120 --> 00:32:15,120 this web app, 884 00:32:15,120 --> 00:32:17,120 and this 885 00:32:17,120 --> 00:32:19,120 phone shining with 886 00:32:19,120 --> 00:32:21,420 very dynamic stuff 887 00:32:21,420 --> 00:32:23,420 And ours is not very dynamic, I agree. 888 00:32:25,420 --> 00:32:27,420 The same is if you 889 00:32:27,420 --> 00:32:29,420 you know the comparison page? 890 00:32:29,420 --> 00:32:31,420 I mean, if every node 891 00:32:31,420 --> 00:32:33,420 of a group and every graph, just imagine 892 00:32:33,420 --> 00:32:35,420 that on one thousand nodes 893 00:32:35,420 --> 00:32:37,420 one thousand plugins 894 00:32:37,420 --> 00:32:39,420 Your Firefox 895 00:32:39,420 --> 00:32:41,420 won't have any memory anymore. 896 00:32:43,420 --> 00:32:45,420 And the last thing is 897 00:32:45,420 --> 00:32:47,420 it lacks proper ACL. 898 00:32:47,420 --> 00:32:49,420 For a bigger install, 899 00:32:49,420 --> 00:32:51,420 usually you want to delegate 900 00:32:51,420 --> 00:32:53,420 monitoring to 901 00:32:53,420 --> 00:32:55,420 subsystems, and you don't 902 00:32:55,420 --> 00:32:57,420 want people to see everything, because 903 00:32:57,420 --> 00:32:59,420 it will be overwhelmed by 904 00:32:59,420 --> 00:33:01,420 the information, and 905 00:33:01,420 --> 00:33:03,420 well, that's 906 00:33:05,420 --> 00:33:07,420 that's a problem. 907 00:33:07,420 --> 00:33:09,420 So, I'll just go 908 00:33:09,420 --> 00:33:11,420 very fast, that's my last slide 909 00:33:13,420 --> 00:33:15,420 So, for 2.2 we will be 910 00:33:15,420 --> 00:33:17,420 integrating into 911 00:33:17,420 --> 00:33:19,420 2.1 and 912 00:33:19,420 --> 00:33:21,420 when it is stable it will become 2.2 913 00:33:21,420 --> 00:33:23,420 it's moving from 914 00:33:23,420 --> 00:33:25,420 the whole Storable thing to 915 00:33:25,420 --> 00:33:27,420 SQL-based, and 916 00:33:27,420 --> 00:33:29,420 the SQL-based will be 917 00:33:29,420 --> 00:33:31,420 DBI-based, because we are still in Perl 918 00:33:31,420 --> 00:33:33,420 and will by SQLite 919 00:33:33,420 --> 00:33:35,420 by default, because 920 00:33:35,420 --> 00:33:37,420 we really want the 921 00:33:37,420 --> 00:33:39,420 nice out-of-the-box 922 00:33:39,420 --> 00:33:41,420 install, remember our users 923 00:33:41,420 --> 00:33:43,420 [many] of them 924 00:33:43,420 --> 00:33:45,420 are the one-node type, 925 00:33:45,420 --> 00:33:47,420 and if you want, you can 926 00:33:47,420 --> 00:33:49,420 do PostgreSQL 927 00:33:49,420 --> 00:33:51,420 and if you want, you can do 928 00:33:51,420 --> 00:33:53,420 whatever DBI supports 929 00:33:53,420 --> 00:33:55,420 it's just up to you. 930 00:33:55,420 --> 00:33:57,420 It will enable dynamic HTML 931 00:33:57,420 --> 00:33:59,420 because 932 00:33:59,420 --> 00:34:01,420 well, we are 933 00:34:01,420 --> 00:34:03,420 not in 2001 anymore 934 00:34:05,420 --> 00:34:07,420 But that will require 935 00:34:07,420 --> 00:34:09,420 a deep rewrite of the code 936 00:34:09,420 --> 00:34:11,420 As I said before, 937 00:34:11,420 --> 00:34:13,420 when you have many many accessors 938 00:34:13,420 --> 00:34:15,420 to Storable inside the core 939 00:34:15,420 --> 00:34:17,420 but 940 00:34:17,420 --> 00:34:19,420 since it was a big Storable, 941 00:34:19,420 --> 00:34:21,420 it was a native Perl data structure 942 00:34:21,420 --> 00:34:23,420 so for 943 00:34:23,420 --> 00:34:25,420 whatever reason 944 00:34:25,420 --> 00:34:27,420 many code does not use 945 00:34:27,420 --> 00:34:29,420 accessors, they use it 946 00:34:29,420 --> 00:34:31,420 in a typical Perl way 947 00:34:31,420 --> 00:34:33,420 and that makes it very difficult 948 00:34:33,420 --> 00:34:35,420 to translate to SQL. 949 00:34:35,420 --> 00:34:37,420 So that's a challenge. 950 00:34:39,420 --> 00:34:41,420 And just to be completely crystal-clear, 951 00:34:43,420 --> 00:34:45,420 the data will stay, 952 00:34:45,420 --> 00:34:47,420 the data that is in RRD will stay 953 00:34:47,420 --> 00:34:49,420 in RRD. I don't want 954 00:34:49,420 --> 00:34:51,420 to put the timestamped 955 00:34:51,420 --> 00:34:53,420 value inside SQL. 956 00:34:53,420 --> 00:34:55,420 That's not the point. 957 00:35:07,420 --> 00:35:09,420 We will have a complete 958 00:35:09,420 --> 00:35:11,420 node push 959 00:35:11,420 --> 00:35:13,420 feature 960 00:35:13,420 --> 00:35:15,420 The node can push 961 00:35:15,420 --> 00:35:17,420 on the master 962 00:35:17,420 --> 00:35:19,420 in order 963 00:35:19,420 --> 00:35:21,420 well, to have 964 00:35:21,420 --> 00:35:23,420 to break this five minutes pause 965 00:35:23,420 --> 00:35:25,420 standard, so you can put 966 00:35:25,420 --> 00:35:27,420 whenever you want, at every second if you want 967 00:35:27,420 --> 00:35:29,420 and 968 00:35:31,420 --> 00:35:33,420 this will enable 969 00:35:33,420 --> 00:35:35,420 very fine 970 00:35:37,420 --> 00:35:39,420 precision 971 00:35:39,420 --> 00:35:41,420 and my goal is to be as good as 972 00:35:41,420 --> 00:35:43,420 collectd. 973 00:35:43,420 --> 00:35:45,420 If... 974 00:35:45,420 --> 00:35:47,420 And well, if you have 975 00:35:47,420 --> 00:35:49,420 that little 976 00:35:49,420 --> 00:35:51,420 blurb on the new 977 00:35:51,420 --> 00:35:53,420 HTML5 UI 978 00:36:01,420 --> 00:36:03,420 I speeded up the end, so you have 979 00:36:03,420 --> 00:36:05,420 a little time for questions. 980 00:36:05,420 --> 00:36:07,420 If you want. 981 00:36:17,420 --> 00:36:19,420 [question1] With this... 982 00:36:23,420 --> 00:36:25,420 Is it possible, with this new 983 00:36:25,420 --> 00:36:27,420 architecture 984 00:36:27,420 --> 00:36:29,420 to 985 00:36:33,420 --> 00:36:35,420 (sorry, I just missed my question) 986 00:36:37,420 --> 00:36:39,420 [presenter] The SQL one, you mean? Or 987 00:36:39,420 --> 00:36:41,420 [question] No, I had just wrote it down... 988 00:36:41,420 --> 00:36:43,420 So, I'm sorry for... 989 00:36:43,420 --> 00:36:45,420 [laughter] 990 00:36:47,420 --> 00:36:49,420 [presenter] The async? Or... 991 00:36:49,420 --> 00:36:51,420 [question1] Yes. Do you fork 992 00:36:51,420 --> 00:36:53,420 the plugins? 993 00:36:53,420 --> 00:36:55,420 The architecture still 994 00:36:55,420 --> 00:36:57,420 forks the plugins 995 00:36:57,420 --> 00:36:59,420 every time, or 996 00:36:59,420 --> 00:37:01,420 is it possible to run 997 00:37:01,420 --> 00:37:03,420 the plugin and keep it all running 998 00:37:03,420 --> 00:37:05,420 and just feedback some values 999 00:37:05,420 --> 00:37:07,420 as you will 1000 00:37:07,420 --> 00:37:09,420 you mentioned collectd, which 1001 00:37:09,420 --> 00:37:11,420 builds on this architecture. 1002 00:37:15,420 --> 00:37:17,420 [presenter]: I designed the 1003 00:37:17,420 --> 00:37:19,420 ...a new extension 1004 00:37:19,420 --> 00:37:21,420 a new verb for plugins 1005 00:37:21,420 --> 00:37:23,420 it's called "stream" 1006 00:37:23,420 --> 00:37:25,420 and this is, you just 1007 00:37:25,420 --> 00:37:27,420 launch the plugin, you ask 1008 00:37:27,420 --> 00:37:29,420 for a config 1009 00:37:29,420 --> 00:37:31,420 and then you ask for the stream, and 1010 00:37:31,420 --> 00:37:33,420 when the plugin quits 1011 00:37:33,420 --> 00:37:35,420 it means, just sends 1012 00:37:35,420 --> 00:37:37,420 periodically values 1013 00:37:37,420 --> 00:37:39,420 at the rate he wants to 1014 00:37:39,420 --> 00:37:41,420 so it's very 1015 00:37:41,420 --> 00:37:43,420 ...it is designed to capture, for example 1016 00:37:43,420 --> 00:37:45,420 the output of 1017 00:37:45,420 --> 00:37:47,420 vmstat 1018 00:37:47,420 --> 00:37:49,420 you can do 1019 00:37:49,420 --> 00:37:51,420 cat vmstat | awk 1020 00:37:51,420 --> 00:37:53,420 and, well, that's your 1021 00:37:53,420 --> 00:37:55,420 plugin output. It will stay 1022 00:37:55,420 --> 00:37:57,420 in memory. And 1023 00:37:57,420 --> 00:37:59,420 the plugin will kill himself 1024 00:37:59,420 --> 00:38:01,420 when the configuration changes. 1025 00:38:01,420 --> 00:38:03,420 That's 1026 00:38:03,420 --> 00:38:05,420 the design. But the promise 1027 00:38:05,420 --> 00:38:07,420 I didn't put it in 2.2 1028 00:38:07,420 --> 00:38:09,420 because I won't have time to do it. 1029 00:38:11,420 --> 00:38:13,420 But that's the way 1030 00:38:13,420 --> 00:38:15,420 it is done. But basically 1031 00:38:15,420 --> 00:38:17,420 the 1032 00:38:17,420 --> 00:38:19,420 architecture of forking/exec 1033 00:38:19,420 --> 00:38:21,420 a plugin 1034 00:38:21,420 --> 00:38:23,420 is, or will be 1035 00:38:23,420 --> 00:38:25,420 at the core of Munin. It won't be 1036 00:38:25,420 --> 00:38:27,420 for example 1037 00:38:27,420 --> 00:38:29,420 a DO .eso [?] or 1038 00:38:31,420 --> 00:38:33,420 that you will chart in, or .pm 1039 00:38:33,420 --> 00:38:35,420 that you will charge in 1040 00:38:35,420 --> 00:38:37,420 Munin memoryspace 1041 00:38:37,420 --> 00:38:39,420 That's 1042 00:38:39,420 --> 00:38:41,420 That's not something I want to. 1043 00:38:47,420 --> 00:38:49,420 [questioner1]: This was the 1044 00:38:49,420 --> 00:38:51,420 thing which 1045 00:38:51,420 --> 00:38:53,420 I really liked 1046 00:38:53,420 --> 00:38:55,420 Munin, and used it 1047 00:38:55,420 --> 00:38:57,420 in 1.2 or whatever, but 1048 00:38:57,420 --> 00:38:59,420 it had scaling problems 1049 00:38:59,420 --> 00:39:01,420 with regards of 1050 00:39:01,420 --> 00:39:03,420 work. So that was one of the reasons 1051 00:39:03,420 --> 00:39:05,420 I have to 1052 00:39:05,420 --> 00:39:07,420 change to another system. 1053 00:39:17,420 --> 00:39:19,420 [questioner2]: Hi. 1054 00:39:19,420 --> 00:39:21,420 So, I was a happy Munin user 1055 00:39:21,420 --> 00:39:23,420 And then, suddenly I 1056 00:39:23,420 --> 00:39:25,420 well, 1057 00:39:25,420 --> 00:39:27,420 because of the scaling issues, 1058 00:39:27,420 --> 00:39:29,420 I moved to a 1059 00:39:29,420 --> 00:39:31,420 pnp4nagios 1060 00:39:31,420 --> 00:39:33,420 and that's one 1061 00:39:33,420 --> 00:39:35,420 question I want to ask, is about how... 1062 00:39:35,420 --> 00:39:37,420 Because, with all this data 1063 00:39:37,420 --> 00:39:39,420 that is a great thing in Munin 1064 00:39:39,420 --> 00:39:41,420 then you can do proxy 1065 00:39:43,420 --> 00:39:43,670 monitoring 1066 00:39:43,670 --> 00:39:45,420 that is, sending OLS if [?] monitoring 1067 00:39:45,420 --> 00:39:53,520 that is, sending OLS if [?] 1068 00:39:53,520 --> 00:39:55,520 Do you plan on 1069 00:39:55,520 --> 00:39:57,520 having better integration with 1070 00:39:57,520 --> 00:39:59,520 [?] systems 1071 00:39:59,520 --> 00:40:01,520 than you have currently? 1072 00:40:01,520 --> 00:40:03,520 [presenter]: Actually 1073 00:40:03,520 --> 00:40:05,520 the point of 1074 00:40:05,520 --> 00:40:07,520 Nagios, I mean, we 1075 00:40:09,520 --> 00:40:11,520 We have very much problem with 1076 00:40:11,520 --> 00:40:13,520 because of 1077 00:40:13,520 --> 00:40:15,520 nscachanger 1078 00:40:15,520 --> 00:40:17,520 its interface, lately, 1079 00:40:19,520 --> 00:40:21,520 The thing is, we have 1080 00:40:21,520 --> 00:40:23,520 something called 1081 00:40:23,520 --> 00:40:25,520 munin-limits, and 1082 00:40:25,520 --> 00:40:27,520 it sends 1083 00:40:27,520 --> 00:40:29,520 [?] and so on, but it does not 1084 00:40:29,520 --> 00:40:31,520 do it very well. 1085 00:40:31,520 --> 00:40:33,520 So, 1086 00:40:33,520 --> 00:40:35,520 the integration with other systems 1087 00:40:35,520 --> 00:40:37,520 such as Nagios 1088 00:40:37,520 --> 00:40:39,520 icinger or whatever 1089 00:40:39,520 --> 00:40:41,520 is very very high on my top list 1090 00:40:41,520 --> 00:40:43,520 because I don't want to reimplement Nagios. 1091 00:40:43,520 --> 00:40:45,520 I mean, it is 1092 00:40:45,520 --> 00:40:47,520 I want to focus 1093 00:40:47,520 --> 00:40:49,520 on data gathering and 1094 00:40:49,520 --> 00:40:51,520 data keeping, I mean 1095 00:40:51,520 --> 00:40:53,520 I'm more interested in 1096 00:40:53,520 --> 00:40:55,520 replacing something like php4nagios 1097 00:40:55,520 --> 00:40:57,520 than Nagios itself. 1098 00:40:57,520 --> 00:40:59,520 [question2]: Becuase the 1099 00:40:59,520 --> 00:41:01,520 munin-limits, for example 1100 00:41:01,520 --> 00:41:03,520 it only has 1101 00:41:03,520 --> 00:41:05,520 threshold, like, if 1102 00:41:05,520 --> 00:41:07,520 it is a set of values, then 1 1103 00:41:07,520 --> 00:41:09,520 whereas I'm also interested in 1104 00:41:09,520 --> 00:41:11,520 questions like, OK, 1105 00:41:11,520 --> 00:41:13,520 usually this filesystem 1106 00:41:13,520 --> 00:41:15,520 is growing at 1% rate every 1107 00:41:15,520 --> 00:41:17,520 day, and suddenly it grew out like 1108 00:41:17,520 --> 00:41:19,520 50%. I want a warning there. 1109 00:41:19,520 --> 00:41:21,520 That's, you know... 1110 00:41:21,520 --> 00:41:23,520 [presenter]: Exactly, that's something 1111 00:41:23,520 --> 00:41:25,520 that is even offered by 1112 00:41:25,520 --> 00:41:27,520 RRD right now 1113 00:41:27,520 --> 00:41:29,520 and I also 1114 00:41:29,520 --> 00:41:31,520 have it on my future 1115 00:41:31,520 --> 00:41:33,520 roadmap, but, well, 1116 00:41:35,520 --> 00:41:37,520 I'm taking the problem 1117 00:41:37,520 --> 00:41:39,520 for user-facing 1118 00:41:39,520 --> 00:41:41,520 right now, but 1119 00:41:41,520 --> 00:41:43,520 everyone is welcome to help. 1120 00:41:43,520 --> 00:41:45,520 [question2]: I'm looking forward to it. 1121 00:41:49,520 --> 00:41:51,520 [presenter]: Yes, OK, so 1122 00:41:51,520 --> 00:41:53,520 time is up, so I guess you have to ask your questions 1123 00:41:55,520 --> 00:41:57,520 after the talk, and 1124 00:41:57,520 --> 00:41:59,520 thank you 1125 00:41:59,520 --> 00:42:01,520 [audience]: Just, there is a 1126 00:42:01,520 --> 00:42:03,520 BoF session 1127 00:42:03,520 --> 00:42:05,520 this afternoon, if you have some questions, 1128 00:42:05,520 --> 00:42:07,520 or anything specific, just come, and 1129 00:42:07,520 --> 00:42:09,520 I'll be glad to answer. 1130 00:42:09,520 --> 00:42:11,520 [presenter]: Thanks 1131 00:42:11,520 --> 00:42:13,520 [applause]