Today while I was talking in #techlahoma, the topic of chatbots came up. Being something of a chat bot connoisseur, I (with @amandaharlin's blessing) have set out to tell you how to get your very own hubot up and running on IRC.
First, let's talk briefly about my chat bot history. I've written a Python chat bot from scratch to connect with an XMPP server. His name was kittenbot, and he was awesome. As a result, I got very familiar with XMPP's XEP-0045, which is the specification for multi user chat. Fortunately, the good folks over at @github have given us a lot more to work with than I did in the Kittenbot days.
Awesome. So let's get cracking.
First thing you should do is go grab an ice-cold beer. I've gone with Prarie Wine Barrel Noir. It's particularly delicious and well suited to making hubots.
The second thing you need to do is install the latest version of node. I've built it from source on ubuntu, which is an amazingly simple process. Instructions can be found over at Joyent.
You need then to get the required npm packages for hubot:
npminstall-g hubot coffee-script
After you have that installed, you'll need to create your new hubot. This is done simply with:
hubot --create techlahoma-bot
This will create a new techlahoma-bot directory in whatever root directory you specified. I'll be putting this code up athttps://github.com/cthos/techlahoma-bot. Next step is to install the new hubot's brain! This runs on redis by default, which is fine. I'm not going to be super picky and just install the redis-server package provided by my host. Once that's installed, it is running on the proper port by default, so that's all we need to do.
After that, we need to make some minor modifications to the Package.json file. For some reason, the one generated by hubot --create does not include the correct version for hubot-scripts. We need to change the line:
'hubot-scripts':'>= 3.5.0 < 3.0.0',
To
'hubot-scripts':'2.5.14',
Great! Now hubot is able to actually run. Now we need to pick and choose some awesome things from the Hubot Script Catalog to have our bot be capable of by editing hubot-scripts.json:
I recently decided to get an IRC server up and running to see how complicated it would be compared to installing ejabberd.
After looking through some of the packages available on Ubuntu through the standard repositories. I wanted to have Nickserv and Chanserv running, so I was hoping to get something with services included. Originally, I tried ircd2, but I didn't find a decent services package which worked with it. So then I was going to try dancer-ircd, because it had a dancer-services package. Turns out dancer-services was removed from the package repos due to a bug that was never corrected.
Thus, after a bit of poking I decided to go with ircd-ratbox and ratbox-services.
Installing Ratbox
Ratbox is pretty easy to install, and it has a very large (and very snarky) sample configuration file for you to work off of. It recommends that you read the entire thing first before starting to configure your real ircd.conf file. It then forces the point because there's a configuration variable which will shut down the server if you haven't been reading the entire thing (I'll give you a hint. It's called "havent_read_conf", and it's in the general block). Okay, fine. After a lot of head banging and some finagling I manage to get ircd-ratbox functional. That is all well and good, but unfortunately getting ratbox-services functional was a different story.
Wrestling with ratbox-services
First thing you need to do is get a back end for ratbox-services to run off of. I originally went with the sqlite backend, but there was a problem I was unable to resolve where it was unable to write to the db file. So then I switched to mysql. The problem there was the mysql db isn't installed by anything, so I had to go hunting for the mysql configuration file. It was not in an intuitive place (/usr/share/ratbox-services/lib/schema-mysql.sql).
The other undocumented problem was that NickServ is disabled by default (why? Why was that turned off?).
All in all it was not super difficult to get running after a careful reading of the configuration files, there were some gotchas which took me quite a long time to get figured out.
So, I was working on getting a particular view which rattles off a list of categories working to link off to another view that has a Contextual Filter on said category field to list off all matching content items with said value in the field. Similar to Taxonomy terms, but being linked from a select dropdown. I was using the output as link option, with a replacement filter to put the value of the item into the url. Everything was pretty straightforward, except for one little snag. One of my categories had a forward slash in it. Now, normally I would think this / would be url encoded to %2F, but that wasn't happening.
I did some digging, by utilizing hook_views_pre_render to modify what it was sending to the display, and I found that it was in fact running through drupal_encode_path. Here's the full text of that method:
returnstr_replace('%2F','/',rawurlencode($path));
As you can see, that's just urlencoding the string you're sending to it, and then decoding the slash. That means that no matter what I do in hook_views_pre_render, I am totally unable to get that / encoded properly for the url. Bummer.
Eventually I hacked it by making a separate view for the offending item (there was ony one), but for now it looks like my best option is to not allow slashes in that particular drop-down.
Just a very quick note, as I had to do a bit of hunting to find where the heck Aegir was generating my virtualhosts for all of the domains I was managing through it.
By default it is located inĀ /var/aegir/config/server_master/apache/vhost.d/, where that server_master part might change depending on what host you're modifying (if you're in a complex hosting situation, for example).
This comes in handy if you want to install SSL certificates yourself instead of going through Aegir to handle the certificates.
Yesterday, during a conversation with my coworkers on how Dependency Injection works with the Service Locator pattern, we were talking through what would happen with cyclical dependencies. Basically, what would happen if you told a Service locator to load a dependency for one class, and that class had a dependency for the class that was originally calling the Service Locator. This would cause an infinite loop, and in PHP, Maximum Memory Exceeded error pretty rapidly.
Class A attempts to resolve a dependency for class B, class B then tries to get a dependency for class A, and so on. There is a way to avoid this, turns out, if you're willing to make some concessions.
This is where my PsySl comes into play. It is basically a small experiment to see if I could deal with the cyclical dependency loop. Here's the meat of ServiceLocator class:
There's a lot going on in this method. First, it checks to see if it has any open dependencies (things that have called the resolution stack but have yet to be completely resolved). If it does, it creates a new StubResolver and sets the dependency it is a mock for. Next, it checks the stack trace to see if the calling class is one of the things it is registered to resolve (and really, it should probably check to see if it is the construct method, since otherwise it wouldn't cause a loop). If this is the case, it appends it to the list of open dependencies and continues on. The last bit is it tries to resolve the original dependency it was trying to resolve, closes its open dependency, and continues along. Additionally it keeps a copy of the dependency it created in a static variable so it StubResolver can later resolve its dependency without entering an infinite loop. (Which consequently means the dependencies have to be singletons, a problem I'm not sure if it is possible to get around).
That's about where the magic happens. In our new A($sl) example, A tries to get a copy of B, and in the process registers A as an open dependency. B then tries to resolve A, and is returned a StubResolver instance for its "A" dependency. B finishes resolving, and everyone is happy.
But wait, now B has a StubResolver in place of its $a parameter. How does it fully resolve to A? Well, StubResolver has a magic __call method which does the following:
So, as you can see, there is a lazy load of the dependency the first time you call a method on it. You would need to implement __get and __set which will also do the same, and forward those along to the dependency as well.
All of this is incredibly impractical and breaks all kinds of conventions, but it was a fun exercise to dig a bit into the associated patterns and how php can be manipulated for evil (or good). You can view the full source over here: https://github.com/cthos/PsySl
My last post was about using ab and jmeter to get some performance benchmarks on a website. Now, I figure I should probably mention how to profile your application to look for memory leaks, bad calls, and what portion of the code is taking the most amount of time.
First thing to do is get xdebug installed. There are several ways to go about this, the easiest being using a package manager to do the hard work for you. For example, if you're running Ubuntu with the default php from apt-get, you would do:
sudo apt-get install php5-xdebug
Failing that, the xdebug site has a full set of instructions on how to install and configure it: http://xdebug.org/docs/install
Okay! Now that it's installed (and presumably working), jump into php.ini (usually going to be in /etc/php.ini on linux systems, I hear it's in /usr/local/etc on freeBSD for example).
There are a couple of configuration settings you can set in order to turn on profiling. Here are how they work (and they're all listed here as well: http://xdebug.org/docs/profiler):
xdebug.profiler_enable=1 - This will turn on profiling for all requests, which is not a great idea as it will generate a lot of cachegrind files quickly.
xdebug.profiler_enable_trigger=1 - This turns on profiling if XDEBUG_PROFILE=1 is present in the GET/POST, or if you have a cookie set with that name (content doesn't matter)
xdebug.profiler_output_dir=/tmp - Sets the profiler's output directory!
So, you usually want to turn on profiler_enable_trigger, and then pick out a page you want to profile (maybe one that is fairly slow) and then add ?XDEBUG_PROFILE=1 to the end of the url. Go check the directory you set in profiler_output_dir, and check out the cachegrind files it generated.
What the heck do you do with cachegrind files?
If you're on Windows, you'll want to dig out wincachegrind. On Linux, kcachegrind. On Mac, I've tried out MacCallGrind, it's a paid app, but the trial allows you to open files up to 3MB in size (which will generally be enough on smaller page loads, but if you have a serious call stack problem it won't cut it).
So, fire up the cachegrind parser of choice and load up a cachegrind.out file of your choosing. You will be presented with a window that looks like this (WinCacheGrind).
Learning to read the output isn't too difficult, the left side of the window is the entire call stack, starting at the outermost call and allowing you to drill down into all of the subsequent calls. You can also do that on the main panel. The different columns of the main window are the amounts of time each call took to execute. Self is the total amount of time that particular invocation took. In the sceenshot my main file (but nothing it included) took 1ms. The Cum. column is the cumulative amount of time that all of the child calls inside of the main method took (aka 22 seconds).
Just stepping through the entire application can be useful, but more useful is being able to sort through the most expensive calls that are happening in your application. Simply click on the Overall tab, and you will be presented with a screen like the following.
As you can see, there are several more columns which provide you with more information. Avg. Self is the average amount of time that individual call took across all of the invocations. **Avg. Cum. **is the average amount of time that the call took including all of the child calls that method invoked. Total Self is the total amount of time the individual call took across all invocations. **Total Cum. **not surprisingly is the total amount of time the call including child calls took. Calls is the total number of times that method was called during script execution.
That Total Cum. column is the one you will want to look at in order to determine where large spikes of time are being used up. As you can see, I've sorted on that field, and php's call_user_func_array is taking the largest amount of time. That's not particularly useful, though you can then drill down into the call stack and get to the meat of the problem by doubleclicking (from the line-by-line section) until you find a more concrete problem. How do I know it's not actually call_user_func_array's fault? For one, Total Self is 18ms, which is almost nothing. For two, it doesn't do anything except make another call.
Drilling down, I find that ultimately the longest call is split between view->build and view-render, indicating I have a problem with a drupal view I'm inculding on a bunch of pages. Going even deeper, the majority of the time in those calls are also because of a mysql call, indicating a query problem being the bottleneck.
Optimizing Mysql is beyond the scope of this article, but it is next in my series of site optimization articles so stay tuned!