|
|
|
|
|
|
Submitted by Bryce on Wed, 11/24/2010 - 10:52
|
A while ago I came across an aritcle that talked about using python with couchdb. Since I'm interested in learning about the whole NoSQL movement and am trying to understand where this new style of database server fits in the schema (sorry, I couldn't resist) of things I decided to spend a little time getting to know CouchDB better.
The system that I build this script on, tested with, and designed for is Ubuntu (I'm running 10.10 if anyone cares to know). Although I installed CouchDB through apt-get, I decided to get couchdb-python through easy-install. I did this because the website documentation for couchdb-python is written for the most recent version, and the apt-get version is a little out of date.
So after you've installed Couchdb and couchdb-python you can now run the script below:
#!/usr/bin/python ''' Simple script to start playing with the couchdb python package. ''' import os import subprocess import re import sys import couchdb from couchdb.mapping import TextField, ListField, DictField class System(couchdb.Document): _id = TextField() _rev= TextField() uname = ListField(TextField()) packages = DictField() def get_packages(): ''' gather all of the packages on the system, and return a dictionary with the package name as the key, and the version as the value. ''' process = subprocess.Popen("dpkg-query -W -f='${Package} ${Version}\n'", stdout=subprocess.PIPE, shell=True) (proc, error) = process.communicate() sys.stdout.flush() if error: print(error) sys.exit(1) proc_dict = {} for x in proc.splitlines(): m = re.search('(?P<package>\S+)\W(?P<version>\S+)', x) proc_dict[m.group('package')] = m.group('version') return proc_dict def uname_list(): ''' Generate a list based on the data from uname, excluding the system's name. ''' l = [] l.append(os.uname()[0]) for x in os.uname()[2:]: l.append(x) return l if __name__ == "__main__": box_name = os.uname()[1] # When not using any arguments, defaults to localhost server = couchdb.client.Server() # Test to see if the db exists, create if it doesn't try: db = server['sys_info'] except couchdb.http.ResourceNotFound: db = server.create('sys_info') # test to see if the computer already exists in the db try: rev = db[box_name].rev db.update([System( _id = box_name, _rev = rev, uname = uname_list(), packages = get_packages())]) except couchdb.http.ResourceNotFound: db.save(System(_id = box_name, uname = uname_list(), packages = get_packages()))
If the script ran without errors then you should be able to this URL and see your computers name as well a value of rev:1 – “some string of characters”. If you click on your computer's name, you'll see all the information the put inserted into the database. What you should see is the “_id” field which will contain the computer's name. A “_rev” field which will say the current revision number for this page. A list of all the packages installed on the system... in no particular order. Finally the output of uname, minus the computer name.
If the script did return some errors, what please make sure that you have module initialization arugments correct (line 54). Couchdb-python uses the defaults if there are no arguments, so in this case the module is going to localhost for the host and the admin account, which has no password. Yes, I know this is really unsafe but I'm just playing with things right now.
One of the things I feel worth pointing out in the System class, in order to create a ListField for CouchDB you must specify what the ListField will contain. In this case (line 16) I am filling the ListField with TextFields. Or in Python speak, filling the list with strings.
I'm surprisingly fascinated with CouchDB... though I'm really hard pressed to say why. At this moment I want to modify my tweet_dump project to use CouchDB. So expect to see that series continued in a little while.
One last thing, for those of you who celebrate it, Happy Thanksgiving!
|
|
|
|
|
|
|
|
|
|
Submitted by Bryce on Thu, 05/06/2010 - 18:38
|
Even though Poisonbit's solution to the original problem is the fastest (Thanks again PoisenBit!) I decided to use the original code as an excuse to learn multi-threading programming in PERL and see if it might improve the performance of the original code. One of those "for shits and giggles" moments.
First off, here is the new and improved code:
#!/usr/bin/perl ####################################################################### # Created By: Bryce Verdier # on 4/14/10 # # Function: grab all installed packages, using threads # find their exact versions, and display them # NOTE: FOR USE ON DEBIAN BASED MACHINES ####################################################################### use threads; my $temp_pack; my $temp_ver; my $returned_version; my %pack_hash :shared; my $thread_count = 0; my $pack_count; my @return = `dpkg --get-selections`; sub get_package_ver { my %args = @_; my $temp_ver; my $returned_ver = `dpkg -s $args{package}`; $returned_ver =~ m/^Version: (.+)$/m; $temp_ver = $1; lock($args{hash}); $args{hash }{$args{package}} = $temp_ver; } { $_ =~ m/^(\S+)[ \t].*/; $_ = $1; } while ( $pack_count - $thread_count >= 1) { my $th1 = threads->create(\&get_package_ver, hash => \%pack_hash, package => $return[$thread_count]); my $th2 = threads->create(\&get_package_ver, hash => \%pack_hash, package => $return[$thread_count+1]); $thread_count = $thread_count + 2; } # Get the odd package, if there is one if ($pack_count - $thread_count == 1) { my $th1 = threads->create(\&get_package_ver, hash => \%pack_hash, package => $return[$thread_count]); $thread_count++; } while((my $key, my $value) = each(%pack_hash)) { }
For all the number crunchers out there, putting things into two threads reduced the program execution time by more than 1 minute. For a program that took two and a half seconds to complete, a reduction to one and a half seconds is pretty significant.Well, in my book anyway.
After first following the suggestions of Sam for the regexes, (Thanks again Sam!) I pulled out the version checking code into its own function so that each thread would have a very specific thing to do. After that I realized I needed to clean up the package names that were being sent to the threads, creating the foreach loop on line 35. Within the foreach loop I tried something that I didn't expect to work - the line:
$_ = $1;
There isn't a reason for the line above to not work, but in the PERL code I've seen "$_" is not being used as a pointer to write data to, only to retrieve data from. And I guess that is why I did not expect it to work. But then, that's why I'm doing this - to learn things. :-D
After these changes and adding the threads code, I was almost done. For some reason the data from each thread wasn't getting stored into the hash. It wasn't until I looked a little deeper at the examples on perldoc that I saw what I needed to do. In the section "Shared And Unshared Data" I noticed I needed to mark %pack_hash as shared, so all the threads could access it. Which I did like so:
my %pack_hash :shared;
All in all, my first multi-threading coding expirence in perl wasn't bad. Granted the program isn't complicated, but this was truely new territory for me. I haven't tried doing any kind of multi-process/multi-threading programming since my operating systems class almost 3 years ago, so there were some battles to fight in my head on how to modify things to work with multiple threads. But again, it was a good experience. And I'm going to reiterate this so everyone remembers: don't use this code in production. Poisonbit's solution is MUCH faster than mine. Like me, use this code for learning.
|
|
|
|
|
|
|
|
|
|
Submitted by Bryce on Wed, 04/21/2010 - 08:57
|
AAAHHH work! Everyone has those horror stories where your boss, or the client, comes and asks for some horrible feature that will require an entire rewrite of the program. Fortunately, that hasn't happened to me (yet) and that is not what this blog entry is about. It's about the even more (what I believe to be) unlikely scenario when someone wants a feature that they think will be difficult to create and after some research you implement that feature in a small amount of time. It's a rare event and great confidence booster when it does happen though.
A co-worker wanted a feature added to a project. He thought that it might take a while to complete so he talked to me first about it to “plant the seed” and get my brain started on figuring out a solution, not expecting a quick turnaround. Of course, as the title hints at, the problem was to find out all the packages installed on our Debian boxes as well as their version numbers. So after a little bit of googleing and man page reading I had a basic algorithm to build on. Twenty minutes of coding and testing later, I had this script:
#!/usr/bin/perl ####################################################################### # Created By: Bryce Verdier # on 4/14/10 # # Function: grab all installed packages, find their exact # versions, and display them # NOTE: FOR USE ON DEBIAN BASED MACHINES ####################################################################### my $temp_pack; my $temp_ver; my $returned_version; my %pack_hash; my @return = `dpkg --get-selections`; { $_ =~ m/(\S*)[ \t].*/i; $temp_pack = $1; $returned_version = `dpkg -s $temp_pack`; $returned_version =~ m/Version: (.*)/i; $temp_ver = $1; $pack_hash{$temp_pack} = $temp_ver; } while((my $key, my $value) = each(%pack_hash)) { }
I will admit that is a little slow (on my desktop it takes around two and a half minutes to complete) and could probably benefit from some parallelization. However, that might be over-engineering for such a simple task. I'll code that feature up next, grab a stopwatch, and test it just to find out. Anybody gonna place bets one way or another? In the meantime, I'm proud of my use of regex's in this script, the "\S" removes the trailing whitespace from the package names, instead of just using ".*", which should speed things up a bit because I don't have to call chomp on each package name. Also using a hash for storage simplifies the data management and should allow for an easier time porting the code into a larger script later.
|
|
|
|
|
If you made it this far down into the article, hopefully you liked it enough to share it with your friends. Thanks if you do, I appreciate it.

|
|