Archive for August, 2008

Awesome

Comments

There is nothing else I can say:

http://develop-one.net/blog/2008/08/27/HugADeveloper.aspx


More Grid: EC2 Block Store (EBS)

Comments

Amazon's EC2 now named EC2 Block Store has persistent storage that should be much easier to use for EC2 instances.  Its appears to be as easy as using fstab to mount the EBS volume.  Pricing is along the same lines as all the other Amazon services, a pay as you go model, at a rate of $0.10 per allocated GB and $0.10 for 1 million I/O operations per volume.  They also allow you to make snapshots of a volume and store your snapshots on S3 and then start a new volume from a snapshot.  Looks like this is the missing piece for many people to make EC2 a valid option, I know a lack of easily usable persistence was preventing us from using it previously.

Now just need to do a cost analysis...


Useful commands I must never forget

Comments

On Windows, to sever all network CIFS/SMB:

net use * /delete

On Linux using bash, the ultra useful repeat history:

history | grep whatever_you_want_to_recall

then bang-number from history, i.e.:

$ history | grep restart

  431  /etc/init.d/apache2 restart
  483  /etc/init.d/sshd restart
  501  history | grep restart

$ !431

Simple tricks that somehow I always forget. I don't know how, these should be akin to typing by this point.  Maybe by posting them I will remember.


Microsoft.VisualBasic

Comments

One of my recent projects involved copying large files (10 MBs or more) over a network connection, usually a VPN connection over the internet.  I wanted a way to display progress but didn't want to have to spend too long writing the function.  Then I remembered ShellUI.dll and the Explorer dialogs it creates and how those would be perfect for what I was doing.  I immediately googled for the extern signatures necessary to summon the file copy dialog when I discovered from some research that this functionality was already wrapped in the Microsoft.VisualBasic assembly!

Scott Hanselman recently wrote about Single Instance WinForms and how you could essentially recreate the old Win32/VB6 single app instance to handle all requests - so multiple attempts at opening a document result in the same application servicing them.  This is very useful functionality and its already nicely wrapped up in the Microsoft.VisualBasic assembly.

I spent a good two hours digging into the assembly to see just what old favorites of mine were still around.  Beep is in there and Screen, another old favorite, among others I recall with fond memories.  There's also some new functionality I either was never aware of or have just simply forgotten.

So here's a brief overview of what I found in there digging around.  Some of the functionality you can do in C# with some effort and a lot of this functionality is very useful.

Microsoft.VisualBasic

MyGroupCollectionAttribute - here is one of those very interesting "supports Visual Basic and not intended to be used by your code."  After trying to get this guy to work, reading up on the VB.NET language reference, and generally Googling it, I get the idea.  Basically it uses strings to make a "group type family" of classes and acts like a compile time template to add the necessary methods to subclasses of the class marked with this attribute, and then gives you some syntaxical sugar where you don't even need to invoke your singleton GetInstance (or equivalent method); instead something that was once:

var x = MyClass.GetInstance(); x.DoSomething();

becomes:

MyClass.DoSomething();

without MyClass or DoSomething being declared as static (or Shared in VB).  It also allows you to alias these instances into a collection, a la My.Classes.MyClass and My.Classes.AnotherClass (assuming both MyClass and AnotherClass share the same ancestor and that ancestor is decorated with this attribute properly).  All in all a pretty neat idea, but I couldn't get it to work, the documentation was lacking and I tried every combination of C#, VB.NET, and C# and VB.NET to make this happen but obviously I was unable to devise the correct one.

Microsoft.VisualBasic.FileSystem

Kill - who doesn't like a function named Kill that does exactly what you'd expect it to?  (Assuming you'd expect it to delete a file and not terminate a process that is.  Or not kill a person.)

Microsoft.VisualBasic.Financial

* - all of these functions look very interesting if you are a financial programmer.  They don't have a whole lot of use for me, but still, a nice addition.

Microsoft.VisualBasic.Interaction

InputBox - a throwback to the old Windows 3.1 UI design days, an input box is a modal dialog (a #32770 IIRC) with a prompt, a text area, and buttons to confirm or dismiss the input box.  Need a quick and easy way to get a value while testing your UI?  The InputBox is effectively a simple - albeit perhaps not pretty - way of doing a Console.ReadLine from within a non-console WinForms application.  Because the return value of the "InputBox" method is a string, an empty (not null) string indicates that the cancel button was clicked which can be easily confused with the user entering or blanking the answer text box.  Also a notable throwback - the x and y positional arguments are in twips not pixels.

Partition - now this one is interesting and it has nothing to do with your file system.  Its basically a function that allows you to explode a single value into the range in which it belongs by defining the complete set and range intervals.  Microsoft provides a good example of what this does, and why the return value is an easily parsible string.

Switch - memories!  You provide an even number of items in an object[] with each odd numbered item a bool value or expression that evaluates to bool, and the even numbered items are the associated return values for the first expression which evaluates to true.  Basically a dynamic switch/case statement though certainly nowhere near as efficient.  Could be useful at some point as long as I remember its there.

Microsoft.VisualBasic.Strings

StrReverse - returns a string with the character positions reversed such that index 0 is now the last index and the last index is now index 0.

Microsoft.VisualBasic.Devices.ComputerInfo

* - contains some of the general level stuff (like physical memory available, OS version) that people sometimes ask for without resorting to WMI or Windows API calls.

Microsoft.VisualBasic.Devices.Keyboard

SendKeys - this gives you some level of easy automation with other applications in that it allows you to send key presses to an application as if the keyboard were being pressed.  You'll want to pair this with *.Interaction.AppActivate(string) which allows you to activate a window by caption.  Tip: if you don't want to make it look like your app is ghost writing when you are sending a lot of text, just preserve the clipboard, copy the text to the clipboard, then send keys "^v" which is your Control+v key.  Use this as a reference for all of the key command strings.

Microsoft.VisualBasic.Devices.Network

DownloadFile/UploadFile - allows you to specify URIs for remote addresses, local paths, and walla! instant download or upload of a file.  This seems to be restricted to ftp, http, and https so other later installed protocols likely don't work.

Ping - does what you think, with a boolean return value indicating if the ping request was successful.

IsAvailable - a boolean method that tells answers whether a network connection is available and working.

NetworkAvailabilityChanged - an event you can subscribe to to be notified when the network availability changes without having to perform any complex Win API P/Invokes.

Microsoft.VisualBasic.FileIO.FileSystem

* - the CopyFile and MoveFile and DeleteFile all use the ShellUI.dll functionality to allow you to automatically add support for progress and user controllable long operations (such as allowing the user to cancel a copy operation).  The DeleteFile methods even allow you to specify recycling bin options so you don't just unlink files, but send them to the recycling bin instead.

FindInFiles - the basic "find in files" ShellUI implementation which I believe does not make use of the indexing service.

GetTempFileName - a method which returns a temporary file name in the temporary directory for the current context (user's temporary directory for a user and one can assume system temporary directory for a service) and also creates a 0 byte file on disk so you can just open it up immediately without first creating it.  I'm sure we've all dug into the API to get environment variables like the temporary directory and then had a temporary naming convention we implemented to do this in the past.

Microsoft.VisualBasic.FileIO.TextFieldParser

This is a great class which allows you to read all manner of text files, whether they are comma delimited, tab delimited, or fixed width record files.  Yes, read, unfortunately not write.  It has support for pretty much every text file situation you may run into - qualifiers, separators, and even comment characters.  A useful quick, down and dirty class for upconverting old systems to a new format.


How Caching Can Really Improve Performance

Comments

We're always looking for ways to improve performance without expanding horizontally by adding new hardware.  Recently there's been a lot of buzz about caching, specifically memcached and Microsoft's Velocity.  These are distributed caching technologies which are ideally suited to server clusters.  This post isn't about them.  Instead its about plain ol' regular caching you would implement as you move up from small web-based project to medium project and before a distributed cache is something you'd use.  The techniques here are incredibly simple to implement but the results are astounding.

Our system handles tens of thousands of connections per day, with each client creating a transaction that involves querying a database, retrieving data, and then packaging that data up to transmit to the client.  The whole thing is fairly processor intensive during the peak operating hours.  Caching was "low hanging fruit" to quote John, and we knew that implementing it would drastically improve performance.  Here's a shot of what our database processor usage was at just before implementing caching, as generated by PAL:

Before Caching

Yikes!  You may not be able to tell from the picture but those are the four cores sampled in 10 minute intervals peaking at 100%.  There were some additional optimizations to be made at the OS and software level so this is an extreme case, however none of those optimizations account for what we got out of caching.

The problem: You have a data set that is retrieved by numerous clients regularly, or you have large chunks of data that are retrieved by numerous clients.  The data is not volatile - though you can certainly cache volatile data but that involves more extensive work in synchronizing the data store and cache.  This even works with smaller chunks of data - especially those read only lookup tables you use for UI labels like state, gender, part number descriptions, whatever, but the gain will be bigger based on the larger your data set is.  Our particular problem involves large chunks of data.

The solution: When data is retrieved from the database, cache that data in the application server's memory so the database doesn't have to be queried for the data again. Store the data in a structure that allows it to be easily retrieved.

For this example, let's assume you are caching a statistical report which contains oodles of data, images, etc (like a PAL report).  Your clients connect to the server periodically and retrieve one or more of these reports, at least once, if not several times during the day.  The shape of the query may change - for example one user may request 3 reports, then another may request 2 of those reports and 2 other reports not yet cached.  In any case we are still executing queries against the database to figure out which reports to return, but instead of returning all that data we are simply returning IDs, and we cache our reports in a hashtable on the application server by ID.

public class CacheService
{
   private readonly Cache _cache;
   private static CacheService _instance;
 
   private CacheService(Cache httpCache)
   {
      _cache = httpCache;
      HitCount = 0;
   }
 
   public T Get(int id, ICacheFulfiller fulfiller)
      where T : new()
   {
      var ttl =  new TimeSpan(1, 0, 0);
      var ret = default(T);
      var key = CreateKey(typeof(T), id);
      var obj = _cache.Get(key);
      if (obj == null)
      {
         ret = fulfiller.Get(id);
         _cache.Insert(key, ret, null,
           DateTime.UtcNow.Add(ttl),
           Cache.NoSlidingExpiration);
      }
      else
      {
         ret = (T)obj;
         HitCount++;
      }
      Total++;
 
      return ret;
   }
 
   public int HitCount { get; private set; }
   public int Total { get; private set; }
 
   public static CacheService GetInstance(Cache httpCache)
   {
      if (_instance == null)
         _instance = new CacheService(httpCache);
      return _instance;
   }
 
   private static string CreateKey(Type t, int id)
   {
      return t + "_" + id;
   }
}

The sample CacheService above is implemented as a singleton, stores cached items in a generic Dictionary with the key values being the hash code of the type string of the item to be cached plus the unique integer identifier (for the sake of this example let's assume we're using autoincrementing primary integer IDs in the database and the queries I talked about above respond with zero or more IDs when executed).  The HitCount and Total are used simply for gathering performance statistics; the places that benefit most from caching will end up with a 99% ratio of HitCount (number of actual cache item hits) to Total (number of total requests) over time.  We also have a no sliding expiration of 1 hour; its a good idea to expire your cached content because it could change and you don't want to have to clear the app pool in a situation where the change is not critical to recache as soon as its made.  So how are null keys - those items that have not yet made it into the cache - handled?  We want to handle them internally as part of the cache so you don't have to check to see if an item is cached, and if not, retrieve it, then cache it.  We also don't want to entangle retrieval logic for all cacheable items with our cache, so what we end up with is an ICacheFulfiller<T>:

public interface ICacheFulfiller where T : new()
{
   T Get(int id);
}

And what, exactly, is an ICacheFulfiller?  Essentially any service level class that you use to retrieve your data can become an ICacheFulfiller.  Going back to our fictional report retrieval example (assume we have a report class which stores all of the data about a report including its images):

public class ReportService : ICacheFulfiller
{
   public Report Get(int id)
   {
      Report r = null;
      // TODO get the report from the database
      // return a Report instance mapped to the
      // data
      return r;
   }
}

You can actually nix the entire ICacheFulfiller<T> idea completely and rely on Func<T> as seen in Steve Smith's more concise Cache Access Pattern Revised article.  I like the contractfulness of interfaces, though the usefulness of Func<T> is not lost on me.

So by implementing this relatively easy pattern our tens of thousands of transactions were resulting in a near 99% cache hit ratio which ended up taking the CPU performance of the database server from nearly 100% to...

Those are once again ten minute intervals, same snapshot of time (peak operating hours), and that's right the utilization is staying at around 10%.  The anomaly at the end is a backup job or report generation, whatever it is, it didn't have to compete with our primary operation to do its work.

So what about processor and memory utilization on the application (web) server? Aren't we just shifting some of the workload? Well the processor usage actually didn't go up (sat around 10%, or more informatively pre-cache and post-cache performance was the same) and memory utilization increased by about 100 MB for the cache, but memory utilization also decreased by about 100 MB (or more) due to less data mapping having to occur.

In our case we're still executing pretty complex queries against the database for every single transaction that's being made (a transaction being between a client and the server) but what the results say is that returning just a integer IDs instead of all the columns in joined tables as part of a single table result set is faster - not by a bit, but by magnitudes.  My point isn't that a recordset of integers is less intensive, its that the actual data retrieval is extremely intensive especially so in bulk and caching, when viable, alleviates that.

In the code above you'll note that I'm using the HttpCache from the HttpContext.  The reason for this is the already implemented expiration (among other things), but also as an additional treat there is the AspAlliance CacheManager which, while written some years ago, works perfectly fine today.  Just keep in mind that if you are using IIS7 on Vista or Server 2008 you're not adding to the httpHandlers section, but to the webServer/handlers section instead:

<system.webServer>
   <handlers>
      <add name="CacheManager" verb="*"
         path="CacheManager.axd"
         type="AspAlliance.CacheManager.CacheManagerPageFactory, _
            AspAlliance.CacheManager" />
   </handlers>
</system.webServer>