Archive for 2008

Useful commands I must never forget

Comments

On Windows, to sever all network CIFS/SMB:

net use * /delete

On Linux using bash, the ultra useful repeat history:

history | grep whatever_you_want_to_recall

then bang-number from history, i.e.:

$ history | grep restart

  431  /etc/init.d/apache2 restart
  483  /etc/init.d/sshd restart
  501  history | grep restart

$ !431

Simple tricks that somehow I always forget. I don't know how, these should be akin to typing by this point.  Maybe by posting them I will remember.


Microsoft.VisualBasic

Comments

One of my recent projects involved copying large files (10 MBs or more) over a network connection, usually a VPN connection over the internet.  I wanted a way to display progress but didn't want to have to spend too long writing the function.  Then I remembered ShellUI.dll and the Explorer dialogs it creates and how those would be perfect for what I was doing.  I immediately googled for the extern signatures necessary to summon the file copy dialog when I discovered from some research that this functionality was already wrapped in the Microsoft.VisualBasic assembly!

Scott Hanselman recently wrote about Single Instance WinForms and how you could essentially recreate the old Win32/VB6 single app instance to handle all requests - so multiple attempts at opening a document result in the same application servicing them.  This is very useful functionality and its already nicely wrapped up in the Microsoft.VisualBasic assembly.

I spent a good two hours digging into the assembly to see just what old favorites of mine were still around.  Beep is in there and Screen, another old favorite, among others I recall with fond memories.  There's also some new functionality I either was never aware of or have just simply forgotten.

So here's a brief overview of what I found in there digging around.  Some of the functionality you can do in C# with some effort and a lot of this functionality is very useful.

Microsoft.VisualBasic

MyGroupCollectionAttribute - here is one of those very interesting "supports Visual Basic and not intended to be used by your code."  After trying to get this guy to work, reading up on the VB.NET language reference, and generally Googling it, I get the idea.  Basically it uses strings to make a "group type family" of classes and acts like a compile time template to add the necessary methods to subclasses of the class marked with this attribute, and then gives you some syntaxical sugar where you don't even need to invoke your singleton GetInstance (or equivalent method); instead something that was once:

var x = MyClass.GetInstance(); x.DoSomething();

becomes:

MyClass.DoSomething();

without MyClass or DoSomething being declared as static (or Shared in VB).  It also allows you to alias these instances into a collection, a la My.Classes.MyClass and My.Classes.AnotherClass (assuming both MyClass and AnotherClass share the same ancestor and that ancestor is decorated with this attribute properly).  All in all a pretty neat idea, but I couldn't get it to work, the documentation was lacking and I tried every combination of C#, VB.NET, and C# and VB.NET to make this happen but obviously I was unable to devise the correct one.

Microsoft.VisualBasic.FileSystem

Kill - who doesn't like a function named Kill that does exactly what you'd expect it to?  (Assuming you'd expect it to delete a file and not terminate a process that is.  Or not kill a person.)

Microsoft.VisualBasic.Financial

* - all of these functions look very interesting if you are a financial programmer.  They don't have a whole lot of use for me, but still, a nice addition.

Microsoft.VisualBasic.Interaction

InputBox - a throwback to the old Windows 3.1 UI design days, an input box is a modal dialog (a #32770 IIRC) with a prompt, a text area, and buttons to confirm or dismiss the input box.  Need a quick and easy way to get a value while testing your UI?  The InputBox is effectively a simple - albeit perhaps not pretty - way of doing a Console.ReadLine from within a non-console WinForms application.  Because the return value of the "InputBox" method is a string, an empty (not null) string indicates that the cancel button was clicked which can be easily confused with the user entering or blanking the answer text box.  Also a notable throwback - the x and y positional arguments are in twips not pixels.

Partition - now this one is interesting and it has nothing to do with your file system.  Its basically a function that allows you to explode a single value into the range in which it belongs by defining the complete set and range intervals.  Microsoft provides a good example of what this does, and why the return value is an easily parsible string.

Switch - memories!  You provide an even number of items in an object[] with each odd numbered item a bool value or expression that evaluates to bool, and the even numbered items are the associated return values for the first expression which evaluates to true.  Basically a dynamic switch/case statement though certainly nowhere near as efficient.  Could be useful at some point as long as I remember its there.

Microsoft.VisualBasic.Strings

StrReverse - returns a string with the character positions reversed such that index 0 is now the last index and the last index is now index 0.

Microsoft.VisualBasic.Devices.ComputerInfo

* - contains some of the general level stuff (like physical memory available, OS version) that people sometimes ask for without resorting to WMI or Windows API calls.

Microsoft.VisualBasic.Devices.Keyboard

SendKeys - this gives you some level of easy automation with other applications in that it allows you to send key presses to an application as if the keyboard were being pressed.  You'll want to pair this with *.Interaction.AppActivate(string) which allows you to activate a window by caption.  Tip: if you don't want to make it look like your app is ghost writing when you are sending a lot of text, just preserve the clipboard, copy the text to the clipboard, then send keys "^v" which is your Control+v key.  Use this as a reference for all of the key command strings.

Microsoft.VisualBasic.Devices.Network

DownloadFile/UploadFile - allows you to specify URIs for remote addresses, local paths, and walla! instant download or upload of a file.  This seems to be restricted to ftp, http, and https so other later installed protocols likely don't work.

Ping - does what you think, with a boolean return value indicating if the ping request was successful.

IsAvailable - a boolean method that tells answers whether a network connection is available and working.

NetworkAvailabilityChanged - an event you can subscribe to to be notified when the network availability changes without having to perform any complex Win API P/Invokes.

Microsoft.VisualBasic.FileIO.FileSystem

* - the CopyFile and MoveFile and DeleteFile all use the ShellUI.dll functionality to allow you to automatically add support for progress and user controllable long operations (such as allowing the user to cancel a copy operation).  The DeleteFile methods even allow you to specify recycling bin options so you don't just unlink files, but send them to the recycling bin instead.

FindInFiles - the basic "find in files" ShellUI implementation which I believe does not make use of the indexing service.

GetTempFileName - a method which returns a temporary file name in the temporary directory for the current context (user's temporary directory for a user and one can assume system temporary directory for a service) and also creates a 0 byte file on disk so you can just open it up immediately without first creating it.  I'm sure we've all dug into the API to get environment variables like the temporary directory and then had a temporary naming convention we implemented to do this in the past.

Microsoft.VisualBasic.FileIO.TextFieldParser

This is a great class which allows you to read all manner of text files, whether they are comma delimited, tab delimited, or fixed width record files.  Yes, read, unfortunately not write.  It has support for pretty much every text file situation you may run into - qualifiers, separators, and even comment characters.  A useful quick, down and dirty class for upconverting old systems to a new format.


How Caching Can Really Improve Performance

Comments

We're always looking for ways to improve performance without expanding horizontally by adding new hardware.  Recently there's been a lot of buzz about caching, specifically memcached and Microsoft's Velocity.  These are distributed caching technologies which are ideally suited to server clusters.  This post isn't about them.  Instead its about plain ol' regular caching you would implement as you move up from small web-based project to medium project and before a distributed cache is something you'd use.  The techniques here are incredibly simple to implement but the results are astounding.

Our system handles tens of thousands of connections per day, with each client creating a transaction that involves querying a database, retrieving data, and then packaging that data up to transmit to the client.  The whole thing is fairly processor intensive during the peak operating hours.  Caching was "low hanging fruit" to quote John, and we knew that implementing it would drastically improve performance.  Here's a shot of what our database processor usage was at just before implementing caching, as generated by PAL:

Before Caching

Yikes!  You may not be able to tell from the picture but those are the four cores sampled in 10 minute intervals peaking at 100%.  There were some additional optimizations to be made at the OS and software level so this is an extreme case, however none of those optimizations account for what we got out of caching.

The problem: You have a data set that is retrieved by numerous clients regularly, or you have large chunks of data that are retrieved by numerous clients.  The data is not volatile - though you can certainly cache volatile data but that involves more extensive work in synchronizing the data store and cache.  This even works with smaller chunks of data - especially those read only lookup tables you use for UI labels like state, gender, part number descriptions, whatever, but the gain will be bigger based on the larger your data set is.  Our particular problem involves large chunks of data.

The solution: When data is retrieved from the database, cache that data in the application server's memory so the database doesn't have to be queried for the data again. Store the data in a structure that allows it to be easily retrieved.

For this example, let's assume you are caching a statistical report which contains oodles of data, images, etc (like a PAL report).  Your clients connect to the server periodically and retrieve one or more of these reports, at least once, if not several times during the day.  The shape of the query may change - for example one user may request 3 reports, then another may request 2 of those reports and 2 other reports not yet cached.  In any case we are still executing queries against the database to figure out which reports to return, but instead of returning all that data we are simply returning IDs, and we cache our reports in a hashtable on the application server by ID.

public class CacheService
{
   private readonly Cache _cache;
   private static CacheService _instance;
 
   private CacheService(Cache httpCache)
   {
      _cache = httpCache;
      HitCount = 0;
   }
 
   public T Get(int id, ICacheFulfiller fulfiller)
      where T : new()
   {
      var ttl =  new TimeSpan(1, 0, 0);
      var ret = default(T);
      var key = CreateKey(typeof(T), id);
      var obj = _cache.Get(key);
      if (obj == null)
      {
         ret = fulfiller.Get(id);
         _cache.Insert(key, ret, null,
           DateTime.UtcNow.Add(ttl),
           Cache.NoSlidingExpiration);
      }
      else
      {
         ret = (T)obj;
         HitCount++;
      }
      Total++;
 
      return ret;
   }
 
   public int HitCount { get; private set; }
   public int Total { get; private set; }
 
   public static CacheService GetInstance(Cache httpCache)
   {
      if (_instance == null)
         _instance = new CacheService(httpCache);
      return _instance;
   }
 
   private static string CreateKey(Type t, int id)
   {
      return t + "_" + id;
   }
}

The sample CacheService above is implemented as a singleton, stores cached items in a generic Dictionary with the key values being the hash code of the type string of the item to be cached plus the unique integer identifier (for the sake of this example let's assume we're using autoincrementing primary integer IDs in the database and the queries I talked about above respond with zero or more IDs when executed).  The HitCount and Total are used simply for gathering performance statistics; the places that benefit most from caching will end up with a 99% ratio of HitCount (number of actual cache item hits) to Total (number of total requests) over time.  We also have a no sliding expiration of 1 hour; its a good idea to expire your cached content because it could change and you don't want to have to clear the app pool in a situation where the change is not critical to recache as soon as its made.  So how are null keys - those items that have not yet made it into the cache - handled?  We want to handle them internally as part of the cache so you don't have to check to see if an item is cached, and if not, retrieve it, then cache it.  We also don't want to entangle retrieval logic for all cacheable items with our cache, so what we end up with is an ICacheFulfiller<T>:

public interface ICacheFulfiller where T : new()
{
   T Get(int id);
}

And what, exactly, is an ICacheFulfiller?  Essentially any service level class that you use to retrieve your data can become an ICacheFulfiller.  Going back to our fictional report retrieval example (assume we have a report class which stores all of the data about a report including its images):

public class ReportService : ICacheFulfiller
{
   public Report Get(int id)
   {
      Report r = null;
      // TODO get the report from the database
      // return a Report instance mapped to the
      // data
      return r;
   }
}

You can actually nix the entire ICacheFulfiller<T> idea completely and rely on Func<T> as seen in Steve Smith's more concise Cache Access Pattern Revised article.  I like the contractfulness of interfaces, though the usefulness of Func<T> is not lost on me.

So by implementing this relatively easy pattern our tens of thousands of transactions were resulting in a near 99% cache hit ratio which ended up taking the CPU performance of the database server from nearly 100% to...

Those are once again ten minute intervals, same snapshot of time (peak operating hours), and that's right the utilization is staying at around 10%.  The anomaly at the end is a backup job or report generation, whatever it is, it didn't have to compete with our primary operation to do its work.

So what about processor and memory utilization on the application (web) server? Aren't we just shifting some of the workload? Well the processor usage actually didn't go up (sat around 10%, or more informatively pre-cache and post-cache performance was the same) and memory utilization increased by about 100 MB for the cache, but memory utilization also decreased by about 100 MB (or more) due to less data mapping having to occur.

In our case we're still executing pretty complex queries against the database for every single transaction that's being made (a transaction being between a client and the server) but what the results say is that returning just a integer IDs instead of all the columns in joined tables as part of a single table result set is faster - not by a bit, but by magnitudes.  My point isn't that a recordset of integers is less intensive, its that the actual data retrieval is extremely intensive especially so in bulk and caching, when viable, alleviates that.

In the code above you'll note that I'm using the HttpCache from the HttpContext.  The reason for this is the already implemented expiration (among other things), but also as an additional treat there is the AspAlliance CacheManager which, while written some years ago, works perfectly fine today.  Just keep in mind that if you are using IIS7 on Vista or Server 2008 you're not adding to the httpHandlers section, but to the webServer/handlers section instead:

<system.webServer>
   <handlers>
      <add name="CacheManager" verb="*"
         path="CacheManager.axd"
         type="AspAlliance.CacheManager.CacheManagerPageFactory, _
            AspAlliance.CacheManager" />
   </handlers>
</system.webServer>

Windows Server Performance Monitoring and PAL (Part 1)

Comments

An area where I have had little experience is server monitoring on Windows.  I've done a fair amount with Linux but never really had a need to do it on Windows.  I did some research and looked into Nagios and some commercial software solutions (all starting at around $20,000 with annual licensing fees) but I knew I wanted something smaller and easier to deal with - we're talking about a pair of servers that'll probably grow to a dozen or so over the next year.

I knew about Windows performance counters and I knew I could log them so I went and had a look to see if anyone had written an Excel spreadsheet to make the pretty pictures everyone enjoys.  When I resurfaced I came back with Performance Analysis of Logs (PAL) which is an incredible piece of VBScript that saved me tons of time and/or money (depending entirely on which path a parallel me chose in an alternate reality).  Essentially what PAL does is crunch the numbers in your binary or text Performance Monitor logs, processes them with profiles defined in XML files, and produces a great HTML report with graphics, links that has further explanation, and most importantly alerts.  PAL includes a GUI tool which you can use to point to a log file, choose a profile, answer some questions (four) about your server hardware, do some additional configuration, and generate a log file.

PAL is only part of the solution.  If you're like me, you'll want to retain those log files and reports for performance comparison in the future so you can see if your performance changes have the desired effect.  You'll need some place to store the logs (zip archiving them is best) and a sane way to manage the *.htm reports it generates.  My suggestion (for the manual approach): use MSIE to save your PAL report in *.mht format.  Once saved in HTML Archive Format (*.mht) you can delete the directory and *.htm file PAL creates and you have your report in a single file.  If you use Firefox, its okay, install IE Tab and direct all "file:///C:\*" to view in MSIE (you may also find this useful for pointing to your Outlook Web Access URL if your organization uses Exchange).  Finally the *.mht files are viewable from within OS X, simply give them a *.eml extension and OS X's Mail.app will display them with no issues.  (It should also be noted that Firefox has its own web archive-like format you could use just as well, but in part 2 I'll talk about automating this process and include some source code that does this, and uses CDO to make *.mht files.)

To get started with PAL you'll need a few things: PAL, Microsoft's Log Parser, and Office 2003 (or later) Web Components.  Fortunately all of these items are free.

PAL (the betas have been working for me using SQL Server 2005 and IIS profiles, and... NICE 2008 HyperV support in beta 8!)

Microsoft Log Parser 2.2

Microsoft Office 2003 Web Components (later versions should work as well)

So great, enough information to be dangerous now.  But what should you record?  I record the following performance counters on our servers:

  • Memory\*
  • Paging File\*
  • Physical Disk\*
  • Processor(*)\*

PAL may not analyze every single counter but it does hit all the big ones and pretty much covers anything that may be important to you excluding custom performance counters you implement in your own code (more on that in part 3).  There are a number of other performance counters you can track depending on what purpose your server serves.  If its an IIS web or application host you can track (and display) IIS performance counters; likewise for SQL Server, Exchange, or a host of other applications.  PAL comes with many *.xml profile files that contain information on where and how to display the performance data in the reports it generates.

In terms of overhead I've read a lot of people claim there is little to no overhead, and its a greater risk to just not track performance (and setup alerts).  I agree with these statements and I have so far not found performance tracking to adversely impact the performance of our production servers.

What about scheduling PAL to run nightly?  While you can use perfmon.msc to configure the performance logs and counters they will track you cannot schedule re-occuring jobs.  Fortunately logman, a utility included in Windows, allows you to do just that.

logman update logname -b 07/14/2008 23:00:00 -e 07/15/2008 09:00:00 -r

This command will cause the performance log named "logman" to start recording at 11 PM server time on 7/14 and end recording at 9 AM server time on 7/15.  The important switch here, -r, will cause the schedule to be recurring so the following evening at 11 PM on 7/15 it will begin recording anew and end at 9 AM on 7/16.


SharePoint, WebDAV, and http (not https)

Comments

Our company loves to co-locate services.  Source control, production servers, HR, you name it, if we can get it out of the building, we do it.  So it comes as no surprise that our document management/repository exists out on the internet hosted by a service provider using Microsoft SharePoint.

A very irksome issue with SharePoint and Vista is that for many people SharePoint shares aren't accessible as mappable network drives or through the common dialog for opening/saving files in Office 2007.  People upgrading from Windows XP find they suddenly just cannot access their previously mapped and usable SharePoint shares.  This problem occurs when you use Vista and attempt to map to a share across http because Vista, coming configured for a much more stringent security model, doesn't do WebDAV over http, only https.  If you are having issues with a Windows 2008 server connecting to a SharePoint (or WebDAV) share over http I would imagine the solution below will work for you as well.  (And yes, SharePoint or WebDAV over http is probably a silly idea in the first place... obviously if its within your power to https-ify the share that's likely the best solution - even if its via a self-signed cert).  (Update: Windows Server 2003 instructions appear below.)

(I ran across a lot of people having this problem, some solutions to similar problems, but the most informative source was a post I found in Robert McMurray's blog.)

  1. Fire up regedit.exe (if UAC is enabled you'll have to approve it of course)
  2. Navigate to the key HKEY_LOCAL_MACHINE\
    SYSTEM\CurrentControlSet\Services\WebClient\Parameters
  3. Change the setting BasicAuthLevel from its current value (default 1) to 2 (0 means disabled, 1 is https only, 2 is both http and https)
  4. Optional. Change the setting ServerNotFoundCacheLifeTimeInSec from its current value (default 60) to 0 (change from hexadecimal to decimal) (I made this change just because I had been hammering on a box trying to fix the issue, I didn't want a cache issue to be my undoing)
  5. Right-click a shortcut to cmd.exe and choose Run As Administrator (or just fire up cmd.exe if you have disabled UAC)
  6. Enter the following commands:
    1. net stop webclient
    2. net start webclient
    3. net use x: "http://yoursharepointprovider.tld/somesharename" /user:"whateverdomain\yourusername" /persistent:yes
    4. Enter your password when prompted

The net use command is optional, you can map using the Windows Map Network Drive functionality or even open the share using the common file dialog in Office.

Note that this solution does not fix the fact that with most shared SharePoint hosting providers you cannot map to the root share (that is, http://yoursharepointprovider.tld); you must choose a share.  There are ways around this if you have access to the IIS instance hosting SharePoint but since that wasn't my problem I did not explore them.

And yes, what this means is that if you are automating uploading files to SharePoint you don't have to use the SharePoint web service API or the SharePoint SDK to perform such a simple task - you can just use File.Copy and away you go!

Update:

For Windows 2003 server the DWORD value you must add to the registry under the Parameters key above is UseBasicAuth and set its value to 1 and you'll also want to update the AcceptOfficeAndTahoeServers key from 0 to 1.  You cannot just net stop/start webclient; instead you must net stop mrxdav then net start webclient.  Finally, and this is important if you are using things such as "%20" to represent spaces in your folder names - don't; instead just use spaces or the equivalent character.  I found this out from KB841215.