Archive for the ‘Uncategorized’ Category

Manning Hadoop in Action Chapter 1 Example

No Comments »

It took me a little while of digging to get to the baseline source code for the Manning Hadoop in Action (2010) chapter 1 source code.

You can find the WordCount.java here. Here's the 1.0.0 version I used:

/**
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
package org.apache.hadoop.examples;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

  public static class TokenizerMapper
       extends Mapper{

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString(), " \t\n\r\f,.:;?![]'*");
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken().toLowerCase());
        context.write(word, one);
      }
    }
  }

  public static class IntSumReducer
       extends Reducer {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable values,
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      if (sum > 4) {
		  context.write(key, result);
	      result.set(sum);
	  }
    }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
    if (otherArgs.length != 2) {
      System.err.println("Usage: wordcount  ");
      System.exit(2);
    }
    Job job = new Job(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
    FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

Compilation with Java 6 (1.6) is a bit more involved. I wrote a simple shell script, the important thing here is to get the classpath flag correct. Java veterans will of course have no problem with this, but it took me a few minutes to sort out.

#!/bin/bash
rm -rf output/
javac -classpath "../share/hadoop/lib/commons-cli-1.2.jar:../share/hadoop/hadoop-core-1.0.0.jar" -d classes src/WordCount.java
jar -cvf wordcount.jar -C classes/ .
../bin/hadoop jar wordcount.jar org.apache.hadoop.examples.WordCount input output

ASP.NET MVC 2 Areas and Spark View Engine

No Comments »

I did a little looking around at alternative view engines for ASP.NET MVC 2 and gave Spark a go.  My biggest problem is that its not Haml.  I was moving some *.aspx views over to *.spark and pretty much its an exercise in replacing <% with { with some XMLish <if> blocks thrown in.  (Didn’t we already determine that XML if’s are bad?)

Anyway decided not to use it but I did sort out a problem with 1.1.0.0 (as of November 2010) where ASP.NET MVC 2 Areas simply did not work with the view engine.  I suspect this is also the same reason why NHaml doesn’t work with Areas as well.

In the source for Spark in the Spark.Web.Mvc assembly within the DefaultDescriptorBuilder class around line 183 I’ve replaced the methods PotentialViewLocations, PotentialMasterLocations and PotentialDefaultMasterLocations with the following code:

 

protected virtual IEnumerable<string> PotentialViewLocations(string controllerName, string viewName, IDictionary<string,object> extra)
{
    if (extra.ContainsKey("area"))
    {
        return ApplyFilters(new[]
        {
        controllerName + "\\" + viewName + ".spark",
        "Shared\\" + viewName + ".spark",
        "~\\Areas\\" + extra["area"] + "\\Views\\Shared\\" + viewName + ".spark",
        "~\\Areas\\" + extra["area"] + "\\Views\\" + controllerName+ "\\" + viewName + ".spark"
        }, extra);
    }

    return ApplyFilters(new[]
    {
        controllerName + "\\" + viewName + ".spark",
        "Shared\\" + viewName + ".spark",
    }, extra);
}

protected virtual IEnumerable<string> PotentialMasterLocations(string masterName, IDictionary<string, object> extra)
{
    if (extra.ContainsKey("area"))
    {
        return ApplyFilters(new[]
        {
            "~\\Areas\\" + extra["area"] + "\\Views\\Layouts\\" + masterName + ".spark",
            "~\\Areas\\" + extra["area"] + "\\Views\\Shared\\" + masterName + ".spark",
            "Layouts\\" + masterName + ".spark",
            "Shared\\" + masterName + ".spark"
        }, extra);
    }

    return ApplyFilters(new[]
    {
        "Layouts\\" + masterName + ".spark",
        "Shared\\" + masterName + ".spark"
    }, extra);
}

protected virtual IEnumerable<string> PotentialDefaultMasterLocations(string controllerName, IDictionary<string, object> extra)
{
    if (extra.ContainsKey("area"))
    {
        return ApplyFilters(new[]
        {
            "~\\Areas\\" + extra["area"] + "\\Views\\Layouts\\" + controllerName + ".spark",
            "~\\Areas\\" + extra["area"] + "\\Views\\Shared\\" + controllerName + ".spark",
            "Layouts\\Application.spark",
            "Shared\\Application.spark"
        }, extra);
    }

    return ApplyFilters(new[]
    {
        "Layouts\\" + controllerName + ".spark",
        "Shared\\" + controllerName + ".spark",
        "Layouts\\Application.spark",
        "Shared\\Application.spark"
    }, extra);
}

 

This doesn’t work with Hanselman’s Mobile Device Capabilities ViewEngine out of the box, you’ll have to do more work for that.

If you don’t want to go through the legwork of patching Spark 1.1.0.0 I’ve zipped up a Release build of the relevant files from my patch.

I had to dig pretty deep for the original solution.


Successful Lisp: How to Understand and Use Common Lisp

No Comments »

Just discovered this little gem while on a hunt for the Lisp code that is responsible for FORMAT’s ~R.  I am over halfway through Practical Common Lisp now – Successful Lisp looks like a good compliment to it.  (And I have been reading Paul Graham’s ANSI Common LISP in parallel.)


Its 2010. I shouldn’t have to write code like this.

1 Comment »

_webRequest is a HttpWebRequest instance, _headers is a WebHeaderCollection from the original HttpWebRequest.  I am relaying headers from one request to another, as required by a third party application.  Why do I have to write such terrible code?

foreach (var key in _headers.AllKeys)
{
    // of course MS would have to fuck this up somehow
    var value = _headers[key];
    Log.Debug(key + ": " + value);
    switch (key.ToUpper())
    {
        case "ACCEPT": _webRequest.Accept = value; break;
        case "REFERER": _webRequest.Referer = value; break;
        case "USER-AGENT": _webRequest.UserAgent = value; break;
        case "TRANSFER-ENCODING": _webRequest.UserAgent = value; break;
        case "DATE":
            if (DateTime.TryParse(value, out dt))
                _webRequest.Date = dt;
            break;
        case "IF-MODIFIED-SINCE":
            if (DateTime.TryParse(value, out dt))
                _webRequest.IfModifiedSince = dt;
            break;
        case "CONTENT-LENGTH":
            int cl;
            if (Int32.TryParse(value, out cl))
                _webRequest.ContentLength = cl;
            break;
        default:
            _webRequest.Headers.Add(key, _headers[key]);
            break;
    }
}

It should be what I originally tried:

//_webRequest.Headers.Add(_headers);

This is precisely the sort of code that creates a great dissatisfaction within me when working with the .NET Framework.


UTF-8 and char 65279 (byte order mark)

No Comments »

Should you find yourself catting files together to make a megafile (like turning many individual sproc files into a single file fit for execution) and you end up with your program (let’s say SQL Management Studio) saying there’s a problem with the file its because a Unicode BOM (byte order mark or char value 65279) that appeared at the start of one of the concatenated files made it into your megafile.  We solved this by opening the offending file in Visual Studio and Save As… overwriting itself as a Unicode Codepage 1200 instead of UTF-8.