« Awesome

GoGrid: No ssh for you!

So an odd thing happened around the 29th/30th of August that turned our production system upside down for a short time: the GoGrid machine we had running for a bit with no problems suddenly mounted the root file system as read only and stopped accepting incoming ssh connections.

Naturally we tried to resolve the problem through their tech support, but all we could were uninformative replies like "you must have upgraded your kernel" and "I can't get the machine to get an address through DHCP".  Of course we haven't upgraded the kernel or any such thing.  At one point the first tech could connect but said the kernel panicked during the boot process.  Mmm great, I knew I should have backed up our config.

So we went into disaster recovery mode and tried to stand up another GoGrid instance using CentOS 32-bit.  No dice, the machine would boot but couldn't ssh to it (another trapped in kernel panic?).  Same thing for a RHEL 5 64-bit instance, that one we could ssh to, but tried a RHEL 4 32-bit instance, boot but no ssh, and finally another RHEL 4 32-bit instance assigned from the bottom of the IP pool and we could ssh to it.  Very hit or miss so it was too risky to proceed.

We ended up moving our Linux/Apache/PHP5 system to a Windows 2008/IIS7/PHP5 system we had sitting spare (as a hot spare of our production system actually) and configured FastCGI and had things chugging along in about 4 hours.

Loosing a production system is a tough problem to deal with.  The day was spent sorting out problems, fixing bad data (a read only file system using file based caching can make some really really bad data), and essentially lost.  This is the risk you take and sometimes the price you pay for hosting on a beta platform.

Too bad, we were planning on moving our development, demo, and test servers to GoGrid because it would be cheaper, minus these sorts of events of course.


  • http://www.jangro.com Scott Jangro

    Bad news Chuck.

    I tried setting up some gogrid servers last night and ran into the no ssh problem as well. I tried a CentOS and a RH, both would start up but I couldn’t connect.

    The support guy in chat, after going quiet for what seemed like 20 minutes, told me that it was a known problem (I don’t know for how long) where the “public interface does not get configured due to a VLAN issue”.

    All I know is I couldn’t ssh or even ping the server.

    Ultimately we left it unresolved with an escalated trouble ticket.

    I will say that after using the gogrid interface, I’m pretty excited about its potential. I too was getting ready to set up development and test servers.

  • http://www.deploymentzone.com Chuck

    We’ve left a few machines up that exhibited the problem, I wish the original box where the file system was mounted in read only mode and didn’t accept SSH (but did accept FTP and HTTP) was still in that state to help them troubleshoot because I think (I hope) that is an unrelated problem that would be important to solve before they move to production.

  • http://www.skytap.com/ Alex

    Chuck,
    I just read your blog and wanted to point you in the direction of Skytap. You mentioned issues on production and thus the reason for not doing Dev, Demo, & Test Servers. Skytap provides virtual environments with 32 and 64 bit windows, linux, and solaris images. You also get storage, and the ability to snapshot entire VM’s and environments in a couple clicks. It makes it easy to revert back to past builds and let you turn on copies when others die. Skytap is the leader in VM’s via a browser and with a low price point and month2month services it’s easy to start and see value for a low amount of dollars.

    send me a note and I can send over some whitepapers I have