« February 2008 | Main | April 2008 »

March 2008

March 26, 2008

Intermittant Blip Seen By Some FogBugz On Demand Customers

At approximately 4:25 AM, one of our load balanced On Demand web servers went into an error state.   Our monitoring system caught it, and it was corrected by 4:30 AM. 

We will work to understand the cause of this error.

Knowing that these types of things will happen in the future, as they do in the world of web apps, we are working to implement a more robust load balancer that can detect this sort of problem and take the 'damaged' server out of the pool automatically.

March 24, 2008

Weekend Problems Accepting Payments

Due to some intermittent service problems with our 3rd party payment processor, we were unable to accept credit card payments at various times this weekend.  To further complicate matters, our own monitoring system was not notifying us of the problems as it should have been. 

I have corrected the bug in the monitoring system, and we are in touch with our payment processor to further understand the problem.

March 19, 2008

Copilot and FogBugz On Demand Sign-Up Outage

Both Copilot and FogBugz On Demand services were unable to accept credit card orders between 3:30 PM and 4:30 PM EST (approximately).   While our processing service was being monitored, it was not being watched at a deep enough level to detect this specific problem.  I will be installing a more scrutinizing monitor shortly.

March 18, 2008

Non-Service Impacting: Data Center Power Generator Test on March 19th, 2008 @ 07:00 AM EST

Just a note to everyone: our NYC colo will be performing a power generator test @ 7:00 AM EST on Wednesday, March 19th, 2008.  This is a routine test that ensures that the power system is functioning properly, and we are not expecting any trouble.

March 17, 2008

Payment Processor Response

My paper letter to the CEO of First Data resulted in two phone calls from them, and a very good phone call from the director of their online payment processor.  He shared with me the reasons for the 3 outages we experienced (upgrade to new version of Oracle db, out of disk space on a disk which wasn't being monitored, and redundant dns server failures).  I shared my thoughts about how the reasons for the failure were important (when they are down, we do not make any money), but that the communication was even more important.  Knowing what was going on lessens the frustration from a customers point of view, which is exactly why we started this blog.  Sharing our outages and the reasons for our mistakes, and how we're going to fix them in the future, lets our customers know that we try to be perfect, but we'll admit our mistakes when we aren't.

Hopefully the message got through.  Thanks for the phone call First Data!

March 05, 2008

Payment Processor Outage

UPDATE: It's back up.  I've sent a letter to First Data's CEO.  We'll see if I get a response ;)

Our payment processor's domain name is not resolving so we cannot reach it right now. (Neither can other machines out on the internet).  Their status message says the gateway is down.

March 03, 2008

SECURITY FIX: All users requested to upgrade to 6.1.13

Everyone that is currently running FogBugz 6.x is requested to please upgrade to 6.1.13 immediately.  This fix is a security fix and is supplied to all customers.  You can download the latest version from https://shop.fogcreek.com with your order number and email address.

http://fogcreek.com/FogBugz/KB/releaseNotes/WhatsNewInFogBugz6.1.13.html

  • Security Fix, applies to FogBugz versions 6.0.0 through 6.1.11
    A security vulnerability in FogBugz API versions 3 and 4 allowed registered users to view and edit some information that they did not have permissions on. It is highly recommended that all customers using FogBugz 6.0 upgrade to the latest version.
  • Improved error handling and error recovery in the FogBugz Maintenance service (heartbeat)
  • Better install support for Gentoo and Solaris
  • Fixed a bug that was causing the screenshot tool to fail for PHP FogBugz
  • Fixed intermittent SQL errors when using filters with sorted columns
  • Sped up link creation in wiki
  • Fixed timeout bug during FogBugz upgrade of MySQL databases