June 26, 2008

Update on the On Demand Errors

update: the tunnel changes are in and have been verified. We expect the intermittent error pages when viewing your account information to be no more.

Alright - we've tracked down the intermittent errors (previously mentioned here) to our IPSec configuration that connects our various data centers together. We have worked out an improved configuration, tested it in the lab, and will implement it tonight.

Customers should not expect an interruption in service outside of intermittent errors when viewing the account and user info pages. As soon as our changes are in place, I will update this post and your pain should be no more.

I apologize to any who were affected. This was an unexpected side effect of the upgrade that I performed this weekend, and although it will bring greater stability in the long run, it is unfortunate to have these sorts of problems up front.

Intermittent Errors With FogBugz On Demand

We are getting reports of intermittent errors when users are visiting account and user information pages on FogBugz On Demand.  We believe this to be a side effect of the maintenance this weekend, and are investigating right now.

June 25, 2008

Copilot Survey Duplicate Mail Update

First, allow me to apologize for to all of the customers who received multiple copies of our email offer that went out yesterday at approximately 3:00pm. We are seriously embarrassed that this happened.

There were several technical issues in our mass-mailing process that ultimately led to a subset of our customers receiving several copies of the message. The proximate cause of the problem was that FogBugz incorrectly interpreted an error message from our SMTP server, and instead of stopping, it tried to resend the message several times. Luckily, our awesome system administration team noticed the unusual traffic and shut down the mail server, preventing an even bigger problem.

The error that the mail server was throwing at FogBugz said that there were too many recipients for a single message. There is a default recipient cap in Postfix of 1000 addresses. Since each of the offer emails messages were destined for over 1000 recipients, we were in trouble. Ultimately, this configuration issue and the bug FogBugz had with handling this particular error condition worked together to confound us and unnecessarily send out extra emails to you.

We have begun the process of triaging the emails that did go out, and rest assured that those folks who received a message from us will not receive another copy. We will, starting immediately, be using a different process for sending this kind of email. The bug that we found in FogBugz has already been fixed, and that fix is being tested as I write this.

Please accept our sincere apology for this inconvenience, and if there is anything we can do for those affected, don't hesitate to contact us at support@copilot.com.

June 24, 2008

Fog Creek Copilot Survey duplicate mail

Earlier today our mail server started sending out multiple copies of mail messages for a survey we are running.

We're investigating -- but first I'd like to offer my apologies for the duplicate mails you may have received for our Fog Creek Copilot survey.  We're still not sure what happened, but we'll update you as soon as we know.

We're embarrassed, that's for sure.  If there's anything we can do to make it up to you personally, email us and let us know.

June 23, 2008

UPDATE: FogBugz On Demand Slowdown

Some customers may be experiencing a slowdown in service for our LA data center.  We are currently investigating.  We will post updates as soon as we have some info.

UPDATE: As of 12:40PM EST, the slowdown process should be fixed.  If you are having any other issues, let us know immediately.

June 18, 2008

FogBugz On Demand Maintenance This Weekend (Sunday, 6/22)

A notice to our FogBugz On Demand customers:

This weekend, on Sunday, June 22 at 00:01 - 05:00 EST (GMT -0500), we'll be taking one of our On Demand data centers offline to perform some upgrades.  We are moving the equipment to a larger area so that we may grow without trouble, will be implementing a much more resilient network configuration that will include redundant paths for all hosts, and will be installing several other devices that will improve our management of On Demand behind-the-scenes. 

During this window, your On Demand accounts may be unavailable.

If you have any questions, please feel free to contact us at customer-service@fogcreek.com or by calling us at 1-866-FOG-CREEK (1-866-364-2733). Those that are outside North America can reach us here: +1-212-279-2335.

June 17, 2008

Copilot Slowness

update: We have fixed the bug in the process, and are investigating avenues of isolating the Copilot application further from other Fog Creek applications.

Due to some run away processes, we are currently experiencing some Copilot slowness.  We have isolated the issue, and I will post more once things are under control.

June 09, 2008

Copilot Free Weekends

Over the weekend the Copilot website was not honoring the free weekends offer.  It seems that there was a regression in the new website that we missed.  Everyone who purchased a day pass over the weekend will be refunded and be provided a free day pass to use any time.  This post will be updated when we have a better idea of why this problem occurred.

UPDATE: We have determined that the error affected 49 people, all of whom have been refunded.

Why did the Copilot website not offer free passes this weekend?
As you may have noticed, we launched a new website design this past week, along with many small improvements.  As it happened, free weekends were added to the old website while the new site was under development.  A bad merge caused the free weekend code to not be properly ported to the new site, and was overlooked by the developer doing the merge.

Why?
The developer doing the merge (ok, I admit, it was me) did not have this particular merge code reviewed by a peer.

Why?
Until recently we did not have an effective tool for tracking code reviews.  Such a tool has been developed by the Copilot developers, and will be put into use in the coming weeks.  That said, the error should have been caught in testing, and was not.

Why?
Because it is a difficult feature to test.

Why?
Because it requires a QA analyst to do regression testing on a weekend to be absolutely certain that the feature works, since we can't just go around changing dates on production servers to make our lives more convenient.

In the future, our new code review tool will help identify omissions before they make it to QA.  Additionally, internal servers will be set to weekend dates for testing this particular feature.  Finally, a QA analyst will log in Saturday mornings after new builds have been pushed to ensure that the free passes are being honored.

June 03, 2008

Update: Copilot Issues for Internet Explorer Users

We've tracked down the problem and have a fix out for it.  IE users should now be able to download the Copilot clients once again.

The problem was due to a known issue in Internet Explorer that does not support downloads over SSL if the Cache-Control header is set to no-store or no-cache.  More information can be found from Microsoft:

http://support.microsoft.com/kb/323308
http://support.microsoft.com/kb/812935

We did not catch this during our testing because our test servers were not running SSL.  In the future we will add this check to our tests.

Copilot Issues for Internet Explorer Users

We are getting reports (and have been able to verify) that Copilot users that are running Internet Explorer 7 are unable to download our helper application.  We are working to resolve this right now.

Please note that all other browsers are working properly.  This is only an issue with Internet Explorer.