Not Too Proud To Say “We’re Sorry” Ross | June 5th, 2009

We just went through one of the most humbling experiences for our company since we launched in May 2008. We experienced serious site performance problems. While our team (and numerous consultants on several continents) worked around the clock, it took us some time to identify the root causes of the problems and to fix them.

We realize that the performance issues made it very difficult to use the site. We did our very best to communicate about our efforts to solve the problems (in this blog, via email and on Twitter).

Yesterday, we believe we turned the corner and got a handle on the problems. Today, we sent the following email to our community of creatives:

Next week, I’ll have a detailed post (or series) explaining the problems and our efforts to solve them. We want others to learn from this experience and we’re happy to share.

We believe in transparency. The response from our community to that email has been truly wonderful.

We love our community. And we take every opportunity to let our community know this.

Need something designed? Name your price. Pick from 110+ entries. Love it or your money back.

Like our blog? You’ll freaking love our Twitter updates. Oh, and you’ll dig our Facebook page too.

  • angrypuppy

    I thank you for addressing this. I have had difficulties since the beginning. But i am sad to say, I am still waiting almost one minute per click on the site. Im sure you are frustrated just as much as the users of the site. I will keep checking back to see if it improves. I do like the site and would love to use it. Just not this way.

  • Kinlochbervie

    LOVE that you guys are open with communication! While some issues may be irritating, I think we all know how fickle technology can be. Thanks for your candor!

  • http://www.crowdspring.com Ross

    @angrypuppy – We made a really really stupid mistake today. When we sent our email to tens of thousands of users, we didn’t think they’d visit right away. At once. Our servers were completely overloaded for about four hours. We know that our existing software can’t handle that much traffic at once (that’s why we’re refactoring 100% of it). All looks good at the moment and until we push the new software, we won’t be emailing everyone at once. Sorry about that.

    @Kinlochbervie Thanks so much for those kind words, as we wipe the embarrassment off our faces (see note above in response to @angrypuppy)

  • http://www.crowdspring.com Ross

    @angrypuppy – We made a really really stupid mistake today. When we sent our email to tens of thousands of users, we didn’t think they’d visit right away. At once. Our servers were completely overloaded for about four hours. We know that our existing software can’t handle that much traffic at once (that’s why we’re refactoring 100% of it). All looks good at the moment and until we push the new software, we won’t be emailing everyone at once. Sorry about that.

    @Kinlochbervie Thanks so much for those kind words, as we wipe the embarrassment off our faces (see note above in response to @angrypuppy)

  • Ntelekt

    I was one of the thousands who received your email yesterday and jumped right on the site to check it out. I located a project that I wanted to participate in and was able to get all the details — although doing so reminded me of the good old days of connecting to Compuserve with a 300bps dial-up modem. I then spent the next 2-1/2 hours designing and trying to upload my bid to the site. I finally gave up and for the cherry on top, I missed the bid because the bidding window closed.

    I would call that an unfavorable result for your stress test.
    However, I would encourage you NOT to stop sending out the email notices when you think you have things under control. Rather, you should continue to do so, but as an invitation to stress the system to find its next weakest link…not an announcement of all is well and good.

    As a further suggestion, perhaps you could look into adding a snippet of server code that can detect when the service begins to logjam and could automatically extend bidding windows so that projects do not close out during a period where the system is incapable of functioning normally.

    It would seem prudent to me that if you are going to perform a stress test – which you should – then you could at least take some precautions to ensure that projects are not adversely affected because of it.

    That’s my 2c – otherwise, keep up the good work!

    –MikeN

  • mudmaven

    I, too, continue to experience great difficulites with the site. This afternoon I have been unable to access mySpring and cannot submit materials. I really love crowdSpring but using the site is excrutiatingly frustrating.

    I know you are working on it and hope you are able to resolve the difficulites soon.

  • http://www.crowdspring.com Ross

    @Ntelekt – yes, the days of the 300bps modem. Mighty Mite for the Commodore 64 for me. I am very sorry about the site issues. It indeed was an unfavorable result for the stress test. We truly thought we turned the corner, but we had not. In fact, a bunch of us put in a very long day yesterday to continue stabilizing things and again today – at the moment, the site is running nicely. We’re finding problem areas and fixing them, but the bottom line is that the core of our software – the content management system we are currently using – is severely broken and must be replaced. We’re doing our best to keep it on life support.

    Great idea about automatic extensions. I don’t think we will do that with our existing code (we’re pushing 100% new code in 4-6 weeks, but something to definitely think about with that code.

    @mudmaven We are working hard. I had to re-introduce myself to my kids this morning – before sitting down at the computer for much of the day. mySPRING still presents some issues and we need to fix a few things in there on Monday. This rmorning, the site worked really well. This afternoon, for a few hours, it sucked. And now, after a few minor fixes, it’s working really well again. We know how frustrating this is for everyone (for us too) and we truly want to get back to the great experience we know you all love and deserve.

  • http://www.crowdspring.com Ross

    @Ntelekt – yes, the days of the 300bps modem. Mighty Mite for the Commodore 64 for me. I am very sorry about the site issues. It indeed was an unfavorable result for the stress test. We truly thought we turned the corner, but we had not. In fact, a bunch of us put in a very long day yesterday to continue stabilizing things and again today – at the moment, the site is running nicely. We’re finding problem areas and fixing them, but the bottom line is that the core of our software – the content management system we are currently using – is severely broken and must be replaced. We’re doing our best to keep it on life support.

    Great idea about automatic extensions. I don’t think we will do that with our existing code (we’re pushing 100% new code in 4-6 weeks, but something to definitely think about with that code.

    @mudmaven We are working hard. I had to re-introduce myself to my kids this morning – before sitting down at the computer for much of the day. mySPRING still presents some issues and we need to fix a few things in there on Monday. This rmorning, the site worked really well. This afternoon, for a few hours, it sucked. And now, after a few minor fixes, it’s working really well again. We know how frustrating this is for everyone (for us too) and we truly want to get back to the great experience we know you all love and deserve.

  • N.Jonathan

    Perhaps identifying the source of the problems will help you in avoid the issues that you had faced.
    Hope that you would make up for all that is lost.
    -Jonathan
    http://www.p2w2.com/jonathan

  • N.Jonathan

    Perhaps identifying the source of the problems will help you in avoid the issues that you had faced.
    Hope that you would make up for all that is lost.
    -Jonathan
    http://www.p2w2.com/jonathan

  • ranflyer

    still problematic — the difficulty is in wrapping up a project and without that, a designer is not getting paid. Will keep trying.

  • randyjames@itperformanceexperts.com

    Please have you system admins look at the capacity of your disk subsystems (I/O).

    With the size of the files you are moving around, if the storage subsystem is not adequate (number of disks, sector sizes, Raid type (I sure hope you are using RAID 19 and NOT Raid 5), controller cache size and settings (read/write percentages), and amount of bandwidth from storage to the database server, the HBA settings, and bandwidth and TCP/IP tuning to and from the web servers, and the number of threads enabled on the database server, and the number of connections enabled on each web server and database server, etc.

    If you don’t have the database spread over at least 20 15K drives, and using large stripe and sector sizes, any other attempt to “tune” the application will fail. You are moving huge files around (images), and this takes HUGE bandwidth between storage and database and application servers.

    You may want to investigate the use of solid state disk as a temporary solution – I can hook you up with demo hardware to test.

    I agree that the performance has been horrific – I’m just about finished as a customer.

    Most developers, DBA’s,and system admins know nothing about performance optimization – and worse, are afraid to ask for help.

    As to the comment “we all know how fickle technoloyg can be” – that is an excuse by IT people that don’t know the answers. It can be better – it has to be better, or this business will fail.

  • randyjames@itperformanceexperts.com

    Should be Raid 10 (striped and mirrored) not 19 (a typeo). 99% of sites use the wrong raid settings – raid 5 causes a lot more writes than raid 10. This is a significant issue in dabase apps.

  • mariarti

    thats what i have whan i trying to open any of crowdspring pages

    Fatal error: Undefined class name ‘ezuserloginhandler’ in /var/www/html/cms/extension/cookieuser/sso_handler/ezcookiessohandler.php on line 105
    Fatal error: eZ publish did not finish its request

    The execution of eZ publish was abruptly ended, the debug output is present below.

    ugh…

  • angrypuppy

    How is it going for everyone else? I am still having problems.

    Im going through design withdrawls!

  • http://www.crowdspring.com Ross

    @N.Jonathan We’re doing our best to find the root cause of each of the issues. The underlying problem is that the content management system we’re using simply cannot handle the load. Pure and simple.

    @ranflyer I am sorry about that. I assure you we keep working to fix things but we realize that doesn’t make it any less frustrating for our community.

    @randyjames Thanks very much for those thoughts. I/O is not an issue (any longer. It was a big issue 4 weeks ago but we solved that issue pretty quickly. I/O problems masked issues in other areas). We run RAID 10 for the database (because of writes) and have done that from the month we launched (upgraded 3 weeks after we launched in May 2008), RAID 5 for the file server – RAID 5 in many application is equally fast if not faster than RAID 10 for reads. Plenty of bandwidth, everything is tuned (I’ve personally reviewed every setting on every box 5 times and have worked with top experts at our host – Rackspace – to optimize them). We’re ruled out all hardware/bandwidth related issues. Our images aren’t in our database, so that isn’t the issue, and we don’t move the images back to the application servers – we serve them straight up from the file server.

    Part of the problem is that our CMS (which is the core of our application) doesn’t let us split reads and writes. This makes it tough to properly distribute the load.

    Thanks for the offer of solid state. Something that we might look at down the line but the timing doesn’t help at the moment, unfortunately (and our DB is in memory, so the impact to the DB would not amount to much).

    I agree that technology can be better and has to be better or we’ll fail. Pure and simple.

    @mariarti Sorry about that. Very frustrating errors. The site is working much better after about 6 pm CDT and until about 9:30 am CDT and we’re continuing to work on improving it during the day.

    @angrypuppy Still up and down. And we’re bummed that we’re spending all our time fixing things rather than helping all of you get great projects and build your client lists. Real bummed.

  • http://www.crowdspring.com Ross

    @N.Jonathan We’re doing our best to find the root cause of each of the issues. The underlying problem is that the content management system we’re using simply cannot handle the load. Pure and simple.

    @ranflyer I am sorry about that. I assure you we keep working to fix things but we realize that doesn’t make it any less frustrating for our community.

    @randyjames Thanks very much for those thoughts. I/O is not an issue (any longer. It was a big issue 4 weeks ago but we solved that issue pretty quickly. I/O problems masked issues in other areas). We run RAID 10 for the database (because of writes) and have done that from the month we launched (upgraded 3 weeks after we launched in May 2008), RAID 5 for the file server – RAID 5 in many application is equally fast if not faster than RAID 10 for reads. Plenty of bandwidth, everything is tuned (I’ve personally reviewed every setting on every box 5 times and have worked with top experts at our host – Rackspace – to optimize them). We’re ruled out all hardware/bandwidth related issues. Our images aren’t in our database, so that isn’t the issue, and we don’t move the images back to the application servers – we serve them straight up from the file server.

    Part of the problem is that our CMS (which is the core of our application) doesn’t let us split reads and writes. This makes it tough to properly distribute the load.

    Thanks for the offer of solid state. Something that we might look at down the line but the timing doesn’t help at the moment, unfortunately (and our DB is in memory, so the impact to the DB would not amount to much).

    I agree that technology can be better and has to be better or we’ll fail. Pure and simple.

    @mariarti Sorry about that. Very frustrating errors. The site is working much better after about 6 pm CDT and until about 9:30 am CDT and we’re continuing to work on improving it during the day.

    @angrypuppy Still up and down. And we’re bummed that we’re spending all our time fixing things rather than helping all of you get great projects and build your client lists. Real bummed.

  • cloud168

    Fatal error: Undefined class name ‘ezuserloginhandler’ in /var/www/html/cms/extension/cookieuser/sso_handler/ezcookiessohandler.php on line 105
    Fatal error: eZ publish did not finish its request

    The execution of eZ publish was abruptly ended, the debug output is present below.

  • http://www.crowdspring.com Ross

    @cloud168 Please delete your crowdspring.com cookie and log in again. All should be well.

  • http://www.crowdspring.com Ross

    @cloud168 Please delete your crowdspring.com cookie and log in again. All should be well.

  • johnjamesjacoby

    Good luck guys. I know how frustrating this can be. If it makes you feel better, my absenteeism isn’t your fault directly. I actually find myself having so much work lately that I just don’t have time to stop by anymore. Don’t worry though, I’ll be back from time to time. I like the service and I like the way you run your site, and that’s what keeps me coming back.

  • http://www.crowdspring.com Ross

    @johnjamesjacoby Great to hear you’re keeping busy with work – always music to our ears. Thanks so much for the kind words. It’s frustrating for sure, but we’re getting through it and making improvements along the way in the way things are setup that will help us once we roll out the new code.

  • http://www.crowdspring.com Ross

    @johnjamesjacoby Great to hear you’re keeping busy with work – always music to our ears. Thanks so much for the kind words. It’s frustrating for sure, but we’re getting through it and making improvements along the way in the way things are setup that will help us once we roll out the new code.

  • nathank

    The site is working much better now fellas. Good job on clearing up your issues.

  • http://www.crowdspring.com Ross

    @nathank Still have issues during day (Chicago time) from about 9:30 am to about 6:30 pm but continuing to work to improve things. Outside those hours, the site seems to be nice and quick. We made further changes yesterday, so we’ll see how things hold up today.

  • http://www.crowdspring.com Ross

    @nathank Still have issues during day (Chicago time) from about 9:30 am to about 6:30 pm but continuing to work to improve things. Outside those hours, the site seems to be nice and quick. We made further changes yesterday, so we’ll see how things hold up today.

  • Susan

    This is the error that I get. I can’t get any further than this page.
    I can’t even view your projects without getting this message.
    Please help.6-10-09
    Fatal error: Undefined class name ‘ezuserloginhandler’ in /var/www/html/cms/extension/cookieuser/sso_handler/ezcookiessohandler.php on line 105
    Fatal error: eZ publish did not finish its request
    The execution of eZ publish was abruptly ended, the debug output is present below.

  • http://www.crowdspring.com Ross

    @Susan I’m sorry about that. There’s a note at the top of our site – you’ll need to delete your cookie for crowdspring.com – Here’s a link that will explain how to do it: http://forums.crowdspring.com/showthread.php?p=8742#post8742

  • http://www.crowdspring.com Ross

    @Susan I’m sorry about that. There’s a note at the top of our site – you’ll need to delete your cookie for crowdspring.com – Here’s a link that will explain how to do it: http://forums.crowdspring.com/showthread.php?p=8742#post8742

  • Bryan

    I thought the problem is solved, unfortunately, I still experience the slowness of the site.

  • Bryan

    I thought the problem is solved, unfortunately, I still experience the slowness of the site.

  • cloud168

    Hello Ross,..

    is “filter your search” in browsing page temporary disable or gone forever..??
    thanks for the info.. :)

  • defunktees

    @defunktees I would love to post a project but how can I commit to something designers are walikng away from and or unable to submit a comp to my offer???

  • DevCloud

    It is all well and good sending an apology to the Designers but what about the people who have had projects live during this period, they may not have had the entries against their projects that they could have expected had the site have been running fine. This group has not been sent anything and yet it is they who are currently providing the revenues but not necessarily seeing the value…

  • http://www.crowdspring.com Ross

    @Bryan Not sure why your experience is slow. Site has been pretty quick last 2 days straight. Can you please email me at: bryanfromblog@crowdspring.com and let’s look into this further…

    @cloud168 Temporarily disabled to help with site performance. Will be back in 4-6 weeks when we push new code (and we’ll improve it too)

    @defunktees Good point. I assure you our community is strong and growing. We’ve been really fortunate to have great support from the designers and I think you’ll be proud of the work you’ll get on crowdSPRING (our recent troubles notwithstanding).

    @DevCloud Good point – and apologies for the site issues. We’re working with buyers in all affected projects. I am seeing some nice work in your project, and if you think you need us to extend by a few days as a result of the problems the first couple of days after you posted your project, we can do that. Just click “contact us” and our customer service team will be happy to help.

  • http://www.crowdspring.com Ross

    @Bryan Not sure why your experience is slow. Site has been pretty quick last 2 days straight. Can you please email me at: bryanfromblog@crowdspring.com and let’s look into this further…

    @cloud168 Temporarily disabled to help with site performance. Will be back in 4-6 weeks when we push new code (and we’ll improve it too)

    @defunktees Good point. I assure you our community is strong and growing. We’ve been really fortunate to have great support from the designers and I think you’ll be proud of the work you’ll get on crowdSPRING (our recent troubles notwithstanding).

    @DevCloud Good point – and apologies for the site issues. We’re working with buyers in all affected projects. I am seeing some nice work in your project, and if you think you need us to extend by a few days as a result of the problems the first couple of days after you posted your project, we can do that. Just click “contact us” and our customer service team will be happy to help.

  • espeters

    I have not been able to get on the site all day – it is a bit frustrating.
    Thanks

  • http://www.crowdspring.com Ross

    @espeters Sorry about that. If the problem is with logging in, you might need to delete your crowdspring.com cookie. Here’s how to do it: http://forums.crowdspring.com/showthread.php?p=8742#post8742

    If it’s something else, please write to me at: espeters@crowdspring.com and I’ll be happy to help you.

  • http://www.crowdspring.com Ross

    @espeters Sorry about that. If the problem is with logging in, you might need to delete your crowdspring.com cookie. Here’s how to do it: http://forums.crowdspring.com/showthread.php?p=8742#post8742

    If it’s something else, please write to me at: espeters@crowdspring.com and I’ll be happy to help you.

  • defunktees

    The site is down again. I posted a project but now I cannot review it. If this continues I would expect an extension of the number of days I continue to have issues. Gaining access every other day does not cut it for paying customers…

  • http://www.crowdspring.com Ross

    @defunktees In the past three days, the site has been down for 10 minutes (yesterday around lunch Chicago time for maintenance). Otherwise, it’s been up and working well. Here’s a link to your project:

    http://www.crowdspring.com/projects/graphic_design/logo/tee_shirt_company_logo

    Are you having issues logging in or seeing your project? I see that it was posted yesterday right after we did our maintenance outage – so that should not have affected the project.

  • http://www.crowdspring.com Ross

    @defunktees In the past three days, the site has been down for 10 minutes (yesterday around lunch Chicago time for maintenance). Otherwise, it’s been up and working well. Here’s a link to your project:

    http://www.crowdspring.com/projects/graphic_design/logo/tee_shirt_company_logo

    Are you having issues logging in or seeing your project? I see that it was posted yesterday right after we did our maintenance outage – so that should not have affected the project.

  • melissamax

    Hello! Can’t get on! Working on the problem? Have contests that I am in and cannot check on them or enter any more until issues are solved! Really like this site, but hope the down problems are over soon.

  • http://www.crowdspring.com Ross

    @melissamax – we might have found the user login problem. We just pushed a fix so please clear your browser’s cache, close/reopen your browser and try again…

  • http://www.crowdspring.com Ross

    @melissamax – we might have found the user login problem. We just pushed a fix so please clear your browser’s cache, close/reopen your browser and try again…

  • ruiponce

    If you substract functionality for the site (“filter”) to fix it, you are not really fixing it. Promising to bring back that functionality does not fix it either.

    You guys have a great idea and a great vibe. It would truly astonishing if you burn through this hard-to-get stuff just because you can’t sort out the easy stuff.

    It is absurd to pretend that server / database / virtualization issues are complex. In fact, it is borderline suicidal to broadcast to the community at large that you have major issues with basic scaling up.

    With respect, admiration and love from and old timer that have seen many come and go. Time to wake up and stop buying snake oil (I want to believe you are not selling it).

  • http://www.crowdspring.com Ross

    @ruiponce Thanks for contributing to the discussion. You are right – it is highly likely that functionality like the filter aren’t core contributors to site performance problems. But in our testing, they put a heavy load on the database and we wanted to be a bit conservative and remove them for now (we’ve debated about bringing the filters back now, but much prefer to focus on completing the new code and avoiding any performance issues on the current site) . They’re working well in our refactored code and will be back shortly.

    As for broadcasting to the community scaling issues – those issues are real. The complexity isn’t in the solutions – it’s in the fact that the core problems are with code that we cannot easily revise at the moment – which means most of the solutions we want to implement are not possible. That’s been perhaps the most frustrating part of responding to the performance issues. I’ll be writing more about that in the blog in the coming weeks.

  • http://www.crowdspring.com Ross

    @ruiponce Thanks for contributing to the discussion. You are right – it is highly likely that functionality like the filter aren’t core contributors to site performance problems. But in our testing, they put a heavy load on the database and we wanted to be a bit conservative and remove them for now (we’ve debated about bringing the filters back now, but much prefer to focus on completing the new code and avoiding any performance issues on the current site) . They’re working well in our refactored code and will be back shortly.

    As for broadcasting to the community scaling issues – those issues are real. The complexity isn’t in the solutions – it’s in the fact that the core problems are with code that we cannot easily revise at the moment – which means most of the solutions we want to implement are not possible. That’s been perhaps the most frustrating part of responding to the performance issues. I’ll be writing more about that in the blog in the coming weeks.

Hey, it's crowdSPRING!

Tens of thousands of the world's best and most successful entrepreneurs, businesses, agencies and nonprofits use crowdSPRING for affordable and risk-free custom logo design, web design, a new company name or other writing and design services. More than 160,000 designers and writers work on crowdSPRING. We create designs and names people love. 100% guaranteed.

Get Blog Updates

Free E-Books

12 Question Interviews with cS designers.
Get it »

Contracts for designers who hate contracts.
Get it »

Contracts for software developers who hate contracts. Get it »

More in Announcements, Start ups (301 of 405 articles)

/** chartbeat **/