Why Is The Site So Slow? My Eyes Are Glazing Over! Ross | May 20th, 2009

As I wrote in my post on May 7 – “Fessing Up To Our Mistakes” - we ran into problems earlier this month with scaling our application.

That post has a great deal of detail about those problems, our early efforts to solve them, and the lessons we learned. We know that especially during the day in the Western Hemisphere, our site has been very slow.

From the moment we ran into problems with scaling, we’ve focused 100% on trying to resolve them as quickly as we can. I’ve been personally involved with those efforts and our entire team has been involved either in helping to fix the problems or responding to our users.

I wanted to give you an update on what we’ve done and what’s left for us to do. I am happy to answer specific questions if the update doesn’t answer them – feel free to leave them in the comments to this post.

1. Our priority has been to find a global way to deliver the data quicker to every user, especially during periods of heavy traffic (7 am to 7 pm in the Central Time zone). The problem for us isn’t the number of servers, the data-center, or bandwidth. We host with one of the top providers in the world (for which we pay an obscene amount of money every month).  As I wrote in the earlier post – our problem is with our existing application and the way that data is stored and queried from the database. We can’t easily solve that problem with our existing application, so we’ve looked for a temporary solution to work around it. We’ve brought in additional developers and system admin people to assist and we’re close to a solution. We’ll be doing further testing tomorrow on a test site before we roll out the solution to our production site.

For those who want slightly more technical details – we’ve tried to implement Squid (without success) and Varnish (with some success, and we hope that we can finalize Varnish tomorrow) to cache much of our information so that we can serve huge numbers of people without the delays you’ve been seeing. In the very short gaps of time where we’ve tested on the production site, this has worked very well. Unfortunately, we’ve run into problems that have required us to revert back. That’s why last week, some of you saw pages that wouldn’t load – and why we posted a note at the top of our site telling you that some pages might not load.

We debated whether we should “test” in our production environment but concluded that unless we subjected the proxies to real load, we could not be sure whether they would work or not. In fact, they worked beautifully in our test environment and then would fail horribly when placed under real load. Many of the problems caused last week were the result of those live tests.

We are very proud of our customer service team for making sure our community was well informed about the problems, and for dealing with buyer and creatives who were having all sorts of problems on the site. By Friday of last week, we resolved virtually all of the outstanding problems except for the overall sluggish site performance during the day.

2. Our second priority (which we’re pursuing in parallel with what I just discussed), is to audit all of our server configurations, identify all errors and areas where we can improve performance, test, and implement those solutions. This has been an ongoing process and we’ve made numerous improvements that have significantly helped in the off hours, but have had only a marginal impact during the day. We continue to make tweaks looking for ways to make sure that our servers are performing 100%.

3. Our third priority is to explore the addition of more servers. If we could fix all problems by deploying more servers, we would.

This is not as simple a solution as it appears because more servers can actually hurt our performance – by putting more demands on the database and our file server. We learned this when we deployed the 2 additional servers that we added to our server farm last week. That’s why at the moment, this is not our highest priority, but we have continued to evaluate this option to make sure that once we find a way to add more servers without incurring the negative impacts, we can do so.

Once we have stabilized the site and returned performance to levels that don’t embarrass us (and believe us, we are embarrassed about the performance of our site over the last few weeks), we’ll refocus to promptly complete our refactoring efforts, will thoroughly test the new code, and will deploy it at the earliest opportunity. We are confident that the new code will resolve virtually all of these issues, and more importantly, will allow us to scale cleanly and efficiently.

Please feel free to ask questions. I am happy to get into more technical discussions in the comments if you’re interested or if it’ll help you avoid making some of the same mistakes we’ve made.

Thanks to our entire community for your patience with us as we deal with the real problems of scaling. We continue to be humbled by your confidence in our ability to promptly get past these issues.

Photo credit: law_keven

Need something designed? Name your price. Pick from 110+ entries. Love it or your money back.

Like our blog? You’ll freaking love our Twitter updates. Oh, and you’ll dig our Facebook page too.

  • chiz

    Hi Ross,

    “This is not as simple a solution as it appears because more servers can actually hurt our performance – by putting more demands on the database and our file server.”

    If you’re considering adding additional server/s, I suggest you check on “load balancing”. This is commonly used technique to resolve server load problems.

    link: http://en.wikipedia.org/wiki/Load_balancing_(computing)

    Regarding testing, you could also do automated load testing which usually just requires a software specifically designed for this purpose. An example is “The Grinder” (http://grinder.sourceforge.net/), which runs on Java.

    I’ve no idea on the framework where the cS application runs but I’m pretty sure there’s an equivalent software designed for the platform that does load testing.

  • PattyMarler

    Ross,
    Im very pleased that you’ve given us an update on the progress on the site issues.
    And mostly the amount of work you and the crew have put into resolving them.

    Thank You!
    Patty Anne :0)

  • Randy James

    I have a lot of experience solving these types of issues, and I offer my help at no cost.

    Randy James
    312-835-4742

  • Randy James

    I have a lot of experience solving these types of issues, and I offer my help at no cost.

    Randy James
    312-835-4742

  • http://www.crowdspring.com Ross

    @chiz – Thanks so much for the suggestion. We’ve run a hardware load balancer from the day we launched. At the moment, the load balancer is balancing load across all of our servers equally. We’ve experimented with different methods of balancing the traffic but none address some of the core problems we’re trying to solve.

    The automated load testing works well and we’re going to do that before we deploy the new code for sure.

    @PattyMarler Thanks so much – we’re really proud how our team has come together to work through these issues.

    @Rayndy Thanks so much for your very generous offer. If we can’t figure things out, we just might take you up on it.

  • lindaeckel

    You give us hope! The transparency of the cS team goes a long way in making us feel like a real community. Thanks!

    But now for another big problem with the page loading: I am on EST on the coast and day is the best time for quick loading, but at night the vampires are sucking the blood out of your servers. I am a night owl and have often just quit waiting for an upload. I come back and check after 15 min, 30 min, etc. Sometimes it ends up working and sometimes I wake up in the morning and it is still sputtering. Once I reload in the AM all is well. Work on that, too, pretty please.

    Linda

  • http://www.crowdspring.com Ross

    @chiz – Thanks so much for the suggestion. We’ve run a hardware load balancer from the day we launched. At the moment, the load balancer is balancing load across all of our servers equally. We’ve experimented with different methods of balancing the traffic but none address some of the core problems we’re trying to solve.

    The automated load testing works well and we’re going to do that before we deploy the new code for sure.

    @PattyMarler Thanks so much – we’re really proud how our team has come together to work through these issues.

    @Rayndy Thanks so much for your very generous offer. If we can’t figure things out, we just might take you up on it.

  • http://www.crowdspring.com Ross

    @lindaeckel Night presents an entirely different issue. We run a number of important business intelligence processes at night and those are absolutely killing our servers. We run them against a database backup server, but the repercussions are so great that everything seems to be impacted. I spent half the day today working on this and at the end of the day, we made schedule changes to see if we could reduce the night load. I am a night owl too – so I realize how important it is for us to fix this. Tonight is the first night for these schedule changes, so let’s see if this helps.

    The reason things work so well in the evening and in the morning is that traffic is smaller and we run no processes during those times. Our goal is to get the entire site working well 24 hours per day. We won’t stop until we reach that goal.

  • http://www.crowdspring.com Ross

    @lindaeckel Night presents an entirely different issue. We run a number of important business intelligence processes at night and those are absolutely killing our servers. We run them against a database backup server, but the repercussions are so great that everything seems to be impacted. I spent half the day today working on this and at the end of the day, we made schedule changes to see if we could reduce the night load. I am a night owl too – so I realize how important it is for us to fix this. Tonight is the first night for these schedule changes, so let’s see if this helps.

    The reason things work so well in the evening and in the morning is that traffic is smaller and we run no processes during those times. Our goal is to get the entire site working well 24 hours per day. We won’t stop until we reach that goal.

  • lindaeckel

    Sorry, Ross, it didn’t help!
    Last night was the worse mess of all. My entry never did load…it was still trying in the am, long after the project closed. It is OK though, since the buyer didn’t like the style of my previous entries. Thanks for trying! But keep trying!

  • http://www.crowdspring.com Ross

    @lindaeckel – We fixed the issues during the day but have not yet worked on the night problems. Those have continued. I’m going to make a few more changes today that should address the night issues temporarily as we look for a better solution. I am truly sorry that you were unable to upload the entry in time.

  • http://www.crowdspring.com Ross

    @lindaeckel – We fixed the issues during the day but have not yet worked on the night problems. Those have continued. I’m going to make a few more changes today that should address the night issues temporarily as we look for a better solution. I am truly sorry that you were unable to upload the entry in time.

  • tilmonb

    As for the site being slow…….Please understand that I love your site……..But……….

    I think it is time to bring in some heavy hitters. You need to hire some experts at this sort of thing. As a paying customer it has gotten beyond the point of tolerance. I can’t even view my project and it’s submissions by the artist. I have been patient because I think you have a great concept, but there has got to be a way to fix this thing. If it is in fact a scalable database issue then you have to go to the biggest baddest database guys out there and let them figure it out. Give them some benchmarks and then let THEIR experts develop the fix. Tell them you’ll pay a premium if they work out a fix or something. It won’t be long before you are gonna start losing scores of customers AND a competitor is gonna step up and eat your lunch………..Sorry, sometimes the truth hurts.

  • thejanitor

    This is all very well, but I am incredulous that you are not putting this issue at the very top of your list of priorities. I am completely unable to use this site in any meaningful way, because of my inability to move from page to page at will. I am in need of some extra work at the moment, and your service would be very useful to me. Please canvas your creative colleagues for a solution and find one SOON!

    I’ve got an idea – put it out as a creative brief!

  • http://www.crowdspring.com Ross

    @tilmonb We’ve had experts working with us all to help us fix these problems. The inherent problems arent’ related to hardware. The underlying code is simply having difficulty handling the volume (users, entries, etc). We’ve been working to address this core problem (and looking at everything else that could possibly contribute to this problem).

    @thejanitor This issue is at the top. Period. Nothing else is higher priority at the moment (or for the past 4 weeks). You can be sure that we’re getting little sleep (and our experts are working around the clock). We know that a good working site is important and that currently, the site is not particularly useful. We are working to change that. And we’ve had a great response from our community, with many people offering to help – very generous offers and we truly appreciate them.

  • http://www.crowdspring.com Ross

    @tilmonb We’ve had experts working with us all to help us fix these problems. The inherent problems arent’ related to hardware. The underlying code is simply having difficulty handling the volume (users, entries, etc). We’ve been working to address this core problem (and looking at everything else that could possibly contribute to this problem).

    @thejanitor This issue is at the top. Period. Nothing else is higher priority at the moment (or for the past 4 weeks). You can be sure that we’re getting little sleep (and our experts are working around the clock). We know that a good working site is important and that currently, the site is not particularly useful. We are working to change that. And we’ve had a great response from our community, with many people offering to help – very generous offers and we truly appreciate them.

  • thejanitor

    Hi Ross,
    I think I touched a raw nerve – I apologise :o)

  • thejanitor

    Ross, it might be worth trying the folks at Coroflot how they handler traffic? http://www.coroflot.com.

    Rgds,
    Adrian

  • Cliff

    September 22, 2011 Crowdspring is painfully slow and not worth the constant frustration of going nowhere!

  • Jlh

    Stalling and crashing

Hey, it's crowdSPRING!

Tens of thousands of the world's best and most successful entrepreneurs, businesses, agencies and nonprofits use crowdSPRING for affordable and risk-free custom logo design, web design, a new company name or other writing and design services. More than 160,000 designers and writers work on crowdSPRING. We create designs and names people love. 100% guaranteed.

Get Blog Updates

Free E-Books

12 Question Interviews with cS designers.
Get it »

Contracts for designers who hate contracts.
Get it »

Contracts for software developers who hate contracts. Get it »

More in Announcements, Small business, Start ups (444 of 583 articles)

/** chartbeat **/