By Lambert Strether of Corrente.
Yves and I have spent an inordinate amount of time over the last year trying to find a satisfactory web host. Yves estimates the time she’s spent on this pleasant duty would be equivalent at least half, perhaps as much as 2/3 of the time it would take to write a book, so the opportunity cost has been significant. Leaving aside the stress not just of the search, but the reason we have to make the search: The site keeps going down at unpredictable times for unpredictable intervals. (Actually, that’s not quite correct: I can recall at least two outages right before Links were about to go live! Perhaps the site senses the stress of the operator…)
So, first, I’m going to go over our experiences so you understand what we’re going through; then I’m going to look at the bundle of services involved in web hosting, and solicit your views on the bundling approach, and for possible vendors.
We examined a large number of hosts seriously and got into extended due diligence and negotiations with four, to the point of scheduling data transfers that were aborted with two, and completed with one (see this post on WP Engine to give you an idea of how time-consuming and surreal — in a word, devolved — our experiences with web hosting companies have been).
Our current tech dude, who both administers WordPress (WP) and runs the server, initially seemed to be a step in the right direction, since he tuned the site to improve performance, which had been an issue, and was generally quick to act when we had spambot attacks, by updating the server’s blacklist (though it’s not clear why CloudFlare, which we use, needed the assistance). In fact, we’d been hopeful that any problems could be worked through methodically; until the last, very painful, multi-hour outage.
Our previous tech dude bundled web hosting functionality in a different way: He not only administered WP but handled plug-ins and design, and also set us up with a self-managed server at a perfectly reliable big iron shop. Then, however, when the site went down, the big iron guys would say they saw no activity spike and no reason for the site to have crashed on their end. They’d restart the server, and it would fall over again. So Yves would have to get the tech dude, who did know WP, to flush out whatever the cause of the crash was, and fix it. The problem: He was not 24/7, and so if the site when down when he was unavailable, NC could be out for a few hours, which would make Yves crazy.
But the root of all the problems seems to be WordPress. If Microsoft built a blogging platform, it would be WordPress: feature-bloated, breakage prone, and demanding constant updates. (The weakest part of WordPress, where it notoriously scales badly, is its database design. And because Naked Capitalism updates frequently, thanks to our active comments section and our refusal to turn that functionality over to the justly hated Disqus, we run the database very hard.)
Now, readers might logically suggest a hosting service that markets itself to the WordPress community as a solution, particularly since the big ones, like WP Engine (tried them, offer 24/7 service. However, based on our experience, we are now increasingly of the view that the WordPress hosting model might be a ruse to generate large upcharges for service that is not substantively very different.
For instance, WP Engine makes great claims about being optimized for performance, as in speed, and while that is nice to have, our big concern is uptime. But if the site goes down due to some weird seize-up in WordPress, our assumption is that 80% of the time it is probably something that isn’t complicated (as in there are 3-5 obvious things to check) and that a not-hugely-skilled WordPress person could bring http back up, or reboot the database, and clear out all the caches, run database integrity checks, etc. Our expectation was that someone who marketed themselves as a WP host would run those basic checks if a site fell over and implement those simple correctives.
But that’s isn’t what WP Engine offers. They told us that (basically) they guaranteed 100% uptime from the LAMP stack*, but if WP seized up, that was our problem. The most they would do is walk us through some troubleshooting. So WP Engine’s value-add was where, exactly? Anyhow, the site can fall over when Yves and I are not on duty, and we’d expect a host that markets themselves as WP host to be willing to do basic correctives if we gave them permission to intervene. But n-o-o-o-o-o!’
So now the solicitation part:
As you can see, we’ve tried bundling — unbundling, debundling, rebundling — the following tasks in several ways across several vendors over time:
 WordPress Plugins, Design, and Maintenance. Here I think we’ve come out of the stormy seas into the safe harbor [touch wood]. The site looks classy, we have a ticketing system, and so on.
 Server Hosting (the big iron). In essence, the LAMP stack. Building it, keeping it current, restarting it on demand. This is a 24/7 task in that a warm body must be available to handle a restart.
 WordPress Tuning. By tuning, I mean making sure the various aspects of the LAMP stack are tuned to our load, speed, and reliability requirements, in addition to making sure that settings on the WordPress dashboard are as we would wish. Our current tech dude seems have addressed this effectively, but it’s hard to know how much is WordPress turning specifically, and how much is due to CloudFlare or other server administration factors.
 Server Administration. Administering the entire LAMP stack for load, speed, and reliability requirements beyond those of WordPress; for example, security, preventing spam attacks, and so on. This is a 24/7 task that requires a level-headed and skilled technician, who must also understand the interactions between generic LAMP stack optimizations and the WordPress Tuning at #2. It would be possible to unbundle this task as follows:
[4a] Routine Service Call: 24/7 crash solution: Bring http back up, and/or reboot the database, and/or clear out all the caches, run database integrity checks. I (lambert) do this sort of thing at my own blog, but I’m not 24/7 and in any case I should probably be blogging.
[4b] Technical Service Call: Given a crash, why did it happen? What, in tasks #2, #3, or #4 must be adjusted to prevent a crash from happening again?
We could, I suppose, find one vendor that bundled tasks  through  (since  is covered, for now): A managed server with 24/7 support whose tech people had decent WordPress tuning chops. However, the word “saga” wouldn’t be in the headline if we’d managed to find that one vendor, and we’ve tried, as you see.
Or we could create our own bundle from several vendors. For example, we could find our own Big Iron at , call in a WordPress Tuner, rather as one would call in a piano tuner, for a consulting fee at , and then find a 24/7 Administration solution at  (would have to be at least two people, maybe one the tech lead to handle [4b], and then two for the evening and night shifts at [4a] (depending on time zones). The downside, of course, is that we have more co-ordination to do.
Readers, does this breakdown of task bundles seem reasonable? What would you add? Or substract? And crucially, can you suggest any vendors for the tasks, bundled whichever way?
NOTE * LAMP Stack (adapted from): They call it a stack is because each layer builds on the layer beneath it. Your Operating System, inux, is the base layer. Then pache, your web server, sits on top of Linux. Then your database, ySql (or ariaDB or ongoDB…) stores all the data served by Apache, and PHP (or erl or HP….) is used to drive and display all the data as web pages, and handle user interactions that build new pages.
NOTE ** The trade-off with CloudFlare is speed vs. time: Pages are cached up there in the Cloud somewhere, which makes delivering them to you faster, but comments may take a few minutes to make it into the cache.
NOTE This is not the post to raise concerns about site issues other than those already raised. It’s most definitely not the post to comment on typefaces, the way the site looks on this or that machine or platform, or other look and feel issues.
Also, I’m deliberately not including site specs. The “You give me the specs, I give you the quote” dance has been a massive #FAIL throughout this saga. Reliable 24/7 support, as in tasks  and , is the key here. Probably anybody whose business is adequately resourced to handle them can handle our throughput.
It’s not surprising that you are having problems.
I’ve been trying to get my social security old-age benefits for 10 years, and the government won’t even respond to my e-mails. Such is “democracy” in the U.S. of A.
Arnold Lockshin, political exile from the U.S. living in Moscow, Russia
Try to send them snail-mail, registered. There will most likely be a procedure for handling snail-mail that nobody thought to cancel yet, forcing them to create a case from the mail.
I have some experience with this. The Danish government, who has embraced IT to such a degree that the administration looks like a hand-puppet (in kinky latex, from Amsterdam, but without lube on it) will routinely ignore emails sent to them – while expecting immediate responses to their “e-mails” (A.K.A. PDF files only visible after mandatory registration and cumbersome login). The Danes will try to leverage bureaucracy to screw expats out of their pensions too, there must be a handbook: “best practices in new public management” …
My suggestion is: don’t worry about this. The site is fine. Just be grateful to those whose comments are provided gratis and make NC something to look forward to. When it crashes we just turn to something else for a few minutes and sneak back in hope of a recovery.
Seems so to me too. It works well enough in general.
Agreed. I’ve never tried accessing the site and not had it load up immediately. My recommendation is find out what the logs say when the site ‘goes down’. If they don’t tell you anything then you haven’t enabled the right kind of logging in the various tools you use (Sql Server, Web Server, Php Module, etc). It should all be in the logs.
Also be aware that we are entering a new era in the kinds and quantities of web attacks and mischief. The fact that your site goes down now and then doesn’t surprise me. Look at your server logs at the time the crash occurred to see if there are any strange server requests happening.
I just read about one type of request which seemed innocuous enough (it was an http get request asking WordPress to perform a search). However this specific request makes the server work as hard as possible, and then multiple it by several hundred identical requests coming at the same time, and it will take the server down. This was a layer 7 denial of service attack.
If your site is already compromised then you’d have to start with eradication. Companies like Securi specialize in this kind of stuff. There are a lot of web attacks these days, lots of different ways criminals try to monetize their exploits. Wish I could be more help, but I don’t run a large site (just a personal blog with not much traffic).
One of my hobbies is watching hit reports on the Econ Organizing site I work for and tracking down ip addresses of any “access denied” reports. The other day I was doing this and happened to notice that the site was getting a lot of traffic to a lot of very random, old pages. Curious, I checked out the IP on one hit and traced it to Germany, registered to “Server Block”. Then I checked out the next hit: same IP. And the next and the next and the next. 38 pages of hit reports, over the course of about 40 minutes, all from one anonymous, foreign IP address. So far as I could tell, no one else accessed the site during those forty minutes, which would be unusual.
Since one of the IP addresses from an “access denied” report that I looked at last week was registered to Black Lotus Comm., a DDOS protection service, my immediate thought was: is this what a DDOS attack looks like? Was that a test-run? Am I being paranoid or what?
I don’t know, could be a ddos attack. It seems like there are different kinds of ddos these days. Here is one from very recently which apparently set a new record for data rates.
I rather think that daylong outages would be the norm if Yves wasn’t so dedicated to the site. Do show a little appreciation for her perspiration, won’t you?
PS. I find it instructive to set up a parallel WordPress site using the same underlying operating system, in a virtual machine running on your desktop computer (I use VirtualBox). You can even import your entire database in, to simulate reality as much as possible. You can’t simulate the traffic, but you can test various plugins and server settings in a controlled environment, and you can do it all without jeopardizing your live website.
Some low-cost ways to monitor your system for anomalous activities:
First check to see what ports are open. There shouldn’t be anything ‘listening’ which is not intended by you.
Here is another simple way to see if anything running on your system is calling out to malicious domains, following the instructions outlined here (I haven’t tried it but it looks very simple to use).
You can use this to see what files were changed during some period of time, and check this periodically for unauthorized changes.
Finally, in computer security there’s this notion of business continuity and failover systems (possibly RAID enclosures or secondary servers). I don’t know what your configuration looks like, but you might want to consider some type of load balancing server. Hope this helps.
One other thing to consider is the type of hosting you have. When I set mine up I ordered the cheapest thing, which was an OpenVz container. After setting everything up I learned that I couldn’t update the system kernel, and the reason for this was that OpenVz virtualizes system processes, so in effect I was sharing a Linux kernel with someone I didn’t know. That could potentially be a source of instability if that other user is doing stupid or potentially dangerous things with his system. I made a decision at that time to stop that service and restart with a KVM container (a little more expensive, but not bad). KVM operates much more like the more familiar virtualization programs such as VirtualBox and VMware, in that it virtualizes the physical hardware, and presents that to the operating system. If there are multiple KVM containers running on a single physical machine, then each container is truly isolated from the others. If nothing else this allows me to update and upgrade all parts of my operating system (including the kernel) any time I want, but in all likelihood this gives me a much more stable system.
Can’t help Lambert though would welcome news of solutions. When I did something similar a few years back I ended-up wanting to shoot the university techies allocated and went with a mate of a mate. The solution was a bespoke website on the mate of a mate’s server. I’m still at a loss as to why a university with 30,000 students could not do this, but then the advice from its data protection officer was the helpful ‘you won’t be liable for prosecution if you don’t put any data on your site’!
Golem’s blog works rather well through Smoothriders.
Whenever I get the chance I like to recommend Red Acorn hosting, it’s progressive, green and owned by Emma McReary who is a helpful and caring person and could direct you in the right solution. WP has issues when it is filled with plug-ins–I never liked banging my head into WP–in the end I came to the conclusion that, in the long run, it would be better to just program my own blog in PHP from scratch–but I’m not in that business anymore.
Yves has a good relationship with Barry Ritholtz, is that correct? His site seems to operate okay. It doesn’t exactly map over to NC, but I’d think there was enough functionality in common that a chat with his IT folks might be illuminating. Just sayin’
He moderates all comments and posts fewer per post than we do (5-20 is his norm) so despite having much more traffic than we do, he does not update the DB as often as we do, which is WordPress’ weakest point. I’ve heard he’s been consistently dissatisfied but perhaps that has changed. I can check back in with him.
I have been using Hostway for my mail and website needs (not NEARLY as big as yours) for 15 years or more and have always found them to be a pleasure to deal with, through 9/11 and beyond. They seem to have nice pricing pages where you can customize your package and see what things cost, and they advertise competence with WordPress. You have probably already tried them but just in case: http://www.hostway.com/wordpress/. Love your site; hope this helps.
At OpEdNews.com. We don’t use wordpress at opednews, we have our own CMS, but we have put most of the problems you describe behind us . We see 600-2500k pageviews per month, and have our own commenting system.
Some rhings that help:
Third party server management
Drop me an email and we can talk
fwiw, from a friend of mine who works at the e-commerce site Threadless..
“I think rackspace is still one of the best in terms of service … They are like the Rolls Royce of hosting, but it comes with a price. We just migrated all of our hosting from rackspace over to amazon web services and our monthly hosting bill has gone from $90k down to $3k. But we do have to do a fair amount of the maintenance ourselves, but only really in the software side since it’s cloud based.”
I second this. Rackspace is expensive but worth it. Their ‘managed hosting’ product means there’s a server admin on the phone whenever you need one. And a lot of the problems you have could preemptively be solved by placing monitoring scripts which could, in the event wordpress crashes, restart apache, mysql, etc.
Rackspace is the answer. Expensive, but solid. You can get a smart, knowledgeable person on the phone in two rings at 2:00 am. Nobody else comes close. If their managed solutions are too expensive, find a local NYC kid or two to do the IT.
Thanks but the problem is Jane Hamsher (Firedoglake) had a 36 hour outage on Rackspace where she was tweeting her enormous frustration with the Rackspace techs. Worse, their story kept changing and if I recall correctly, they started fessing up at around the 24 hour point that it was a hardware problem. That episode has made me VERY leery of Rackspace. Jane won’t talk to me directly about it (apparently because Dave Dayen is writing here, she does not take well to that sort of thing….).
See details here:
Rackspace is, by far and away, the very best at what it does. I’ve been a client there for over seven years. Whatever Hamsher’s telling you is, simply, not even close to realities at the VERY HEART of Rackspace’s basic client agreements, which GUARANTEE no more than four-hour downtime on server outages.
Furthermore, Rackspace has so many fail safes built into their co-lo’ed and leased servers–when a hard drive even begins to indicate it MIGHT fail, alarms go off and procedures go into effect to resolve these matters to the point where problems are virtually unknown to end-users–including STANDARD dual/quad processors in all their “boxes” and disk arrays, all redundant to circumvent these very issues that Hamsher’s claiming. (Hardware gets swapped-out in virtually real time; all at NO expense to its clients.)
On top of all of this, anytime there’ve been issues with my hardware, I’ve found Rackspace to be, BY FAR AND AWAY, the very most transparent operation with whom I’ve ever dealt; and I’ve worked with well north of two dozen ISP’s and managed services providers in my career.
So, what I’m saying is that, essentially, I find Hamsher’s claims about Rackspace to fly in the face of virtually EVERYTHING I’ve known and experienced about the company. (And, I’m being very polite about this.)
Linux will rebuild an array just fine in the background, IF you don’t intend to unplug the drive as soon as it’s done. I seem to remember her finding out later, or at least receiving strong suggestions, that “the RAID is rebuilding” included a mirror copy for some three-letter agency that wanted one, but naturally I can’t find a reference, so I’ll just leave the salt shaker here.
My suggestion would be to talk to somebody who knows virtualization (VMWare is by far the best). Run two or more copies of the web site in virtual machines (on the same or separate physical hosts) with server load balancing directing traffic to the replicated WP/LAMP stacks on the various VMs. My guess is the WP gremlins wouldn’t take down all the WP instances at the same time (if so then it’s likely a WP bug you’ve stumbled on – not impossible but unlikely to be the major culprit of your downtime). If one WP instance locks up/crashes the SLB should re-direct to the remaining WP on other virtual machines.
This isn’t rocket science (which, btw, makes the Healthcare.gov website fiasco all the more inexcusable). The large e-commerce vendors have mastered these details a decade or more ago, driving vendors like Cisco and VMWare to add many features for just this purpose. Email if you need more details.
I agree with this. Run the website on mirror images for graceful fail-over and improved up-time.
It’s hard to say if this will be a definitive solution — but it’s worth exploring.
I think the caveat is that you won’t be able to replicate or otherwise distribute your database without adding significant complexity to your stack. And if database integrity is causing WP to go down, then both of your instances might go down at once.
 I say this mostly because you said that your host would restart the server only to have it go down again.
I generally agree – although I would use Amazon EC2 (Xen hypervisor) or Google Compute.
And then yeah, multiple VM’s. Both can crank up new VM’s fast. You have have much more control. And you can split the VM’s out, so that MySQL and WordPress on their own servers. Replication/caching takes a bit of work.
Also, in our experience, Apache sucks. It’s a memory hog, better off using lighttpd or nginx.
As for WordPress – there are a number of ways to optimize this. I just run perf tests on specific processes or the whole VM and target the problem. Worked wonders for squeezing every ounce of performance out of Amazon EC2 micro appliances.
We have some experience in just this sort of thing.
Re: Cloudfare – you guys only using the Cloudfare Security? For the block list? If you can just put the blocklist in an ipset you can drop things fast and early with iptables.
I very much like this site, and would much rather have you working on more important things. If anything, I (and a couple other guys) can be usually available to help – I can point you to and IRC channel to talk to us you want.
If you wanted to go really redundant you could use two or more cloud hosting providers. Amazon EC2 and Rackspace.
http://hosting.brownrice.com … I’m not kidding.
Well, dang! I thought “LAMP Stack” stood for LAMbert P. STrether.
From a guy who spent 30+ years in IT at a major university [150,000 students plus faculty and staff]
At least 2 tech dudes needed here. One for design and one for maint. Working in consort.
Multiple servers running in fail over mode with RAID Striping and Mirroring at the least. All servers in shared DASD configuration.
Load balancing with dynamic DNS update with solve about 80% of this problem. Also with dynamic security monitoring and updating. CISCO Systems has a some pretty good hardware/software that handles this but needs a good security guru to over see.
At least 1 tech dude for this, possibly 2.
The above obviously WAY beyond your resources therefore ideal. But that is what you are looking at if you want high availability.
No, for 1. we actually have the rare WP person who is good a both, knows a fair bit of coding (as in can write widgets/plugins and other more technical tasks as well as do design). So it is not 2 people. But you are confirming our instincts, that we are in the dead zone re third party service, and to do this well we need more dedicated or close to dedicated people.
Thinking more along the lines of exhaustion. :-)
But I hear ya. Point being it’s not just the app but the underlying hardware/software that needs a heavy re-think. I know the current situation is about to drive you round the bend, however a long term plan needs to be looked at.
Sever redundany, DASD redundancy, onsite, offsite etc.
Even before I left and retired, the Univ. ran internal IP with NAP to outside. Made redundancy a whole lot easier to implement. Dynamic NAT updates etc.
First, get off WordPress and run something that works and doesn’t have exponentially-degrading pain points. There are a number of options, although I’m partial to my own (built from the ground up.)
Second, whether you choose to buy colocation space and bandwidth, running your own server (hint: you either need redundancy in the cabinet and someone within reasonable distance or you need TWO because two is one and one is none, and this feeds back to the first option in that if you need two the software has to be designed to work that way) or find a managed option that comports with #1 (and get out your wallet.)
I’ve done this sort of thing for north of two decades, which is why I don’t have these problems. There are a lot of charlatans in this game and getting sold a bill of goods happens all the time. The lightly-loaded small guy doesn’t run into problems, but as your load and demands on that infrastructure rise…….
Feel free to contact me.
Karl Denninger’s Market Ticker Blog is one of the best run sites on the Web.
And, as he says he “built it from the ground up” so he knows about all the infrastructure needed to run a successful blog.
I suggest you contact him, as he offered in his comment.
Moving from WP would be a major conversion effort; NC has tens of thousands of posts. We probable don’t have the money or time. But thanks!
The conversion effort is not as great as you think. And the time, effort, and expense is not correlated to the number of posts and comments / size of the database.
You really need drupal. Others have converted. You can get a firm that has done that conversion and/or you can help from the drupal community.
I run drupal myself. It’s not a panacea. and it means a new learning curve for Yves. Drupal is also a resource hog.
I’m sorry, this is time intensive managerially. We just spent a lot of time doing a WP redesign. I do not have the time to find out how to do this and execute it. It’s a much bigger task than finding a new host. And we don’t have the money for this.
WP makes it easy despite its many faults, to control a lot of users and comments in the backstage. I’d have to learn new software. I don’t have a smartphone or iPad because I don’t have the time to learn how to use them.
This is simply NOT happening. And the $ is another obstacle. The extra $ would come at the expense of travel, FOIAs and important special projects we are pursuing, and my vacations. I’m not about to give any of those up.
The Real News Network has a nifty option of texting donations.
Takes a minute and nothing to fill out. Good luck!
So you are going to continue to waste half of your time dealing with WP rather then spend any resources resolving the underlying problem of working with a solution that doesn’t scale cheaply?
You picked the right tool for NC years ago when it was small and manageable because WP was designed for that. It was not designed for the scale you need nor the amount of resources you are able to throw at it.
You need to provide number of unique visitors, concurrent user targets, and a bunch of other metrics in addition to the plugins you use. If you don’t provide this information you won’t get a properly sized/tuned solution.
One last thing, you mention CloudFire as something you looked into. Did you look into your own front end cache? An example would be Varnish Cache and to purge the cache. Its not simple plug and play, but it can surely be done if you aren’t already.
Mangled that last sentence. You can run a front end cache and have a WP plugin that is capable of purging the cache.
HuffPo runs WP as does FDL with a lot more authors and a ton more traffic.
And I don’t have the time. I am not about to go into physical collapse to manage a conversion. I don’t have the financial and managerial bandwidth to execute your solution. I am not Superwoman and we do not have the money. Period.
I think that the learning curve for Drupal is pretty steep and Joomla is much easier to work with. However, one has to move to the most recent version, 3.x (which might be “bleeding edge”), to get rid of the common problem: MySQL!
IMHO: Only upgrade/change if it is possible to get rid of MySQL entirely.
MySQL is not a very good database, there are far too many options for tuning, configuring, and replicating MySQL, these options interact, and almost none of the el-cheapo hosting businesses understands fully the “gotcha’s” involved or have time to do a full Monte-Carlo parameter optimization (the only way IMO). Most do not care either – I made truly *insane* money on “fixing”, or rather, kicking it to get it to run a bit longer, MySQL replication when working as a “techie” while the economy “recovered”. Why kill the goose that layeth them golden …. e.t.c.?
Moving over those files is not a very difficult task, especially when its coming from WP.
I’ve done three upgrades on my own drupal content, and each time there have been issues, especially with comments. If drupal to drupal isn’t pain free, I have to wonder what wo to drupal us like.
When First Look was first announced, one of the offerings looked like it might be a publishing offering for independent journalists, but that’s not the case at all (via Jay Rosen): The First Look tech company is for profit, developing media tools for First Look properties, and other markets. The profits earned by the tech will be used to support the mission of First Look (“independent public interest journalism”).
Well, that’s how it looks on paper anyway. Perhaps First Look is probably going to come up with an interesting product at a low entry price. We’ll see as it unfolds.
In general, remember that issues scaling the hosting are because NC is a success!
Too many cooks spoil the soup, and WordPress can’t boil water. Generally speaking, profit optimization is only weakly correlated with technical excellence.
NC is far from the only site with these problems, which suggests there could be an unmet demand for reliable hosting and ancillary services for relatively small sites. A non-profit entrepreneurial opportunity.
One small tweak: get rid of the “A” (“Apache”) in “LAMP” and replace it with nginx (pronounced “engine X”) – yielding “LEMP”.
The webserver nginx is a more lean-and-mean than Apache.
It’s rock-solid and has a small but stteadily growing market share of webservers worldwide.
I’ve done several WordPress sites using nginx (actually never used Apache).
While you’re at it, you should consider adding PHP-FPM, an accelerator.
This will involve hunting around online to find a suitable nginx.conf file to work with PHP-FMP and WordPress.
Here is a typical one:
There are also plenty of complete recipes published online for using PHP + PHP-FPM + nginx + WordPress:
If you don’t have the tech skills to do this configuring yourself, it should be a simple matter to find a web designer who can do it for you. It’s a fairly common skillset – and by avoiding Apache-oriented web designers, you’ll likely get a slightly higher calibre of geek.
I share your frustration with WordPress. It is bloated and buggy. And since (in the opinion of certain programmers, including myself) PHP is a relatively poorly-designed language, it tends to attract non-programmers or amateur programmers, who end up making buggy plugins for WordPress – or plugins which conflict with other plugins.
I have hunted for years for a decent alternative to WordPress – but keep coming back to it, because it kinda is the “Windows” of the blogging world: it’s the most popular blogging platform, so when you need a plugin, there usually already is one for WordPress that at least claims to do what you want. (But see above: the plugin is not guaranteed to actually do what you want – either due to inherent bugs, or conflicts with other plugins.)
The propensity to bugs and conflicts in WordPress is (in my opinion) due in part to the design of PHP, as an “un-typed”, “interpreted” language, which means that many bugs are caught too late – at runtime (rather than at compile-time, which is when typed languages are able to catch many bugs). It’s also probably due in part to the architecture of WordPress itself – which is way bigger now than it was when it was initially designed, so probably many of its functionalities are add-ons or kludges.
In the world of blogging software, there are tradeoffs. There certainly are blogging platforms and content management systems out there which are based on languages much better than PHP and architectures better-thought-out than WordPress. However, PHP-based blogging platforms such as WordPress were “first out of the gate”, and therefore enjoy an immense early-adopter advantage (namely: bigger community, greater number of plugins), so many people feel stuck with PHP + WordPress.
For myself, as a programmer, I have made a decision never to use PHP again on my server(s), if I can help it. If I need a static website, I’ll use Twitter Bootstrap plus some carefully chosen libraries. If I need a blog or CMS (content-management system), I’ll look at some of the many non-PHP alternatives out there.
Disclosure: I am primarily a curmudgeonly theoretical computer scientist only comfortable using relatively esoteric languages.
I am quite unimpressed with the current state of programming. My favorite blog post about the dismal situation of software development (which perhaps annoyingly – but also quite correctly – points out that “most software ships broken“) can be found here:
Most “normal” programmers would hate that blog post – but I think someone like Lambert would get a big kick out of it.
I agree with ScottA – as I say much the same above – drop Apache. It’s probably not the most important problem, but it’s easy to do.
Get your start on OS/360 ? Just kidding. IBM whose OS is shipped broken and then ships patches that break it even more.
It’s now z/OS under z/Architecture hardware, where I can run thousands of z/Linux instances in a zEC12 box and not miss a beat. Apache needs to be replaced first and foremost as it can’t scale to save your arse.
Maybe also add a “Varnish” cache in front of the Web-server. Probably, the author of Varnish Poul-Henning Kamp would agree with your blog post ;-)
I feel your pain Lambert! This high tech stuff isn’t all its cracked up to be. Here is my two cents worth.
The first thing that I would look at is the filesystem (fs) load sharing (if any) that you are using in the hosting environment. I have had problems with NTFS in the past when it becomes “saturated” with heavy user write requests and then corrupts data (either the cache or data written to the disk block). This can also happen to ext 3(4) fs but at higher usage levels.
When the fs write requests are split among different filesystems the read and write requests are more likely to be completed successfully under heavy user load. You may accomplish fs load balancing by mounting a separate disk (make sure the disk is not shared and not simply a partition or virtual disk) to a heavily used directory.
LAMP may use var/www for http and var/db for database. Had to fire up CentOS 6.4 VirtualBox (shameless plugs for great software) to check this one out. Remember virtual machines use shared hardware technology (disks) and may not be suitable to run a high performance database (MySQL).
A note about disk Raid redundancy levels. Raid 1 (mirroring) is somewhat faster than the more secure Raid 5 (fully redundant) especially when considering the required response of the fs load. For this reason I would install the database on a Raid 1 array rather than the slower Raid 5 array. There is no substitute for a complete backup of your all important data.
If the database is (mostly) the culprit you may consider using newer SSD disk technology to store and serve up your back-end db requests. Mount the SSD disk to var/db or the directory where the database resides. This will increase the speed of db service requests at least five fold over conventional hard disk technology.
Next, I would entertain running a server farm (two or more servers) where session requests are automatically arbitrated to the least busy system in the farm. A farm will also load balance the fs that reside on each server.
BTW, this is one of the few sites I comment on because it doesn’t use a third party commenting system.
Hope this helps.
Wow, I have to admit, I’m a little surprised at the WordPress hate. I’ve run WordPress on a number of servers in varying configurations with virtually no problems. Also, WordPress has a number of high-profile sites that get reasonably high traffic who seem to do just fine running it, like CNN, Techcrunch, and Dow Jones.
I suggest you check out WordPress VIP hosting. http://vip.wordpress.com/
WP hosting is beyond our price point
I would echo the comment that stated that your goal of 100% uptime is unreasonable and since you don’t get paid much to do this, chill on OUR expectations of you.
We will live through the occasional outage.
That said, many here have provided valuable insight and suggestions for hosting and management alternatives. Having design, built and managed 24/7 distribution management for decades and not liking call outs at odd hours, find someone knowledgeable that can build scripts that monitor and reset all the parts as necessary….short outages for this sort of management are, IMO, more preferable than the big ones.
Good luck with your efforts and know that nothing is perfect, especially WP.
Short outages actually stress Yves more than long ones, because they get in the way oh her nightly workflow.
Moreover, since things tend to downgrade from what ever level of “acceptable failure” once chooses, it’s generally – not always – better to keep an attitude of solving rather than accepting problems.
Obviously you have not be at MY house when NC is not avaliable – rants of NSA conspiracy, the need for tin foil hats, and big government / big business / China overlords abound. My wife threatens to leave me, my friends abandon me, and love is not to be found.
Keep up the search – never stop, never falter – in your quest to keep the curmudgeons of the world well fed and happy.
Every time a simple task explodes into a kafkesque time sucking maze of tech support calls, software updates, google searches, new user account sign-ups, multiple reboots, etc…. I just keep repeating my mantra:
technology makes my life easier
technology makes my life easier
technology makes my life easier
Another option I haven’t seen discussed: First, it sounds like there are some recurring technical issues with the site that will take significant skill and effort to resolve. In the meantime, dealing with those issues requires 24/7 oncall assistance to handle a couple of very basic things. (Restart http, check for the small list of minor things, etc.) Since the fundraiser results means money is somewhat limited, I suggest enlisting an additional trusted volunteer or two. Give them brief, monkey-level training to do the 4-5 minor things and coordinate their schedule to handle after hours flareups. Obviously this requires giving those volunteers some level of access to the site. But to be honest, I’m not sure why you haven’t done this already when it comes to comment moderation. I’m sure you would be able to find better candidates who know more about WordPress, but I have a technical background and would be willing to help out. I’m sure this is equally true of other longtime commenters.
Managing volunteers takes a lot of time and is not easy, alas. I’ve done it!
It sounds like you might be able to live with the usual server/site problems if WP/db were working smoothly. So some variant of the suggestions above regarding virtual machines might work assuming you can automate the switch-over upon failure. Knowing little about WordPress, I wonder if you could get away with simply mirroring the database and having the switch-over occur almost seamlessly at the db connection level? I’m not sure what would trigger it (what would “know” that WP or a WP instance had crashed ?). And assuming that was solved, you could refresh the faulty db and bring it back online at more convenient times. Otherwise, if you have to have multiple VMs to mirror both the db AND WP, you have may a fairly complex (but well understood) problem in terms of managing fail-over (I assume WP doesn’t handle it).
But assuming you really want to stick with WordPress for whatever reasons, something along those lines that could push off manual intervention to suitable times seems like it might be worth the investment in getting someone to automate it.
Of course it’s very unlikely that one could simply fail over to another db. More than likely, some thing, such as a race condition, is happening between the db and WordPress that hangs up the db AND the WordPress intance and subsequently the site.
Just some words of wisdom from my 15 years in the hosting business.
Truthfully most of this is simply finding reliable people. Forget the hardware – that’s easy.
Call the hosts you are asking about and ask for referrals to folks who have massive sites using their service. If the comments are awesome there is your answer. No one can predict the future.
Focus on working with good people who have good intentions (and less corporate greed). It’s not that hard to find an incredible host. Just don’t follow the hype and “affiliate links.”
Follow your gut. Talk to the smaller hosts who have nothing but love in their hearts and talk with passion and less about dollar signs.
It seems you’ve hit the limit of the big-names vanilla CMSes. WordPress or Drupal (or others) will hit a brick wall passed a certain number of users/connections/comments.
Unfortunately at some point you need a custom-tailored solution designed by a proper programers team and managed by a proper tech team. Yes you can outsource it, especially the latter.
But at some point you need to have someone design a solution (software) fitting YOUR needs and not the other guy’s blog. And yes they will easily export from WordPress.
And even with managed services for your hardware, you will need one guy from the team that built the software on call to fix half of the issues, which will be software related. An Ops team is always comprised of software engineers, sysadmin and DBAs.
So I would outsource a new CMS software, use maybe rackspace to host it and use their managed services and deal (from the start!) with the team that will create your new shiny software to have someone on call (and not a latecomer junior) that rackspace can use.
And yes, it’s my day job and have been for a while. I’m available for further insights, not for much more sadly.
Nationally the tech worker co-operatives have organized into a network, which means that over a dozen expert groups, covering a range of digital issues, are providing services from a desire to help their clients and not simply their bottom line, or egos! They maybe helpful. You can find them here =>
give it a rest until Mercury goes direct 2/28
You’se guys are always doing all this stuff when Mercury is in a very inharmonious position of retrograde.
We are going back to an old problem and doing research, all of which my astrology mavens say are perfectly fine Mercury retro tasks (I have spent enough time in Santa Fe to know people who have serious astrology habits). The mavens ALSO tell me that this “do nothing on a Merc retro” is incorrect, that there are such things as “good electional times” when all the Merc retro means is what you don’t know won’t hurt you.
As a former teacher of astrology, practitioner and historian, Mercury retrograde often does INTERRUPT forward momentum — but it often is a great opportunity for using that interruption for PLANTING for what will happens when Mercury goes direct. I’ve learned to go with the flow over the past 40 years. I can tangibly feel when Mercury goes retro (Mercury is exactly on my midheaven), and simply stop trying to push forward. I do “cleanup” work and structural reorganization. (Saturn also is important, as well as the aspects with other planets.)
For what it’s worth.
Really? I was being really productive and have a major task to do–which I was really excited about and should be perfectly capable of doing–and I… “just can’t.” For the past week or so, all I want to do is read.
Who knew? I don’t think it will last till the 28th though, (I kind of hope not).
First, go here.
I use EC2 to run a non-stop service (as close as is practical).
I’m the original architect and development lead and I still do a lot of the system administration.
If you really want to go non-stop your biggest issue will be db replication. I’m using MongoDB and doing this using a replica set.
Amazon elastic load balancer will do primary/secondary failover for your incoming requests.
Put the primary service in one EC2 datacenter and secondary in another.
You don’t need to worry about grody things like RAID, striping/mirroring and the like because Elastic Block Store will be doing that for you. (you will still need to do backups but S3 will help you) You also don’t need to worry about the server having a hardware failure and having to get remote hands to fix it, at most you will be unwedging/restarting your VM.
Start with virtual private cloud. It’s worth it just to get the ability to reassign your service’s IP addresses to different VMs — this can be done without VPC but it’s more problematic.
Put a couple of system administrators on retainer for time and materials support.
From the sound of it, you may want to consider running two separate programs for content management and comments, each with an independent database. That would allow you to tune the comments database for speed and maintain uptime for the site content. Today, your comments and content share the same database. When the database fails due to high writes with comments your whole site goes down. Separating them should contain the crash.
Despite the WP hate here, WordPress is the best content management system (CMS) for it’s cost; however, its commenting system leaves a lot to be desired, which is why so many sites use Discus.
I don’t know if there are open source commenting programs out there (I don’t doubt there are) but you can ask your tech dude to check out the field and see what may work best. It may even be worth forking WordPress to remove everything but the commenting system. That would allow you to run the CMS and comments on separate databases. This is such a pain point for others that it get support from big companies. I believe you can export all comments from WordPress to other programs but it may be easiest to shut off commenting on old posts and use the new commenting system going forward.
Blog comment pages seem to be the next or second step up from “hello world” in database programming tutorials I’ve seen lately. That said, separation of concerns, by way of replacing WordPress comments with something form-fit precisely to purpose, would be my chosen tack as well. One can already drop in Disqus (or Livefyre, or other) comments; the same principle of including or framing a standalone comments page hosted on another box leaves options wide open. One of those possibilities is serving the blog from one Raspberry Pi (aggressively cached) and putting the comments on bigger iron.
Thanks for not going to Disgus, guys. That accursed and intrusive system cut my overall blog commentating by about 80%.
I’m a co-founder at Pogoapp, and we’d be happy to offer you some pro-bono advice on the sysadmin side of things. A lot of the comments here are on track – you basically want to eliminate any possible single point of failure, and there are a lot of them. Assume anything that can fail will fail.
Dear Yves and Lambert
Andrea and Stephen here. We saw your recent post seeking new webhosting and development. We are from Glocal & May First / Peoples Link. We as Glocal are active members of the May First / Peoples Link Web Hosting Collective. This collective includes many different organizational members who actively critique and organize for change in the political-economy. Some of the active members include the Left Forum, Brecht Forum, numerous labor unions, and great economic pranksters like the YES MEN.
May First is the perfect home for Naked Capitalism ideologically and practically. May First has a support-team that reliably and promptly responds to ticket issues at; all of our servers at May First are running LAMP stack and are routinely maintained by our support-team which maintains documenting server crashes and responses.
We want to emphasize one thing: May First / Peoples Link is a collective, we seek to provide the space for our members voices and actively support them. Some of our active members like Sahara Reporters receive tremendous threats of DDoS attacks which we efficiently respond to minimizing the impact on their servers. We also are working on developing alternatives to the likes of Google Docs through our OwnCloud instance.
As Glocal cooperative we work on both design, development and support of the servers. We have experience using Drupal as a CMS which may be a system better suited to your needs than WordPress. We also develop on WordPress if that’s the desired is the path. With respect to your specific needs for the development of your site, we at Glocal also have a ticketing system to post and respond to issues that may arise.
We are huge fans of the work you do at Naked Capitalism and want to support it any way we can. We’d love to talk with you further about this. Please email Stephen at email@example.com and Andrea at firstname.lastname@example.org.
Stephen and Andrea
Lambert/Yves: ^^^^ THIS!!! ^^^^ ;-)
I’ve seen lots of site issues with CloudFlare as in you can’t get to the sites and CloudFlare claims it’s not their problem. I believe one of the issues is that a lot of porn sites go through CloudFlare and that means the traffic is high and peaks at certain times of the day taking down servers.
Just moving off of CloudFlare may be a big help.
Too many bells and whistles? I once worked for a firm which embraced a policy of stay well back of the cutting edge. Let someone else find the bugs.
We are most decidedly NOT cutting edge tech wise. We just run the WP database hard and that is WP’s Achilles heel.
I reread the comments and IMO the best potentials are
1. Separate the comments from the postings somehow in DB land. Calculated Risk has “custom” comment software that may be available at a reasonable price and make the separation easier. Others have suggested commenting alternatives or there should be ways to split the DB logically and physically.
2. May First sounds like a hosting provider to evaluate seriously because they want to see you succeed where others are more profit driven.
3. Apache can be a memory problem and so using lighttpd or nginx sounds like something to look into.
4. Virtualization with the proper scripting for failover as suggested in various comments is another good idea that should be explored.
Dear Lambert and Yves,
I would not assume that you are not having problems with the NSA. They are far more aggressive on the cyber warfare front against activists than most people realize. The NSA is constantly attacking activist websites here in the US. I encourage you to view my site freeyourselffrommicrosoftandthensa.org for more information about the scale of NSA crimes.
In my day job, I have spent years teaching web design and development courses. I encourage you to read my article on the drawbacks of WordPress Websites on one of my websites, buildyourownbusinesswebsite.org.
WordPress is literally loaded with problems. It is not that difficult to move away from WordPress to another better structured CMS. There are automated tools for doing this. You can also keep the current WordPress database in an archive and build a new database on a better structured platform.
As for web hosts, I have written an article on the need for small business owners to move their websites out of the US as soon as possible. You can also read this at buildyourownbusinesswebsite.org.
I agree with the comment that if you are on a limited budget, moving to a server with SSD might help reduce crashing without increasing hosting costs. Also a low cost way to simulate a distributed server is by going with a web host that uses Cloud Linux.
I am in the process of moving all of my websites out of the US and have therefore researched many web hosts in Europe and Canada. Most are sadly not very good. A good web host in Canada that has SSD servers with Cloud Linux – and reasonable rates is Crocweb.com.
My partner, Elizabeth Hanson, runs the Occupythemoneysystem Facebook page. She loves your website and asked me to try to send you some helpful advice. I hope you take the time to read the articles above and I wish you well. You are providing a great service in trying to wake people up. Feel free to email me back if you would like more advice about non-US web hosts.
Regards, David Spring M. Ed.
Thanks for your concern, but our 6 hour outage had absolutely nothing to do with outside forces. And our spambot attacks, while annoying and frustrating, would probably be deterred with better perimeter defenses (which our current host does NOT have). Put it another way: if this were the NSA, we’d be having much worse symptoms.
In addition to what folks have said above, here’s some other advice that you might find useful:
1) Ask your webmaster for details about the outages. All of the issues with WP I’ve seen in my career have had to do with either old versions of WP with security vulnerabilities or misbehaving plugins that cause WP to be unstable. For example, two plugins that work fine in isolation but interfere with one another when installed in tandem. An astute webmaster should be able to inspect WP after a crash and determine whether or not a plugin caused it. If that’s the case, it might be worth disabling the plugin or looking for an alternative. Sadly, not all WP plugins are high quality. I don’t believe there’s anything intrinsic about WP that should render it prone to crashes.
2) If you’re absolutely certain that your database technology/hosting provider is struggling under load, it might be worth leveraging the cloud. Google’s AppEngine provides hosting for wordpress. It’s worth noting that AppEngine is also the technology that runs services like snapchat, so I’m sure it will be able to handle any load you happen to throw at it. I’m told this option can get expensive though, so it’s worth looking into how much it will cost at your scale.
Hope this helps, and good luck!
I won’t bore you with the details, but the last failure (the 6 hour one) was totally managerial.