ECC RAM is standard for servers, so it’s not especially hard to get. Fixing bit errors outside the memory (e.g. in CPU) is harder; I imagine something like http://en.wikipedia.org/wiki/Tandem_Computers, essentially running two computers in parallel and checking them against one another, would work. But all of this drives the cost up, which, as you note, is already a problem.
There are other clever things you can do, like including redundant hardware and error-checking within the CPU, but they all drive up the die area used. Some of this stuff might be able to actually drive down cost by increasing the manufacturing yield, but in general, it will probably be more expensive.
ECC RAM is standard for servers, so it’s not especially hard to get. Fixing bit errors outside the memory (e.g. in CPU) is harder; I imagine something like http://en.wikipedia.org/wiki/Tandem_Computers, essentially running two computers in parallel and checking them against one another, would work. But all of this drives the cost up, which, as you note, is already a problem.
There are other clever things you can do, like including redundant hardware and error-checking within the CPU, but they all drive up the die area used. Some of this stuff might be able to actually drive down cost by increasing the manufacturing yield, but in general, it will probably be more expensive.