Gekko Science
Home
Philosophy
Contact Info
Weekend Builds
Products
Webmail
Overclocking the Block Erupter Cube

Build by Kittan           kittan |at| gekkoscience.com

-----------------------------------------------------------------------

If this interests you, feel free to ask about parts availability.

Also, this writeup is fairly math-heavy. But does include... PICTURES!

Also check out the Blade Erupter V2 Overclock writeup

-----------------------------------------------------------------------

What is a Block Erupter Cube?

Some time ago, ASICMiner started making millions of specially-designed integrated circuits designed to process a particular hashing function called SHA256. This algorithm is used heavily in Bitcoin mining.
As far as I know, they were the first entity with any ASIC (Application-Specific Integrated Circuit) bitcoin mining hardware. Not the first to sell to end users though, that came later. But when they did start selling, they made a huge dent in the market.
One of the first major products was the Block Erupter USB, which runs off a single BE100 ASIC and operates, at stock, around 336MH (MegaHashes - or millions of hashes per second) - comparable to a mid-range GPU, except consuming only 2.5W through the USB bus. They can, of course, be overclocked to pretty fast; it's not difficult to get them running 450MH (comparable to a RadeonHD 5870).
The same BE100 chip is used in a large array on the Block Erupter Blade, which exists in two versions, both using 32 ASICs to achieve rated speeds of 10.7GH. The original had two oscillators built-in, so the speed was selectable from the config page (10.7GH or 12.8GH) but required manually adjusting the core voltage to operate at the higher speed.
The second version Blade was a lot more compact than the original, but also was not adjustable - they were built for a fixed speed of 10.7GH and a fixed core voltage. That didn't keep us from figuring out how to get almost 15GH out of them though.

The next big product from ASICMiner is the Cube, basically a black aluminum monolith that functions as a standalone unit. The USB sticks required a host machine to operate; the Blade could function standalone but required power wiring and was basically a bare circuit board. V2 Blades could be mounted in a backplane that took care of power delivery, but still required external cooling and benefit from external framework for mounting. The Cube is a self-contained box that uses standard PCIe power connectors and an ethernet jack, is stackable and has an internal fan.

Cube Internals

The Cube is controlled by a base mainboard driving six riser cards, each containing 16 BE100 chips. The Cubes are dual-speed, rated for 32.2GH on low clock and 38.5GH on high. The core voltage changes are implemented automatically when speeds are selected in software.
BE100 speed calculations on a per-chip basis are actually fairly easy. On "low" speed, the Cube runs a 12MHz oscillator to the ASIC's clock input. I'm not sure if it's an internal PLL or just parallel operations, but the chip runs 28 hashes per clock cycle. The USB BE also has a 12MHz oscillator, which is where stock 336MH/s comes in - 12MHz * 28H/Hz = 336MH. The same speed oscillator is used by default in the Blade (32 chips * 336MH = 10.752GH). The Cube contains 96 BE100 chips total, so at stock it sees 12MHz * 28H/Hz * 96= 32.256GH. The "high" clock is a 14.31818MHz oscillator, so 14.31818 * 28 * 96 = 38.487GH.
Each Cube riser card has two VRMs (Voltage Regulator Modules) on it, each one providing power for 8 of the BE100 ASICs. These typically take in 1.05V to operate at 12MHz, but require around 1.15V to see 14.3MHz and 1.25V to run 16MHz - where the per-chip hashrate is 448MH, a full-Cube rate of 43GH.
The mechanism for providing selectable voltages to the chips in the Cube is actually quite clever, and will be documented further down.

Riser Card VRMs

The guts of the Cube's low-voltage high-current power is the Texas Instruments TPS53355 30A Synchronous Buck Converter. This guy is a single-chip solution for high-current power applications, integrating the high-side switching transistor and rectifier as well as feedback voltage sampling. The stock configuration for the Cube's VRMs is to output 1.05V for low clock and 1.15V for high clock. This change is actually done by modifying the feedback divider that samples the output voltage.

As each card has two identical VRMs, I'll focus analysis on the one nearest the card's backplane connector and then we can copy the results to the second circuit on the card. The elements significant for understanding how the output voltage regulation works are R7, R8, R9, and Q3. First let's look at the case of Q3 (a standard logic-level NFET in a SOT23 package) is turned off.
For the stock Cube boards, R7 = 7500ohm, R8 = 8200ohm and R9 = 1800ohm. These act as a voltage divider sampling the filtered output from the buck inductor, where the controller measures on the FB pin. The 53355 has an internal 0.6V reference which it tries to match with the output - this is to say, the voltage sampled from the divider will always measure at 0.6V if the regulator is functioning properly. I'll go ahead and derive the equation for the output voltage given that Vfb = 0.6.

i = Vo/(R7+R8+R9)

i*(R8+R9) = 0.6V

[Vo/(R7+R8+R9)] * (R8+R9) = 0.6V

Vo * (R8+R9) = 0.6V * (R7+R8+R9)

Vo = [0.6V * (R7+R8+R9)]/(R8+R9)

Vo = [0.6V * R7]/(R8+R9) + [0.6V * (R8+R9)/(R8+R9)]

Vo = 0.6V * [R7/(R8+R9)] + 0.6V generic output voltage equation

Vo = 0.6V * [7500/(8200+1800)] + 0.6V

Vo = 0.6V * 0.75 + 0.6V

Vo = 0.45V + 0.6V = 1.05V

For the case of Q3 off/open, the VRM is set to output 1.05V - this agrees with what we already know is the operating point for the BE100 at 12MHz clock frequency. The same logic signal that triggers the higher clock also enables Q3 through gate resistor R25, causing it to short around R9 effectively removing it from the divider string. As the Rds(on) for FETs is typically less than 1ohm, this makes the parallel combination of (1ohm)||(R9) end up around three orders of magnitude smaller than every other element in the divider. For simplicity we'll approximate it as zero, and introduce at most 0.1% error - likely an order of magnitude less error than the tolerance of the resistors themselves.

Vo = 0.6V * [R7/(R8+(Q3||R9))] + 0.6V

Vo = 0.6V * [R7/(R8+0)] + 0.6V

Vo = 0.6V * (R7.R8) + 0.6V

Vo = 0.6V * (7500/8200) + 0.6V

Vo = 0.6V * .9146 + 0.6V

Vo = 0.549V + 0.6V = 1.149V

So when Q3 is turned on, the output voltage is 1.15V, which we know is the operating point for the BE100 at "high" clock of 14.318MHz

Making it go faster

What I wanted to do, was modify the Cube to run 14.318MHz (38.5GH) on "low" clock and 16MHz (43GH) on "high" clock. I know BE100 chips will, with adequate cooling, operate at 16MHz without any problems at all. Working the voltage to the chips might be a bit tricky. They require 1.20 to 1.25V in order to achieve a 16MHz operating point stably. I first looked at what it would take to replace R8 and R9 such that you'd get 1.15V and 1.25V at "high" and "low" respectively, but this turned into an algebraic nightmare.
Instead of documenting all that crap, how about I go straight to finding a replacement for R7 that would get us the proper voltages? Using the generic output voltage equation and turning it around a little...

Vo = 0.6V * [R7/(R8+R9)] + 0.6V

Vo - 0.6V = 0.6V * [R7/(R8+R9)]

(Vo - 0.6V)/0.6V = R7/(R8+R9)

(Vo - 0.6V)*(R8+R9) / 0.6V = R7 generic equation for R7 given Vo, R8, R9

So we want an R7 that gets us Vo = 1.15V on low clock. Let's plug it in.

R7 = (Vo - 0.6V)*(R8+R9) / 0.6V

R7 = (1.15V - 0.6V) * (8200+1800) / 0.6V

R7 = (0.55V)*(10,000)/0.6V

R7 = (55/60)*10K = 9167ohm

We also want an R7 that gets us Vo = 1.25V on high clock, when Q3 is on.

R7 = (Vo - 0.6V)*(R8+R9) / 0.6V

R7 = (1.25V - 0.6V) * (8200+0) / 0.6V

R7 = (0.65V)*(8200)/0.6V

R7 = (65/60)*8200 = 8833ohm

Well, R7 for both values didn't come out the same, but the difference is relatively small - about 3%. If we average we should find a value that comes close to satisfying both voltage requirements.
(8833+9177)/2 = 9000 so a 9K resistor should work. I used a 9.1K, slightly more common value for some reason. Plugging 9.1K into our equations, we can see how well it satisfies our voltage needs.

Low Clock, Q3 is Off, Target is 1.15V

Vo = 0.6V * [R7/(R8+R9)] + 0.6V

Vo = 0.6V * [9100/(8200+1800)] + 0.6V

Vo = 0.6V * 0.91 + 0.6V

Vo = 0.546V + 0.6V = 1.146V 0.35% difference from target

High Clock, Q3 is On, Target is 1.25V

Vo = 0.6V * [R7/(R8+R9)] + 0.6V

Vo = 0.6V * [9100/(8200+0)] + 0.6V

Vo = 0.6V * 1.11 + 0.6V

Vo = 0.666V + 0.6V = 1.266V 1.27% difference from target

What to Replace

Well we've determined that R7 needs to be replaced with a 9.1K resistor. This will require a surface-mount 0805 package. The second VRM on the card will require a swap as well; its R7 counterpart is R16. With this change implemented, both VRMs will be outputting the proper voltages for the given clock speeds.

Replacing the oscillators themselves is no trouble at all. Since our desired speeds are 14.318 and 16.0, and the board already has 12.0 and 14.318 mounted, we only need to source one new oscillator per board. Removing the 12MHz (Q1) and placing the 14.318MHz (Q2) in its place will give us the 38GH "Low" operating frequency; installing the 16MHz oscillator on the now-empty Q2 pads will give us the "High" clock.
You'll need to find an oscillator - not a crystal - that operates on 3.3V, in a standard 7mmx5mm package. It doesn't have to be 7x5, about a fourth of a millimeter either way won't throw it off enough to not work right.

Issues and Hindrances

These Cubes are pretty good at keeping themselves cool. Well, keeping most of themselves cool. The fan covers the middle 3 cards with pretty good airflow, but the rightmost card and two leftmost cards (looking from the fan end) are definitely suboptimally located for adequate airflow. These cards have very little heatsink cross-section actually in the airflow of the fan, and as such tend to run hot even without overclocking. Do a quick statistical analysis of the average per-chip speeds for each card and you'll see what I mean. Here's a sample output from one of mine; cards 2, 3 and 4 are overclocked to 16MHz while the rest are running 14.318MHz.
M_01-16: 366 368 382 362 372 367 371 382 377 378 368 367 367 379 380 381
M_17-32: 418 428 424 425 426 441 424 427 424 431 431 424 445 432 433 431
M_33-48: 429 430 433 441 444 431 432 443 434 439 442 430 430 436 446 438
M_49-64: 431 440 431 443 441 432 430 432 414 436 428 421 434 426 435 421
M_65-80: 378 392 381 395 385 381 382 392 369 373 367 378 372 367 373 375
M_81-96: 357 379 359 363 367 000 350 349 348 344 353 344 341 328 336 330
Card 1 Average: 372.94MH; Target 401MH; Target Delta -28.06MH; Total Delta -449MH
Card 2 Average: 429.00MH; Target 448MH; Target Delta -19.00MH; Total Delta -304MH
Card 3 Average: 436.13MH; Target 448MH; Target Delta -11.88MH; Total Delta -190MH
Card 4 Average: 430.94MH; Target 448MH; Target Delta -17.06MH; Total Delta -273MH
Card 5 Average: 378.75MH; Target 401MH; Target Delta -22.25MH; Total Delta -356MH
Card 6 Average: 349.87MH; Target 401MH; Target Delta -51.13MH; Total Delta -767MH

From that you can easily tell which cards are underperforming because of heat. It is recommended to remove the hardware from the tidy aluminum case and improve airflow to the outermost cards to get everything working at optimum efficiency. It's probably possible to cut a hole in the side panel and mount another fan there, to get more airflow over the outermost card (card 6, leftmost looking from the fan) - this one is in dire need of added cooling, as its heatsink has almost no fan cross-section and is up against the side panel of the housing. If I get more of these Cubes I'll probably work on mounting them all together in a combined case with better airflow and see what kind of speeds I can get out of them. Adding more voltage and adding more speed means adding more heat, so to get them running much faster will definitely require more cooling.

Here's a fan-end view of one of my Cubes. You can see that the outermost cards have very little of the fan's airflow interacting with them. You can also see a hint of the prototype power supply board that's running these guys, and should be available for consumer purchase within a few weeks. Stay tuned.

If you like what you saw and you want to contribute to future awesome projects, feel free to donate some BTC: 1GekkosciLeaey8Na9siC8oD5HcMtLnWwd