OK before I begin, I don't want to sound like a guru, but I would like to share the experience I have working with different "bitness" machines.

Firstly, I wanna define "bitness". If someone tells me that a particular machine is 32-Bit, then that could mean one of two things:
1) The CPU's registers are 32 bits wide.
2) Memory is addressed in chunks of 32 bits at a time.

For the purpose of this post, let's just assume that bitness means both of these, i.e. a 32-Bit machine has 32-Bit registers and also 32-Bit chunks of memory.

In my most recent year of college, I did an embedded systems project. I had to make a portable electronic Connect4 game. I chose an 8-Bit microcontroller made by PIC. With this particular chip, if I wanted to add two 8-Bit numbers, it could be done in one instruction:

ADD a,b

If I wanted to play around with 16-Bit numbers, then I would have to use two memory chunks, and perform two instructions:

ADD a[0],b[0]
ADD a[1],b[1]

So in this respect, you can say that on this chip, 8-Bit arithmetic is twice as fast as 16-Bit arithmetic. (In fact it's actually a little more than twice as fast when it comes to doing comparisons).

In my Connect4 game however, the largest number I ever had to deal with was 48, so I never needed to use 16-Bit arithmetic. If I had chosen a 16-Bit CPU of identical specs, my Connect4 game wouldn't run any quicker because it's still executing the same amount of instructions.

Now, let's say I drank some of that "squishy 100% syrup" stuff and decided to make "Connect 4000". On my 8-Bit CPU, two instructions would have to be executed every time I want to add, subtract, whatever. However, if I had a 16-Bit CPU, only one instruction would be needed. So in that case, the 16-Bit CPU would be faster.

Today, most CPU's are 32-Bit. If you ask a 32-Bit CPU to add two 64-Bit numbers together, it will perform two instructions on two chunks of memory. However if you ask a 64-Bit CPU to do the same thing, it'll only perform one instruction.

Now the thing about 32-Bit is that you can represent a pretty damn big number with 32 bits, the maximum value being 4.2 billion. Now just think to yourself... how often do you use numbers greater than 4.2 billion?

So there's the first benefit of 64-Bit over 32-Bit:
If you're dealing with numbers bigger than 4.2 billion, then you're executing half the amount of instructions that a 32-Bit machine would.
There's another thing though. Let's say you have 2 megabytes of memory that you want to copy to another part of memory. A 32-Bit machine will do this 32 bits at a time. A 64-Bit machine will do this 64 bits at a time.

So there's another benefit of 64-Bit over 32-Bit:
For operations on large sections of memory, you'll get things done twice as fast
If you have a program that doesn't deal with numbers greater than 4.2 billion, and if your program doesn't do operations on large sections of memory, then I don't think you'll see a difference between a 32-Bit CPU and a 64-Bit CPU.

For instance, the password generator I wrote recently wouldn't run any better on a 64-Bit CPU because it dealt with small numbers and didn't do any operations on large sections of memory. However, there are quite a few algorithms that benefit from having a "bigger bitness" CPU.

Remember when I was talking about the Big Number library for C called "GMP"? Well if you use that to deal with ridiculously big numbers, then a 64-Bit CPU is way better than a 32-Bit CPU. They have some info about it on their own website http://gmplib.org/32vs64.html (for anyone who's hesitant to click on links, it's actually a nice light interesting read).

And just one final thing... the whole thing about "porting" a program from 32-Bit to 64-Bit. When I got into programming, it wasn't long before I took a liking to portable programming. If a program is written properly with portability in mind, then you won't need to port it. For instance, when I'm writing a program in C, you'll see me use integer types such as "uint_fast64_t". What this does is ask the compiler to choose the fastest integer type for doing 64-Bit arithmetic. So if my program is compiled for a 64-Bit machine, it'll use a 64-Bit integer type instead of using two 32-Bit integer types. No need to play around changing the code for the "next generation".

But alas, you have some really bad programmers out there who unnecessarily take the flexibility and portability out of their code. Microsoft are particularly bad at this. I almost vomit any time I have to take a look through <windows.h>.

When I want to write a portable algorithm, I think about the following:
1) Will it run on a machine that has a 9-Bit byte? (They have existed)
2) Will it run on a super computer where a byte is 64 bits?
3) Will it run on a machine where there's "padding" in the integer types (some supercomputers can don't integer arithmetic, instead they do floating point arithmetic and just pad any integer values with zeroes after the decimal point).

So anyway, at the end of it all, the more bits the better (assuming the CPU can still run at the same speed). I'm reminded of cars a little bit here. If someone said to me, "Would you prefer a larger capacity engine or more efficient combustion"?, it's say "Both!". Similary, if I had a 32-Bit CPU and someone asked me "Would you like a 64-Bit CPU or would you like a 32-Bit dual-core"? then I'd have to say "Both!" as well!