Thought I'd post a little about CFLAGS as I've been messing with them and learnt a few things.
What are CFLAGS ?
CFLAGS are arguments you can pass to your C compiler to optimize the building and running of your programs. You can add these flags to your Makefiles and also specify them to tools like configure to add to the generated Makefile. Your C compiler then picks up these flags and compiles your software in the way you have specified. CFLAGS are for the C compiler and CXXFLAGS are used for your C++ compiler. There are also LDFLAGS and CPPFLAGS etc but we won't be going over them here.
So why use CFLAGS ?
CFLAGS are used for two things, optimization and debugging software. We are going to focus on the optimization of your software using CFLAGS. You can speed up the way your computer reads the program and specify more options like features in the processor ETC. This speeds up the runtime of the program and takes advantage of all your box has to offer 
Safe CFLAGS.
Safe CFLAGS are flags like "-march=" which is used to specify your processor arch and is the main safe CFLAG to use globally. We will talk about global flags later in this tutorial. There are lots of CFLAGS but what can work for one program can destroy another so use with caution. Also be aware if you plan on sharing your compiled binaries with others if you specify something that is not supported by the other persons box they will not be able to run the program. However you can still specify a base and optimal setting by using the -march and -mtune flags for eg. If I wanted to optimize my program for my intel centrino DUO but wanted to shere the compiled program with a few m8s with various newish computers I would use these flags
Code:
-march=prescott -mtune=i686
-march is telling the C compiler that I have a Santa Rosa processor but mtune is telling the C compiler to also optimize for the standard i686 arch. If you are just compiling for yourself just use march if not specify mtune aswell.
-pipe
The pipe option is used to save on compile time by telling your C compiler to not create temp files while compiling instead pipe the output into the next function. This option will eat ram so if you only have a little to spare don't use this flag.
The -O Flag
-O set the optimization level to the C compiler. You can use this to your advantage to compile programs to have to most amount of optimizations "-O3" or the least "-O". Remember the more optimizations the bigger your compiled program will be. So don't go setting the "-O3" flag to something like GCC if not you will very quickly run out of HDD space
. Here are the four main optimization settings.
-O
Is the least amount of optimizations you can set. It will have the least impact on the size of the compiled program but will increase the runtime.
-O2
turns on all -O optimizations and all other optimizations that don't greatly increase the programs size or interfere with debugging. This option is the most common used flag in the linux world and probably is best to use as a global CFLAG.
-O3
turns on all -O2 optimizations and even more. This is the highest optimization setting you can use but its not necessarily the best. It makes your programs bulky and may not be the fasted flag to use. Also debugging is affected and makes the job near on imposable.
-Os
Utilizes all the -O2 optimizations but uses further tricks to make your programs smaller. If you are compiling for older or PDA's ETC this may be a good option to use as long as it doesn't cause any compile errors, if so use -O2.
Making these flags globally.
If you would like to use these flags all the time when compiling your programs you can by adding them to your /etc/profile bash script. Open up in kwrite or something and add something like this at the bottom.
Code:
CFLAGS="-O2 -fomit-frame-pointer -pipe -ffast-math -malign-double -march=prescott -msseregparm -msse3 -minline-all-stringops -fgcse-lm -fgcse-sm -fforce-addr"
CXXFLAGS="${CFLAGS}"
export CFLAGS CXXFLAGS
and save.. Remember to add your own flags and not use mine
.
Here's a little list to help you with choosing the right arch for your cpu.
i386 = i386
i486 = i486
487 = i486
Pentium = pentium
Pentium-MMX = pentium-mmx
Pentium Pro = pentiumpro
Pentium II = pentium2
Celeron = pentium2
Pentium III = pentium3
Pentium 4 = pentium4
Via C3 = c3
Winchip 2 = winchip2
Winchip C6-2 = winchip-c6
AMD K5 = i586
AMD K6 = k6
AMD K6 II = k6-2
AMD K6 III = k6-3
AMD Athlon = athlon
AMD Athlon 4 = athlon
AMD Athlon XP/MP= athlon
AMD Duron = athlon
AMD Tbird = athlon-tbird
Centrino Duo = prescott
And benchmarks on my laptop...
Benchmark 1 without CFLAGS...
Dhrystone Bench
BYTE UNIX Benchmarks (Version 3.11)
System -- Linux drgr33n 2.6.29.1 #1 SMP Sat Apr 4 13:53:11 GMT 2009 i686 Intel(R) Core(TM)2 CPU T5200 @ 1.60GHz GenuineIntel GNU/Linux
Start Benchmark Run: Mon Apr 6 19:29:56 GMT 2009
5 interactive users.
Dhrystone 2 without register variables 4131510.6 lps (10 secs, 6 samples)
Dhrystone 2 using register variables 4045697.1 lps (10 secs, 6 samples)
TEST BASELINE RESULT INDEX
Dhrystone 2 without register variables 22366.3 4131510.6 184.7
=========
SUM of 1 items 184.7
AVERAGE 184.7
Benchmark with "-pipe -O3 -fomit-frame-pointer -ffast-math -malign-double -march=prescott -mtune=prescott -msseregparm -msse3 -minline-all-stringops -fgcse-lm -fgcse-sm -fforce-addr" flags set.
BYTE UNIX Benchmarks (Version 3.11)
System -- Linux drgr33n 2.6.29.1 #1 SMP Sat Apr 4 13:53:11 GMT 2009 i686 Intel(R) Core(TM)2 CPU T5200 @ 1.60GHz GenuineIntel GNU/Linux
Start Benchmark Run: Mon Apr 6 19:51:50 GMT 2009
5 interactive users.
Dhrystone 2 without register variables 5755118.4 lps (10 secs, 6 samples)
Dhrystone 2 using register variables 5327306.5 lps (10 secs, 6 samples)
TEST BASELINE RESULT INDEX
Dhrystone 2 without register variables 22366.3 5755118.4 257.3
=========
SUM of 1 items 257.3
AVERAGE 257.3
As you can see by using custom CFLAGS I'm getting about a 40% increase in the average speed. You can use BYTE UNIX Benchmark to test your own CFLAGS. Happy tinkering 
ADDED:
With all the above cflags and native cpu flaged.
Dhrystone 2 without register variables 22366.3 5101827.8 228.1
=========
SUM of 1 items 228.1
AVERAGE
With all the above cflags and prescott flaged using -O2 optimizations.
Dhrystone 2 without register variables 22366.3 5376992.4 240.4
=========
SUM of 1 items 240.4
AVERAGE 240.4
These speeds can be afected by what my system is doing at the time of the benchmark, I am running them twice and posting the largest result.
References:
Gentoo WIKI: http://en.gentoo-wiki.com/wiki/Safe_Cflags