I'm migrating my blog from Typepad to my own product (Sampa).
Please, update your blog feeds:
Address: http://marcelo.sampasite.com/brave-tech-world/default.htm
I'm migrating my blog from Typepad to my own product (Sampa).
Please, update your blog feeds:
Address: http://marcelo.sampasite.com/brave-tech-world/default.htm
June 06, 2006 at 11:21 AM in Blogging, Development, Products & Companies, SampaSite, Startup Business, Web 2.0 | Permalink | Comments (0) | TrackBack (0)
UPDATE: This blog has moved to http://marcelo.sampasite.com/brave-tech-world/default.htm . Please, update your subscription. Click to subscribe on Bloglines.
Guy Kawasaki, a Venture Capitalist with technology roots (Apple), writes the top ten lies of engineers. He is sooooo wrong!
His list doesn’t come even close with the worst lies of engineers. It looks like he never talked directly to an engineer, only to the engineer’s manager that passed the message to him.
Here is what I think are the worst ten lies:
I could continue on and on with this list. I’ve worked on Microsoft for too many years with many different types of engineers to see all the lies. And don’t take me wrong, most engineers don’t even know they are lying, they truly believe when they say the things that they say.
April 28, 2006 at 01:41 PM in Development | Permalink | Comments (3) | TrackBack (6)
Yesterday I wrote a post about a bizarre bug on IE rendering when you scrolled the document. Which led to a comment by David Woods hitting at the usage of background images. And he was right. This bug started showing when I added a 800x600 background image to my page.
Searching I've found a blog post where Mike Golding had a similar bug and knew the root cause of the issue. The summary is that if you have a BG image on an object on top of a part of the page that is transparent, IE might display it incorrectly when you scroll the page.
My bug was because the image was on the body, and there is nothing behind the body! But actually, there is one thing behind a body background image: the body background color!
So, when I changed my CSS from:
body }
to
body }
The problem went away!
April 28, 2006 at 10:54 AM in Development | Permalink | Comments (0) | TrackBack (0)
Talking about bizarre problems with IE rendering engine, take a look at this...
This is what the screen should look like:
This is what happens if I use the vertical scrollbar:
I don't even know how to describe this issue and where to get start to investigate it.
April 27, 2006 at 10:28 AM in Development | Permalink | Comments (1) | TrackBack (0)
UPDATE: This blog has moved to http://marcelo.sampasite.com/brave-tech-world/default.htm . Please, update your subscription. Click to subscribe on Bloglines.
No, I'm not taling about Web 2.0. I'm talking about the .NET Framework 2.0. I was ready to migrate all 170,000 lines of code from Visual Studio 2003 to Visual Studio 2005. Mostly, I want the benefits of the IDE and the possibility of building a few parts of my system in .NET 2.0, while I keep others in 1.1. That turn out to be a problem according to this MSDN article:
"... Visual Studio 2005 does not allow you to choose to support version 1.0 or version 1.1 of the .NET Framework. You can only create projects that support version 2.0. ..."
My problem is the fact that I have a small client application called Sampa Uploader to help users easily and quickly upload pictures to the site. This is built on .NET 1.1 and I didn't want to force users to install the 2.0. Less than 50% of my Windows users have .NET 1.1, which it is a pain for them when installing Sampa Uploader and probably 1% have 2.0. Who wants to download 30MB to install a 500Kb app?
The only reason that I built Sampa Uploader in .NET 1.1 was because it took me less than 2 days to do it. The entire app has less than 1000 lines. I was already decided to move to C++ and remove any dependency on any external component, but that is going to take about 5 days to do it and I can't fit on the schedule.
Now, why we don't have a standard protocol that websites/services and client applications can use to upload pictures is beyond me. We have MetaWeblog, RSS and OPML, but nothing for pictures. So, you need one app for ShutterFly, one for Flickr, one for SampaSite, ...
March 30, 2006 at 08:06 AM in Development | Permalink | Comments (0) | TrackBack (0)
This is a continuation to my tips "Using StringBuilder the right way" and "Pre-allocate your collections". On both tips I mention the importance of helping your application by giving it estimates of how big a string or collection will end up. Pre-allocating the right amount, will give you optimum performance in size and speed.
But a lof of times it is hard to do it. If you are building a piece of text on the fly, based on a lot of conditions, like if you are building a HTML snippet, it becomes hard (or impossible) to have a deterministic calculation of how much space will be needed. On those cases, you can resort to a trick that I call MyStringBuilder.
The idea is to implement a class that encapsulates the StringBuilder. Besides exposing exactly the same methods as StringBuilder, this class also stores the initial allocation on the StringBuilder, and the only additional method that it really implements is to detect when the app calls ToString() and compute the difference between what was initial allocated, how much was used and what is the final allocation size.
I did implement this class in one of my projects to see if that was worth using or not. Oh boy, I learned so much about my code doing that. Basically, on the ToString method I had two threasholds.
First, If the initial allocation was more then the current size, I would report if there was too much space unused on the string. For example, if the initial allocation was for 1000 characters, but I only used 100 characters, I would report that. Second, if the initial allocation was less then the current size, I would report it.
When I say "report it" it means that I would save to a file (CSV). Run the application, collect the CSV and use Excel to detect the most outrageous discrepancies, fix the code, and repeat. Until I felt there was very little that I could fix in terms of StringBuilder allocations. Then, I did a search-and-replace on the code for all instances of "MyStringBuilder" to "StringBuilder".
There are just two tricks that you need to know to get a great result of this method, is to collect the initial allocation and the caller method of the MyStringBuilder constructor. Second, you must remember to use a buffered file stream, otherwise you'll have an I/O operation for each call to ToString().
public class MyStringBuilder
{
private StringBuilder _sb;
private int _Initial;
private string _Callee;
public MyStringBuilder()
{
_sb = new StringBuilder();
_Initial = _sb.Capacity;
StackTrace st = new StackTrace(1, true);
StackFrame sf = st.GetFrame(0);
_Callee = sf.GetFileName() + " - "
+ sf.GetMethod().Name + " #"
+ sf.GetFileLineNumber();
}
public MyStringBuilder(int capacity)
{
_sb = new StringBuilder(capacity);
_Initial = _sb.Capacity;
StackTrace st = new StackTrace(1, true);
StackFrame sf = st.GetFrame(0);
_Callee = sf.GetFileName() + " - "
+ sf.GetMethod().Name + " #"
+ sf.GetFileLineNumber();
}
public MyStringBuilder Append(char c)
{
_sb.Append(c);
return this;
}
... // Override every method and property on StringBuilder
public override string ToString()
{
double usage = (double)_sb.Length/(double)_sb.Capacity;
if(len > 128 && (usage < 0.5 || _Initial < _sb.Length))
{
// Write to file: _Initial, sb.Length, sb.Capacity, usage
}
return _sb.ToString();
}
}
You can do the same thing for ArrayList, Hashtable, Queue, Stack, etc., however, most applications are spend a lot more time manipulating strings than other collections.
Found an error on this article: Post a comment.
Have additional data relevant to this article: Post a comment.
Have a tip for C#: E-mail me.
March 22, 2006 at 12:44 PM in Development | Permalink | Comments (2) | TrackBack (0)
This tip is very similar to my previous tip on StringBuilder performance.
The basic idea is that when you are adding items to a collection (ArrayList, Hashtable, SortedList, Stack, etc.), you should pre-allocate the collection with an estimated amount of data that it will hold. A simple example is this:
public ArrayList MergeArrays(ArrayList al1, ArrayList al2)
{
ArrayList alMerged = new ArrayList(al1);
alMerged.AddRange(al2);
return alMerged;
}
Above are the minimum lines of code that you can use to write this merge function, but it completely fails to predict the final size of the merged array, thus, potentially causing a huge performance hit.
Imagined that the first ArrayList (al1) contains 16,000 elements, while the second ArrayList (al2) contains only 10 elements. The final ArrayList will have 16,010 elements, but since you forgot to tell your ArrayList constructor that, it will allocate 16,000 elements, copy the content of the first array, then, re-allocate 16,010 elements, copy the original 16,000 and add the remaining 10. That is just plain stupid.
One of the problems with high-level languages is that it abstract the low-level complexities involved on certain operations. When I begun using C++ (before I used C), I was obsessed with figuring out the final Assembly. I was always afraid that C++ would do something on my loops that I didn't know and it would affect my code. Well, C# is orders of magnitude more complex on that sense. It first compiles to IL, then to Assembly. On top of that you have all the .NET Framework that does a lot of stuff for you (for C++ developers, think of how STL made you uncomfortable at first). My point here is that while C/C++ have a pretty reasonable 1-line-of-code = x-assembly-commands, in C# you should be worried about having too little lines of code to do complex operations. In C#, sometimes less is less! :)
Here is a more efficient code:
public ArrayList MergeArrays(ArrayList al1, ArrayList al2)
{
ArrayList alMerged = new ArrayList(al1.Count + al2.Count);
alMerged.AddRange(al1);
alMerged.AddRange(al2);
return alMerged;
}
Your solution: Search your code for any occurrance of "new ArrayList();", "new Hashtable();", "new SortedList();", etc. For each line that contains those strings, ask yourself what is going to be the final size of the collection. Sometimes it is a static value, like 20 elements, sometimes is the sum of some variables, and sometimes you'll won't be able to predict, but you can predict an estimate (either an upper-bound, or a 90-percentile value).
Found an error on this article: Post a comment.
Have additional data relevant to this article: Post a comment.
Have a tip for C#: E-mail me.
March 20, 2006 at 01:14 PM in Development | Permalink | Comments (0) | TrackBack (0)
Most people that write JavaScript (my random assumption) are not seasonal JavaScript developers. They write it because they need something to work on a Webpage, but their primary language is C++, C# or Java. Since the syntax of JavaScript and those languages are so similar, they assume the behavior is very similar as well. Most of the time that is true, but not always.
I'm not going to explain the nitty-gritty of the differences, but one very important: variable scope.
C/C++ have for the longest time what could be considered a bug (it has been reason of a lot of debate), that is...
for(int i = 0; i < 10; i++)
{
// do something
}
printf("%d", i);
The code above is perfectly legal and works fine on C/C++. The declaration of the variable i inside the for statement has the scope outside the for loop. However, on Java (I think) and C#, the usage of the variable i outside the for-loop is invalid -- not to mention that there is not such thing as printf. :)
On this sense JavaScript is like C/C++. The variable will be valid outside the for-loop.
So far, so good. Now, look at JavaScript example:
function Test()
{
var i = 11;
while(true)
{
var i = 84;
break;
}
alert(i);
}
What do you think is going to happen? A message box with the value 11 or 84? Intuitively, most developers think that 11 will appear, and that is not how JavaScript works. There are basically only two scopes: Global and function-level. Any variable declaration will be either on those scopes. So, on the example above the "var i = 84;" line is an invalid statement because i is already declared on the scope, and the interpreter will assume the developer just meant "i = 84;", which assigns the 84 value to the previous i declaration.
There are *SO* many bad JavaScript with this error out there that is mind-boggling. MSN Virtual Earth, Google Maps, Windows Live, eBay, Amazon, etc.
The recommendation is to always declare all variables that a function will use at the beginning of the function (remember the old-days of the original C-compiler?).
And here are other typical error on JavaScript:
I think what the development community really lacks is a good JavaScript editor and debugger. One that can work like a compiler, and tell you upfront of problems with the code.
March 17, 2006 at 05:45 PM in Development | Permalink | Comments (0) | TrackBack (0)
Today I'll explore a very common mistake when building strings in C#. You can actually find a lot of examples on MSDN that have this mistake. And I'm not talking about your traditional string concatenation.
Before I started, check out my C# tip #1: Using High-precision CPU measurements to understand how I'm measuring the cost of the tests that I'm going to do.
You should know that strings (System.String) is an immutable class. This means that the content pointed by a string will never change. And you were taught to use StringBuilder when building a string in a loop for example.
StringBuilder is somewhat flexible, allowing you to not have to worry about allocation (it grows as needed), as well as provide some simple Replace methods. Here is a typical usage of StringBuilder:
public string ConcatArray(string[] values, string separator)
{
StringBuilder sb = new StringBuilder();
foreach(string val in values)
{
if(sb.Length > 0)
sb.Append(separator);
sb.Append(val);
}
return sb.ToString();
}
I'm sure you can find a few ways to optimize this function, but the biggest mistake made (IMHO) was not to pre-allocate the StringBuilder with an estimated length. The problem with any type of library (.NET, STL, MFC, etc.) is that it is doing a lot of things behind the scene, and you don't feel the pain (ah, the good ol'days of C where you had to write everything from scratch).
Here is the deal, by default, StringBuilder allocates 16 characters (32 bytes) of space. As you start building your string, if it goes beyond those 16 initial characters, StringBuilder will allocate double of the current size, copy the existing string and append the new value.
StringBuilder sb = new StringBuilder(); // 16 chars allocated
sb.Append("abcdefghijklmnop"); // append 16 chars
sb.Append("1"); // Re-allocate, copy and append 1 char!
If you knew upfront what the final size would be, you could have instantiated the StringBuilder and tell it what initial capacity it should use, as in:
StringBuilder sb = new StringBuilder(17); // rounded to 32
sb.Append("abcdefghijklmnop"); // append 16 chars
sb.Append("1"); // Append 1 char (no re-alloc!)
Using the high-precision CPU measurements on the two functions above, pre-allocating the StringBuilder resulted in a 29% improvement in performance.
So, our initial ConcatArray function can be re-written this way:
public string ConcatArray2(string[] values, string separator)
{
int init = 0;
foreach(string val in values)
init += val.Length + separator.Length;
StringBuilder sb = new StringBuilder(init);
foreach(string val in values)
{
if(sb.Length > 0)
sb.Append(separator);
sb.Append(val);
}
return sb.ToString();
}
Most people would never do that because going through the string array twice seems like the opposite of writing good performance code. There is only one way to figure out: Measuring it!
The table below describes the differences between ContactArray and ConcatArray2 for a different number of elements and sizes on the array. The values are milliseconds to run the routine 100,000 times (best of 10), in bold is the best function (separator is a 2-char string):
ContactArray | ConcatArray2 | |
2 elements, 15 chars each | 35.5 | 30.7 |
2 elements, 80 chars each | 56.0 | 43.7 |
10 elements, 15 chars each | 183.9 | 143.0 |
10 elements, 80 chars each | 385.0 | 193.1 |
80 elements, 15 chars each | 1331.1 | 1097.6 |
80 elements, 80 chars each | 2842.5 | 1506.4 |
No need to say that even going through the array twice, it is more efficient to pre-allocate a StringBuilder than to let .NET do its magic for you. If you look closely to the table above you will see that the biggest gain is when there are longer strings.
However, you can't always know the size of a string upfront. For example, if you are parsing a file, or receiving data through the network and you can't say for sure what is the size that the string will be, how do you do it? Well the easy (and wrong) answer is to use the average. Another wrong answer is to use the maximum size allowed.
If you use the average, half of the allocations (in theory) will need at least 1 re-allocation. If you use the maximum size allowed for your need, you will have no re-allocation, but a lot of unused space allocated.
The real answer is that it depends on the data distribution. Do you expect the resulting string to be between 50-75 characters, or between 50-7500 characters? And what is the most common case? Notice that for 50-75 the average would be 63 characters (.NET will round to 64), allocating 64 or 80 characters won't make much of a difference. However, if the range expected is between 50-7500 the average is 3775. Again, it depends on your application, but unless the values above 3775 are the exception (< 10% of the time), you would be better pre-allocating something like 6000 to try to cover 90% of the cases.
Don't be so paranoid as to allocate exactly the string size. Always add some extra room for error, like 16, 64 or 128 characters. Whatever makes sense on your case.
Warning: Be extra careful when allocating very long strings. Anything above a certain range (16KB I think), .NET will immediately use the 2nd generation of the garbage collector. That could bring your application to a crawl. So, when you thought you were safe to allocate 4000 chars (8KB), and you add just 1 more char, StringBuilder will double the allocation to 8000 chars (16KB) and that will be marked as a 2nd generation allocation. D'oh! And all you needed was 4001 chars.
To finalize, here is the ConcatArray function with some optimizations to make it even faster (optimized for 1, 2 or 3 elements):
public string ConcatArray(string[] values, string separator)
{
if(values.Length==1)
return values[0];
if(values.Length==2)
return values[0] + separator + values[1];
if(values.Length==3)
return values[0] + separator + values[1] + separator + values[2];
int init = 0;
foreach(string val in values)
init += val.Length + separator.Length;
StringBuilder sb = new StringBuilder(init);
sb.Append(values[0]);
for(int i = 1; i < values.Length; i++)
{
sb.Append(separator);
sb.Append(values[i]);
}
return sb.ToString();
}
By the way, this technique should be used not only with StringBuilder, but also on ArrayList, Hashtable, Stack, etc.
Found an error on this article: Post a comment.
Have additional data relevant to this article: Post a comment.
Have a tip for C#: E-mail me.
March 15, 2006 at 11:46 AM in Development | Permalink | Comments (2) | TrackBack (0)
Over the past year or so, I've been collecting many C# tips and thinking that some day I might write a book. Since it can take 15-20 years for me to do so, and we would be using another language/platform by then, I decided to publish all my tips on this blog.
I hope to write 2-3 tips per week, and today is the first one. I hope you enjoy.
High-precision performance counters are easily available through the Platform SDK on Windows, but most people prefer to use DateTime to measure how long an operation takes to perform. That is a perfect fine way of doing things, as long as you are talking about tens of milliseconds of precision. This is the typical code to measure performance:
DateTime start = DateTime.Now;
DoTask();
DateTime end = DateTime.Now;
TimeSpan elapsed = end - start;
double miliseconds = elapsed.TotalMilliseconds;
The code above will give you a precision of roughly 15 ms (limitations of the OS, I think). But you know how to use a high-precision method, you will use the DateTime.Now.Ticks to do it, right? Wrong. That has exactly the same precision as the method above.
The only way (that I know of) to get higher precision timing functions is to access the Windows directly through the methods QueryPerformanceCounter and QueryPerformanceFrequency. Check out MSDN for more info on those methods, but here is a quick intro:
QueryPerformanceCounter: Number of clock ticks on the CPU since the system started.
QueryPerformanceFrequency: Number of clock ticks per second on the current system.
As a bonus, the number of clock ticks per second returned by QueryPerformanceFrequency is the speed of your CPU. On my single-proc 2.8 GHz Hyper-Threading Intel P4, the value returned is 2,793,070,000 (Intel, where are my 6,930,000 missing cycles?)
Here is how you access those functions:
class HiPre
{
[DllImport("Kernel32.dll")]
public static extern bool QueryPerformanceCounter(out long value);
[DllImport("Kernel32.dll")]
public static extern bool QueryPerformanceFrequency(out long value);
}
And this is how you measure your routine using the high-precision tick-counters:
long start, end, freq;
HiPre.QueryPerformanceCounter(out start);
DoTask();
HiPre.QueryPerformanceCounter(out end);
HiPre.QueryPerformanceFrequency(out freq);
double miliseconds = ((double)(end - start) / (double)freq) * 1000.0;
There is a big problem with the method that I just described. It assumes that the only thing running on your PC is the DoTask() routine and that is not true. Not only you have many applications and services running, but the OS and the Hardware itself are doing things. So, you still should put your routine inside a loop, run it a few (tens? hundreds?) thousand times and take the average. This will minimize external noise. Don't touch on the mouse, don't print anything, don't write to the console, disable IM, e-mail, IIS, SQL, etc. This will give you a much higher precision and make it easier to compare two different methods.
You should only spend time to measure the efficiency of routines that you know will be used thousands or millions of times per second. It is not worth your time to worry about what is the best way to load a config file every time the application start or anything related to user typing on keyboard or moving the mouse.
The method above is not good for measuring Network Performance, File Access Time, Graphics rendering, etc. Those things are way more complicated to measure than a simple routine like this and involve many things that are not CPU-related.
I like to "warm up" the application before I start. This is, run your routine at least once outside of the measurement loop. This will help the OS to load the all the code pages, data pages (if any) and it will warm up the CPU cache as well.
To summarize, use this code:
long start, end, freq;
int runs = 1000;
HiPre.QueryPerformanceFrequency(out freq);
DoTask(); // warm-up
HiPre.QueryPerformanceCounter(out start); // start
for(int i = 0; i < runs; i++)
DoTask();
HiPre.QueryPerformanceCounter(out end); // end
double msPerTask = (((double)(end - start) / (double)freq) * 1000.00)
/ (double)runs;
Just be aware of the following other caveats: First, the for loop itself will increase the time to run the routine, and, second, using a routine inside a function will also cause a function call (Duh!), and that might be or not what you want to measure.
Found an error on this article: Post a comment.
Have additional data relevant to this article: Post a comment.
Have a tip for C#: E-mail me.
March 14, 2006 at 01:31 PM in Development | Permalink | Comments (1) | TrackBack (0)