November 18, 2005



"You don't count MSN as a large site?"

I don't count him as having worked on it. Anyone can make say on their blog that they did anything, that means nothing. Look at MSN, they do not do the completely retarded things Marcelo lists.

"I stand by my statement: modern browsers probably render XHTML slower than HTML, so switching to XHTML because it is faster for browsers to parse is currently a waste of time."

Why do you stand by your statement if you have no idea what you are talking about? Html is an SGML markup. SGML has been around for a long time, and is very well known. Its also very well known that parsing SGML takes longer than parsing XML, because SGML is less strict, allows more possibilities, and more complex/ambiguous nesting. More options to search for means more time parsing, its just common sense. Add to that the fact that browsers can't even just use an SGML parser, they have to add a bunch of extra work arounds for the mistakes people make, and it gets even slower.

You are correct however that it is a waste of time to switch to xhtml to improve browser parsing speed. Because although it will speed up parsing, parsing is the tiniest most unnoticable part of the process, so speeding it up doesn't matter. Its downloading and rendering that takes all the time and benefits from being sped up. The advantage of using xhtml is that I can parse my documents from any language, quickly and easily, and transform it in very powerful ways. This is exceedingly difficult, slow, and unreliable with tag soup.


"Personal Blog of Marcelo Calbucci, founder & CEO of Sampa Corp. (www.sampa.com), and former Microsoftee."

Microsoft. That explains everything. No wonder his advice sucks.


Are you seriously proposing to use HTML instead of XHTML because is shorter than ? Not using quotes for attributes leads to invalid HTML and invalid XML. It might work, but it won't validate. That's a high price to pay for saving two measly bytes for every attribute.

... because [br] is shorter than [br/]...

Simon Willison

OK, I'll re-work my performance argument to the following:

The speed difference between parsing HTML and parsing XHTML is small enough to be irrelevant. XHTML does not allow incremental rendering (in current implementations) but HTML does. Hence for any large page and/or slow connection HTML will provide better performance.

So the argument that XHTML is faster to parse than HTML is non-useful.


you should start by stripping white spaces from your site :-)

seriously, some points are valuable but the rest is more or less worth only for large traffic websites.


Amazing, sad and amazing how many wanna-be xhtml advocates do not know basic things about HTML and the very same XHTML they advocate...

"Not using quotes for attributes leads to invalid HTML and invalid XML."

Seriously, read some specs, you will be surprised. :(


There is one very bad recommendation here: the DNS one.

Web browsers will typically only open two connections to a host at once. Your user will be downloading content two pieces at a time. Their bandwidth will be inefficiently utilized (especially if you have many small files) because a significant percentage of their time will be waiting on roundtrips, not downloading content.

Moving some content to different hostnames increases the number of files they can download at once. If the user is not bandwidth-limited they will achieve a very significant improvement in their page-load time. Even if the user IS bandwidth-limited (for example, an overseas modem user) they will still see a performance benefit because their internet connection will be working closer to their maximum speed. Their download rate while waiting on a response to a GET is zero - that's bandwidth the other simultaneous connections should be using.

Splitting your site into several smaller clusters also allows you to scale more efficiently (a static image server will scale much differently than your PHP servers).



“By default, SGML requires that all attribute values be delimited using either double quotation marks (ASCII decimal 34) or single quotation marks (ASCII decimal 39). Single quote marks can be included within the attribute value when the value is delimited by double quote marks, and vice versa. […]

In certain cases, authors may specify the value of an attribute without any quotation marks. The attribute value may only contain letters (a-z and A-Z), digits (0-9), hyphens (ASCII decimal 45), periods (ASCII decimal 46), underscores (ASCII decimal 95), and colons (ASCII decimal 58). We recommend using quotation marks even when it is possible to eliminate them.”

This is for HTML401. I remembered from somewhere that only numerical values were allowed quoteless, but in that case I was wrong.

Now for XML:

“Literal data is any quoted string not containing the quotation mark used as a delimiter for that string. Literals are used for specifying the content of internal entities (EntityValue), the values of attributes (AttValue), and external identifiers (SystemLiteral). Note that a SystemLiteral can be parsed without scanning for markup.”

I.e. single or double quotation marks MUST be used in XML.


3 seconds?? since when does google take 3 seconds to perform a search term??

for me it takes a tenth of a second and i have the my preferences set at 100 results per page.. (firefox)


"Avoid using tables" ? wot the! wot else do you do ?   all over the place??


