Adam Howitt's Blog

Jan 18
2005

My theory on Google and CSS ignorance

I am currently conducting an experiment to prove whether the use of CSS unordered list elements to display your page navigation prevents the discovery of the pages it refers to.

I launched two sites a month ago: one was the cflunch.com site which has regular href tags to refer to the other pages in the site;  the other site was Pittman Guitar Repair which has no inline hrefs on the front page with unordered lists being used with hrefs as the list items to provide site navigation and sub-navigation. 

Since launch, Google has happily discovered the front page of each site but for the guitar site, it has stopped indexing at the front page, like it doesn't recognize the hrefs embedded in the unordered list.  The CFLunch site has been fully indexed by Google during the same period so I have a clear benchmark. 

Here is the list in question:

<ul id="navtext">
<li><a href="index.cfm?mode=entry&amp;entry=100C098D-9E1D-2003-099BBC48283D5826">HOME</a></li>
<li><a href="index.cfm?mode=entry&amp;entry=0FF1DA5D-029D-BE39-CCB5D59ED056BAE2">ABOUT US</a></li>
<li><a href="index.cfm?mode=entry&amp;entry=BEAE0EAD-D610-A6A8-65B5137E3A88990B">PHOTO GALLERY</a></li>
<li><a href="index.cfm?mode=entry&amp;entry=0FF7E695-B3B8-C5B4-46DFD11113F433AB">TESTIMONIALS</a></li>
<li><a href="index.cfm?mode=entry&amp;entry=0FF9695D-95BD-6065-DA4B9670F449E7A2">REPAIR TIPS</a></li>
<li><a href="index.cfm?mode=entry&amp;entry=F2482C7E-07BC-2494-5D2E849583584719">CONTACT US</a></li>
<li>&nbsp;</li>
</ul>

Two days ago I decided to switch from the use of the unordered list of hrefs on the guitar site and am now eagerly awaiting the return of Google. 

Here is the new format:

<div id="newNav">
<a href="index.cfm?mode=entry&entry=100C098D-9E1D-2003-099BBC48283D5826">&nbsp;HOME</a><br>
<a href="index.cfm?mode=entry&entry=0FF1DA5D-029D-BE39-CCB5D59ED056BAE2">&nbsp;ABOUT US</a><br>
<a href="index.cfm?mode=entry&entry=BEAE0EAD-D610-A6A8-65B5137E3A88990B">&nbsp;PHOTO GALLERY</a><br>
<a href="index.cfm?mode=entry&entry=0FF7E695-B3B8-C5B4-46DFD11113F433AB">&nbsp;TESTIMONIALS</a><br>
<a href="index.cfm?mode=entry&entry=0FF9695D-95BD-6065-DA4B9670F449E7A2">&nbsp;REPAIR TIPS</a><br>
<a href="index.cfm?mode=entry&entry=F2482C7E-07BC-2494-5D2E849583584719">&nbsp;CONTACT US</a><br>
</div>

To check the progress, take a peak at this Google Search and compare it to this Google Search

We have examples of the unordered list being used successfully elsewhere here at Duo Consulting but I believe that the spiders are finding regular inline hrefs to make their way through the site.  I'll write a follow up when I have some evidence either way.

Comments (Comment Moderation is enabled. Your comment will not appear until approved.)
[Add Comment] [Subscribe to Comments]
  1. Boy, that really doesn't make much sense. I can't believe Google went through the extra trouble of programming their spider to ignore links that are embedded in lists.

    While odd that only the first page of the guitar site was indexed, it just doesn't make sense that a list would prevent the indexing.

    Even if your newly formatted site becomes more fully indexed I think a larger test sample - or a comment from google - would be in order to fully prove the correlation between lists and indexing.

    I'm curious - is the site structure of cflunch and the guitar site similar? For example, I notice on the guitar site that every page is actually index.cfm and then differentiated by url paramters - perhaps (and I don't know, I'm just playing devils advocate) but perhaps it is the parameters (or their formatting) that is hindering the indexing and not the list.

  2. Very interesting. I wonder why Google would care whether the href is in a list or not. I'm anxious to see your followup posts.

    Christian

  3. This is my first experiment. The next test will be to do a more rudimentary example with the help of Jeff, the CSS guru here at Duo, with one entry page of list links and one regular href in the same page and see which links are indexed. Obviously I'll share that too.

  4. Be very interested to see how this one goes.

    In my own view, i have discovered Google favours clean url's over and above ones with query strings appended. I have a number examples of sites where i have proved this to be the case. One example being that Google picks it up within 5minutes of posting new content.

  5. Just an observation, but the links themselves seem complicated, and different from one to the other. Specifically, the top uses & and the bottom uses an ampersand all alone.

    Is it the complexity/difference that causes the problem (spider can't figure out what to index)?

    What about a css styled list with just page1.cfm, page2.cfm, page3.cfm (just to test if it's the li/css combo that's problematic).

  6. Whoops, that was the blog editing tool which added the first batch of ampersand escape codes. They were actually ampersands in the original.

    As for the complexity of the URLs (ref: Alan Williamson) - I have had no issues with my blog getting indexed and it is built on the same engine/ url scheme. I think the next test will be more telling...

  7. I think you're comparing apples with oranges here. Googlebot may have spent more time indexing the whole of the cflunch site because of a higher number of inbound links. The guitar site currently claims to have had 178 visitors, so it may not yet be popular enough for Googlebot. PageRank is all a bit of a grey area, and the reason why SEO people are paid so much!

    Also, small nitpick, but using unordered lists isn't really CSS. It's good old fashioned HTML: http://www.w3.org/TR/REC-html40/struct/lists.html#h-10.2

  8. I would imagine that he is using styled unordered lists (which is pretty popular now-a-days) to give structure to what was otherwise just a bunch of unrelated text bits flowing one after another on a page.

    In the last year or so, thanks to the increasing CSS support found in browsers more and more navigation systems are being developed using Unordered lists and CSS in conjunction to create visually interesting navigation menus, while also providing nice structure to the document.

    I think most people who visit his site knows that an unordered list, in and of itself is plain old HTML - but I would also put money on the fact that his guitar site was using CSS to make the list more visually interesting - just as he is probably doing with the DIV and the anchors now.

  9. Google ignores query strings so if you build a site using fusebox or other such methodology Google won’t index the site. I have heard that Google discriminates against dynamic sites it definably likes html pages better then cfm or other pages. You can test this by checking the ranking of various sites.

  10. I wholeheartedly disagree with the suggestion that "Google ignores query strings so if you build a site using fusebox or other such methodology Google won’t index the site". I have a considerable number of examples to the contrary with and without SES Url conversion: <br /> 1. Fusebox 2: http://www.lcfpd.org <br /> 2. Fusebox 3: http://www.myphotopia.com <br /> 3. Fusebox 4 (with SES): http://www.duoconsulting.com <br />

    I will concede that Google prefers well-formed HTML better than non-validating HTML but would suggest that the ranking of the sites is independent of the implementation and specifically relates to the inbound link popularity, quantity and the quality of the actual content in terms of best match.

  11. I user to work at a company that spent a lot of time on trying to get a higher ranking on Google and html pages seem to get higher ranking and query strings get ignored. There are a bunch of people that study this stuff and it not just well formed html.

    I don’t see what you are proving with the sites you list most of the pages don’t have any rankings?

  12. Again, I disagree about fusebox and url parameters. The following google search clearly proves that google has indexed 408 pages of a fusebox 3 site with query string parameters which you claimed were ignored.

    http://www.google.com/search?hl=en&lr=&client=firefox-a&rls=org.mozilla:en-US:official&q=+site:www.myphotopia.com+adam+howitt%27s+photos

    As for html pages ranking higher than cfm pages, search google for "adam howitt". The second match is a .cfm page. Scan down the list and result 11 (for me at least is the first html page listed). I don't know how you could draw any other conclusions unless you linked a .cfm and .html page with identical content and then linked the cfm first and was still able to show a better ranking for the html page.

  13. eli - this was never about rankings, it was about Google's spiders acutally finding the pages. Whether or not they rank highly isn't a requirement.

  14. my mistake

  15. I will be posting a follow up later today with some results since the spiders have been and I'm waiting the results to enter the listings...

  16. Adam, what's the update?

    Eli, here's a few sites with fuseactions in the URL that google indexed: http://www.google.com/search?q=allinurl%3Afuseaction%3D

  17. Here's an update: I've posted my findings today at http://www.webdevref.com/blog/index.cfm?t=Google_Update&mode=entry&entry=31BBD890-D610-A6A8-64B37A2CF75C59AF and a new experiment. If anyone else has ideas as to why this site seems to be getting ignored by google, I'm all ears!

[Add Comment]