by Lyle Scott, III

Getting Front-End Routed URLs with AJAX Content Indexed & Crawler Friendly

THIS DOCUMENT IS NOW OBSOLETE. It was half obsolete when I originally published it, but oh well... :)

My scenario: Over at TideNugget.com, I have a Google Map that contains "markers" that you can click on to bring up a modal that loads location specific tide and weather information. Since the information is only within a modal with dynamically built content via AJAX, rather than a brand new page, all relevant information isn't reabily available to search engine crawlers that just hit the site.

The remedy to this is popular for people that deal with SEO, but it took me a few reads over how google want you to do it to get it right. 

Front End Routing

Step one is that you have some front-end routing in place to attach actions to based on the URL's hash. This would normally replace the logic behind the 'click' event on a link or action that loads the content via AJAX with a sinmple window.location.hash change where a routing library then handles your request. In the end, this will give http://yousite/#/some_state some meaning, where some_state gets hooked into your routing library and calls the things that once were fired by manually triggering the event to load the AJAX content.

An example link is  http://tidenugget.com/#!/bookmark/tampa-bay-st-petersburg

There are lots of libraries that offer this capability. Google will have more answers on some of the varieties, but I tend to use Finch because it's dead simple and lightweight.

A quick example (CoffeeScript) of how I uses it in the above scenario for those who are curious:

$ ->

    Finch.route '!/bookmark/:placeSlug', ({placeSlug}) ->
        marker = window.markers[placeSlug]
        if not marker
            return

        window.current_marker = marker
        $('#marker-details-modal').modal({show: true})

    Finch.listen()

What ever library you choose to use is irrelevant to the crawler. You just want to have a link you can tell the crawler about that replaces what would normally be loaded via AJAX with a normal request with a static response that has SEO attributes unique to that resource. This way, each link will look like a different resource and search engines will have less of a chance of tagging it as a duplicate of another page (and therefore, no indexed.

Crawlable URLs

Step two is to make these links available to search engines so that it knows which links contained on the site are direct links to AJAX content. This is achieved by the search enginess' crawlers seeing a specific thing on the URL (an exclaimation mark after the '#' in the hash, in this case) to signify that the link is one that loads AJAX content. It then provides a GET argument that we can handle on the server side so that we have a flag to indicate that the request is coming from a crawler and static and SEO optimized content should be uniquely returned for that resource, instead.

So http://yoursite/#myhash becomes http://yoursite/#!myhash.

Now, the crawler will inspect this link. Google and friends then convert the URL http://yoursite/#!myhash to http://yoursite/#!myhash/?_escaped_fragment_= (if you had GET arguments, they will just be appeneded like normal) when it comes time to get crawled. This is so developers have a paramater to use as a flag to known when to generate a static HTML page that is a more crawler friendly version of the page.

For example, I intercept this paramater in my Django view and display a template that is more appropriate for a search engine and not meant to be visited by humans. I try to update as many things that will aid in SEO and helping my content get indexes correctly.

  • page title is updated
  • meta keywords are updated
  • meta description is updated
  • h1 element is present with a more descriptive title
  • text i want to be indexed is thrown in a p element
  • limited JavaScript and CSS is generated to get response times down

Get Indexed

You can of course submit all of the links in the "webmaster tools" sectiion of your favorite search engines, but this is alot of work and you won't be listed on a lot of tinier search engines.

It's far easier to list them directly in the sitemap for your site so the crawlers will pick them up automatically.

<urlset>  
  ...
  <url>
    <loc>http://tidenugget.com/#!/bookmark/adak-island-adak-bight</loc>
    <changefreq>weekly</changefreq>
    <priority>0.5</priority>
  </url>
  <url>
    <loc>http://tidenugget.com/#!/bookmark/adak-island-adak-island</loc>
    <changefreq>weekly</changefreq>
    <priority>0.5</priority>
  </url>
  ...
</urlset>  

Google took about a week to index all my 4280 links that I submitted this way. They are slowly showing up on other search engines.

Python and the Underscore Prefix

Underscore prefixes in Python provide a way to protect functions, methods, and variables...kinda. In Python, any notion of private variables simply does not exist. There are, though, some pythonic ways to declare that a variable, function, or method shouldn't be consumed outside of where it is being directly used.

Single Underscore

When you prefix something with a single underscore, it politely asks developers that interact with that code that the thing being prefixed should not be used in any direct manner other than calling it within the scope for which it was defined in. If you see it in third-party code, it means that you should not use or depend on it in any way.

Note, though, that the thing having a single underscore prefix can still be used as if it didn't have an underscore in the name, so having the prefix underscore is only symbolic and only represents some advice that hopefully people adhere to.

For example, 

class FooBar(object):
    foo = 'abc123'
    _bar = 'qwerty'

    def foofunc(self):
        print 'foofunc!'

    def _barfunc(self):
         print 'barfunc!'

The internal representation of the class looks like you think it would, listing the names of the defined variables and methods exactly as they were defined.

print dir(FooBar)
[ ..., '_bar', '_barfunc', ..., 'foo', 'foofunc' ..., ]

Abuse is easy, though. You can still use the single underscore prefixed things with their name like you would any other variable or method. Get why I mentioned it was based off the honor system?

foobar = FooBar()
foobar.foofunc()   # foofunc!
foobar._barfunc()  # barfunc!
print foobar._bar  # qwerty
foobar._bar = 'hello'
print foobar._bar  # hello

This example was with class methods and variables. The same rules apply to a variable or function definition in the module's scope.

Double Underscore

When you prefix something with a double underscore, it sternly implies to developers that interactict with that code that the thing being prefixed should absolutely and positively not be used in any direct manner other than calling it within the scope for which it was defined in.

The thing having a double underscore prefix becomes mangled, meaning that the class's variable or method gets renamed internally to protect the variable from being used directly. Like a single underscore prefix, this protection is only symbolic and is still based on the honor system, thought it is harder to use the variable or method.

Abuse is still possible, especially given that the result of manging always has the same pattern: _TheClassName is internally prefixed to the internal attribute.

For example,

class FooBar(object):
    foo = 'abc123'
    __bar = 'qwerty'

    def foofunc(self):
        print 'foofunc!'

    def __barfunc(self):
         print 'barfunc!'

The variable __bar and __barfunc are both mangled internally.

print dir(FooBar)
[
 ...,
 '_FooBar__bar',
 '_FooBar__barfunc',
 ...
 'foo',
 'foofunc'
 ...,
]

As you can see, __bar and __barfunc are mangled using the FooBar classname. This makes direct access more difficult and deliberate.

foobar = FooBar() 
foobar.__barfunc()         # AttributeError: 'FooBar' object has no attribute '__barfunc
foobar._FooBar__barfunc()  # barfunc!

Though access is possible, it is far from good practice!

Python Class vs Instance Variable

There are a million topics written on this, so I'm not going to delve into gory details. Instead, just checkout the snippet below. I think it says it all. 

class variable is a variable that is shared between all instances of a class. You access it by using the class's name in the dotted reference, rather than self (unless you are in a class method, where the self or cls argument could be used instead).  For example, if Car is a class with a variable number_tires and Honda, Jaguar, and VolksWagon were all instances, if number_tires was changed in any of the instances, then the new value would be reflected when I assessed it from Honda, Jaguar, or VolksWagon instances.

An instance variable is scoped to a single instance of a class. Meaning, if Car is a class with a variable number_tires and Honda, Jaguar, and VolksWagon were all instances, if number_tired was changed in the Honda instance, then the new value would ONLY be reflected in the Honda instance and the Jaguar and VolksWagon would be left to what ever value they were.

Accessing a class variable with the self (instance) reference copies that variable into the instance's scope.

class FooBar(object):
    foo = 0
    bar = 0

    def __init__(self):
       FooBar.foo += 1
       self.bar += 1
       self.foo += 1

    def __str__(self):
        return '\n'.join((
            '--------',
            'FooBar.foo \t {} \t (class variable)'.format(FooBar.foo),
            'self.bar \t {} \t (instance variable)'.format(self.bar),
            'self.foo \t {} \t (instance variable)'.format(self.foo),
        ))

As you can see,

  • the class variable was incremented accross all instances
  • the instance variable was only incremented for the single instance that it was being used in
  • (bonus) a class variable used as an instance variable gets copied to the instance's scope and does not effect the class version if you were to alter it
--------
FooBar.foo       1       (class variable)
self.bar         1       (instance variable)
self.foo         2       (instance variable)
--------
FooBar.foo       2       (class variable)
self.bar         1       (instance variable)
self.foo         3       (instance variable)
--------
FooBar.foo       3       (class variable)
self.bar         1       (instance variable)
self.foo         4       (instance variable)