How Google's New Sitemap Protocol Sets a
New Standard for WebPage Submissions
Get more of your site indexed
faster, easier, and updated automatically! — by Ian Cook
Courtesy of SearchEngineNews.com
Ya
know how we've been saying for the past
several years that submitting your webpages
to Google is a virtual waste of time? ...how
Google likes to find links to your site's
pages and discover your new pages
on its own? Well, to paraphrase
Emily Litella— Never
mind, sort of. |
|
|
It's all-of-a-sudden very different now that Google
is beta-testing their new Sitemaps submission
protocol. The sun is rising on a fresh new way to
tell Google all about your new and updated webpages.
Google says...
Google
Sitemaps is an easy way for you to help improve
your coverage in the Google index. It's a collaborative
crawling system that enables you to communicate
directly with Google to keep us informed of all
your web pages, and when you make changes to these
pages.
With Google Sitemaps you get:
- Better crawl coverage to help people find
more of your web pages
- Fresher search results
- A smarter crawl because you can provide specific
information about all your web pages, such as
when a page was last modified or how frequently
a page changes
Creating your Sitemaps is easy
Use the Sitemap Generator to create an XML Sitemap or submit a simple text file with all your
URLs.
Get started today — it's free
Send us your sitemap today and help increase the
visibility of your web pages.
A More Efficient Way to Submit Your Webpages
...could even become 'The Standard' for all search
engines.
Up until now, getting new pages or sites indexed
by Google depended on external links from pages
that Google already knew about. Google's spider
typically revisits pages that are already indexed
and discovers new links that point to new
pages. The advent of Sitemaps (Beta), however, means that telling Google about new
or updated content can be as straight-forward
as presenting a specifically formatted list
directly to them. In short, the submission process
is back—reincarnated in an altered form that outsources
some of the heavy lifting to webmasters and site
managers.
Google Sitemaps invites you to place a specially
formatted site map file on your web server.
Then, whenever you notify them of new sites, pages,
or updated content, their spider will crawl your
pages. You're even invited to prioritize your
pages and inform Google (they call it a hint) of your update frequencies.
Imagine that! ...the only catch so far is, there's
no actual guarantee that your pages will get indexed
although we strongly suspect that most pages will.
Getting Started with the Sitemap Protocol
To get started you'll first need to sign-up (registered Google Accounts users can skip the
sign-up). After logging in, you'll
get a welcome page like this...
|
Note:
Clicking this replicated page image takes
you to Sitemap's login/sign-up start page.
|
Once you get past the technical terms, it's
actually not so hard.
Important
Note: |
All
Sitemap files, whether plain text
or XML, must use the UTF-8
character set. This is not
the default character set usually
used in most common text editors.
Check the options in your Save
As box for an encoding option. If you do not use UTF-8,
you will see a parsing error when you submit an otherwise
valid Sitemap. Also
important to note: All URLs
must be XML
encoded. This means that,
among other things, all ampersands
(&) in URLs must be replaced
with their equivalent HTML entity—i.e.,
&.
See the following w3.org document
for more in-depth technical details
about XML character encoding:
http://www.w3.org/TR/REC-
html40/appendix/notes.html#h-B.2
|
|
Rest easy–Google accepts site maps in
either plain text or XML to accommodate the
cross-section of webmaster expertise. Now, if
you're a bit unsure about XML, relax—we'll simplify
what you need to know in a minute. Highly technical
site managers will, however, recognize how XML
allows them to create a fully automated XML
feed—a script on your web server that monitors
site changes in order to automatically regenerate
and resubmit your feed to Google.
Even so, for the less technically inclined webmaster,
a simple list of website URLs can be submitted
and resubmitted manually whenever there is new
site content for Google to crawl and index. In
other words, Google offers the best of both worlds—a
simple submit process for the non-technical web
master and a more advanced option that allows
for useful automation and maintenance of the submission
process for the technologically enabled.
Here's how it works
The least technical way to submit your site's
webpages is to create a simple sitemap.txt
file with a list of URLs, one per line, as such...
http://www.domain.com/
http://www.domain.com/products.html
http://www.domain.com/products.php?product_id=292
http://www.domain.com/products.php?product_id=983&cat=10
...and while the example above is a perfectly
valid way to submit, it doesn't fully utilize
the more advanced options Google Sitemaps offers.
To harness the full power being offered by Sitemaps,
you'll need to use the following sitemap.xml
format (which we will simplify for you in a moment as well as introduce
tools that will auto-generate the file).
<?xml version='1.0' encoding='UTF-8'?>
<urlset xmlns="http://www.google.com/schemas/sitemap/0.84"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84
http://www.google.com/schemas/sitemap/0.84/sitemap.xsd">
<urlset>
<url>
<loc>http://www.domain.com</loc>
<lastmod>2005-06-03T04:20:36Z</lastmod>
<changefreq>always</changefreq>
<priority>1.0</priority>
</url>
<url>
<loc>http://www.domain.com/products.html</loc>
<lastmod>2005-06-02T20:20:36Z</lastmod>
<changefreq>daily</changefreq>
<priority>0.8</priority>
</url>
</urlset>
As you can see, the XML file contains extra
data (known as metadata—data
that describes other data) not found in
our simple sitemap.txt example file. The XML tags defined by
Google that make up a Sitemap file are
very specific and must be used precisely. Some
tags are optional and some are required.
Here's the precise breakdown regarding
the purpose, meaning and requirement (or
not) of each of the XML tags used in the
example above:
- <urlset>
[Required]
Indicates the beginning and end of a set of
URLs to be crawled.
- <url>
[Required]
Specifies the start and finish of an individual
URL (webpage)
entry.
- <loc>
[Required]
The full URL of the web page you wish to submit,
including the domain name and path just as it
would be entered in a web browser's address
bar. You're limited to 2048 characters (which
would be an unbelievably long URL, anyway).
- <lastmod>
[Optional]
<lastmod> [...] </lastmod>
The date and time the document was last modified.
This date and time must be specified using the
ISO 8601 standard.
- <changefreq>
[Optional]
<changefreq> [...] </changefreq>
Here's where you can suggest how often Google
should revisit this URL. Bear in mind
it's not a command, but rather a hint.
The days specified can be set to any one of
the following values: always,
hourly, daily, weekly, monthly, yearly, and never.
- <priority>
[Optional]
<priority> [...] </priority>
The relative priority of this
URL compared to other URLs on your own site.
Here's where you can assign a crawl preference
to your more important pages. The scale is from
0.0 to 1.0,
in increments of .1. For example, 0.3, 0.5,
1.0 would
be priorities listed from lowest to highest.
This has no direct affect on your actual search
engine ranking and it's actual importance is
thus far untested. We can only speculate that
one of the usages would be to help Google decide
which pages to crawl in the event a spider is
unwilling to crawl them all.
Google provides complete technical specifications
of the Sitemaps protocol at: https://www.google.com/webmasters/sitemaps/docs/en/protocol.html
Sitemap Generators
Now you can take a deep breath and relax. The
good news is that you don't actually have to (and
ideally you should not) manage this
Sitemap XML file manually.
When Google announced the beta availability of
their new Sitemap protocol, they also provided
an open source Python script that
can generate a Sitemap for you. And, since the
script's release, a number of third-party developers
have also created a variety of tools that enable
you do the job more easily.
Working within Content Management Systems
Site managers who use a Content Management
System (CMS) to manage their website should check with
the developers of their CMS software for a plug-in
that will generate a site map file automatically.
Here's a list of a few popular CMS plug-ins that
have already been released, most of which have
been developed by third parties. If you prefer,
contact the creator of your particular CMS directly
if you wish to use an 'official' version.
Server-Based Scripts
A
Note on Generators: |
The
generators listed are third-party
software programs, and are not
guaranteed to work with all web
sites.
In our testing,
a given generator that worked
for one of our web sites did not
seem to entirely crawl another
of our sites, even when hosted
on the same server.
If you encounter
a problem with the first generator
you try, please try another one.
If necessary, you may wish to
use Google's official sitemap
generator, or hire someone to
operate it for you. |
|
Also available are scripts that you can run directly
on your web server. These scripts will crawl the
actual file structure of your web server
to develop the site map file.
It's important to understand that this method
will find files that are not necessarily linked
from other pages and that you may not want indexed
in a search engine. Therefore...
...be sure to examine the
generated site map file carefully to ensure
that it contains only URLs you want indexed.
Generally speaking, the following scripts require
an intermediate to advanced level of technical
ability:
Web-based Generators
You can also use a web-based generator to create
a site map file for you. Web-based generators
will only locate pages on your website that are
linked from the initial page you specify.
A web-based generator will crawl your site in
much the same way that a search engine would.
So, if the generator can find it, chances are
the engine already knows about it. The exception
to this would be for new sites not yet indexed.
Windows-based Generators
A Windows-based generator is a software program
you install on your Windows-based desktop computer.
The best software program we've found so far is
the freeware Gsitemap,
by VIGOS Software.
This program will connect to your web site and
generate a sitemap based on the criteria you specify.
You can also import log files or URL lists to
generate your sitemap. Once created, Gsitemap
supports uploading the sitemap file and automatically
notifying Google.
Submitting your Sitemap
Once you've created your site map, upload it
to a publicly accessible location on your web
server. Unlike a robots.txt file, your site map
file doesn't have to be located in the root web
directory of your web server. Google will accept
all URLs under the directory where you post the
Sitemap file.
For example, if you post a Sitemap at www.domain.com/dir/sitemap.xml, they'll assume that
you have permission to submit information about
URLs that begin with www.domain.com/dir/ since, obviously, server access
is required for someone to post a file at that
location.
To submit a site map to Google, you must first
create a Google Account. If you're already
using Gmail or another of Google's services that
require a login (other than AdWords or AdSense), you already have a
Google account.
If you don't yet have an account, you can create
one for free at the Google Sitemaps homepage:
Once you've logged into your account and arrived
at the Sitemaps homepage, click the Add
a Sitemap link and paste the URL into your
site map. Within a few hours of submitting your
site map, Google will download it and let you
know whether or not it encountered any errors.
Be sure to check back with Google within a day
of submitting to verify that everything got processed
without a hitch.
Remember, the service is in testing (beta)
mode so, for now, it's reasonable to expect a
few bumps in the road.
Benefits of Using a Site Map
More efficient crawling
Providing the search engines with a personalized
map to the pages within your site is your best
strategy for getting your site categorized and
indexed exactly how you'd like it to be. With
the metadata supported sitemap you're providing
a ready-to-use map of your pages that’s
prioritized and tagged with hints about update
frequencies and last-modified dates.
You're basically removing all the guesswork out
of crawling your site. Over the long run we believe
this will facilitate more frequent updates of
your important content within Google's index.
No waiting lines
Since you can resubmit your site map at any time,
you don't have to wait until the spiders come
crawling for the engines to pick up your new pages.
An alternative to Yahoo's paid inclusion?!?
In the past we've recommended Yahoo's paid
inclusion as an alternative to getting hard-to-index
pages found by Google. Now Google is effectively
offering a free channel into their index.
They're also making the entire protocol available
for use by any other search engine which means
the Sitemaps protocol is likely to become an industry
standard for webpage submission and update notification.
Not a replacement for 'regular' SEO
Google states that using a Sitemap feed doesn't
automatically mean better listings in the organic
search engine results pages. Your pages will still
be subject to the same ranking algorithms as sites
that don't use a Sitemap feed. Sitemap is intended
to be a complement to, not a replacement of, their
regular spidering and crawling the web to index
pages. Google's hope, however, is that the hints
being offered through the use of server-based
Sitemap XML files will help them do a better job
than the regular crawl while saving bandwidth,
resources and, ultimately, money.
The advantages for the webmaster include getting
more of your hard-to-crawl pages listed in the
index than a site that doesn't use the Sitemap
protocol—and sometimes that's all that's standing
between you and your competition.
A word to the wise, don't spam this format...
Over the past five years, Google has considerably
increased their ability to detect and eliminate
search engine spam. Our opinion is that it would
be foolish to list pages that use objectionable
techniques in a Sitemap XML feed—something akin
to raising one's hand in a police lineup.
Clearly it would be too easy to get caught. Bear
in mind that Google can choose at any time to
use their own nuclear option—banning a
site for life. So, be smart. Forgo the temptation
to push the envelope for short term gains and
play it straight. We're sure you'll be better
off in the long run for having done so.
For instance, bear in mind that the priority tag suggests relative priority.
So, if you happen to set every page to 1.0 (defined
as the highest priority) it'll literally
mean that all of your pages are equal just
as if you had set no priority at all.
If you exaggerate update frequency or
fudge on last update tags, Google can easily
figure that out and flag your domain as one that
provides unreliable metadata. Remember, these
tags aren't commands—they're hints. Google
is under no obligation to follow your hints but
you can bet they are taking notes, making
a list, and checking it twice. There's every reason
for you to play it straight and no clear benefit
to gain by cheating. Ultimately, you'll want your
site(s) listed in their white-hat database, not
their black-hat one.
Also bear in mind that Google cannot guarantee
they'll crawl or index all of your URLs. Their
primary goal is to gain a relational understanding
of the data in the hopes to get more of it into
their crawls and, ultimately, into their indices.
Spamming them at this stage would be like painting
a bulls-eye over the heart of your business–not
smart!
Learn how now, or at least soon...
We strongly suspect the Sitemap XML protocol
will become the submission standard of the not-so-distant
future. Therefore, we recommend that you budget
in the time it takes to negotiate the learning
curve—or at least assign the task to someone within
your company.
While there's no doubt that Google will maintain
their standard crawler method of finding
and indexing pages in the near-term, there's are
so many incentives for them to shift the emphasis
to the Sitemap XML protocol in the long term.
For those of us that make the adjustment now (or at least soon), the search-marketing-scape
will be all-that-much more comprehensible when
the protocol becomes the de facto standard for
getting websites indexed.
In other words, Google has begun training us
to do it their way. And we don't really
see that there's much choice because the advantages
to the engines of the Sitemap XML protocol are
just too numerous and compelling for them to pass
up.
Are we having fun yet?
Ian Cook – CTO & Technical Analyst
SearchEngineNews.com
|
Dynamic Lab: Dynamic Mail Communicator
Manage your E-mail easy with Folders
In Dynamic Mail Communicator you can now store messages
in folders under each Mail Account.
For each Mail Account, your email is delivered to
the Inbox folder. You can create your own folders
to further organize and track received messages.
Create a new folder
» Right Click in the Mail Box
section
» Select "New Folder"
option
Move email into a folder
» Select a message (to select
more messages, hold Ctrl)
» To move the message(s) hold
Shift and with the mouse move the message(s) into
appropriate folder
Note: Registered clients of Dynamic Mail Communicator
v2.0 can get this feature available simply by downloading
an updated version of the program from Dynamic
Software Upgrade Centre.
|
Tips, Tricks: Dynamic Software
Success Story
Featured Client: SaronicNet Promotions
SaronicNet Promotions offers full website
design services including creation, maintenance, search engine
positioning and internet promotional services. Kelsey Edwards,
the owner of the company, is sharing her success story with
us this month …
I must admit that I am somewhat sceptical and jaded having
spent 5 years attempting to get my websites up on the search
engines and have been stung by numerous companies promising
the world and not delivering. I also attempted to manually
add my sites to as many engines as I could find as well.
Both approaches got limited success.
Solution
Then by accident I discovered Dynamic Submission
software and to be honest with you it has radically
changed not only my sites positions on the search engines
but has saved me hours and hours of time and effort, as well
as reducing my internet phone bill, since I purchased it.
I can honestly say that it's the best bit of software I have
ever had.
I can't recommend it highly enough.
It's easy to use, reliable, thorough and I would feel as
though my hand had been removed if I no longer had it on my
PC. Also worth noting is their technical support response
which on the one occasion I had to use it was prompt, helpful
and provided a quick resolution to my installation error (my
fault not theirs).
Having slogged away for 5 years, I appreciate it all the
more and wish I had known of it sooner - but then perhaps
I wouldn't fully appreciate its impact on my work if I hadn't
put in so many hours of struggle before!
Results
I can now do in 18 minutes what it would have taken
98 hours to do (but never did of course) in the past.
I think it is excellent value for money.
And best of all I can confirm that it works! I use the submission
software to get my client's sites onto the internet. My clients
tend to be small operations such as hotels, restaurants, artists,
etc. based on a tiny Greek island called Hydra. I am not so
much interested in the number of hits my client sites get,
than the number of unique visitors that convert into real
bookings and reservations. If it helps you at all to make
a decision, I can confirm that my clients used to be getting
1 booking per week but since using Dynamic Submission,
all of them have gone up to at least 3 enquiries with at least
1 converting to an actual booking PER DAY. I am thrilled!
For a small community of only 2300 residents, this is a huge
increase in business.
I also found that the help text and tutorial really helped
me to improve my websites and I revised a lot of my keywords
and the structure of the text as a result of setting up Dynamic
Submission and following their suggestions.
I manage about 60 client sites plus 3 of my own and so am
very busy with updates. Unfortunately this means that I probably
only utilise about 40% of the capacity of Dynamic Submissions
(there's loads I haven't investigated fully yet), but even
so the results are terrific.
Making the right choice
In my opinion the software is excellent and I have absolutely
no hesitation recommending it to you. - Buy it now, read the
instructions thoroughly, incorporate the suggestions from
the help and tutorial text, submit your site pages regularly
and I can say with confidence that you will be seeing very
positive results within three months.
Good luck.
Warm regards
Kelsey
Kelsey Edwards
SaronicNet Promotions
Hydra, Greece, 180-40
www.SaronicNet.com
Learn
how to turn a non-productive website into a
powerful revenue generating website with Dynamic Submission.
|