Q. How to generate a xml sitemap (google sitemap) ?

Feed 19 posts, 7 voices

Dec 9, 2007 19:36
Avatar
382 posts

Fallow those steps:

Create a layout Xml with content-type: application/xml and body: <?php echo $this->content(); ?>

Create a snippet xml_sitemap filter: none body:

<?php
function snippet_xml_sitemap($parent)
{
    $out = '';
    $childs = $parent->children();
    if (count($childs) > 0)
    {
        foreach ($childs as $child)
        {
            $out .= "  <url>\n";
            $out .= "   <loc>".$child->url()."</loc>\n";
            $out .= "   <lastmod>".$child->date('%Y-%m-%d', 'updated')."</lastmod>\n";
            $out .= "   <changefreq>".($child->hasContent('changefreq') ? $child->content('changefreq'): 'weekly')."</changefreq>\n";
            $out .= "  </url>\n";
            $out .= snippet_xml_sitemap($child);
        }
    }
    return $out;
}
?>
<?php echo '<?'; ?>xml version="1.0" encoding="UTF-8" <?php echo '?>'; ?> 
 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<?php echo snippet_xml_sitemap($this->find('/')); ?>
 </urlset>

Create a you sitempa page, set the status to hidden then in body, write: <?php $this->includeSnippet('xml_sitemap'); ?> set filter to none then set the layout to Xml

et voilĂ  !!

 
Dec 9, 2007 21:08
Avatar
382 posts

I foget to tell that if you want to manually set the change frequancy, you need to add a changefreq content (part) in your page

 
Dec 9, 2007 21:51
Avatar
56 posts

Since it's not immediately obvious what values are useful as changefreq:

bq. How frequently the page is likely to change. This value provides general information to search engines and may not correlate exactly to how often they crawl the page. Valid values are: bq. * always bq. * hourly bq. * daily bq. * weekly bq. * monthly bq. * yearly bq. * never bq. The value "always" should be used to describe documents that change each time they are accessed. The value "never" should be used to describe archived URLs. bq. Please note that the value of this tag is considered a hint and not a command. Even though search engine crawlers may consider this information when making decisions, they may crawl pages marked "hourly" less frequently than that, and they may crawl pages marked "yearly" more frequently than that. Crawlers may periodically crawl pages marked "never" so that they can handle unexpected changes to those pages.

And if you use a frequency of more than "daily", you will have to change the <lastmod> line, as it only has a granularity of approximately one day. (It gives the date of the last update, which most search engines will assume is in UTC. So, beware of time zones if you're far from UTC…)

This really hideous line did it for me:

        $out .= "   <lastmod>".substr_replace($child->date('%Y-%m-%dT%T%z', 'updated'), ":", -2, 0)."</lastmod>\n";
 
Dec 9, 2007 21:52
Avatar
56 posts

sighs

And Markdown struck again! :(

 
Dec 9, 2007 22:52
Avatar
382 posts

bq. is not markdown tags it is textile tag :P you need to use

> (>) this for markdown tags

 
Dec 9, 2007 22:55
Avatar
56 posts

Heh. I'll try to keep those straight. ;) Dear readers, just pretend that there's a new line at every .bq in my post. ;)

 
Dec 10, 2007 06:59
Avatar
818 posts

[I just want to see if this works! ... and read the formatted post! :) ]

> How frequently the page is likely to change. This value provides > general information to search engines and may not correlate exactly > to how often they crawl the page. Valid values are: > * always > * hourly > * daily > * weekly > * monthly > * yearly > * never > The value "always" should be used to describe documents that change > each time they are accessed. The value "never" should be used to describe > archived URLs. > > Please note that the value of this tag is considered a hint and not a > command. Even though search engine crawlers may consider this > information when making decisions, they may crawl pages marked > "hourly" less frequently than that, and they may crawl pages marked > "yearly" more frequently than that. Crawlers may periodically crawl > pages marked "never" so that they can handle unexpected changes > to those pages.

 
Dec 10, 2007 07:00
Avatar
818 posts

OK. Markdown definitely working funny -- you can see all the > marks, and Markdown just ignored them!

???

 
Dec 10, 2007 07:56
Avatar
382 posts

ok ok I know it is not the best implementation I'v done ;) ... this is du to the replace of > to & gt; I think I will skip the markdown here and use some bbcode and add a edit ;)

 
May 22, 2008 15:26
Avatar
8 posts

i am unable to use this sitemap in Google Webmaster tools as Google won't accept a submission unless is ends .xml (i.e. sitemap.xml) however mine is in one of two formats http://www.mydomain.com/?sitemap or http://www.mydomain.com/?sitemap.html.

is there anyway to change to the extension to .xml

 
May 22, 2008 16:34
Avatar
142 posts

you can just add the extension to the slug and it will be xml :)

 
Jul 4, 2008 16:12
Avatar
77 posts

Thanks for the explanation on how to create the sitemap for Google/Yahoo/MSN, and for the tip on how to name the file with the xml extension. I was a little confused with the beginning sentence on how to create the xml file, but then understood that I needed to create a new layout called "xml".

 
Jul 4, 2008 16:30
Avatar
276 posts

I think you have to use mod_rewrite..,

 
Jul 4, 2008 19:00
Avatar
77 posts

Not sure I understand, but perhaps I'm just missing something.

What changes are you recommending for mod_rewrite? I added the "xml" extension to the slug, and was able to display the .xml file without any issues.

 
Jul 5, 2008 06:00
Avatar
276 posts

That's not the problem. You have the url generated http://yoursite.com/?sitemap.extension . I think the ? is not allowed to be part of the file name, as the Sitemaps name is ?sitemap.extension, interpreted by Google. Now do you understand? Try using mod_rewrite, so your URL to the sitemap will be http://yoursite.com/sitemap.extension

 
Jul 5, 2008 06:10
Avatar
818 posts

(Hey Bd! I think you're confusing Jamesh in post #10 from last May, with redcrew in post #12 from yesterday!)

 
Jul 5, 2008 08:12
Avatar
77 posts

Yes, I believe there was some confusion with Jamesh's post #10 in this topic. My sitemap.xml file doesn't have the ? in the file name. I modified mod_rewrite earlier to remove the ? in file names. Thanks David for the clarification.

 
Jul 20, 2008 23:16
Avatar
77 posts

Has anyone encountered issues with submitting their sitemap.xml file to Google Webmaster Tools? I've tried three times to submit, and Google responds with the following error message:

General HTTP error: 404 not found We encountered an error while trying to access your Sitemap. Please ensure your Sitemap follows our guidelines and can be accessed at the location you provided and then resubmit.

I can view the sitemap.xml file without any issues at: http://www.probationandparoleconsulting.com/sitemap.xml

 
Jul 21, 2008 03:30
Avatar
818 posts

Hi redcrew -- that's an odd one! I set up the Muffins xml-sitemap almost letter-for-letter as Philippe described in post #1 in this thread. Google Webmaster Tools found it (and is using it!) without any difficulties.

The only difference I can see is that I have not used a .xml termination, and you have. I don't know -- perhaps worth trying without the extension?? Not sure what else to suggest!

 
Jul 21, 2008 06:45
Avatar
77 posts

Hi David,

Thanks for the feedback. Not sure why, but submitting the sitemap file this morning worked fine.