URL-Lists for Google sitemaps
Google Sitemaps needs a list of URLs to optimize crawling. Usually, this is no problem, since Google supplies a script you can run on your server to build that list.
But that fails, if the content of your site is not stored in HTML-files, but in TXT-files like DokuWiki does. So here is what I did to build that list of URLs:
Find them
We do all of this in the directory where DokuWiki stores the pages:
cd /home/hdocs/beta.linuxbasics.org/data
find ./ -iname "*.txt"
give us
./wiki/syntax.txt
./wiki/dokuwiki.txt
./wiki/playground.txt
./start.txt
./tutorials/pre/start.txt
...
which is the URL except that:
SED
The editor ‘sed’ can help us with those replacements. It is the source of Perl’s s///-command, so if you know Perl, this will be familiar:
sed -e 's#^./#http://LinuxBasics.org/#g ; s/.txt$//g'
- “sed -e” will execute the command given to the standard-input and print the result to the standard-output.
- “s#^.#http://LinuxBasics.org#g” will replace the dot at the beginning of the line (’^’) with the base-URL.
This uses ‘#’ as a delimiter instead of ‘/’. Why? Because it looks much better then the version with slashes: “s/^./http:\/\/LinuxBasics.org/g”
- “s/.txt$/.html/g” will replace the “.txt” at the end of the line (’$’) with “" which removes ”.txt“.
- ”;" seperates the two commands.
Putting it together
find ./ -iname "*.txt" | sed -e 's#^./#http://LinuxBasics.org/#g ; s/.txt$//g'
gives us what we want:
http://LinuxBasics.org/wiki/syntax
http://LinuxBasics.org/wiki/dokuwiki
http://LinuxBasics.org/wiki/playground
http://LinuxBasics.org/start
http://LinuxBasics.org/tutorials/pre/start
http://LinuxBasics.org/tutorials/pre/md5sum
Copyright (c) by the authors.
Prior to editing, authors agreed to license their contributions by the terms of the GPL.
See our licensing page for details.
Linux® is a registered trademark of Linus Torvalds.
tutorials/advanced/realworld/url-lists_for_google.txt · Last modified: 2008/07/20 19:08
Welcome to LinuxBasics.org - The online community that helps people to get Linux installed and running.
During this tour, we will guide you through our website, which has many facets which wait to be explored
The biggest project we are running is our Linux course, based on the LBook.
The book is stored in wiki-format, which enables us to update and correct it as we go.
Discussion for the course is on our Forum
Our Forum is used for discussion of Linux and for questions and answers.
Search the mailing-list that was used prior to the Forum.
The questions and answers from the list are stored in the list's archives in order to help others with the same problems.
Every weekend, we meet to chat in IRC. These meetings are NOT mandatory, but are a nice chance to get to know each other better.
IRC is also a great tool to solve many problems, since it is very quick and easy to ask for more details if you need them.
The tutorials are one of the oldest sections on the LBo-website.
Here you find explanations on how to do specific tasks in Linux. Many of the tutorials were created after a certain problem
has been discussed (and usually solved :) on the mailing-list.
The tutorials are categorized in
In the links section, you find outbound links to other valuable resources.
One of our later additions to the site. We maintain a mirror of the Linux Documentation Project. This is our contribution to the "home of the HOWTOs"
Another later addition is the LBlog which focuses on how to do stuff on the Linux Desktop. It begins with the basics on installing Ubuntu.
Using the integrated site-search, you can search the tutorials, the LBook and all other wiki-pages
Simply type the search term into the box in the upper-right corner of our webpages
As a community, we depend on your feedback and collaboration. So, if you have something to share with others, please contact us. If you have a suggestion for a topic you would like to see covered here, please add it on the Wishlist.
There are many ways to contribute: You can answer questions on the Forum, you can write a complete tutorial or just a step-by-step documentation on how you completed a specific task using linux. Ask questions if the information on this site is not clear, tell us if we got something wrong, spell-check our writings, whatever.
We are looking forward to meeting you at LinuxBasics.org
Anita, Jisao, Sam and Stefan