Lab 12 - Our final lab.
Part I. MAKING PAGES THAT SEARCH
ENGINES FIND.
Background: How do
search engines work? They work by
sending out programs that are actually based on "hacker" technology
to crawl or burrow through the millions upon millions of web pages. These programs go by the following
names: Crawlers,
"Worms", although based on hacker technologies, are
benign - they do not attempt to hurt your machine. The worms work by burrowing through the
Internet, moving from page to page, and as they go accumulating information
about that page. So not all the visitors
to your website are human! The worms can
read certain parts of the page that help them when trying to describe the pages
when they "report" back to the search engine.
What do worms read on your page? One thing they can read is the so-called
META tags, at least in the late ‘90s and early ‘00s were really important ideas because you could increase the likelihood that a search engine would rank your site highly for returns on appropriate searches, so this led to increased site traffic (very important!).
So, how do you use these tags to make it more likely that your site is found by a search engine?
Read here for more:
http://searchenginewatch.com/webmasters/article.php/2167931
also read this for a little more information about why Meta-tags aren’t quite as essential as you might have thought:
http://spider-food.net/meta-tags.html
Please try out adding
http://www.umbc.edu/oit/sans/helpdesk/Macromedia/Dreamweaver/HOWTO_Meta_Tags.html
Please actually play around with
adding
PART II. How do you know who visits your
website?
Websites log all the visitors that come to your site. What exactly do they log? The answer depends on which kind of webserver you are running. Apache, the most commonly used webserver, has a set of logs, including the Access Log. Here is more information on Apache logs:
http://httpd.apache.org/docs/1.3/logs.html#accesslog
Outputs from logs usually look like this:
127.0.0.1 - frank [10/Oct/2000:13:55:36
-0700] "GET /apache_pb.gif HTTP/1.0" 200
2326
What does this mean? The first line gives the name of the host that requested the file. Remember a hostname is a computer name, not a name of the person. The hostname often has information in it about where the person came from --- since hostnames often reflect a domain (eg. “Colorado.EDU”) which helps localize where the request came from. The “dash” after the hostname means that the host id was not logged – don’t worry about it – that information is almost never captured. The next item is the name of the user IF AND ONLY IF there was a login mechanism for the user to access the page (username/password system). In almost all cases, you won’t get a username, and the username may not reflect the identity of the real user (obviously). The next item is the date when the processing of the request was completed by the server. The next line is what the client requested. The last two items (200, 2326 are status codes and size in bytes of the returned item).
Weekly logs can often be megabytes in size representing millions of individual hits. How to summarize all this information in logs? With log summarizing programs of course!
Programs that summarize your weblogs for you:
Log summaries are very important in giving you some feedback about what is popular on your site and some very rough guides to demographics of site users. More sophisticated information requires some form of further tracking,
Here is more information on web-analytics software for determining what your users are doing: http://en.wikipedia.org/wiki/Web_analytics
Part III. Cookies.
The web suffers from one serious problem. It has no short term or long term memory. When you surf the web, the page doesn’t remember that you have been there before! Each time you visit, the page thinks you are there for the first time. But it is important for the page to remember who you are, since if it can, the page can write itself more specifically to what it knows about you.
There are a couple ways to make a web page remember you. One is to get you to fill out a form and then store information about you in a database, like what you buy and what you looked at. This is how sites like Amazon work in part.
Another way, that is slightly more insidious is to use Cookies. What are cookies? Cookies are very small pieces of information that a web page can save on your computer. You read correctly --- web sites can put information on your computer. These very tiny pieces of information include things like what you clicked on the page and what information you put into a form. The next time you visit, the page that wrote the cookies can look at the cookies on your machine and tailor information based on what it reads.
Read more about cookies: http://www.cookiecentral.com/faq/#1.1
To learn how to use cookies as a web developer: http://www.peachpit.com/articles/article.aspx?p=31661&seqNum=6
AND FINALLY --- We didn’t have time to delve into installing web servers. Let me know if you have questions about that, and I can help steer you to some resources…
PROJECT TIME!