The first principles of web search: a primer on search engine optimization
Nicholas Schiller, Washington State University
Schiller taught SEO as both a means to understand how web search can be “gamed,” and also, incidentally, to help students understand underlying principles of web design & searching.
Consider the DIALOG blue sheet: old-school way of making sure your expensive data search worked, before you did it and were charged on a per-minute basis. Sheets showed what fields you could search, etc. Honed librarian search skills.
Principles of SEO are a blue sheet for the Web. Not as controlled b/c the overall environment isn’t as controlled. But has listing of fields to search by, description of language to use, understanding of how the tool we’re interacting with works.
First principle: Google isn’t magic. Google & web search are often denigrated as quick, easy, type in search to get subpar results. Want users to be empowered and engaged users of the tool. Want us to remember this too. Best way to show this is to reveal the architecture underneath.
Second principle: bits & atoms behave differently. Digital objects obey different rules than physical objects. Libraries are designed around the rules of physical objects.
Third principle: Google tries hard to impersonate a database. In fact a very heterogeneous collection of items online; opposite of a database. Google tries to emulate the things that let us search a database with precision and clarity.
Fourth principle: Google leverages the capabilities of digital objects. Not forced to reduce relationships between digital items, can do many things with digital that don’t work with print.
Short video by Matt Cutts, about how Google search works, which helps dispel some of the “Google is magic” mindset:
The Anatomy of a Large-Scale Hypertextual Web Search Engine, by Sergey Brin & Larry Page, explains the steps that Google uses to search.
Users often want “Google-type” searches, so we create keyword searching options–but in fact keyword-matching search isn’t very efficient. (And libraries have been able to do it for a very long time.) Google isn’t just based on matching words, but on ranking results. What users want isn’t matching, but ranking. Google does very well at results ranking.
1. They create externalized metadata, because they can’t get good metadata off most Websites.
2. They also do page ranking, inspired by academic citation analysis–deciding importance based on how many others are using a work. Incoming links become significant data.
3. Tags and text! Anchor text: the description text for a link (the word itself, usually blue and underlined on the web page that’s linked from.) Anchor text is basically volunteer cataloging that describes the nature of the thing being linked to. (This is why Google bombing works.) Information in a URL is more significant than information used elsewhere in a page. Title text is also vital information, that’s weighted highly. Meta name tag is also weighted highly: the text in the <meta name=”description”> field is what is displayed in the Google search results. Basically serves as an abstract, which means it’s more important than data in the body of the text. Heading tags <h1> <2> etc. are also implied important.
Understanding these things helps explain why search results are arrayed/ranked as they are. Google isn’t just retrieving everything on a topic; it’s giving specific hits based on predictable rules.
Using search operators like site: and allinurl: help reveal the underpinnings too.
There’s more and this is great, but my laptop battery is dyyyyiiiiiinnnnnggggg…