Before You Do Research on the Web...
Remember that any yoyo can make a web page. So who are the dumbest, most ignorant people you know, or what are the stupidest organizations you can think of? Would you trust them, just because they put up some web pages that are about something of interest to you?
The really good stuff is rare, so it's almost impossible to find when you just use search engines that can't tell the difference between Harvard and Hoboken. Why can't they? Hey, you know--they're just dumb machines, so they just look everywhere that's open: in your old high school, in Krusty the Clown pages, in the organic gardening pages, and so on.
They don't look in places that require passwords or proprietary access, and that includes the databases you can reach through the library's page, like EBSCOhost. EBSCOhost invested a lot of money in making those databases, so they're not going to let Google, Metacrawler, AOL Search, or Yahoo rummage around there so you can get in for free.
How do you know the stuff you find out there is any good, when you don't know who put it there?
First of all, you can always find out by exploring the web site and seeing just who did put it there and what that person's background is, or what that organization is all about in general. Start by just thinking about the URL you're looking at, and start with the end of the first part:
- .gov means it's a US government site (e.g., http://www.nasa.gov for NASA)
- .edu means it's a college or university; so maybe a professor put up the page you're looking at, or maybe a freshman who knows much less, so you should look carefully at who put it up (e.g., http://www.wvstateu.edu)
- .com or .net means it's a commercial site; someone paid to have the page posted, and you have to look even more carefully at who put it up (e.g., http://www.amazon.com)
- something like .uk or .de, it's a site in the United Kingdom or in Germany, or some other country. For a very large map with a list, click here. Then, look at what's before the .uk, .de, .fr, .cn, .lv, .ru, .az, .mil or whatever:
- something like k12.wv.us means public schools (grades K-12) in West Virginia, which is in the United States (e.g., http://kcs.kana.k12.wv.us/ for Kanawha County Schools)
- something like .co.uk means it's a business in the United Kingdom (e.g., http://amazon.co.uk)
- some countries may not use the .co part, like France (e.g., http://www.amazon.fr), so you'll just have to use your native wit
- can you figure out what http://www.univ-nantes.fr means? How about http://www.uni-heidelberg.de?
Then, hit the library and see how well the information holds up against known experts' conclusions.
If you're not sure at all, then check out the web evaluation site at Berkeley, which has a really good walk-through.
When you're done, think about all the info you may already have gotten off the net, and consider just how good it may really be. Most of what is out there is fun, but it's junk--not high-quality material good for research papers. Don't trust us? OK, just check out someone else's proof: the collection of Useless Web Pages.
Guides to Citations
Some of the sites below may well be changed soon, but could still be of value now, since the MLA has finally put up some of its style guidelines, including how to cite web pages.
An oft-cited guide to what MLA style is the OWL at Purdue.
Advice can also be found at The Write Source, Berkeley, even the University of Illinois at Urbana-Champaign.
One person's advice for a sample article looks like the following:
Besserman, Lawrence. "Chaucer and Dickens Use Luke 23.34." Chaucer Review 41 (2006): 99-104. AN 21276079. Academic Search Premier. EBSCOhost. Drain-Jordan Library, West Virginia State University. 20 February 2007 <http://search.ebscohost.com>. PDF.
What does that mean, and why is it MLA but not quite? Most of that is canonical MLA style for an article in a journal: author, title of article, journal title, volume, date, and page run. Since this was not seen in a paper copy but online through a database, most of the rest is standard MLA style for that: the name of the database, the name of the hosting service, the library that provided access (since it is not a free service), the date this was seen (because things electronic have a habit of changing, sometimes quite suddenly), and the URL, in this case the last short, static URL before the URLs become dynamic and incredibly long.
What is non-canonical is the "AN 21276079" in the middle. That number is the "Accession Number" and is unique to that article, so it's very useful. If you made a mistake in the page references or something, instead of re-running the search for "Charles Dickens" and scrolling through the 1,218 hits (on Feb. 20th, 2007), you can put "21276079" in the "Find" box, then pull down the box to its right from "Select a Field (optional)" to "AN Accession Number," hit the Search button, and get 1 (one) hit: this very article. The MLA evidently hasn't figured this very useful feature out, but since it's there it would be very helpful to include it.
Finally, what the MLA doesn't quite cover is the "PDF" at the end. The last item in any MLA citation is always "Supplementary information or annotation." With EBSCOhost and other online databases like it, sometimes you find just the citation (author, title, publication information), sometimes that plus an abstract (a summary of the work), sometimes the full text in HTML format, and sometimes a link to a PDF.
The full text in HTML means someone scanned the original paper article and ran the scan through OCR software. Sometimes the OCR software misinterprets things, because it's just guessing; for example, it could misread "go" as "90." But a PDF is usually the scanned article itself, just a picture of the pages, and so it's much more reliable.
So depending on exactly what you see and have available, it's wise to specify: Abstract (meaning just the citation and an abstract), HTML (meaning you read the OCRed version, and which also means you can't know what page anything was actually on), or PDF (you read the scan and know exactly what page you got various bits of information from).
