WebHogs FAQ. Your comments are welcome. I may be reached at jamesjkeene@gmail.com.

Q: What's new with WebHogs?

A:
Feb. 5, 2003: Filters added to remove some false positives. With this type of program, generally one avoids filters which may remove from reports true positives. Your ideas for more filters are welcome.
Jan. 26, 2003: A RAM leak fixed. This would be beneficial to very long runs of Webhogs.
July 14, 2002: More bugs fixed. Since July 10, more strings have been added to the tag searches (see below).

Also, "?" within html tags is scaled by content length as the "onmouse" references have been. This will reduce the impact of the "?" count on the Pig Score. Thus, Pig Scores calculated before July 10 may be "inflated" by the excessive use of "?" references (which suggest info might be sent to the web server). As time goes on, the weighting of components in the Pig Score might be further polished.

There was some interest in seeing the cookies delivered by web servers in the http header. These are now displayed if Other is checked.

Q: Is a high Pig Score bad?

A: Not necessarily. The concept was to itemize and score http header and web page content in both the <head> and <body> fields for items that may raise questions re security, use of system resources, etc. In many of the html tags and other items analyzed, there are known exploits. Except in a few cases where the content is most likely malicious, WebHogs does not tell you that particular items are definitely "dangerous," only that such items could be abusive. It may be ironic that techniques used by bandits and vandals are used so widely with only an implied "trust us" seal attached.

Q: What does the magnitude of the Pig Score mean?

A: The amount of "piggy" content. This could all be "innocent." It may be evident that the magnitude of the Pig Score lends itself to various interpretations depending on the views of the user and specific servers tested. In other words, a big Pig Score may or may not be a bad thing. So what is it? An indication of the amount of content of the types described.

For security questions, probably the specific indicators are more important than the numerical value of the Pig Score. If just one virus or questionable item is found, that may be all that matters. Thus, one could screen the cases (*.lst) by a simple sort in a spread sheet program to develop a list of sites with Danger (the second component) greater than zero. Many other variations are possible. WebHogs helps you get and organize information, but requires your active input regarding your specific interests.

Q: Will WebHogs protect me from virus or malicious content in my web browsing?

A: Good question. WehHogs is not specifically an anti-virus program designed to detect all malicious items or code. It is an independent stand-alone program, not a plug-in, and unrelated to your browser activity. Of course, one could get the Pig Scores for some of one's favorite sites.

The issue it raises, if you will, is the amount of certain types of content. The security relevance may be as follows. Let us say that 1 in 100 people are untrustworthy. This indicates amount of risk of meeting misfortune (a pick pocket or other thief) when one goes out in public. The more hundreds of random people that cross your path, the more risk you have of misfortune.

Even "trusted" web pages, may become infected with a virus or corrupted, by an outsider or even by a disgruntled employee. Maj. Hog does not want to create fear in its users, just more awareness of the possibilities in this enormous phenomenon of web page activity.

Beyond the security question, there is the larger issue of use of certain techniques in web pages and the impact this may have on your own personal computer. The Pig Score gives a rough indication of how much such content. However, higher Pig Scores are correlated with greater use of your system resources.

It is your computer and you are the boss. So your answer counts. Is your computer a PC (public computer) or a PC (personal computer)? If the former, then anyone who writes a web page that you visit may be able to run any program of their choosing on your P(ublic) C(omputer). If you have the "silly" notion that you should decide what programs to run, then disable all scripting of any kind in your browser.

Q: Why do so many web pages use techniques which earn Pig Points?

A: Probably some combination of ignorance on the part of the coders (they may just be pointing and clicking and not know themselves what is there), an effort to impress you or make your "experience" thrilling and economics (it's cheaper to use your resources).

Of course, WebHogs users can also send Maj. Hog's transcript and Pig Score to the webmaster! Call it customer or user feedback :-). Maybe they will (1) get tired of looking piggy, (2) want to look different than bandits or (3) display a "Certified Script-free by Maj. Hog" at the top of their web pages. Who knows?

Q: Can I get a specific page at a web site?

A: Yes. If your copy is dated July 5 or later, you may enter a specific page in the Input Box. Examples:
/index.html
www.somewhere.com/something.html
/something.html
If you include a host IP or name (second example above), then one host is tested. If you put only "/index.html" this qualifies the GET request used in the session. If no host name or IP is specified in the Input Box, you are either using an input file (do not use "/" notation) or are scanning a range of IPs or randomly in a range. In those later cases, if you ask specifically for a page using the "/" notation, you may get many error responses from servers saying they do not have that page. On the other hand, if you are looking specifically for that kind of page (e.g., /default.asp), that is what your GET requests will be asking for.

Q: Does the "Stealth" method mean that WebHogs is "up to no good?"

A: Quite to the contrary. First, some background.

"Stealth" refers to methods of port scanning which may be less likely to be detected by the "target" computer systems. WebHogs, which uses the RandScan engine was designed to use stealth-type methods because they were created to be research tools which should not be intrusive or disruptive.

What kind of research? Most people are familiar with survey research which is used in serious academic work and in opinion polls often reported in the media. The key concept here is "random sampling" so that an estimation of the population parameter can be reliably stated (please see a statistics book).

Now. What is the "good?" The random sampling was implemented for research purposes. It has the added result that use of the programs will almost certainly not be intrusive since the odds of any one host being "interviewed" twice are extremely low.

The wonderful internet infrastructure was created in great part through research and development. Now that it is such an huge phenomenon, both technologically and socially, it will become the object of even more study. That is a purpose for releasing WebHogs for research and education.

Q: Can the use of WebHogs (or RandScan) get me in trouble?

A: Most probably not. The author has used the program extensively with no problem or complaint whatsoever. Basically, the program retrieves web pages as a browser does. Check your own sources including your internet service provider if in doubt. WebHogs requests a web page using the same method as a browser and is not noteworthy in that respect. Whether it is WebHogs or your regular browser, most web servers will log your IP address for its records. Generally, if you use any program at all in a questionable manner, you most probably can be easily identified.

The good news is that ordinary usage of WebHogs will probably create less traffic for both your local service provider and the remote web servers than if you were doing ordinary browsing or downloading lengthy files. For example, a browser user may "sit" on a page for news or stock quotes or something and continuously refresh that page. That kind of browser usage creates more traffic for the server than WebHogs would, since the program only needs one snapshot.

With that said, I will say what might be considered abusive use of any scanner, so you will know to avoid it and/or have little excuse if you yourself choose to be abusive. Just make a very narrow "From" – "To" address range and hammer the servers there with requests for pages at a high rate. Indeed, with virtual IP address technology, you may think your requests are distributed across many machines in your selected range, and, um, it might be one computer handling all of this. This is bad, because you yourself are being a pig. You are asking to use the resources of someone else to an unreasonable degree. I would not be surprised that you might be reported for abuse or worse (Denial of Service).

How to avoid this? Use the wide address range that is shown when the program starts or at least a fairly wide range. The author takes no responsibility for misuse of this software. It is beyond the scope of this document to explain in a comprehensive manner the social issues raised by this sort of program.

Q: What does WebHogs look at to create the Pig Score?

A: In the http header, the server http version and response code, the server software description, the usage of WWW- and ETAG entries, the stated Content Length and the number of Set-Cookie lines are logged. Some servers will attempt to set up more than one half dozen cookies! Also, the Content-Type and Content-Disposition are considered. The presence of a ".exe" reference is noted.

In the Head and Body fields of the html document itself, several tags get special attention: SCRIPT, A HREF, OBJECT, APPLET and XSL. Within these tags, the following strings are displayed and logged:

FILE://, ABOUT://, TELNET://, GOPHER://, RES://, APPLICATION/, .EXE, .BAT, .LNK, .PIF, .SCF, .SCR, .HTA, .URL, .VB, .MSI, .MHT, .REG, .SCT, .WSC, .WSF, .WSH, C:\, D:\, .EML, .NWS, ..\, SCRIPTS, \WINNT, \WINDOWS

Within SCRIPT tags, the following are also noted: (1) the kind of script -- JAVA or VBS (local scripting host), (2) downloaded .JS programs, (3) FUNCTION definitions, (4) DOCUMENT creation and usage, and (5) references to "GETOBJECT" or "CREATEOBJECT".

In the overall document, WebHogs also looks for references to <EMBED, ACTIVEXOBJECT, .SHOWHELP, ONMOUSE and LOGIN. Plus the EXTRA items in your entries in the Search Box. All searches are not case-sensitive.

Q: How can I use the search box for security or investigation?

A: Briefly, you enter one or more words (or fragments of words or sequences of characters) separated by single space characters. Like the Bin Laden EXTRA feature, these searches have no effect on the Pig Score and the references could be on a page with zero Pig Score. This list of words can include many different topics and the search box scrolls to allow you to enter or paste a lengthy list, if you want.

Example: sex would flag all pages with words like sex, heterosexual, sexual, sexuality, etc. sex child young might find pages run by sex offenders. bomb kill flight or whatever might find pages where those things are mentioned. Html tags could be found and flagged. XXX where XXX is the latest "signature" of a new virus infecting web pages might help you find infected web pages or review if your own pages are infected, and this before any remedy has been posted. That is, Maj. Hog is part of an incident response team, and system administrators should have their input file ready at all times to feed their servers into WebHogs at a moment's notice to check for content on their web pages.

It is a potentially interesting and useful technique since, in the Random mode, Webhogs will find an enormous number of web servers that do not have DNS entries and are likely somehow private (perhaps for use within an organization or even just a personal web page). Investigators can use WebHogs to develop lists of such servers categorized by their custom key words entered into the WebHogs Search Box. Obviously, Maj. Hog is a good friend of law enforcement, anti-terror task forces and so forth!

Q: Is it true that Maj. Hog works completely in the nude?

A: That is not true. There was one time when Maj. Hog obtained life-saving intelligence by stripping naked and going unarmed to a terrorist camp and simply pretending to sleep next to the tent of the terrorist leader. Upon return with valuable intelligence, he swore to never work naked again. In the office, Maj. Hog prefers his military uniform showing the decorations for his many exploits.

Thanks for reading! Hope this helps. Warm regards, Doc

Copyright 2002 Global Services

Back to Net Census