
Filtering Online Information
by Guest Blogger, 3/12/2002
Information and communications technology have helped nonprofits to better develop, gather, disseminate, and share information with their constituencies, other organizations, the public, and lawmakers. These tools have helped organizations and staff to monitor and respond to pressing public policy issues, access and analyze data, and process and distribute increasingly voluminous amounts of e-mail messages, web pages, and other forms of content in a timely manner. The timeliness of information, however, is only one piece of its value to organizations.
Other considerations include the currency, relevancy, accuracy, accessibility, and durability (or shelf life) of information. These factors exist alongside other factors including an organization's capacity to process and coordinate the information that it receives and generates, the types of information with which it functions, how it handles the task of data management, and to what end or uses it processes the information.
Filtering is one method employed by nonprofits to better manage the information flowing into and out of their organizations. Filters are also one of the most controversial elements of Internet technology. At best, the filtering tools help users reduce the amount of irrelevant content among the many millions of Web pages, discussion list postings, and e-mail messages available. At worst, filters can potentially block too much content, thereby limiting access to potentially useful, interesting, and informative material. In general, there are at least three types of filtering tools commonly employed today.
First, there are e-mail programs or add-on software that allow users (or system administrators) to block unwanted or offensive messages based on defined sender addresses and/or selected words and phrases in the subject or body of a message. These types of tools sometimes allow messages to be automatically categorized depending upon certain characteristics, such as the priority attached to a message. Hotmail, Yahoo, and a number of web-based free e-mail services have this capability.
Next, there are the filtering software tools that work alongside web browsers, e-mail clients, and other online tools to block out unwanted points of interests such as chat rooms, adult online services, and bulletin boards. Usually these programs come with a pre-defined set of URLs and Internet services that are considered questionable for younger audiences, and won't allow users access without a password, or will completely block access.
Last, the major commercial search engines and web portals, those online guideposts responsible for directing a large portion of Web users to website content, also use unique filtering methods to help users narrow down the range of content available. In general, search engines use these filtering rules or instructions to match users' queries against words or word combinations stored in proprietary databases.
Filters have raised many legal headaches in the past four years, since they became an important tool in blocking out content offensive to younger Internet users in schools and libraries. Where things started getting especially problematic was the point at which those efforts unfairly out constraints on both free expression and the access adults had to content online. The argument is that the same amount of control and protection parents should exercise over protecting children should extend to places where children could be potentially exposed to harm and exploitation. The difficulty arises when those other places include (a) the potential for Internet access by adults, and (b) how sites are determined to be potentially harmful or detrimental to younger audiences. If you are a parent, it is easy to argue on behalf of voluntarily weeding out content for children and the family. If you manage a nonprofit or corporate organization, you can also argue the necessity of defining how to make certain that appropriate, even crucial information, is accessible by your employees or affiliates via e-mail or the Web.
But just actually try to create a filter that blocks what you want, without blocking out harmless references. One runs the risk of limiting the range of information accessible via the Internet, but it ultimately because difficult to monitor each bit of information added daily without broadly ignoring information series. Active viewing, one might argue, lies at the heart of the Web, and as such, you don't know what's out there to offend or inform until you (or someone you trust) can find it.
An interesting argument made in favor of filtering is that it is akin to the judgment exercised by libraries to not shelve certain materials. The counterargument, however, is that librarians are aware of those things that they choose to not provide access to. But applying this on a broader scale can raise serious issues. This year alone, there has been legislative activity to mandate that filtering technology be available through Internet service providers for voluntarily use by subscribers. There have also been efforts to mandate their use on library and school computers. How effective are these tools, though, in screening out the "bad" from the "useful?" Who can say that the useful does not, from time to time, does not make reference to "the bad stuff?"
Search Engine Filters
On a purely unscientific level, we at NPTalk decided to sample some of the online filtering tools employed by a few of the largest, more popular search engines. Each of the filtering features is turned off and on by use of cookies that log individual user preferences. In some instances, filters can be turned on with respect to certain types of web pages, specific keywords, and can be also be set as password-protected preferences. As our test search query, we used the offensive language cited in the 1978 Supreme Court decision, Federal Cmmunications Commission v. Pacifica [438 U.S. 726 (1978)]. This was the case that centered on the broadcast of George Carlin's (in)famous "Seven Words You Can Never Say on Television" monologue. Needless to say that decency forbids us from repeating them here. We used the actual words themselves to see, both with the filters on and off, if the words themselves would point to content relevant to the Supreme Court decision (as opposed to pornography-related sites) in some way, via the top ten results for each engine.
First we checked out Google. Google uses a tool called SafeSearch that users can elect to turn on or off before a search is conducted. With SafeSearch on, Google delivered some 983 results with the seven dirty words, none of the top ten results relevant to the actual Supreme Court decision. Without the filter, we received some 66 results, with about three relevant to the decision..
Next, we used Altavista. AltaVista has a feature called AV Family Filter that not only allows users to turn on a filter before a search, but to password protect the filter so that anyone using the same machine cannot turn off the filter at a later time unless they know the password. With the filter on, we received 16 results, none of the top ten featuring a relevant result. With the filter off, we received 1,056,321 results, with 6 relevant results in the top ten.
Last, we tried Fast Search & Transfer (FAST). With the filter on, we received 347 results, not one of the top ten results relevant to the decision. With the filter off, we received 12,594,545 results with one relevant result among the top ten. We should note that this is the only one of the three that had a statement to the effect that the filter may not be 100% accurate, and that inoffensive content may very well be weeded out.
To be fair, each of these online tools are designed with a different set of filtering rules and are only as good as the sites and content that human catalogers and reviewers can evaluate. The makers of filtering software do tout the fact that human reviewers literally pour through all sites triggered by their search tools to ensure that groups are not weeded out unfairly, but there is always room for error, given the sheer volume of e-mail and Web pages produced, as well as the number of newsgroups and chat rooms active at any time. For the most part, however, users don't know what specific words or sites are blocked using the filters. That's because this information is treated as privileged corporate information, vital to the successful operation and distinctiveness of the individual filtering tools.
As a side note, consider the case of This.com. CNET's Courtney Macavinta detailed in July 1999 the story of this "child-friendly" Internet access service that inadvertedly provided easy access to pornographic content online. Unlike online services and portals that provide an optional filtering service, This.com was positioning itself as an Internet Service Provider with built-in filtering and reviewing of all Web sites for roughly US$20 a month. It turns out that the engine behind This.com came courtesy of the Goto search engine, which let users search for web pages, and returns the results inside a frame embedded within the overall This.com site. While it can be convenient for a number of users, it seems that Goto.com also let users search for pornographic content, which could then be framed under a window in the the This.com site. This small oversight was compounded, according to Macavinta, by the fact that this service had board members such as former Reagan Administration education secretary Wiliam Bennett, former Christian Coalition head Ralph Reed, and Rabbi Abraham Cooper of the Simon Wiesenthal Center. To be fair, visitors were presented with a pop-up window warning them that they will not receive the benefits of filtering unless This.com is their ISP service. What Macavinta points out, however, is that GoTo operates by providing priority placement in its search engine results to companies willing to pay a premium. Add to this the fact that online pornography is the biggest source of query results and online advertising for search engines in general, and you might have some problems ahead.
Off the Shelf Solutions
Some off-the-shelf filtering software, such as CyberPatrol, use special search agents (or robots) that continually monitor the Web in order to return sites that might possibly link to offensive information. These sites are then reviewed by living, breathing human beings, who categorize content that is deemed offensive. Users can elect to block categories of information, and in many cases can add to the information. They will not know, however, what specifically is being blocked and why, because the list of offensive websites and Internet sources form the core of online filtering tools, and are considered copyrighted content and trade secrets by companies. This can prove somewhat problematic, especially if your organization's content is getting blocked.
In the Fall of 1999, the Center for Media Education released a study, "Youth Access to Alcohol and Tobacco Web Marketing: The Filtering and Rating Debate" that examined whether some of the most popular screening and filtering software blocked youth from online alcohol and tobacco marketing sites. The report states that filtering software and online ratings services are not doing a good job of protecting children from alcohol and tobacco ads on the Internet. Of the six common off-the-shelf filtering software programs available, only one was able to effectively filter over half of the ads, and one did not filter out any test ads at all. The report recommends that software developers come up with a better system for systematically identifying offending websites. Interestingly, it also recommends that developers actually make their proprietary selection and filtering criteria open for review.
If all this were not bad enough news for filtering software manufacturers, there was another flap during the spring of 2000 regarding the alleged ideological bias of popular filter tools. The flaws in America Online's family-oriented filters wer identified in CNET News.com pieces this past spring and summer. the “Kids Only” blocking allowed content from the Republican National Committee and Libertarian Party, but not the Democratic National Committee, Green Party or Reform Party. AOL's “Young Teens” filter did not block gun advertisements and content from the National Rifle Association, but did block content from the Coalition to Stop Gun Violence and the Million Mom March. None of the blocked sites features any content resembling depictions of pornography. It also turned out that if an offending site were stored in the AOL browser's cache, it could still be accessed, either through the browser list of past URLs, or through the plain text “*.arl” files which store the list of recently-visited websites. The browser flaws were reportedly fixed in July. By sheer coincidence, the filtering rule for AOL's service were developed by The Learning Company, a unit of Mattel Interactive, which has responsibility for CyberPatrol.
The same charges of bias were leveled against other popular filtering software this summer. It seems when turned on, the tools allowed people to view content critical of homosexuality located on the websites of conservative nonprofits, like Concerned Women for America, Family Research Council, and Focus on the Family. When the same content was placed on the websites of individuals, according to Bennet Haselton's Peacefire site, the filtering tools blocked the websites out completely, labeling it as “hate speech.” Interestingly, conservative groups have long advocated filtering tools as a means of curbing access to offensive material by minors.
The problem is that often, inoffensive content from nonprofit groups, hobby and recreational sites, journalism and media organizations, educational instituions, and material in other language, are blocked by the filters. They will not show up in search engine results, or are improperly identified as offensive content sources, regardless of their relevancy, instructional value, or ideological bent. CyberPatrol blocked, for example, every student organization at Carnegie Mellon university, Usenet discussion groups relating to journalism, and a random smattering of inoffensive content, according to Declan McCullagh's report in the 3/16/2000 edition of Wired Magazine.
Filtering the Offensive or Offensive Filtering
Of course, anything that's privileged corporate information is ripe for attack by hackers. For at least four years, an individual named Bennett Haselton has developed algorithms to crack the encrypted list of blocked sites crucial to the operation of filtering tools. Haselton has posted evaluations of filtering tools (or "censorware" as he dubs it), and the list of Internet sources blocked by each product. One tool, iGear (a subsidiary of Symantec), used by a large number of public schools in New York City, was a visible target earlier this year. Haselton distributed a list of those sites blocked by iGear on his site, and distributed a link to it on the Internet via the Web and newsgroups. The company had its lawyers send a letter to the individual's Internet service provider, stating that the list of links and links to the tools infringed on the company's copyrights. Again, the rationale behind the inclusion of sites on the list was not provided, but the range of organizations and companies included on the list created an uproar among some nonprofits and respected companies.
Whenever those lists run the risk of exposure, the companies are poised to take legal action against any violators, increasingly through the Digital Millennium Copyright Act. This 1998 federal law makes illegal any technology that can break the codes ("cracking devices") behind copyright-protection devices and protects online content in a variety of formats. The DMCA does allow cracking and the "reverse engineering" of encryption technologies to aid in the testing of security systems and general encryption research.
CyberPatrol was at the center of a landmark filtering dispute early in 2000. As mentioned above, CyberPatrol belongs to toymaker giant Mattel, under a unit specifically called Microsystems. Two programmers in Canada and Sweden developed a program that exposed the list of some 100,000 websites and other Internet-related information sources considered off-limits by the CyberPatrol tool. The program at issue, CPHACK, was a reverse engineering of the CyberPatrol encryption algorithm. Links to the tool and source code were posted on the programmer's websites. Claiming that the tool would make the filtering software worthless in blocking offensive content, Microsystems filed a lawsuit in U.S. District Court in March 2000, against the programmers and the Internet service providers that hosted their sites. As is usually the case in these types of cases, widespread distribution of cphack and the creation of numerous mirror sites took place as soon as the suit was announced, and a restraining order was issued. A no-cash settlement was reached in the suit involving the programmers. One of the mirror sites, set up through Peacefire.org, was ordered to remove the CPHACK content by a federal judge, or face threat of prosecution (meaning fines and jail time) under the DMCA.
CyberPatrol, by the way, is again coincidentally, the engine behind the Anti-Defamation League's (ADL) HateFilter service. ADL monitors and reports on anti-Semitic and racist activity around the world. HateFilter is a tool developed to monitor "hate speech" on the Internet. HateFilter is downloaded onto your computer, and uses a constantly updated list of reviewed sites to check for offensive content. If a user attempts to access a page deemed offensive, a page appears with the ADL logo, noting that access to the page is restricted. The logo is linked to the ADL's website where they can get more information on categories of hate speech ranging from white supremacy to general Internet hate. Filtering criteria can be adjusted by parents and system administrators, and can also block access to offensive sites based on content, time of day access tries to take place, and specific sets of users. ADL HateFilter can be downloaded for a free trial period of seven days. The price on the software is US$29.95, including three months of free filter updates, with an annual charge of US$29.95 for updates. The page of categories is also available without the HateFilter service. If the lists of offensive sites that power filers are not publicly available, organizations can be singled out for use of a word or phrase, often out of context, or for sharing information that an outside entity does not like, often without organizations being aware of it. This is less of a problem if you are filtering for a select group of users like staff or a family.
Say, however, that you operate a portal site or a national organization whose website is the starting point for a large number of users. One frequent criticism, for example, from groups that advocate for sexual education in the schools is that sexual and pornographic content is not consistently differentiated. Words like "sex" and "sexual" may cause links to pages that advocate abstinence to be blocked, as they may trigger filters. The flip side is that they may also block the pages of those who oppose sex education as well.
With respect to ADL's HateFilter, there have been charges raised that it does not draw a distinction between the opinions, however offensive, individuals may hold and share across the internet versus those efforts which have the most potential for harm. Shawn L. Twing, in his review of HateFilter in the Jan/Feb 1999 edition of the Washington Report on Middle Eastern Affairs notes that while speech considered racist by the ADL was blocked, content that could be considered threatening and malicious to Arabs and Palestinians was not blocked. (The ADL does state that two of the more controversial pro Zionist sites, Jewish Defense League and Kahane.org are blocked). HateFilter, interestingly enough, does not appear to be the first filter produced by a non-commercial entity. The Church of Scientology, according to a December 1995 Wired magazine article, apparently developed filtering tools to prevent its members from accessing content critical of the church through Usenet discussion newsgroups.
The Law on Filtering
As far as filtering in the legislative arena, the biggest legislative attempts at the federal level currently have involved
- efforts to mandate that schools and libraries receiving federal funds install filtering software on computers to prevent children's access to obscene or harmful material online; and
- efforts, such as the recently overturned Child Online Protection Act (COPA).
- On June 22, 2000, the 3rd U.S. Circuit Court of Appeals in Philadelphia, Pennsylvania, in upholding a lower court ruling against COPA, said that the law was so broad that it would also wind up applying to websites that were not pornographic. The three-judge panel's decision gives a lengthy discussion to one of the trickiest aspects of COPA, namely how the U.S. Supreme Court's definition of "community standards" is not easily transferrable to the Web and online content, mostly because the same content, whether it be allowable in one community and not in another, is potenitally still available to anyone online. Interestingly, according to Wired Magazine's Declan McCullagh writing in a 6/22/00 article, the problem for the judges lay in trying to convert "physical community standards into virtual ones."
- COPA is, in essence, a second attempt by Congress to pass a federal law that would shield children from harmful online content. The first, known as the Communications Decency Act (CDA), was struck down by the U.S. Supreme Court in 1997. The case addressed a portion of CDA that prohibited "indecent," "patently offensive" content directed towards minors online, though adults had a First Amendment right to the same content, based on "community standards" and other tests. The Court, in a 7-2 decision, found that Internet speech ranks amongst speech afforded highest First Amendment protection, that Congress' ends did not justify the restrictive means employed, and that community standards did not hold up in the wake of the dssolved geographic boundaries of the Internet. More telling, the Court rejected the notion that the CDA could override the authority of parents to choose what their children should or should not have access to, and that the articulators of objectionable content should face numerous confusing (and sometimes) conflicting mechanisms in distributing their content.
- On May 22, 2000, the U.S. Supreme Court issued their decision in United States v. Playboy Entertainment Group. At issue was whether Section 505 of the 1996 Telecommunications Act violated the First Amendment in restricting (but not banning) adult-oriented content to "safe harbor" (or late) hours (10:00 p.m. to 6:00 a.m.) if the cable operators carrying said content could not fully scramble access to it. If a channel were successfully scrambled, there would be only normal static on screen. The problem was that cable operators sometimes had "signal bleed" creeping through on their channels (for you cable subscribers out there, this is when you can't see anything but fuzziness or blurred images, but can hear all of the sound on channels to which you are not subscribed). In a very close ruling (5 to 4) in favor of Playboy, the Court held not only that Congress could neither limit content to certain hours nor dictate the methods through which content would be blocked, but that any attempt to address the issue of content harmful to minors needed to employ the "least restrictive means test." This is a standard employed by the Court in First amendment cases that states that government, in looking to perform functions it is allowed to do, when it has a range of options that can effectively achieve the stated end, must select that option least interfering the rights of expression under the First Amendment. In this instance, Congress was held to have a legitimate interest in blocking children's access to pornography, but not to the extent that it blocked Playboy from exercising its First Amendment right to show it. Since individual subscribers could either ask their individual cable operators to block the channel, that least restrictive means would be another way of accomplishing the same goal. Regardless of the importance of material, the Court held, the First Amendment still applies, especially when the government cannot demonstrate firm evidence showing that the content in question produces specific harm to minors.
