Mediapolis, inc

index

April 12, 2001

OUT ON A LIMB
Napster's Song-Filtering System Likely to Be Deemed Inadequate

By LEWIS PERDUE
Special to WSJ.COM

Napster Inc. says it is doing its best to filter out pirated songs on its file-sharing service, but a court-ordered technical review may discover what a top search-engine expert and engineer calls a "lame attempt" to prevent copyright violations.

The expert, Alan Emtage, is credited with building the Internet's first search engine, Archie, in 1989. He says that the file-based filtering system developed by Napster is doomed to failure. "It's not in their interest to filter things effectively," says Mr. Emtage, who is chief technology officer of Web engineering company Mediapolis.

U.S. District Court Judge Marilyn Hall Patel on Tuesday called Napster's efforts to date "disgraceful" and suggested that the company may deserve to be shut down, depending on what the court's technical expert, A.J. Nichols, finds.

Judge Patel expressed outrage that every one of the 212 songs listed in the original complaint are still available through Napster (www.napster.com ), as are more than 5,000 of a list of 6,000 specific songs the recording industry asked to be blocked.

For its part, Napster has issued a series of news releases saying that it "will work with any technology that fits within the parameters of the court's order." The company has also said that "effective blocking is an ongoing and iterative process that we take very seriously." Napster declined to comment for this column.

Interestingly, though, while Napster has been very good at helping users find often-pirated songs in the service, the company has done a poor job of preventing those tracks -- through the use of filters -- from being listed in its online catalog in the first place. For example, the Napster search engine has the ability to find music by entering either an artist name or the song title (www.napster.com/help/win/gettingstarted/2 ). But in order to filter songs, the company says, its system requires that both the artist and the title be matched up -- and spelled correctly -- or else the material doesn't get filtered.

Napster also says its efforts to root out pirated songs have been limited by accidental and intentional misspellings or alternate spellings of song titles and artists. For example Metallica -- which has long fought against Napster -- has a track called "Anthem." A search on the Napster system for anthem.mp3 turns up no results. But a search for either "metallica and anthem" (see screenshot3 ) or just "Metallica" (see screenshot4 ) pulls up a list of that group's songs.

In addition, typing in "Metalica" (missing one "l") gives yet another list of songs from a group that has repeatedly said it wants its songs off Napster. This illustrates that the service allows users to find titles even if the user misspells the name. This feature is built into Napster's own search engine, yet Napster hasn't put a similar capability to work in filtering copyrighted material.

Indeed, as the Metallica search shows, it is not even necessary to misspell an artist's name to get lists of prohibited materials. This runs counter to Napster's Q&A page5 which describes its efforts to filter prohibited materials.

That page offers the following:

"Q: Has Napster tried to improve its filtering capabilities?

A: Since the March 27 filing, Napster has added a third filter to its system. Whereas the first and second filters prevent files with a specific word combination from being indexed when a user logs on, the third filter blocks searches for certain artist/title pairs. For example, searches for the exact artist/title pair 'Metallica' and 'Enter Sandman,' or any previously identified variant, now return no results."

But a search for "Sandman" (see screenshot6 ) found the Metallica cut on the first try.

As a result, many of the copyrighted songs and musical acts the industry has pegged for blocking are still readily available on the site -- and remain easy to find.

Napster also asserts that its efforts to block pirated songs have been limited by accidental and intentional misspellings or alternate spellings of song titles and artists.

But Mr. Emtage says that a skillful programmer could easily write algorithms that would filter both obvious and not-so-obvious variations. "You could easily filter for adding a dash, an asterisk or other character in the name or for variations on spelling."

In its compliance Q&A, Napster claims it has done this: "Napster has improved its filtering technology to account for common filename variations. For instance, numbers in front of or following a word ('1Jimi 1Hendrix - 1Purple 1Haze') are now treated as separate words, so that the keyword filter will properly identify and block the file name."

Still, a search for the properly spelled name "Hendrix" easily locates scores of prohibited cuts (see screenshot7 ).

It's not necessary, Mr. Emtage agreed, to acquire large databases of file-name variations as Napster has done. That much can be accomplished by making search-engine algorithms smarter in some very obvious ways. But even that, Mr. Emtage says, won't likely be the long-term solution. He says text-based filtering is a "brain dead strategy" because file names can so easily be changed, but added that "Napster hasn't tried very hard" to make even that tactic work. "Perhaps they have grudgingly managed to comply with the letter of the court's order, but certainly not with its spirit."

Significantly, Napster doesn't necessarily need the file names to filter songs. It has long used a fingerprint of each file -- a digital checksum known as an MD5 hash -- to keep track of duplicate files on its system and assist users in finding songs for completing interrupted downloads. Napster hasn't provided filtering based on checksums, even though it can be used to locate songs for download. (A checksum is an algorithmic file quality and accuracy check.)

A number of digital-rights protection companies such as Seattle's Cantametrix and San Jose's BayTSP counter that even this method can be circumvented because it's possible to alter a sound file in a way to change its digital checksum. However, BayTSP and Cantametrix said their systems' technology would still detect prohibited files. (To be sure, many digital-rights management firms would dearly like to add Napster as a client, given its current troubles.)

Napster's statement that it is doing all it can doesn't sit well with some observers, including Mark Ishikawa, chief executive of BayTSP. He says he offered to license his firm's technology to Napster last November, but never got any response to numerous calls and e-mails. Cantametrix didn't return calls.

Matthew Oppenheim, senior vice president for legal affairs of the music trade group Recording Industry Association of America, says he was aware of "numerous" attempts by companies such as Cantametrix to offer Napster filtering technology as long ago as May 2000.

In business about 10 months, BayTSP says it has identified and forced the removal of more than 105,000 copyright violations -- mostly images -- resulting, it says, in a 99.8% compliance rate.

index

close