| KMTool A global community for knowledge management professionals |
ABOUT KMTOOL |
HOME |
COMMUNITIES |
CONTACT |
||
| QUICK SEARCH [search tips] |
Click On An Arrow Below-Choose An Option-You'll Go There |
||
| TAXONOMY What Is
It? Topics On The Page - Automated Classification Tools - Commercial Product - Standards: Marc, Dublin Core, XML - Manual Classification Related Topic - Search Engines |
|||
|
AUTOMATED CLASSIFICATION TOOLS - COMMERCIAL PRODUCTS Aeneid Developed an XML vocabulary and Document Type Definitions which allow information providers to integrate Internet content with proprietary data. As part of the platform, Aeneid develops vertical business information catalogs which focus on the following vertical industries: high-tech, finance, legal/tax/accounting and healthcare. The Aeneid Aggregation Platform provides: an industry-specific catalog of rich information sources, access to a business-specific search cluster crawled at the frequency and depth required by business information consumers and a toolkit for navigating information and displaying search results. Corporate customer: Gale Group. Autonomy Automates the categorization, tagging, hyperlinking and personalization of large volumes of unstructured content for new media publishers and large corporate enterprises. It uses pattern-matching algorithms, contextual analysis, and concept extraction to automate categorization. Autonomy claims their tool can "extract a documents digital essence and determine the characteristics that give the text meaning." The tool creates "concept agents" that analyze new data and classify them according to the dynamic rules that they "learn". It describes an easy to navigate visual interface for searching showing a unified view of disparate sources called the Visualizer module but doesn't really explain if this is a hierarchical view or classification tree structure. The Knowledge Explorer Module they offer does an excellent job of showing "islands" of knowledge by concepts so the user can navigate through concepts until she drills down to the content she is looking for. IBM Intelligent Text Miner (to be reviewed) Infoseek Content Classification Engine (to be reviewed) Knowledge Discovery Concept Server uses filtering and query building tools for better information retrieval. It prompts the user with contextually related terms drawn from sample content, and the user clicks on them to build queries that have the highest likelihood of success. The system offers a simple alternative to expensive and tedious manual categorization and cataloging of documents, and it provides an efficient means to organize content across many collections, in effect creating "virtual categories". You can try it for free at greatsearches.com or create categories for your research or business needs on a subscription basis. You may also license the software to use on your own corporate website or intranet. Opentext (to be reviewed) Plumtree Server (to be reviewed) Semio Taxonomy Released this year (1999), automatically categorizes structured and unstructured data to create what Semio calls "browsable directories" of Intranets or web portals. Users can browse or search the directories to find the document they are looking for. Semio claims it is a truly automated process without human intervention. In fact, the owner sets the top level of the taxonomy, but the remainder of the categories are created from the source text and matched to 2000 or so lower level categories provided by Semio in their Semio Topic Library. A beta continues through mid June, 1999 if you want to try this product out. STANDARDS: MARC, DUBLIN CORE, XMLMARC (Machine Readable Cataloging) - Originally developed at the U.S. Library of Congress "The USMARC Format for Bibliographic Data is designed to be a carrier for bibliographic information about printed and manuscript textual materials, computer files, maps, music, serials, visual materials, and mixed materials. Bibliographic data commonly includes titles, names, subjects, notes, publication data, and information about the physical description of an item." [Source] Dublin Core: Began in 1995 when a group of
academics, librarians, computer scientists and others came together to create a
set of 15 elements that could be applied to a resource; i.e. subject, author,
title, publisher, date, source, object type, etc. Today there are a total
of 15: title, author or creator, subject and keywords
(expectation/encouragement to use controlled vocabularies here), description,
publisher, other contributor, date, resource type (see Berkeley for current discussion regarding
types), format (i.e. software required to view it), resource identifier (i.e.
url, isbn), source (i.e. pdf might have isbn), language, relation (i.e. way to
show if there are other images, documents related to it), coverage (spatial or
temporal aspects), and rights management (i.e. copyright). W3C World Wide Web Consortium An international organization of 250 companies, government agencies and universities that sets the technical blueprints for many of the Internet's advances. Working on RDF (Resource Description Framework) focuses on the need for metadata that is easily understood by machines. (Harvest metadata from pages then maybe augmentation by people?) RDFwill be helpful for resource discovery to provide better search engine capabilities, in cataloging for describing the content and content relationships available at a particular Web site, page, or digital library, by intelligent software agents to facilitate knowledge sharing and exchange, in content rating, in describing collections of pages that represent a single logical "document", for describing intellectual property rights of Web pages, and for expressing the privacy preferences of a user as well as the privacy policies of a Web site. [Source] For example, an RDF schema designed for indexing books might provide for "title" and "author" properties, while a schema for e-commerce might provide a "payment type" property that can be set to "credit card" or "invoice." [Source] XML A site sponsored by Seybold Publications and O'Reilly & Associates, Inc. The mission of the site is to: is to help you discover XML and learn how this new Internet technology can solve real-world problems in information management and electronic commerce. MANUAL CLASSIFICATION PROCESS - Human Efforts - and Shared Thesauri The American Society of Indexers This association provides a list of thesauri online. Many of these are browse or search only and can not be downloaded. Information Resource Center The Aslib thesaurus collection is the largest open access collection of thesauri in the UK and covers a wide range of subjects, from Aeronautics to Zoology. These are not available electronically from this site but publishers information is provided. Netscape Open Directory Project Recognizing the costs of manual labor in the form of staffs to index the web are very high, this project was created with the goal of recruiting a pool of volunteers to take responsibility for a specific area and maintain that section of this taxonomy of the web. The software uses RDF. Also interesting to note, the project offers the ability to download the hierarchy and use it under a free use license. The site permits anyone to send a submission to be added to a volunteer editor, and anyone to become an editor as well. Clearly stating the point, this site operates under the slogan "Humans Do It Better." Web Thesaurus Compendium Barbara Lutes of Darmstadt University has a nice site with links to electronically available thesauri broken out alphabetically as well as by subject area. She notes which of them may be downloaded as files for use. This site would be helpful to consult prior to building your own subject area taxonomy to see if you can re-use some of the hard work that has been done to date in your field. |
|||
|
ABOUT KMTOOL |
HOME |
COMMUNITIES |
CONTACT |
|||