Introduction to IT Projects (Electronic Dictionary Research:EDR)

Japanese

Keywords Used For WWW Search:
Electronic dictionary, Machine tractable dictionary, General-purpose dictionary, Large-scale dictionary, Natural language processing, Knowledge information processing, Linguistic data, Word dictionary, Bilingual dictionary, Concept dictionary, Co-occurrence dictionary, Corpus, Text base, Headword, Part of speech, Syntactic tree, Conjugation, Surface case, Idiom, Synonymy, Concept, Concept relation, Concept hierarchy, Co-occurrence relation, Semantic frame, Frequency, Example sentence

Project Name:
Experimental research towards the development of an electronic dictionary for natural language processing [-> EDR web server]

Country of Origin and Current Location:
Japan Electronic Dictionary Research Institute, Ltd. (EDR)
Project Coordinator:

Brief overall project summary:
The aim of this project (the EDR project) is to research and develop a large-scale and advanced machine tractable dictionary (an electronic dictionary) inevitable in establishing natural language processing technology and knowledge information processing technology in the next generation. Therefore, the EDR Electronic Dictionary is designed as a new and genuine machine tractable dictionary incorporating all information essential for computers to understand and generate natural language in the forms appropriate for computers to process, and is contemplated to be a computer dictionary with universality and generality without limitation to specific application systems.

The Dictionary is configured with such manners as, to separate dictionaries into one that deals with surface-level information, which is related to grammatical characteristics, and the other that handles deep-level information, which is associated with meanings respectively, to keep the surface-level information, heavily dependent on the language, in the Word Dictionary, and to reserve the conceptual information, independent of the language, in principle, in the Concept Dictionary as an independent knowledge base. Besides, the Dictionary is developed with a policy to describe, based on a large amount of texts, only information retained by words and their represented concepts themselves, excluding information contingent to specific grammatical rules and algorithms. The characteristics of the EDR Electronic Dictionary can be summarized as the following.

  1. A large-scale dictionary covering vocabularies used in general sentences.
  2. A general purpose dictionary not depending on specific application and algorithms.
  3. A dictionary furnished with a knowledge base necessary for the genuine semantic processing.
  4. A dictionary with high objectivity based on a large volume of texts.
  5. A basic dictionary with broad expandability for the language, fields and others.



Figure 3. Relations among Major Dictionaries
Project Objectives:
The purpose of the EDR project is to develop a dictionary designed for language processing by computers. In the background, there are certain circumstances where the Japanese language is placed and such factors as, the expectation to the natural language processing (machine translation in particular) as a means to untangle language barriers, the request to materialize a support system aimed to attain the higher advancement of the language technology (documentation technology, language tutoring technology and others) in Japan, and the natural language processing is being placed in the core of the information processing technology, will explain the situation.

While the information processing technology has highly advanced from knowledge information processing to artificial intelligence, the natural language processing technology can be valued as a basic common technology not limited to an application technology and is enticing needs from various fields. The purpose of the project is to be useful to these needs and the project is expected to be used as a basic technology for various advanced application systems in the future.

Expected/Actual Results:
The results of the project are provided to outside as the EDR Electronic Dictionary in the form of CD-ROM and the details are listed below.

Table-1@The EDR Electronic Dictionary in a CD-ROM(s)


Name of
the Dictionary
ContentsCD-ROM No.
Japanese Word
Dictionary
General Vocabulary (250,000 words) JWD-V015
English Word
Dictionary
General Vocabulary (190,000 words) EWD-V015
Concept Dictionary Concepts for General Vocabulary Dictionary (400,000 concepts) in the forms of the Concept Classification Dictionary and the Concept Description Dictionary
CPD-V015
Japanese-English
Bilingual Dictionary
General Vocabulary (230,000 words) JEB-V015
English-Japanese
Bilingual Dictionary
General Vocabulary (160,000 words) EJB-V015
Japanese Co-occurence
Dictionary
General Vocabulary (900,000 phrases) including an assisting dictionary of co-occurence patterns for Japanese verbs
Supplement: Japanese Corpus (220,000 sentences)
JCC-V015

JCO-V015E

JCO-V015S
English Co-occurence
Dictionary
General Vocabulary (460,000 phrases)
Supplement: English Corpus (160,000 sentences)

ECC-V015
ECO-V015E
ECO-V015S
Technical Terms
Dictionary (Information Processing)
Japanese Word (120,000 words), English Word (80,000 words). Concept Classification, Bilingual, and Co-occurence DictionariesTED-V015




Target group:

Partners/Actors in the Project
Japan Key Technology Center
Fujitsu Limited
NEC Corporation
Hitachi, Ltd.
Sharp Corporation
Toshiba Corporation
Oki Electric Industry Co., Ltd.
Mitsubishi Electric Corporation
Matsushita Electric Industrial Co., Ltd.

Use of Information/Communication Technologies:
As already stated in the research and development system paragraph, the project has taken the distributed research system, structured a network to facilitate the researches effectively and smoothly and it enabled us to exchange daily research information.

Benefits to the Information Society:
Computers understand "language," communicate, translate and speak it. The EDR Electronic Dictionary is equal to the heart of the information system related to such "language." Even for word processors and machine translation, dictionaries are the vital part. To extract the potential of dictionaries, the grammatical rules and the syntactic analysis/syntactic generation programs are needed, and such a variety of application as follows can be achieved when used with these functions.

  1. Intelligent word processors: Enhance conversion accuracy of kana-kanji conversion.
  2. Next generation machine translation: Allow semantic processing on the concept level.
  3. Intelligent information retrieval: Identify necessary information through inference according to given information.
  4. Document summarization: Summarize essential points understanding the contents of a document.
  5. Language tutoring CAI: Review and correct grammatical errors such as prepositional, case postpositional and other usages.
  6. Software CAD: Understand the specification description language.
  7. Expert system: Acquire knowledge from text.
  8. Speech dialogue system: Understand the content of a talk resolving obscurity of the speech and generate the responding speech.

The EDR Electronic Dictionary has reached the highest level in the world with its vocabulary size and lexical knowledge, and is expected to be employed in many new areas in the coming knowledge information society, naturally be utilized for the improvement of the conventional natural language processing system. Furthermore, besides these application aspects, the Electronic Dictionary, which is the major component for the natural language processing technology, will undoubtedly contribute to the development and advancement of the technology itself and has become an extremely valuable research object for researchers.

Expected Cost:
14,305 million yen (actual results).

Date Information was Collected:
November 8, 1995

Date of Last Update:
October 23, 1998

Information provided by:
Japan Electronic Dictionary Research Institute, Ltd. (EDR)
  • Name: Sayoshi Sakai
  • Position: Senior Vice President
  • Mailing address: Daini-Abe Bldg., 78-1, Kanda-sakumagashi, Chiyoda-ku, Tokyo 101-0026
  • Phone: +81-3-3851-5521
  • Fax: +81-3-3851-5840
  • E-mail : thoth@edr.co.jp

Other information :
For the latest news on the project please go to
URL: http://www.iijnet.or.jp/edr/J_index.html


The MITI inventory offers a total documents retrieval system--please click the button below.
Search
GO Global
Inventory
If you have any requests about topics you wish to be covered other than the information published by us, or comments and questions regarding this Web page, please send an E-mail to: