IBM Develops Full-text Digitization System for National Diet Library of Japan

Posted In: Information Tech

By IBM

Monday, August 1, 2011


newsvine diigo google
slashdot
Share
Loading...

Tokyo, Japan - 01 Aug 2011: IBM (NYSE: IBM) today announced that it is helping the National Diet Library of Japan -- the country's only national library -- digitize its literary artifacts on a massive scale to make them widely available and searchable online by all information seekers.

The prototype technology, created by IBM Research, allows full-text digitization of Japanese literature to be quickly realized through expansive recognition of Japanese characters and enabling users to collaboratively review and correct language characters, script and structure. Additionally, the full-text digitization system is designed to promote future international collaborations and standardization of libraries around the world.

“Nearly two decades ago in his book Digital Library, Dr. Makoto Nagao, the director of the National Diet Library, shared his vision that digitized and structured electronic books will dramatically change the role of libraries and the way knowledge will be shared and reused in our society,” said Dr. Hironobu Takagi, who led the development of the prototype technology at IBM Research – Tokyo. “Until now, the breadth of the characters and expressions within the Japanese language had posed a series of challenges to massive digitization. In order to enable this transfer of knowledge from print to online, we realized the need for both machine and human intelligence to understand information in every form.”

Compared to other languages, which rely on just a few dozen alphabetical characters, Japanese is extremely diverse in terms of script. In addition to syllabary characters, hiragana and katakana, Japanese includes about 10,000 kanji characters (including old characters, variants and 2,136 commonly used characters), in addition to ruby (a small Japanese syllabary character reading aid printed right next to a kanji) and mixed vertical and horizontal texts.

Aside from ensuring quality recognition of Japanese characters, IBM researchers aimed to optimize the amount of time needed to review and verify the accuracy of the digitized texts. By introducing unique collaborative tools via crowdsourcing, the technology allows many users to quickly pour through the texts and make corrections at a much higher rate of productivity and efficiency.

“Through collaboration technology and user tools, we now have the potential to populate a global collection of literature and information,” said Dr. Takagi. “From small community libraries to national institutions, people everywhere can leverage this standardized system to help preserve and share their cultural works for years to come.”

The architecture of the full-text digitization prototype system provides the following two key collaborative features:

The full-text digitization prototype system was realized based on two streams of technologies. IBM researchers in Tokyo applied an innovative approach called Social Accessibility, which allows large groups of reviewers to work collaboratively via Web browsers regardless of location. Also, the COoperative eNgine for Correction of ExtRacted Text(CONCERT) technology -- developed by IBM Researchers in Haifa, Israel -- was leveraged to significantly improve productivity through the repetition of simple operations. 

SOURCE

0 Comments

blog comments powered by Disqus

New To Market

more

JEOL to launch world's smallest solid-state NMR probe
JEOL to launch world's smallest solid-state NMR probe

According to JEOL Resonance, a new benchmark for resolution and benchmark will be set with its introduction next week of a new 0.75-mm solid state nuclear magnetic resonance (NMR) probe. The probe is capable of high resolution sample analysis by spinning the sample at 110 kHz, the world's fastest spinning speed for NMR.

Energy Harvesting Subsystems for Wireless Sensors

Nextreme Thermal Solutions has developed two new energy harvesting subsystems for the plumbing and HVAC industries. The subsystems are the latest additions to Nextreme's Thermobility energy harvesting platform that uses thin-film thermoelectric technology to convert available thermal energy into electric power for a variety of autonomous self-powered applications.

Tools & Technology

more

Ultrapure LC-MS Reagents
Ultrapure LC-MS Reagents

Thermo Fisher Scientific Inc. has introduced three new ultrapure Fisher Chemical Optima LC/MS-grade reagents that modify the mobile phase to minimize background noise and enhance mass spectrometry (MS) detection.

Moisture Analyzer

Mettler Toledo has introduced the HX204 moisture analyzer, which provides high-measurement performance and compliance with industry standards.

Advertisement

Advertisement

Top Stories and Headlines
EVERY DAY!

FREE Email Newsletter