From simplified Nepali typing... to an OS
In the Himalayan country of Nepal, a large section of the population is deprived of the usage of computers because of the language barrier i.e. English which is the communicating language of the computers, One of the institutions there, an archive-and-library there was facing challenges in cataloging its books, and ran into hurdles with ‘sort’ and ‘find and replace’ requirements. It undertook a font stardisation project, whicih grew far beyond expected. An interesting story by Bal Krishna Bal.
Somebody rightly said, “Necessity is the mother of invention”. Had it not been for the necessity felt by Madan Puraskar Pustakalaya(MPP) for the electronically cataloging of its collection of books some four years back, which was then not possible with the help of the existing fonts of Nepali like Preeti and Kanchan, there are doubts that MPP, a principal archiving house got involved in developing software in Nepali.
Besides, the fact that the existing Nepali fonts lacked data processing facilities like “Sorting” and “Find and Replace”, they also lacked uniformity in terms of keyboard mapping of the Nepali characters, thus making Nepali typing difficult to the general public. The need for developing fonts which enabled data processing facilities for Nepali with corresponding simplified keyboard mapping was clearly felt.
It was in the wake of such a scenario that Madan Puraskar Pustakalaya undertook the Font Standardization Project which was assisted by the Ministry of Science and Technology and United Nations Development Project (UNDP) in March 2002. It was the result of this Project that led to the inception of Unicode in Nepal, which is an encoding scheme that assigns unique code to every character of standard writing scripts of the world. Under the Project, Unicode compatible fonts like Kalimati, Kanjirowa, Thakwa Robinson along with two keyboard drivers, namely the Nepali Unicode Keyboard Romanized and Nepali Unicode Keyboard Traditional were developed.
With the development of the keyboard drivers (software), Nepali typing has become drastically simple to learn and the development of the Unicode compatible fonts has made data processing possible for the Nepali language. This satisfied the immediate need of MPP and its involvement in the field of software development could have ended. But it was just the beginning of MPP’s involvement in the field of software development.
Owing to the fact that a larger Nepali population is deprived of the usage of computers because of the language barrier i.e. English which is the communicating language of the computers, MPP then put the objectives of developing an operating system and localized software applications in Nepali, thus undertook the 30 months long PAN Localization Project starting from January 2004 and ending in June 2006, which is supported by the International Development and Research Center (IDRC) Canada and administered by the National University of Computers and Emerging Sciences (NUCES), Lahore, Pakistan.
The chief objective of the Project is to increase the computing capacity in the local languages of South and South East Asia. As an outcome of the multinational localization project, which is being run simultaneously in seven different countries, viz. Afghanistan, Bangladesh, Bhutan, Cambodia, Laos, Nepal and SriLanka, MPP as the Nepal Component released the localized operating system NepaLinux 1.0 on the 22’nd of December 2005
NepaLinux 1.0 is a Debian and Morphix based GNU/Linux distribution focused for Desktop usage in Nepali Language Computing. Apart from the operating system in Nepali, the CD package comprises of localized GNOME desktop environment, OpenOffice.Org suite and Mozilla suite.
Among other utilities in the CD available are the Nepali Spell checker, Thesaurus and Nepali Unicode support. In the extended phase of the PAN Localization Project till March 2007, MPP is planning to release respectively NepaLinux 1.1 and NepaLinux 2.0 plus develop a prototype of the Grammar Checker for Nepali. Besides, the current Nepali Spell Checker and Thesaurus in NepaLinux 1.0 would also be enhanced and made more robust.
With the operating system in Nepali developed, MPP is aiming to focus more on advance language processing and mobile computing applications in future. MPP’s works have been increasingly shown interest and rendered support both from national and international levels. In the year 2005, MPP in joint partnership with the Central Department of Linguistics, Tribhuvan University undertook the Project Bhasha Sanchar, which is supported by the Asia IT & C Program of the European Commission and led by the Open University, UK.
The other project partners include European Language Resources Association (ELRA) France, Goteborg University Sweden, Lancaster University UK, Limerick University Ireland and OutsideEcho UK. The chief objectives of the Project are to serve the ICT needs of local communities and citizens, and provide an input into sustainable development, by developing and deploying software technologies that work in Nepali.
With regards to the Project activities, works are underway in developing a digitized Text Corpus and a transcribed Spoken Corpus in the Nepali language, which are believed to be important resources for advance natural language processing applications for Nepali. Similarly, a digitized contemporary Nepali dictionary is also being developed for which the text corpus is being made the base resource. We are also developing a Nepali TTS. Currently the TTS team here at MPP are working in close association with TTS experts of India, Europe and the US.
Much of the cause for Asia and especially south east Asia to lag behind in Information and Communication Technologies is the lack of the basic infrastructures and the availability of the information on the technologies in the understandable local language.
A machine translation software is believed to render immediate help in this direction. Keeping this in mind, MPP as a partner organization and the Computer Science and Engineering Department, Kathmandu University jointly undertook the Dobhase English to Nepali Machine Translation Project, a 18 months long Project supported by the PAN ICT R&D. Dobhase, which is a web-based machine translation project and scheduled to be released in July 2006 is believed to be of much aid in providing the gist translation to the people not having much knowledge of English.
The PAN Localization Project which represents itself as a major project for the software localization at MPP has been extended up to January 2010. In the extended project tenure, we aim to work on the development of the Nepali OCR system. Besides, the major work of the deployment of NepaLinux and other localized software would be conducted in this phase. We would also be focusing on end-user trainings, seminars and awareness creating campaigns for prioritizing the usage of Free Open Source Software (FOSS) applications.
Coming up from the year 2002, MPP has made a giant leap in the field of software development. Today, it’s image in the general public no longer remains just as an archive house for the collection of printed materials in Nepali but also as an established institution, which has been continuously working in the field of software development. The MPP project team which started in the beginning with three members and today’s enlarged team comprising of 41 members is a live evidence of the huge scale of work that is going on at MPP.
The expertise developed in due course of time, the research works that have been conducted or going on plus the strong team that has been formed is sure to have laid the grounds for establishing it as a research center for Nepali language processing. For now, we are very optimistic about it and believe that it could be a major milestone for our country and in the field of software localization.
The writer is the Project Manager of the PAN Localization Project at Madan Puraskar Pustakalaya, PatanDhoka, Nepal.