Databases and Portals for Knowledge Management
Soraya Abad-Mota
Universidad Simón Bolívar, Departamento de Computación y Tecnología de la Información
Caracas, Venezuela
tel: (58-2)906-3266, fax: (58-2)906-3232, abadmota@usb.ve
Abstract: The current availability, quantity and variability of Information Technology (IT) is overwhelming. Internet and the powerful set of tools loosely called "the web" dramatically changed the way people use this technology. We can take advantage of these new ways to try to capture an organization's knowledge and manage it effectively. Two key elements of IT that serve this purpose are databases and portals. In this chapter we explore the database and portal technologies, their interaction, and their potential for knowledge management. We also discuss briefly how databases and portals provide a good development opportunity for Latin America.
Introduction
The amount of sources that talk about knowledge management online is overwhelming. Martin White (2000) says: "Putting the term knowledge management into any of the search engines is rather depressing, as the number of hits is invariably well in excess of 100,000." A common agreement among IT professionals is that the knowledge of an organization is concentrated in its documents and in its people. Within this abundant subject lies the need to define knowledge and knowledge management. We will use the practical definitions found in Barclay & Murray (2000) because of their convenience for our purposes. According to them there are two kinds of knowledge: "explicit knowledge (sometimes referred to as formal knowledge), which can be articulated in language and transmitted among individuals", and "tacit knowledge (also, informal knowledge), personal knowledge rooted in individual experience and involving personal belief, perspective, and values."
Barclay & Murray (2000) also provide a good definition of knowledge management.
Knowledge management is a business activity with two primary aspects:
"Treating the knowledge component of business activities as an explicit concern of business reflected in strategy, policy, and practice at all levels of the organization."
"Making a direct connection between an organization´s intellectual assets - both explicit [recorded] and tacit [personal know-how]-and positive business results."
Implementations of knowledge management "may range from technology-driven methods of accessing, controlling, and delivering information to massive efforts to change corporate culture." "Knowledge and information have become the medium in which business problems occur. As a result, managing knowledge represents the primary opportunity for achieving substantial savings, significant improvements in human performance, and competitive advantage."
It is also worth quoting what they had to say about the technological environment in which knowledge workers perform today.
"The nature of business has changed in at least two important ways:
Knowledge work is fundamentally different in character from physical labor.
The knowledge worker is almost completely immersed in a computing environment. This new reality dramatically alters the methods by which we must manage, learn, represent knowledge, interact, solve problems, and act.
...the computerized business environment provides opportunities and new methods for representing 'knowledge' and leveraging its value."
The technology available is at an adequate point to ease the management of the knowledge of an organization. In this chapter we explore two fundamental technologies which can significantly contribute to this management, these are: portals and databases.
A portal is an overloaded term used to describe, among other things, a bundle of services provided electronically, through the web, to a set of users. A portal is a powerful notion. It allows the integration of many functions within a single interface. Sarah Roberts-Witt (1999) provides a good list of what can be included in a portal: "under this portal umbrella sits business intelligence, content and document management, enterprise resource planning systems, data warehouses, data-management applications, search and retrieval and any other application". Roberts-Witt also adds, "the ultimate portal would also provide the Holy Grail in terms of organizational knowledge - true data aggregation and information integration coupled with knowledge worker collaboration."
The notion of dynamically acquiring content and making it available on the web is not new and has become an obvious goal for any web site. But the level to which this goal can be attained is not a simple matter. The technology which plays a central role in this purpose is database technology. Databases are created to structure and describe large quantities of data and to provide a mechanism for querying these data. The main difficulty of building databases lies precisely in structuring the data, which usually involves a complex process of designing the structures, implementing, them using a database management system (DBMS), loading the data, and writing the applications which will query and update the data. Research in databases has extended the scope to include data coming from varied sources, also called multimedia data, which in particular includes documents. On the other hand, a web browser provides a very powerful and universal interface which has allowed easy access to data regardless of its geographical location, as long as it is made available on the web. As such, a web browser is a very appropriate interface for a database.
Databases and portals are two key elements of IT with the potential to provide appropriate access to the vast amounts of online data existent in an organization today in a dynamic and organized manner. In this chapter we explore the relationships between portals and databases and how the interaction between them can be an effective vehicle for knowledge management. In section 2, we define the notion of a database and review the process to make it available to the users. Section 3, covers the notion of Portal, some of its different flavors, and what are the tools and options for building one. In section 4 we present the conclusions and our vision of the future of portals, specially as a development opportunity for Latin America.
Databases and their applications
A well-accepted definition of a database is a collection of interrelated data stored in a computer system. This definition is too general, but it can be made more specific when one considers that the data has some meaning for a specific organization or enterprise, that there will be a diverse group of users accessing the data, and that the database is built to satisfy a particular set of information needs within the organization.
In today's database systems the data and the definition of the data, also called metadata (as described in Chapters 4 & 5), coexist to provide high-level access to the users of these data while assuring that the data is well kept and well secured. An application is a software system which accesses the database to satisfy some users´ information needs. Application programs and queries contain data access commands which are attended by a very complex software, the database management system software (DBMS).
The process of building a database system involves the design and implementation of the database using a DBMS and the design and implementation of the applications that will use the database. The main activity of database design is to model the data in an abstract representation. The model is then implemented in the DBMS to be used. The representation of a specific database in the data model used by a DBMS is called the database schema. The schema holds the definition of the data and is written in the data definition language (DDL) of the DBMS used. The schema is compiled to generate the empty structures where the actual data will be loaded. Embedded in the schema are some instructions for enforcing integrity constraints. These constraints guarantee that the data is correct with respect to the domain. Additional constraints can be specified in several manners and must also be enforced.
After the data is loaded, the database is ready to be used. The users access the database through applications or directly by issuing commands in the now standard database language SQL (Structured Query Language). This way, the data and the programs that use it are kept separate. The DBMS allows access to the data and enforces controls to keep it correct and secure. The applications can then concentrate on specifying which data they need for their processing and not worry about how the data is obtained.
The advantages of following the database approach are manifold. This approach allows data independence, i.e. changes to the physical structures where the data is stored do not affect the operative applications; increases data sharing; facilitates the establishment of standards; reduces programming time by leaving most of the data validation to the DBMS.
In 40 years of database technology history and after the successful implementation of the relational model, there have not been many changes in the way the data is structured or in the data models used. During the 60´s and 70´s, the hierarchical and network models were prevalent. In the 80´s, the relational model became competitive and started taking over the market. Currently, there have been some penetration of the object-oriented model, although it still has a very small portion of the market. But there have been dramatic changes and rapid evolution in the interfaces to a database and in the heterogeneity and needs of the applications which use it. Web browser interfaces are very common today and very convenient. They are platform independent and allow simple and uniform access to databases. With the evolution of intranets and web browsers it is natural to integrate all the data and services of a business through a web site. We talk about this integration in the next section.
In the early days, a database was built for a single application. Today´s technology and practices exploit the database approach by building several applications around one, normally large and complex, database. The nature of these applications has also changed: what used to be a single application before, might be part of a more complex and sophisticated application today. For example, a human resources system includes recruitment, promotions, payroll, training and development, and termination. Each of these components could have been an application before and is now part of a large system. Additionally, with an integrated view of an enterprise, all the applications must interact and their data must be combined and analyzed to provide consolidated information to the strategic levels of the organization.
The database contains the data and the rules which define data correctness. The applications implement the dynamics of how the data is updated and used to fulfill information and knowledge requirements.
Portals and their development options
A portal is essentially an integration of services and content in a web site. A portal is the next evolutionary step in the use of web browsers. In 1994, it was Mosaic, then came Netscape, now every computer has some web browser. A browser according to the Special Libraries Association (2000) is a "software that displays web pages and helps facilitate a user interacting with the information on the web pages." But with the tremendous growth on the number of web pages available, a web browser is not enough. We need a tool to work at a higher level of abstraction, where the user is not required to specify too many details to find the information that she needs or where given the profile of the user, the system can "guess" the user needs or can even correct a typo on the user´s specification. Just like this word processor does when the transcriber misspells much as mcuh. In synthesis, the users need more functionality and intelligence in their browsers.
Even though there is consensus on the general definition of a portal, the major confusion begins when defining specific kinds of portals or when trying to build a taxonomy of portals. In addition, authors have been very creative in the introduction of new terms for every particular type of portal.
We want to focus on Enterprise Portals, also known as enterprise information portals (EIP), intranet portals, corporate portals, and cortals. According to the Special Libraries Association (2000) enterprise portals "represent current evolution of corporate intranet and function as a starting point for employees; EIPs tie together multiple, heterogeneous internal repositories and applications as well as external content sources and services into a single browser-based view that is individualized to a particular user's task or role; can deliver more relevant content in context than a broad (internet) portal." Roberts-Witt wrote in 1999: "The idea of a corporate portal is less than a year old and already companies are feeling a sense of urgency in getting them going."
The Delphi Group conducted a survey in February 1999 of 300 corporate managers to find out what they thought a portal could do for their companies. Robert-Witt analyzes the top three responses to this survey; we reproduce excerpts from this analysis below. According to the survey, the top 3 responses were:
"Sharing information and work methods, which seems to speak directly to the knowledge management notion of making tacit knowledge explicit."
"Business process support, or workflow, indicating that companies see a huge upside to exchanging electronic files rather than moving hard copy from desk to desk in the business process."
"Customer service, mirroring the growing business interest in managing customer relationships."
Roberts-Witt (1999) also clusters corporate portals into three general classes. These classes are: data, information and collaborative portals. We use this classification because it is adequate for our approach. Below are the definitions of her portal types:
Data Portals "deal primarily with structured data? that tend to populate corporate databases." The main products in this category try to make "reports widely available to users who need them." Data portals allow dynamic and low-overhead reporting capabilities.
Information Portals deal "more directly with unstructured data, such as email, text and other documents. The products typically have indexing and cataloging capabilities, as well as some robust search and retrieval functionality. In essence, they organize and deliver information."
Collaborative Portals "focussed on tying more group interactive functionality."
There is a wide range of options for a portal. Beginning with the simplest form of a portal, defined by McCallum et.al. (2000) as "an information gateway that often includes a search engine plus additional organization and content", to more sophisticated forms of portals. Sophisticated examples include Yahoo and Altavista, (examples of horizontal portals) or high level university campus portals such as described in Eisler (2000) as examples of a vertical portals. The services provided in a portal vary widely with the purpose of it. But some services that are frequently found are: member registration, personalization, search engine, email and discussion boards, organization and indexing of content, from internal and/or external sources. Normally, the users of a portal have to register in it and provide a name and password each time they use it. This allows the system to personalize the services and contents to the specific user. The portal constitutes a single point of entry and a single logon to the services provided. For a thorough coverage of the subject of personalization see the special issue of the Communications of the ACM (Riecken, 2000).
The task of developing a portal is a large and complex one. There are several alternatives for building a portal. We list some of these alternatives here.
In-house development. An organization can choose to build its portal by hiring a development team and acquiring the right tools. Regarding the tools there is a myriad of options, but none of them is truly complete to provide all the desired functionality in a portal. The main distinction is between proprietary tools and open source software. The proprietary tools for building portals are, in general, expensive and it is not uncommon to find a bundle of software and consulting services in this area. Some examples of this kind of tools and services are: Microsoft Site Server, Vignette and Broadvision (see Ante,2000), Viador E-Portal Framework, Iona Technologies iPortal Suite, and Hummingbird Enterprise Information Portal (see Whiting, 2000, for the last three). There are many separate tools that are open source software, but very few provide an integrated solution. One of these few is MasonHQ (www.masonhq.com), which uses Perl and Apache. Hummingbird started offering free copies of its EIP Development Edition.
Outsourcing. The whole effort of constructing the portal might be delegated to an outside company. This is a reasonable option if the organization who needs the portal does not have a development team. But in any case, after the portal is developed and becomes operative, a maintenance team should be in place so that the portal is available 24 hours a day, 7 days a week. This situation leads to the third alternative.
Portal in a box. This alternative is available today from many vendors. It provides a basic skeleton for the portal with a fixed set of services from which the user selects the ones she likes. The basic skeleton can be customized to some level. This option is very limited and not very flexible. Many of the vendors in this category offer this option free of charge, but the downside is that usually a significant amount of advertising is pushed through the portal. Another characteristic of this alternative is that the registration data of the users of the portal and some of the content is hosted at the vendor's site, without any control over the use and maintenance of these data by the "owner" of the portal. In fact, the portal is really owned by two parties, the organization who wants the portal and the vendor who offers the portal-in-a-box solution.
Automatic construction. There are some attempts to try to build a portal, or at least its main content, automatically. The most recent example of this is the application described by McCallum et.al. (2000) for automatically building an information portal, specifically in the context of scientific and technology libraries. It uses sophisticated tools based on techniques from artificial intelligence to try to automate mostly the content acquisition aspect. This type of portal falls in the category of information portals described in the classification presented earlier in this section.
Some major issues which the builder of a portal needs to be aware of are: permanent availability, up to date content of the portal, intellectual property of the content of the portal (Thurow, 1999), security, privacy, different user levels with access to different functionality, and how to apply the security and privacy constraints at each level.
A portal could be designed from scratch to provide services and data that are specifically constructed for it. Or it can be conceived as a platform for integrating existing systems and data sources. In the latter case, some major issues in the portal construction complicate the task considerably. These issues are heterogeneity and interoperability, or more generally, system integration. But on the other hand, the effort of building a portal is a facilitator of system integration. For a recent review on system integration in some major applications, see Hasselbring (2000).
How can a portal and a database work together? The first use of a database within a portal is to maintain the user profiles and to support the registration and authentication services. But portals and databases can interact beyond this simple use. A database can be used as the main "content provider" for the portal. Furthermore, other techniques that add value to the data can be jointly used with the database to provide additional services. Some examples of these other techniques are: data mining, text mining, and information extraction. The database technology is at the core of maintaining the content of a portal correct and up to date. Also, the portal provides a convenient interface to the database. It constitutes the modern way of building a database application; a web-based application that can be integrated to other services and data sources.
Conclusion
According to a recent Forrester Research survey (Ante 2000) "the top three most important software purchases that technology execs will make in 2000 are customer service, software for managing Web pages, and programs that improve personalization". A database of user profiles is the essential component needed for personalization, and a database of customers is the foundation of a customer service system. A portal, used in conjunction with such databases can address all of these three needs within a single system and interface. Portals are not only interfaces, not only software tools, they are the basis for the next-generation systems of any business, enterprise and organization. A portal is a notion which brings together useful information technologies and powerful concepts.
It is not easier or less expensive to build a portal than it is to build other information systems, but the impact of having a portal is much more powerful. A portal can reach a wider audience, can be available 24 hours per day without the need for human operators, and can grow with the user requirements. One service that is very attractive in today´s global world is e-commerce (see Chapter2). This service is naturally embedded in the definition of a portal. Therefore, the implementation of an e-commerce function can be facilitated with the establishment of a portal.
There is a well-known lack of consolidated information about Latin America. Industries, government agencies, and private capital corporations worldwide are interested in learning more about Latin America. But this lack of information is an obstacle to investment in the countries that comprise the region and to their development. Building portals in Latin America can help the region overcome some of its informational problems. A portal can facilitate data gathering, analysis and querying. A portal can increase the exposure of an organization and the dissemination of information. With a function of e-commerce included in the portal, a Latin American organization can have an incentive to gather data that can be sold. An illustrative example of a portal development effort in Latin America follows.
The Ibero-American Science and Technology Education Consortium (ISTEC) is a non-profit organization comprised of educational, research, and industrial institutions throughout the Americas and the Iberian Peninsula. The Consortium has been established to foster scientific, engineering, and technology education, joint international research and development efforts among its members, and to provide a cost-effective vehicle for the application and transfer of technology. The objectives of the Consortium are to conceive, plan, and carry out activities of higher education, research and development, and technology transfer, for the purpose of facilitating scientific and technical progress of the Ibero-American countries. ISTEC is investing substantial resources in the development of a Science and Technology Education Portal. This portal will concentrate all the services that the consortium provides to its member institutions and will allow the offering of new services. The main purpose of this portal is to provide ISTEC with a multilingual (English, Spanish and Portuguese) real-time forum on Information Technology. ISTEC academic member institutions will be provided with the necessary infrastructure in terms of hardware, software and protocols, that will allow them to run this system, to access all sources of data, information and knowledge available to the consortium, and to contribute significantly to the enlargement and improvement of these collections. The system will provide mechanisms for data gathering, information retrieval, reporting and advanced data analysis for exploration and query answering. The data gathered through the portal will be integrated with ISTEC's Distributed Database of expertise and activities. The industrial members of ISTEC will also benefit from the portal, it will give them access to the collections and it will give them an audience for their own products and services.
Currently there are several portals in Latin America, some of these have been developed with local resources, others were developed elsewhere for Latin American audiences. With the exception of some institutional portals, most of these portals were built for commercial purposes. But ISTEC's portal is the kind of not-for-profit collaborative effort with the potential of providing many benefits to its academic member institutions.
Some areas where a portal would be very useful for Latin America are digital libraries, university curricula for undergraduate and graduate programs, and demographic information. Since portal development is expensive and requires resources of varied nature, it is fundamental that the international organizations interested in the development of Latin America contribute to this effort by helping the interested institutions in locating the appropriate funding.
Ante, S. E. (2000, June 19). The second coming of Software. Information Technology Annual Report - Software. Business Week, June 19, 2000.
Barclay, R.O., & Murray, P.C. (2000). What is knowledge management? Knowledge Praxis, 7/31/00. [Online at]: www.media-access.com/whatis.html.
Eisler, D. L. (2000, September). The Portal´s Progress. Syllabus. [Online at]: www.syllabus.co
Hasselbring, W. (2000). Information system integration. Communications of the ACM, 43(1), 33-35.
McCallum, A. K., Nigam, N., Rennie, J., & Seymore, K. (2000). Automating the construction of Internet Portals with Machine Learning. Information Retrieval, 3, 127-163.
Roberts-Witt, S. L. (1999). Making Sense of Portal Pandemonium. Knowledge Management, July 1999.
Riecken, D. (2000, August). Guest editor of special issue on Personalization. Communications of the ACM, 43(8).
Special Libraries Association. (2000, April). Exploring the possibilities of Information Portals. Video Conference organized by the Special Libraries Association. [Online at]: www.sla.org/sla-learning/portals.htm
Thurow, L.C. (1999). Building wealth: The new rules for individuals, companies and nations in a knowledge-based economy. HarperCollins Publishers.
White, M. (2000, April 13). Knowledge Management. Free Pint, Issue 60. [Online at]: www.freepint.co.uk.
Whiting, R. (2000, March). Vendors add power to portals; Iona, Viador, and Hummingbird offer more development and content management capabilities. Information Week, March 2000, 83. [Online at]: www.informationweek.com/778/portal.htm