Teach students the virtues of metadata. More specifically, they learn how to formally represent the requirements of a Web service and then build a computer program to generate the computer programs that implement that service.
Organizations have complex requirements for their information systems. They also have insane schedules and demand that you build their system within weeks. Finally, they are fickle and have no compunction about changing the requirements mid-stream.
Corporations all have knowledge management systems even though generally they may not have any knowledge. Universities claim to have knowledge and yet none have knowledge management systems. You are going to build a knowledge management system for your university's computer science department. In order to ensure that you can get a job after this course is over, you must refer to your final product as "a KM system".
Another issue is a perennial side-show in the world-wide computer programming circus: the spectacle of nerds arguing over programming tools. The data model can't represent the information that the users need, the application doesn't do what what the users need it to do, and instead of writing code, the "engineers" are arguing about Java® versus Lisp versus Perl versus Tcl. If you want to know why computer programmers get paid less than medical doctors, consider the situation of two trauma surgeons arriving at an accident scene. The patient is bleeding profusely. If surgeons were like programmers, they'd leave the patient to bleed out in order to have a really satisfying argument over the merits of two different kinds of tourniquet.
If you're programming one Web page at a time, you can switch to the Language du Jour in search of higher productivity. But you won't achieve significant gains unless you switch from writing code for one page. You need to think about ways to write down a formal description of the application and user experience, then let the computer generate the application automatically.
Knowledge is text, authored by a user of the community. The user may attach a document, photograph, or spreadsheet to this text. Other users can comment on the knowledge, submitting text and optional attachments of their own. What distinguishes a knowledge management system from the standard /bboard module of the ACS? The following:
Business people like to talk about "objects" rather than "rows in tables". This doesn't mean that the MBA curriculum includes inheritance and method combination in the Common Lisp Object System. What seems to have happened is that
When a business person talks about an "object", what he or she generally means is "a row in a relational database table".
In order to make your system comprehensible to the CIO/CTO types who will be adopting it, we'll use their vocabulary. A table row is an "object" and each column is an "element" of that object.
We need a way to represent the kinds of objects that our system will represent first. Let's assume that we'll have at least the following object types:
To say that "John McCarthy developed the Lisp programming language", the user would create two objects: one of type language and one of type person. Why not link to the users table instead? John McCarthy might not be a registered user of the system. Some of the people you'll be referencing, e.g., John Von Neumann, are dead. Characteristics of the Lisp language would be stored as elements of the language object.
For an example of what a completed system of this nature might look like to the casual reader, visit Paul Black's Dictionary of Algorithms and Data Structures.
For each object type we'll be creating an Oracle table. For each Oracle table we create, we store one row in the metadata table:
create table km_metadata_object_types (
table_name varchar(21) primary key,
pretty_name varchar(100) not null,
We need to store information about what elements will be kept for each type of object. Note that some elements are common across object types:
You won't be writing code to implement fancy permissioning so not all of this information will be useful in this problem set. However, it is good to have it in your data model even if you aren't going to build .tcl pages to update or query it. Notice that you may need to define extra tables to support some of this many-to-one information, e.g., the access control list for an object.
For elements that are unique to an object type (i.e., elements other than the standard ones listed above), we need to insert one row in a metadata table per element:
create table km_metadata_elements (
metadata_id integer primary key,
table_name not null references km_metadata_object_types,
column_name varchar(30) not null,
pretty_name varchar(100) not null,
abstract_data_type varchar(30) not null, -- ie. "text" or "shorttext" "boolean" "user"
-- this one is not null except when abstract_data_type is "user"
oracle_data_type varchar(30), -- "varchar(4000)"
-- e.g., "not null" or "check foobar in ('christof', 'patrick')"
-- values are 'text', 'textarea', 'select', 'radio',
-- 'selectmultiple', 'checkbox', 'checkboxmultiple', 'selectsql'
presentation_type varchar(100) not null,
-- e.g., for textarea, this would be "rows=6 cols=60", for select, Tcl list,
-- for selectsql, an SQL query that returns N district values
-- for email addresses mailto:
-- pretty_name is going to be the short prompt,
-- e.g., for an update page, but we also need something
-- longer if we have to walk the user through a long form
-- if they click for yet more help
-- note that this does NOT translate into a "not null" constraint in Oracle
-- if we did this, it would prevent users from creating rows incrementaly
mandatory_p char(1) check (mandatory_p in ('t','f')),
-- ordering in Oracle table creation, 0 would be on top, 1 underneath, etc.
-- ordering within a form, lower number = higher on page
-- if there are N forms, starting with 0, to define this object,
-- on which does this go? (relevant for very complex objects where
-- you need more than one page to submit)
-- for full text index
include_in_ctx_index_p char(1) check (include_in_ctx_index_p in ('t','f')),
-- add forms should be prefilled with the default value
check ((abstract_data_type not in ('user') and oracle_data_type is not null)
(abstract_data_type in ('user'))),
Notice that the km_metadata_elements table contains specifications for 1) generating the CREATE TABLE sql commands that you'll need to build the database data structures for storing knowledge and 2) building user interfaces to manipulate data in those tables.
If you have trouble feeding the above table definition to SQL*Plus running under an Emacs shell, cut and paste it into a file called "foo.sql". Then run
sqlplus student23/thepassword < foo.sql
from the shell.
Create a directory /admin/km/ under your Web server page root. Go to a Unix® shell and type
> cd /web/yourservername/www/admin/ > chmod a+w km
So that the /admin/km/ directory will be writable by the Web server. Now use the prototype builder (available on your own server at /admin/prototype/) to generate admin pages for the
km_metadata_elements and km_metadata_object_typestables.
Fill your meta data tables with some info. Add one entry to the objects table for each of the object types listed above. For each object type, fill in some elements. Remember that each object will have the default fields specified above so you don't need things like name or overview. Here are some examples:
Write a script called /admin/km/generate-ddl.tcl that will generate CREATE TABLE statements from the meta data tables. It can simply output this to the Web browser with a MIME type of text/plain (then you can save it to your local file system as km-generated.sql) or write it back into the Unix® file system as /admin/km/km-generated.sql (in which case you'll have to be sure that the /admin/km/ directory is writable by the Web server user, which is generally "nsadmin").
Each object table should have an object_id column. Use a single Oracle sequence to generate keys for this column.
In addition to the metadata-driven object table definitions we'll define a generalized mapping table to support links between knowledge objects:
create table km_object_object_map (
table_name_a not null references km_metadata_object_types,
-- Assume all object type tables have integer primary keys.
-- For a generalized mapping table we can't put an integrity
-- constraint on this field.
table_id_a integer not null,
table_name_b not null references km_metadata_object_types,
table_id_b integer not null,
-- User-entered reason for relating two objects, e.g.
-- to distinguish between John McCarthy the developer of
-- Lisp and Gerry Sussman and Guy Steele, who added lexical scoping
-- in the Scheme dialect
creation_user not null references users,
creation_date date default sysdate not null,
primary key (table_name_a, table_id_a, table_name_b, table_id_b)
Notice that this table allows the users to map an object to any other object in the system, regardless of type.
For simplicity, assume that associations are bidirectional. If a user associates the Huffman encoding algorithm (used in virtually every compression scheme, including JPEG) with the person David A. Huffman (MIT Course VI grad student and then faculty member), we should also interpret that to mean that the person David A. Huffman is associated with the algorithm for Huffman encoding. This is why the columns in km_object_object_map have names like "table_name_a" instead of "from_table".
The primary key constraint above has the side effect of creating an index that makes it fast to ask the question "is object A related to object B?". For efficiency in querying "is object B related to object A?", create a concatenated index on the columns in the other order. (The trees chapter of SQL for Web Nerds, gives some examples of concatenated indices. Also read the composite indices section of the Oracle Tuning manual. See also the Oracle SQL Reference section.)
Write a script called /admin/km/generate-drop-tables.tcl that will generate DROP TABLE statements from the meta data tables. You probably won't get your data model right the first time so you might as well be ready to clear out Oracle and start over.
Create a directory /km under the Web server page root. The index.tcl page should display an unordered list of object types and, next to each type, options to "browse" or "create". You don't have any information in the database, so you should build /km/object-create.tcl first. This page will query the metadata tables to build a data entry form to create a single object of a particular type. Build /km/object-create-2.tcl to process the results of this form. You may find util_prepare_update from /tcl/00-ad-utilities.tcl useful in building object-create-2.tcl.
When object-create-2.tcl is done inserting the row into the database, it should ns_returnredirect to object-display.tcl. This page should have small hyperlinks to edit single fields at a time (all linking to object-edit-element.tcl with different arguments). This page should show all the currently linked objects and have an "add link" hyperlink to object-add-link.tcl.
The page returned by object-add-link.tcl will look virtually identical to /km/index.tcl and will in fact link to the same URL: object-browse-one-type.tcl. When called with only table_name, this page will display a table of object names with dimensional controls at the top. The dimensions should be "mine|everyone's" and "creation date". The user ought to be able to click on a table header and sort by that column.
When called with extra arguments, object-browse-one-type.tcl will pass those arguments through to object-view-one.tcl and, if the user clicks a confirmation button, will eventually result in object-add-link-2.tcl being invoked. The extra arguments should be table_name_a and table_id_a, which will ultimately end up in the corresponding columns of a row in km_object_object_map.
You want to know when people are looking at and reusing knowledge. Create a table to hold object views:
-- we will be updating the reuse_p column of views so it
-- will be easier to have a primary key
create sequence km_object_view_id;
create table km_object_views (
object_view_id integer primary key,
-- which user
user_id not null references users,
-- two columns to specify which object
object_id integer not null,
table_name varchar(21) not null,
view_date date not null,
reuse_p char(1) default 'f' check(reuse_p in ('t','f'))
Modify object-view-one.tcl so that you explicitly close the TCP connection to the user (using ns_conn close). This will stop the Netscape icon to stop spinning but the AOLserver thread will remain alive so that you can log.
After the ns_conn close, insert a row into the km_object_views table iff there isn't already a log row for this user/object pair within 24 hours. You could do this with a
However, you can also do this with a single ns_db dml statement. Here's an example of an INSERT statement that only has an effect if there isn't already a row in the table.
insert into msg_id_generator (last_msg_id) select ('000000') from dual where 0 = (select count(last_msg_id) from msg_id_generator);
Apply this idea to the problem of thread-safe logging if and only if there isn't an identical row logged within the last 24 hours.
Modify object-view-one.tcl to add a "I reused this knowledge" button. This should link to object-mark-reused.tcl, a page that updates the reuse_p flag of the most recent relevant row in km_object_views. The page should raise an error if it can't find a row to update.
Explain the concurrency problem in Exercise 7 and talk about ways to address it.
Create an Oracle index on km_object_views that will make the code in exercises 6 and 7 go fast.
Build /admin/km/statistics.tcl to show, by day, the number of objects viewed and reused. This report should be broken down by object type and all the statistics should be links to "drill-down" pages where the underlying data are exposed, e.g., which actual users viewed or reused knowledge and when.
Build an index of the content in the KM system objects.
Create a file /admin/km/generate-sws-triggers.tcl to read the meta data tables and
Add some new objects and verify that they are being automatically copied by the triggers into the index table.
Create a page /km/search.tcl that displays a form letting the user search for a phrase either through all objects or only in one type of object. For /km/search-2.tcl you'll either want to use Intermedia or the DBMS_LOB functions and the pseudo_contains source code.
Show your system to a business school professor.
This problem set was written by Philip Greenspun in October 1999. It is copyright 1999 by him but may be reused provided credit is given to the original author.