Adding Structure to Unstructured Information
Tell A Friend about this Article..
One of the main reasons we created askSam was to help people organize unstructured information... the kind of data that does not neatly fit into traditional databases (reports, research, resumes, email, notes, etc.). And although we tout the fact that askSam requires "no structure" and "no fields", adding structure to your information has benefits. You have more flexibility in searching, sorting, and creating reports.
We've used a variety of tools to help massage unstructured data into a more usable form - word processors, data manipulation tools, text editors, and more. With the right tools, it's amazing how a pile of unstructured documents can be turned into a useful information store.
Using Structure Inherent in the Information
Before covering tools that add structure to documents, I want to briefly talk about using structure that may exist in documents. Many documents contain an inherent structure: memos, email messages, meeting notes, reports, RFPs. Very often these specific types of documents contain words that structure the information.
In the case of email, the words "To:", "From:", "Date:", "Subject:" provide structure to the information. Similarly, memos, faxes, and reports often contain such structure. When we encounter such information, we use askSam's Auto Field Recognition command to identify the structure and use these words as fields in the askSam database.
The Auto Field Recognition command displays a list of words in your database that you can use as fields. From this list, you select your fields. You can then search, sort, and create reports using these field names.

Selecting field names with the Auto Field Recognition Command
The Auto Field Recognition command is wonderful, if your information contains words that provide structure. If your information contains no such structure or if the structure is only partially present, you'll need other tools to help structure the information.
Tools to Structure Information
TextPipe
My favorite tool for manipulating text and HTML files is TextPipe. TextPipe is a powerful data manipulation utility that can perform multiple search and replace operations in your files. We've integrated TextPipe with askSam 5.1. You can save scripts and filters and run these filters as you import files into askSam 5.1.

TextPipe's Filter Wizard
We've used TextPipe to:
Ÿ Pull specific information out of download Web pages and bring this information into an askSam database.
Ÿ Clean up and add "fields" to a large number of text files before importing them into askSam
Ÿ Create a filter to chop up an HTML email newsletter and bring each item into a record in askSam (the newsletter contained listings of government purchases).
Don't Forget Your Word Processor / Editor
Word and Word Perfect both contain powerful macro and scripting capabilities. Similarly, text editor's like NoteTab Pro, offer powerful macros. We've written macros to insert fields, delimiters, and other structure into word processing documents. The ability to search for specific formats (such as styles) and insert a delimiter or field can be very useful.
For one project, we used the scripting provided by a word processor (in this case we used a DOS version of XyWrite!) to insert fields, convert formats, and insert HTML codes in over 10,000 word processing documents.
Structuring Email with PocoMail
Email headers contain structure that can be used as fields ("To:", "From:", "Date:", "Subject:"), but this doesn't help you structure the information in the body of the email.
For example, we receive registration information via email. We've written a script using the PocoMail Email client that takes the information in the email message and reformats it to correspond to the fields we require in our registration database.
Making Information Useful
"Making Information Useful" -- so much of what I do involves helping people manage their information better, so they can be more productive. Adding structure to unstructured data is one way we've been able to accomplish this goal. We're always looking for new ways and new tools to make the job easier. If you have any favorites, please pass them our way.
Phil
Related Links:
TextPipe - version that works with askSam 5.1
There is a special version of TextPipe that works with askSam 5.1, which will be available later this month. For more information and a free trial version see:
http://www.asksam.com/four/surf25/textpipe.asp
TextPipe Home Page
http://www.search-replace-textpipe.com
NoteTab Pro
http://www.fookes.com/notetab/index.html
PocoMail
http://www.pocomail.com/
|