Home > Surf Report 25 > Adding Structuring to Unstructured Information

Adding Structure to Unstructured Information


Tell A Friend about this Article..


One of the main reasons we created askSam was to help people organize unstructured information... the kind of data that does not neatly fit into traditional databases (reports, research, resumes, email, notes, etc.). And although we tout the fact that askSam requires "no structure" and "no fields", adding structure to your information has benefits. You have more flexibility in searching, sorting, and creating reports.


We've used a variety of tools to help massage unstructured data into a more usable form - word processors, data manipulation tools, text editors, and more. With the right tools, it's amazing how a pile of unstructured documents can be turned into a useful information store.



Using Structure Inherent in the Information


Before covering tools that add structure to documents, I want to briefly talk about using structure that may exist in documents. Many documents contain an inherent structure: memos, email messages, meeting notes, reports, RFPs. Very often these specific types of documents contain words that structure the information.


In the case of email, the words "To:", "From:", "Date:", "Subject:" provide structure to the information. Similarly, memos, faxes, and reports often contain such structure. When we encounter such information, we use askSam's Auto Field Recognition command to identify the structure and use these words as fields in the askSam database.


The Auto Field Recognition command displays a list of words in your database that you can use as fields. From this list, you select your fields. You can then search, sort, and create reports using these field names.



Selecting field names with the Auto Field Recognition Command


The Auto Field Recognition command is wonderful, if your information contains words that provide structure. If your information contains no such structure or if the structure is only partially present, you'll need other tools to help structure the information.



Tools to Structure Information


TextPipe

My favorite tool for manipulating text and HTML files is TextPipe. TextPipe is a powerful data manipulation utility that can perform multiple search and replace operations in your files. We've integrated TextPipe with askSam 5.1. You can save scripts and filters and run these filters as you import files into askSam 5.1.



TextPipe's Filter Wizard


We've used TextPipe to:


Ÿ     Pull specific information out of download Web pages and bring this information into an askSam database.


Ÿ     Clean up and add "fields" to a large number of text files before importing them into askSam


Ÿ     Create a filter to chop up an HTML email newsletter and bring each item into a record in askSam (the newsletter contained listings of government purchases).



Don't Forget Your Word Processor / Editor

Word and Word Perfect both contain powerful macro and scripting capabilities. Similarly, text editor's like NoteTab Pro, offer powerful macros. We've written macros to insert fields, delimiters, and other structure into word processing documents. The ability to search for specific formats (such as styles) and insert a delimiter or field can be very useful.


For one project, we used the scripting provided by a word processor (in this case we used a DOS version of XyWrite!) to insert fields, convert formats, and insert HTML codes in over 10,000 word processing documents.



Structuring Email with PocoMail

Email headers contain structure that can be used as fields ("To:", "From:", "Date:", "Subject:"), but this doesn't help you structure the information in the body of the email.


For example, we receive registration information via email. We've written a script using the PocoMail Email client that takes the information in the email message and reformats it to correspond to the fields we require in our registration database.



Making Information Useful


"Making Information Useful" -- so much of what I do involves helping people manage their information better, so they can be more productive.  Adding structure to unstructured data is one way we've been able to accomplish this goal. We're always looking for new ways and new tools to make the job easier. If you have any favorites, please pass them our way.


Phil



Related Links:


TextPipe - version that works with askSam 5.1

There is a special version of TextPipe that works with askSam 5.1, which will be available later this month. For more information and a free trial version see:

http://www.asksam.com/four/surf25/textpipe.asp


TextPipe Home Page

http://www.search-replace-textpipe.com


NoteTab Pro

http://www.fookes.com/notetab/index.html


PocoMail

http://www.pocomail.com/



Quick Downloads

 

How people use askSam

 

Surf Report Newsletter

Subscribe today to receive our FREE monthly newsletter. The Surf Report includes tips, articles, and information about new releases, upgrades, free utilities, and special promotions. Sign up today!


Read Back Issues »
 

"askSam is an essential part of my software tool chest. I can research and collect data from anywhere and any source. Once it is in askSam I can edit, rearrange, organize, and search the information easily. Then I can present it and make it totally useful for other people via the web or CD. Fantastic!"

-- Valda Hilley, Author, Literary Agent, Teaching Consultant, Pack rat, and President, Convergent Press, Ltd.

 

Seaside Software Inc. DBA askSam Systems, 121 S Jefferson Street, Perry FL 32347
Telephone: 800-800-1997 / 850-584-6590   •   Email: info@askSam.com   •   Support: http://www.askSam.com/central.asp
© Copyright 1985-2012   •   Privacy Statement