Abstracts

Format Festival (PDF1, PDF2, PPT)

by David Maddox

FORMATS provide an instruction or template that SAS uses to output data values. They can be used to control the written appearance , or, in some cases, to group data values for analysis. INFORMATS, conversely, provide an instruction or template for reading data values. Both constructs add some powerful capabilities to the software, and these capabilities are available to novice and advanced SAS programmers alike. This presentation will focus on:

  • The various types of FORMATS and INFORMATS
  • Uses of FORMATS and INFORMATS
  • Creating your own FORMATS and INFORMATS

In addition, we will look at some 'not so obvious' uses such as table look-ups and traffic lighting. We'll also take a look at Version 9 enhancements.


SAS Programs for Extracting Data from LexisNexis Documents

by Robert Matthews

Text files often contain data for multiple variables that are distinguished from one another by consistently used text strings that label and organize the data. Devising a programmatic solution for extracting these data from text files can require a significant time investment that may only be justified when the number of files is quite large. The approach we used to extract address information from over 100,000 text files may serve as an instructive and time-saving example for SAS users who desire a similar solution that ensures faster throughput and greater reliability than manual data extraction.

To develop residential histories for subjects in a retrospective follow-up study of cancer incidence, we sought address information for each subject from the LexisNexis National Group Files database. We downloaded the search results from the LexisNexis website as Rich Text Format (RTF) files. The file for each subject contained one or more documents of several types, each with its own characteristic set of data delimiting text strings. This paper describes two programs that (1) read an entire RTF file into a single character string and (2) extract data for variables identified by text strings into a SAS dataset.


Bios

David Maddox

David Maddox is a Business Services Reporting Analyst for Regions Bank and has been a SAS user for approximately twenty years. David has been active in SAS user groups for over ten years, including the Birmingham Users Group for SAS (BUGS) and the SouthEast SAS Users Group (SESUG). In 2002, he served as conference co-chair for SESUG 2002 in Savannah, GA.

Robert Matthews

Robert Matthews has 25 years of experience with database and network management, programming for the analysis of occupational epidemiology studies and providing other types of technical support to departmental faculty. He is an expert in using SAS, OCMAP and other software packages. He has worked with Drs. Delzell, Sathiakumar, Beall and Cheng and with support staff on most occupational epidemiology projects done at UAB in the past. He has extensive experience with using personal computers, Microsoft and Netware Servers, IBM mainframes and with transferring electronic data files among these devices. He has presented papers at local, regional and international SAS Users Group conferences dealing with technical programming issues. He is also the coauthor of papers dealing with exposure to butadiene and other synthetic rubber industries chemicals, papers dealing with exposure assessment and mortality in workers in the semiconductor industry, and papers dealing with mortality among hourly and salaried motor vehicle workers.