FIND.BAS Documentation Julie Anne Watko SSDOO/ADC December 1, 1995 FIND.BAS is a QBASIC program developed to help users access data from the larger catalogs on the ADC CD-ROM. Many catalogs are too large to view with a text editor. FIND.BAS will help users create smaller files containing records of interest. FIND.BAS searches a specified range of records (lines) in the catalog of the user's choice. In each record the program compares the contents of a particular range of bytes (columns) to a set of string variables or numbers requested by the user. Users can input these strings directly while running FIND.BAS or by creating a text file with one string on each line. Whenever the program finds one of the user's strings in the byte range in question, it prints the record to an output file. After all the specified records have been searched or the end of the catalog file is reached, a date and time marker may be written to the output. Users can write the output of several searches to the same output file by running the program once and selecting the option to search again. Below are instructions in using FIND.BAS. 1. Users can begin by copying FIND.BAS to a convenient directory on their disk. 2. Entering the CD-ROM directory will determine the path and file name of the catalog desired. 3. The next step is to open the ReadMe file for this catalog and make note of a) The number of records in the catalog. b) The starting and ending bytes of the field to search. c) Other information about this field (codes, position of decimal points, etc.). 4. The CD-ROM should be left in its current directory position. 5. The user then enters the directory on the local disk where the output will be written. Users can choose a name for the output file, making sure that it does not correspond to the name of an existing file. If a file already exists with the name desired, users can rename the old file or if the file is not valuable, delete it. IF YOU USE A FILE NAME THAT ALREADY BELONGS TO A NON-EMPTY FILE, THEN FIND.BAS WILL WRITE OVER THE INFORMATION IN THAT FILE. If users want to input search strings by file instead of doing it interactively, then they can create a text file with one string on each line. A series of numbers needs not be aligned when it is input. FIND.BAS will automatically right-justify strings shorter than the byte range in question. In order to left-justify a series of short strings, each line should contain a string typed left justified, followed by enough spaces to fill the search byte range, then any non-space character. Case is not important when inputting alphabetical characters. FIND.BAS converts all strings to lowercase. Users inputting more than 255 search strings in this manner should make note of approximately how many strings are in the file. 6. The user then types QBASIC [return] or the command appropriate to the machine to start BASIC. 7. Next users open and run FIND.BAS. 8. Users then type the file path and name of the catalog to search, followed by [return]. An example of a path and file from the ADC CD-ROM is d:\catalogs\8031\38MHz.dat if "d" is the user's CD-ROM drive. If the user has left the CD-ROM in the d:\catalogs\8031 directory, then it is only necessary to type d:38MHz.dat [return] The whole path can also be typed. Most users find that they search a few catalogs often. Each user may create a default input table in the program that will allow typing only a single character and [return] to access a favorite file. 9. Users next type the file path and name of the output file they want to create, followed by [return]. Less typing is required if users use the QBASIC working directory. Again, once users have created a default output table, they will need to type only one character and [return]. NOTE:FIND.BAS WILL OVERWRITE INFORMATION ALREADY STORED UNDER THIS FILE NAME. 10. The next step is to enter the number of records to skip. Skipping records is faster than searching them. If users know the desired records are not near the beginning of the file, then skipping records will save time in their search. They should enter a number and press [return]. If users are interested in records near the top or in searching the entire catalog, then they can simply press [return] without entering a number. 11. The next step is to enter the number of records to search. If users know their desired records are not at the end of the file, then limiting the number of records to search will cut the search time. The default is 1,000 records, which is a relatively small number. Users should be aware of this default. To scan an entire catalog, they should enter a value equal to or greater than the number of records indicated in the catalog's ReadMe file. 12. The user next types the byte number of the first column of interest, then [return]. 13. The user types the byte number of the last column for the search, then presses [return]. Users who are searching one byte for a single character need not enter a value. FIND.BAS will ask whether the user wants to read strings from a file. If yes, the user should enter "y", then the file path and name when asked, and an estimate of how many strings they think are in the file. In this case it is better to overestimate than to underestimate because underestimation may cause the program to miss some of the strings near the end of a longer file. A FILE FOR STRING INPUT SHOULD BE A TEXT FILE WITH ONE STRING ON EACH LINE. FIND.BAS will adjust the strings to the search byte range by right-justifying all shorter strings and by truncating the right of any longer strings. The search is case-insensitive. If the user does not want to read strings from a file, each string to be found must be entered individually. The program will inform the user of the maximum number of strings to enter. The software comes with this maximum set to 8, but the user may edit the program to allow up to 16 by changing the line "SIZE% = 8" to "SIZE% = 16" or whatever number is convenient. The search is case-insensitive, and shorter strings will be right-justified in the field. When users have entered all the strings they want to find, they press [return] without entering a value. 14. Entering the maximum number of hits to find may save time if searching for a limited number of individual stars. For example, when users search the ID field for eight strings, then they do not expect to find more than eight records (eight stars). In this case, limiting the number of hits to eight will cause FIND.BAS to stop searching once all eight records have been printed to the output file. This may save time when searching entire catalogs. If, however, users are looking for all the stars with a certain spectral type (Sp), then they should not enter a value to limit the number of hits. In this case, they probably will not know how many stars they are looking for, and they will not want to miss any records near the end of the catalog. Simply pressing [return] will not limit the search to a maximum number of hits. 15. Entering "y" for a marker with the date and time written beneath the results of the search may help keep track of which search gave the results the user is viewing. In the event of accidental writing-over of an existing output file, this helps identify where the most recent search ended. For output files that will be searched, users will need to edit out these marking lines. For convenience, they may instead type "n" [return] to deactivate this feature. The default is y. 16. At this point, FIND.BAS will display the search ranges and criteria entered by the user, and the user can review the displayed information. The search strings are displayed in between vertical lines that mark the byte range in question. Users should pay special attention to how shorter or longer strings were interpreted by the computer. Wherever blank spaces occur between vertical lines in this display, the computer will be looking for blanks in the catalog. If a user wishes to make a change in any information or has made a mistake in entering choices, "n" is entered for "no." The user will be prompted to enter all of the above information again. The default response is no. If the information is correct, then "y" can be entered for "yes." The program will begin searching only after the user has confirmed by typing "y" [return]. 17. The program will then perform the search. The screen will display messages saying what the computer is doing. Users should be aware that searching large sections of catalogs may take some time. For example, searching 230,000 records for four strings may take about 15 minutes. The search time depends upon the speed of the computer, the number of records to skip and to search, and the number of strings to find. As each hit is found, the record will be displayed on the screen along with the sequence number. The sequence number tells how many records have already been searched. This will help monitor the computer's progress through the catalog. When the computer stops searching, it displays a message to indicate why it stopped. Messages and their meanings are below. "Search complete": The computer has searched the number of records requested. The file may still contain unsearched records. "Search stopped": The computer has found the maximum number of hits indicated. "End of file reached": The computer has skipped or searched every record in the catalog. 18. Pressing [return] at this point will stop the program or typing "1" [return] will search again. Choosing to perform an additional search in this way allows the user to write more output to the same output file without overwriting what was already found. The user will be prompted for all the same information provided before with the exception of the output file name, which stays the same. To perform another search with output written to a different file, a user can stop the program and run it over again. STOPPING FIND.BAS AT OTHER POINTS IN THE PROGRAM MAY CAUSE FILES TO BE LEFT OPEN. Open files may interfere with the user's examination of output files with a text editor. Should this become a problem, the user should enter QBASIC's IMMEDIATE mode and type CLOSE [return]. EXAMPLE 1: A user wants a listing of all stars with 1950 Right Ascension between 2 hours 20 minutes - 2 hours 30 minutes in the Revised Source List for the Rees 38-MHz Survey. Exploring the CD-ROM directory structure, the user finds that the catalog is in directory d:\catalogs\8031 where d is the CD-ROM drive. In the contents of the directory is the file ReadMe, which the user can read by opening a text editor. From this document the user learns that the file desired as input is called 38MHz.dat and has 5,859 records. RAh appears in bytes 1-2 and RAm appears in bytes 4-5. The format code I2 indicates that each of these data are in two-digit integer format. The user does not find out how the stars are ordered in the catalog and decides to search the entire catalog, closes the ReadMe file, switches to the local drive in a directory including a copy of FIND.BAS, and starts QBASIC. The user opens FIND.BAS and runs it. Because the CD-ROM drive has been left in the proper directory, the user enters d:38MHz.dat as the input file and the name of an output file, FILE1.OUT, which the user knows does not already exist. If this file is to be created in the current directory, then the user enters c:FILE1.OUT; otherwise, a path is necessary: c:\directory\subdirectory\ FILE1.OUT. Since the user does not know where in the file the records might be located, the user enters 0 records to skip and 6,000 records to search, entering 1 as the starting byte and 4 as the ending byte of the searched field. Since this user needs to search only for one string, it is not efficient to use a file to input the string. The user types "n" when asked. When prompted for the first string, the user types: 02 2 followed by [return]. The user will be prompted for another string, but since there is no need to find another, the user presses [return] again. The computer will not ask for any more strings. As the user does not know how many stars to expect to find, a maximum number of hits is not entered. This user expects to search the output file later, so the user types "n" for no, telling the computer not to write a date and time marker to the file. The computer screen displays the search criteria it will use. The user looks it over, decides it is correct, and types "y" to confirm. The computer tells what it is doing during each step of the search process, so if anything goes wrong, the user knows what the computer was doing when it happened. Messages about opening files, skipping records, then scanning records will be displayed. When the computer finds a record that matches the string in the byte range indicated, it prints that record to the screen. The records stop printing at sequence number 481. The computer keeps searching until it reaches the end of the catalog at which time an "End of file reached" message appears along with the number of records in the catalog. This is good because it assures users that they have searched all the records in the catalog. This user does not want the results of another search written to this file and types "1" [return] to exit the program. Now the user has a text file called FILE1.OUT that contains 35 records. This file is small enough to examine with a text editor. EXAMPLE 2: A user wants a listing of all stars with 1950 Right Ascension between 2 hours 20 minutes - 2 hours 30 minutes and 1950 Declination between 74 degrees and 77 degrees in the Revised Source List for the Rees 38-MHz Survey. This is more challenging because the user needs to search two separate byte fields, more than one string. The first step is to perform the search as in EXAMPLE 1. Once the user has stopped FIND.BAS, the output file FILE1.OUT is left. The user examines the ReadMe document again. The format of FILE1.OUT is the same as the format of 38MHz.dat, so Declination degrees appear in bytes 11-12 of FILE1.OUT. The user runs FIND.BAS again. This time, FILE1.OUT is entered as the input file and FILE2.OUT as the output file. Hitting [return] selects skipping no records; the user enters 35 records to search and 11 and 12 for the byte range. Since the user did not create a file of strings to find, pressing [return] will indicate no. The prompt asks for the first string. The user types "74" [return], "75" [return], "76" [return]. (The order in which these strings are entered does not affect the results.) The user wishes to find no more and presses [return] again to stop entering strings. Again, the user does not know how many stars will be found and presses [return] to search without a hit limit. This time the user knows to inspect FILE2.OUT with an editor and that this will be the last time searching with FIND.BAS, so "y" is selected for a time and date marker to help keep track of the file. The user looks over the criteria printed on the screen, decides it is correct, and types "y" [return] to confirm and begin the search. The computer is soon finished searching. It indicates that it has found five records that include the strings in the specified byte range. It also indicates that the end of FILE1.OUT has been reached and that the entire file has been searched. The user now decides to find out whether any stars in this RA range have Declination between 77 degrees and 78 degrees. The user presses [return] to search again and re-enters the criteria the same as above, except this time the computer does not ask for an output file name (FILE2.OUT is still open.), and the search string will be 77. This search yields one hit. The user types "1" [return] to stop the program, then exits QBASIC. When the user opens a text editor and looks at FILE2.OUT., the screen shows the first five hits followed by a record containing the date and time and the number of hits for that search. The next record is the hit from the last search, followed by its date, time, and number of hits.