LFE - file managing commands (to change file size)

LFE location on the UT cluster: /gpfs/gvgpfs/gvhome/toomasha/SOFT/LFE64
All others please DOWNLOAD LFE HERE


FUNCTION SHORT DESCRIPTION
APPENDFILE Appends one file to the other.
BREAKROW Breaks rows into fragments and organizes the fragments in columns. This is opposite to ROWSTOTABLE.
COMBITABLE Adds columns to a table by combining the existing columns (difference, sum, multiplication, division).
DIGITIZE Turns a list of continuous values into (digital) groups based on quantiles.
EQUAL_SUBGR_N Compares 2 files containing multiple parameters and deletes entries from either file as needed to equalize their parameter adundances.
EXTRACT Extracts user-specified columns and rows from a file. Can use gzip's!
EXTRACT2 Extracts columns and rows from a file based on customizable patterns (e.g. every odd number, every 15th etc.).
EXTRACTBYCOLTYPE Extracts rows based on data type in the column. Can treat NA in two different ways.
EXTRACTBYHEADER Extracts number-containing entries from files that have sub-headers throughout the file.
EXTRACTE Extracts rows based on the values in specified columns. Uses scientific notation; no number size limit.
EXTRACTID Extracts rows based on text in specified columns. Full match or partial match as well as include/exclude/split are options. Can use gzip's!
EXTRACTIF Extracts rows based on the values in specified columns. Allows two simultaneous comparisons using AND/OR.
EXTRACTCOLUMNS Extracts (or removes) columns based on the specified ranges. Unlimited number of ranges allowed.
EXTRACTROWS Extracts rows based on the specified ranges. Unlimited number of ranges allowed.
FLATTEN Converts table type from "same ID in multiple rows" to "one ID per row".
FRAGMENTROWS Automatically fragments a file by the rows and creates a new file for each chunk. Any chunk size allowed.
GROUPMEMBERSHIP Scans table entries against predefined groups and returns statistics on group assignments.
HORIZAPPEND Horizontally grows a file by adding new columns from the second file.
IF2COLUMNSARESAME Extracts or removes rows that have the same entry in two columns.
ITEMIZE Converts values into categories based on given ranges. Ranges can also be coded as 0/1 and so that each item is in a separate column.
LISTAPPEND Sequentially appends (merges) files together according to a list.
LISTSELECTION Extracts rows based on IDs stored in a list file. It can either include or exclude the rows with the specified parameters.
LISTSELECTION2 Extracts rows based on two IDs stored in a list file. It can either include or exclude the rows with the specified parameters using AND/OR selection options.
MACROBLINDMERGE Merges tabular files column-wise based on directions stored in a reference file.
MAKETABLE Merges lines with the same first-column ID into one row.
MERGETABLES Merges up to three tables into one by the column.
MULTIFILTER Filtering (row extraction) based on multiple parameters and logical operators.
ONECOLUMN Extracts or removes one column form a table based on column number of header ID.
ORGANIZECOLUMNS Extracts, removes, multiplies and creates new columns and lets change column order.
PREFLATTEN Converts table where one of the columns contains more than one data entry into the type where every column contains just one entry and several lines may have the same ID (this is the FLATTEN input file).
ROWSTOTABLE Converts vertical lists into two dimensional tables. This is opposite to BREAKROW.
SIMMATRIX Converts an entry vs. observation style table into a similarity matrix.
SMARTMERGE Merges tabular files column-wise by matching the IDs. Use Smartmerge2 for large data sets that contain numbers as part of the unique ID.
SMARTMERGE2 Faster and more versatile than Smartmerge but reports less in log file, requires each input file to have constant number of columns.
SNAP Performs tasks on columns and nested columns: extract, remove, split, change order, duplicate, fill in missing etc.
SPLITBYID Sort the rows of a file into new files by the ID in a specified column.
TEMPLATEFILTER Extracts rows from file1 if a user-given reference value is smaller/equal/larger than the value in a specified column of file2.
RANGEMERGE Merges two files by ID much like SMARTMERGE2, but does it loosely (allows non-exact matches with a specified range) based on values in specified columns. Also allows to do it only on rows that have a certain ID in the specified column.

APPENDFILE

This function appends one file to the other: newfile = file1 + file2. Two files are merged back to back by copying all their rows into the new file. Attn! Another way to append files is to use the UNIX command "cat". If you need to merge more than two file please refer to the LFE function LISTAPPEND.

./LFE32 -M appendfile -file1 -file2 -out

-file1  = first input file name. REQUIRED.
-file2  = second input file name. REQUIRED.
-out  = output file name. Default = file2 name + ".out".

Example:
./LFE32 -M appendile -file1 firstfile.txt -file2 secondfile.txt -out thirdfile.txt
One file (secondfile.txt) is appended to the end of the other file (firstfile.txt). The new file (thirdfile.txt) contains all rows from both input files.


BREAKROW

This function fragments rows according to the user specifications and puts the fragments in columns. It allows to turn tabular files into lists.

./LFE32 -M breakrow -file -interval -delim -out

-file  = input file name. REQUIRED.
-interval  = specifies the fragment size (in delimiter units). Options: any positive integer. Default = 1.
-delim  = file delimiter. Options: a) "tab", "space", "comma", "semicolon", "colon", "slash", "bslash", "dash", "quote", "squote", b) any symbol or word without spaces. Default = "tab".
-out  = output file name. Default = input file name + ".out".

Example:
./LFE32 -M breakrow -interval 2 -file mydata.txt -delim space -out results.txt
Here a space-delimited file is converted into a list containing of (maximally) two space-delimited columns. If a row contains an odd number of space-delimited entries then some rows will have only one entry.


COMBITABLE

This function adds new columns to a table by using the existing columns. The new columns are created by dividing, multiplying, adding, or subtracting columns from one another. The combinations are created in a non-overlapping manner (i.e. only a half of the pariwise combinations matrix are created, excluding the diagonal). If the table has a horizontal header, the header is used to generate headers for the new columns. If the header does not exist, it is created and it consists of integers and their combinations.

./LFE32 -M combitable -file -columns -function -header -delim -missing -out

-file  = input file name. REQUIRED.
-columns  = columns used for creating new columns. REQUIRED. Options: any valid column numbers or ranges without spaces (Ex: 1,3-5,7 means that columns 1,3,4,5,7 will be used in a combinatorical manner).
-function  = formula for creating new columns. Options: "divide", "multiply", "add", "subtract". Default = "divide".
-header  = input file header. Options: "yes", "no". Default = "no".
-delim  = file delimiter. Options: a) "tab", "space", "comma", "semicolon", "colon", b) any symbol or word without spaces. Default = "tab".
-missing  = what is used to indicate the missing values. Options: a) any symbol or word without spaces, b) "none". Default = "NA". Note: "none" means that the missing entry fields are empty.
-out  = output file name. Default = input file name + ".out".

Example:
./LFE32 -M combitable -file mytable.txt -columns 1,5-8 -function divide -header yes -delim space -missing novalue -out results.txt
Columns 1,5,6,7,8 are used in a combinatorical way to generate new columns containing ratios. The header for the new columns is generated by using the original header names. The missing value indicator is "novalue", the delimiter is space.


DIGITIZE

This function converts a list of continuous values into digital groups (quantiles). The input file should contain just one column (header optional). Results are presented either in one column (where quantiles are represented by consequtive integers) or several columns (one column for each quantile with "1" showing beloning to the corresponding quantile and "0" indicating belonging to another quantile). The output is tab-delimited. Note: you can use the LFE function EXTRACTCOLUMNS to prepare the correct input file, CHANGEDELIM to change the output delimiter, and EXTRACTCOLUMNS and MACROBLINDMERGE to extract and append the resulting columns. Attn: Statistics appears on the screen, not in the output file. Use " > filename.log" to direct this info into a file. This function is related to function ITEMIZE.

./LFE32 -M digitize -file -groups -missing-in -missing-out -header -style -out

-file  = input file name. REQUIRED. Note: The input file name is one column of numbers.
-groups  = the number of groups (quantiles) used. REQUIRED.
-missing-in  = the word or symbol indicating missing values in the input file. Default: not used by default.
-missing-out  = the word or symbol indicating missing values in the output file. Default: not used by default.
-header  = input file header. Options: "yes", "no". Default = "no".
-style  = output ftyle. Options: "onecolumn", "flatten". Default = "onecolumn". Note: "onecolumn" shows the groups (quantiles) in on column, "flatten" shows the groups (quantiles) in multiple columns (one column per quantile, results coded using 1 and 0).
-out  = output file name. Default = input file name + ".out".

Example:
./LFE32 -M digitize -file myfile -groups 3 -missing-in NA -missing-out 0 -header yes -style flatten -out reults.txt
In this example three quantiles are used, results are presented using the flattened format. The initial missing value is "NA", this is replaced with "0" in the output file.


EQUAL_SUBGR_N

This function compares and edits 2 files (such as cases and controls) that contain subjects in the rows and parameters in the columns. It removes entries (rows) from either file as needed to achieve the same mean abundance for each parameter (located in a specific parameter column). The rows could be, for example, individuals and a specific column in that file would list the parameter identities. If some parameters abuncances become zero during the editing process (in either file) the rows corresponding to that particualr parameter are removed from both files. Output files have are named 'input_name+.out'.

./LFE32 -M equal_subgr_n -file1 -file2 -object-column -id-column -delim -header -header-size

-file1  = input file1 name. REQUIRED.
-file1  = input file2 name. REQUIRED.
-object-column  = column where the main object IDs are. Options: any valid integer. Default = 1.
-id-column  = column where the parameter IDs are that are being adjusted between the two files. Options: any valid integer. Default = 2.
-delim  = original file delimiter. Options: a) "tab", "space", "comma", "semicolon", "colon", "bslash", "slash", "dash", "quote", "squote", b) any symbol or word without spaces. Default = "tab".
-header  = input file header. Options: "yes", "no". Default = "no".
-header-size  = input file header size. Options: any integer. Default = 1 if header defined by '-header', 0 if header not defined.

Example:
./LFE32 -M equal_subgr_n -file1 cases.txt -file2 controls.txt -header yes -object-column 2 -id-column 8 -delim comma
Two files are comma-delimited and have a header of size 1. The main ID's are in column 2 and the parameter ID's are in column 8. Two new files are generated: cases.txt.out and controls.txt.out. Standard out lists the number of each parameter detected in each file prior to editing.


EXTRACT

This function extracts individual columns or rows or their ranges (both continuous and non-continuous) from a tabular text file. It replaces the LFE functions EXTRACTCOLUMNS and EXTRACTROWS as it contains all of their functionalities and more. If the user-defined range is longer than the file, the limit is automatically set to file size. Can use gzip'd files. If you want to change column order use ORGANIZECOLUMNS or SNAP. EXTRACT als0 accepts gzip'd files. Column numbers in different rows can be different. If the searched entry is not find, a warning message is issued in cout. EXTRACT can also change the file delimiter.

./LFE32 -M extract -file -columns -rows -gzip -delim -newdelim -header -header-size -out

-file  = input file name. REQUIRED. If file name ends with '.gz' then the file is treated as gzipped file.
-columns  = columns to be extracted. REQUIRED. Notes: columns should be separated by commas and ranges by dashes (see the example below); there should be no spaces. If all columns are extracted use '-columns all'.
-rows  = rows to be extracted. REQUIRED. Notes: rows should be separated by commas and ranges by dashes (see the example below); there should be no spaces. Header (if your file has one) is considered a normal row (row=1). If all rows are extracted use '-rows all'.
-gzip  = force the file to be treated as gzip'd file. If your file does not end with '.gz' but it is gzip'd the use '-gzip yes' to force it.
-delim  = original file delimiter. Options: a) "tab", "space", "comma", "semicolon", "colon", "bslash", "slash", "dash", "quote", "squote", b) any symbol or word without spaces. Default = "tab".
-newdelim  = new file delimiter. Options: a) "tab", "space", "comma", "semicolon", "colon", "bslash", "slash", "dash", "quote", "squote", b) any symbol or word without spaces. Default = "tab". Do not use this flag if you don't want to change file delimiter.
-header  = input file header. Options: "yes", "no". Default = "no".
-header-size  = input file header size. Options: any integer. Default = 1 if header defined by '-header', 0 if header not defined.
-out  = output file name. Default = input file name + ".out".

Example:
./LFE32 -M extract -file myfile.gz -columns 1,7-9 -rows all -delim space -newdelim comma -out results.txt
Columns 1,7,8,9 and and all rows are extracted from myfile.txt. The rows and columns retain their relative order regardless of how they are defined. e.g. 1,2,3 gives the same order as 3,1,2. The new file delimiter is comma.


EXTRACT2

This function extracts columns and rows from a file based on user-specified patterns. First the user needs to specify a block size (e.g. 4) and then the rows that are selected from that block. This pattern is then recycled until end of file is reached. Example: "-rows 4:1,3" means that rows are considered in the blocks of 4 and 1st and 3rd row is extracted from each block. If the file is 8 rows long, these rows are extracted: 1 (block 1),3 (block 1),5 (block 2),7 (block 2).

./LFE32 -M extract2 -file -columns -rows -delim -out

-file  = input file name. REQUIRED.
-columns  = columns to be extracted. REQUIRED. Syntax: block number, "colon", column1, "comma", column2 ... etc.. Example: 5:1,2,4
-rows  = rows to be extracted. REQUIRED. Syntax: block number, "colon", row1, "comma", row2 ... etc.. Example: 3:2,3
-delim  = file delimiter. Options: a) "tab", "space", "comma", "semicolon", "colon", "bslash", "slash", "dash", "quote", "squote", b) any symbol or word without spaces. Default = "tab".
-out  = output file name. Default = input file name + ".out".

Example:
./LFE32 -M extract2 -file myfile.txt -columns 4:1,2 -rows 2:2 -delim space -out results.txt
Here even-numbered rows are extracted and (starting from the first column) two neighboring columns separated by two non-selected columns are extracted.


EXTRACTBYCOLTYPE

Extracts rows based on data type in the column. Designed to extract only numbers or only text. Can treat NA in two different ways (as numbers or text).

./LFE32 -M extractbycoltype -file -column -columns -treat-missing-as -in-or-out -delim -header -missing -out

-file  = input file name. REQUIRED.
-column  = columns to be considered. REQUIRED. Syntax: any legal ineteger Note: Either '-column' or '-columns' need to be specidied, if noth are set then '-columns' prevails
-columns  = columns to be considered. REQUIRED. Example: 1,3-5 (this selects columns 1,3,4,5 Note: '-column 0' will select all columns
-treat-missing-as  = now to treat the missing value (see below). Options: "text, "number" Default: "text" Note: if missing is reated as "text" it is considered as real missing (status=NO); if considered as number it is considered "number" it's status=YES
-in-or-out  = if defined columns are extracted or removed. Options: "in", "out". Default = "in". Note: "in" means that numbers are left in and text is removed; if missing was definded as numbers, it is left in, otherwise it is taken out.
-delim  = file delimiter. Options: a) "tab", "space", "comma", "semicolon", "colon", "bslash", "slash", "dash", "quote", "squote", b) any symbol or word without spaces. Note: As always, if a symbol used here interferes with the command line, use with backlash (e.g. \/s). Default = "tab".
-header  = input file header. Options: "yes", "no". Default = "no".
-missing = used to indicate the missing values. Options: a) any symbol or word without spaces, b) "none". Default = "NA".
-out  = output file name. Default = input file name + ".out".

Example:
./LFE32 -M extractbycoltype -file f.txt -columns 1,3-5 -treat-missing-as text -in-or-out in -delim space- header yes -missing na -out results.txt
Here columns 1,3,4,5 are considered. If they (all of them) contain numbers and not text or missing (none of them contains text or missing) they are extracted.


EXTRACTBYHEADER

This function extracts numerical entries based on sub-headers in the file. The sub-headers are treated as top level parameters that must have a certain value/structure in order to consider anything under them for selection/extraction.

./LFE32 -M extractbyheader -file -column -delim -range -id -iddelim -idcolumn -headertag -headertagcolumn -out

-file  = input file name. REQUIRED.
-column  = column where the main values are. Default: 1.
-delim  = delimiter of the part of the file where the main values are. Options: a) "tab", "space", "semicolon", "colon", "comma", b) any symbol or word without spaces. Default = "tab".
-range  = range of values to select (column is specified by '-column'). REQUIRED. Format: value1-value2. Default: 0-0.
-id  = an ID located in the sub-header that must be present for any selection to take place. REQUIRED.
-iddelim  = the sub-header delimiter. Options: a) "tab", "space", "semicolon", "colon", "comma", b) any symbol or word without spaces. Default = "space".
-idcolumn  = the column number of the sub-header that contains the id that needs to be present for any selections. Default: 2.
-headertag  = text used to recognize what rows correspond to subheaders. REQUIRED.
-headertagcolumn  = sub-header's column number where headertag is located. REQUIRED. Default: 1.
-out  = output file name. Default = input file name + ".out".

Example:
./LFE32 -M extractbyheader -file extractbyheader_ex.txt -column 1 -delim tab -range 10010-10030 -id chrom=2 -iddelim space -idcolumn 2 -headertag sample -headertagcolumn 1 -out results.txt
Sub-headers are recogized by "sample" (-headertag sample) in the first column of the sub-header (-headertagcolumn 1). Only the text following the sub-headers that contain "chrom=2" (-id chrom=2) in the 2nd column (-idcolumn 2) of the sub-header (which is space delimited (-iddelim space)) is considered for selection. Finally only the rows that have values of the first column (-columns 1) between 10010 and 10030 (-range 10010-10030) are selected for output.

Here's the sample file used:
sample chrom=1 span=5
10001 1
10011 1
10021 1
10031 1
10041 1
sample chrom=2 span=5
10001 2
10011 2
10021 2
10031 2
10041 2
sample chrom=3 span=5
10001 3
10011 3
10021 3
10031 3
10041 3

The output file is:
10011 2
10021 2
10031 2


EXTRACTE

This function extracts rows from a tabular file based on the threshold value. Only the rows that contain values either smaller or larger (or equal) than the threshold, are extracted. The threshold value should be represented using the e-notation (scientific notation). This function also works with non-e threshold values but then it has no advantage over the other row extraction functions which may execute faster and be more flexible (EXTRACTIF). It is important that all entries in the selected column are numbers. This function is not limited to the magnitude of the number, it can handle very small and very large values alike. If you need to extract rows with missing values (either include or exclude), consider using the LFE function LISTSELECTION.

./LFE32 -M extractE -file -missing -header -e -direction -column -delim -out

-file  = input file name. REQUIRED.
-e  = threshold value using the e-notation. REQUIRED. Notes: also non-e values are accepted.
-missing  = used to indicate the missing values. Options: a) any symbol or word without spaces, b) "none". Default = "NA". Note: "none" means that the missing entry fields are empty. The rows with missing values in the selected column are never extracted.
-header  = input file header. Options: "yes", "no". Default = "no".
-direction  = whether the extracted rows contain a value <=, >= or equal to the threshold value. Options: "smaller", "larger", "equal", "equalE". Notes: choosing "equalE" will extract the rows that contain value which has the same power value (after the e) as the reference and the signs of the two numbers are the same. Default = "equal".
-column  = column name. Options: valid column number Default = "1".
-delim  = delimiter. Options: a) "tab", "space", b) any symbol or word without spaces. Default = "tab".
-out  = output file name. Default = input file name + ".out".

Example:
./LFE32 -M extractE -file myfile.txt -header yes -missing NOVALUE -e -2.3e-16 -direction smaller -column 2 -delim semicolon -out results.txt
The rows that contain a value smaller than -2.3e-16 in the second column are extracted and written into results.txt. The delimiter of the input file is semi-colon and the file has a header. Missing value is defined as "NOVALUE".


EXTRACTID

This function allows to select or deselect rows from a file that contain a certain keyword or certain entry in specified columns (or any of the columns). The user can also split the input file (divide it into two). This means that the selected rows will go into one output file (-out) while the rest of the rows will go into another (-out2). Optionally stops after finding the first entry that qualifies. Can use gzip'd files. See SPLITBYID if you need to sort rows into separate files based on the ID.

./LFE32 -M extractid -file -gzip -column -header -header-size -is -contains -dir -delim -newdelim -out -out2 -find-only-first

-file  = input file name. REQUIRED.
-gzip  = force the file to be treated as gzip'd file. If your file does not end with '.gz' but it is gzip'd the use '-gzip yes' to force it.
-column  = column name. Options: valid column number, 0 (means all columns) Default = "0".
-header  = input file header. Options: "yes", "no". Default = "no".
-header-size  = input file header size. Options: any integer. Default = 1 if header defined by '-header', 0 if header not defined.
-is  = what exact match text is searched for. Options: any text.
-contains  = what partial match test is searched for. Options: any text. Note = when this switch is used together with '-is' the '-is' switch takes takes priority.
-dir  = whether to include or exclude rows that were found by '-is' or '-contains' switched to match; or to spilt the file based on the keyword. Options: include/in (default), exclude/out, split/divide.
-delim  = file delimiter. Options: a) "tab", "space", "comma", "semicolon", "colon", "bslash", "slash", "dash", "quote", "squote", b) any symbol or word without spaces. Note: As always, if a symbol used here interferes with the command line, use with backlash (e.g. \/s). Default = "tab".
-newdelim  = new file delimiter. Options: a) "tab", "space", "comma", "semicolon", "colon", "bslash", "slash", "dash", "quote", "squote", b) any symbol or word without spaces. Default = "tab". Do not use this flag if you don't want to change file delimiter.
-out  = output file name. Default = input file name + ".out".
-out2  = output2 file name (used only if the split option is seected). Default = output file name + ".out2".
-find-only-first  = a way to stop search after the first matching entry is found. Options: "yes", "no". Default: "no". Note: The behavior is obvious with '-dir include' option but may be undesirable with the other options. Just remember that that it always terminates the search after the first row is written into the '-out' file.

Example:
./LFE32 -M extractid -file myfile.gzipped -gzip yes -column 0 -header yes -contains auto -dir in -delim space -out results.txt
Here all rows that contain the concequtive letters "auto" (case sensitive) in any of the columns are written into the output file together with the header. The file is gzipped.
Example:
./LFE32 -M extractid -file myfile.txt -column 1 -header no -is car -dir split -delim space -newdelim tab -out results.txt -out results2.txt
Here all rows that have "car" in the first column are put into results.txt while the other rows are put into results2.txt. The new delimiter is tab.


EXTRACTIF

This function extracts rows from a tabular file. You can use two IF statements to narrow down the options for extracting. All entries must be valid number except for the missing value, which can be specified and is excluded from the output. Header can be any size, ine can also exlude rows before the header. This is a general function. If you have very small or very large scientific notation style numbers, you should use the LFE function EXTRACTE. If you need to extract rows with missing values (either include or exclude), consider using the LFE function LISTSELECTION or LISTSELECTION2. ATTN: This function is not yet available for the Windows version of LFE.

./LFE32 -M extractif -file -if1 -if2 -mode -missing -header -header-size -skip-first-rows -delim -out

-file  = input file name. REQUIRED.
-if1  = the first logical expression to set extraction limit. REQUIRED. Options: Each expression must contain: a) number (#) or column number (col#), b) operator "E" (equal), "L" (larger than), "S" (smaller than), c) number (#) or column number (col#); there must be NO SPACES. Examples: "col4L5" (value in column 4 must be larger than 5), "5Ecol7" (value in col 7 must equal 5), "col1Scol2" (the value in column 1 must be smaller than the value in column2). Default = no default.
-if2  = the second logical expression to set extraction limit. OPTIONAL. Options: Same as '-if1'. Notes: -if2 can be defined only if -if1 has been defined. Examples: "-if1 col1E3 -if2 col3Lcol4" (whether these two if statements are joined by AND or OR is defined by the -mode switch). Default = no default.
-mode  = select AND or OR statement to join -if1 and -if2. Options: "and", "or". Default = "and". Note: If the "or" option is selected, only one of the 'if' statements has to hold in order to extract a given row.
-missing  = used to indicate the missing values. Options: a) any symbol or word without spaces, b) "none". Default = "NA". Note: "none" means that the missing entry fields are empty.
-header  = input file header. Options: "yes", "no". Default = "no".
-header-size  = file header size (# of rows); use only if header is larger than 1 row; all header rows apper in the output file. Options: any integer. Default = 1.
-skip-first-rows  = allows to skip first rows. Options: any legal integer. Default = 0
-delim  = delimiter. Options: a) "tab", "space", "semicolon", "comma", b) any symbol or word without spaces. Default = "tab".
-out  = output file name. Default = input file name + ".out".

Example:
./LFE32 -M extractif -file myfile.txt -if1 col3E9 -mode or -if2 col7Scol8 -missing REJECT -header yes -skip-first-rows 3 -delim comma -out results.txt
Here two independent comparisons are made. The row is extracted if the value in column 3 equals 9 or when the value in column 7 is smaller than that of column 8. Nothing is selected when the missing value of "REJECT" is encountered by the expression that needs to return true for the extraction to take place. First 3 rows (those before the header) are ignored and not used for anything.


EXTRACTCOLUMNS

This function extracts (or removes) columns from a tabular file based on the ranges specified. Unlimited number of ranges can be specified. The order of columns in the output file is the same as in the input file, regardless of how the ranges were specified. The new LFE function EXTRACT includes all EXTRACTCOLUMNS functions. If you want to change column order or multiply columns use ORGANIZECOLUMNS or SNAP as these are more powerful functions.

./LFE32 -M extractcolumns -file -columns -delim -skip-first-rows -in-or-out -out

-file  = input file name. REQUIRED.
-columns  = columns to be extracted. REQUIRED. Notes: columns should be separated by commas and ranges by dashes (see the example below); there should be no spaces.
-delim  = delimiter. Options: a) "tab", "space", "semicolon", b) any symbol or word without spaces. Default = "tab".
-skip-first-rows  = allows to skip first rows. Options: any legal integer. Default = 0
-in-or-out  = if defined columns are extracted or removed. Options: "in", "out". Default = "in".
-out  = output file name. Default = input file name + ".out".

Example:
./LFE32 -M extractcolumns -file myfile.txt -columns 9,4-6,1 -delim space -out results.txt
Columns 1, 4, 5, 6, and 9 are extracted from myfile.txt.


EXTRACTROWS

This function extracts rows from a tabular file based on the ranges specified. Unlimited number of ranges can be specified. The order of rows in the output file is the same as in the input file, regardless of how the ranges were specified. The new LFE function EXTRACT includes all EXTRACTROWS functions.

./LFE32 -M extractrows -file -rows -indexes -indexdelim -out

-file  = input file name. REQUIRED.
-rows  = rows to be extracted. REQUIRED. Notes: rows should be separated by commas and ranges by dashes (see the example below); there should be no spaces. Header (if your file has one) is considered a normal row (row=1).
-indexes  = should the extracted rows be numbered? Options: a) "yes", "no". Notes: choosing "yes" will print a row number in front of each row. Default = "no".
-indexdelim  = delimiter between the index and the row. Options: a) "tab", "space", b) any symbol or word without spaces. Default = "tab".
-out  = output file name. Default = input file name + ".out".

Example:
./LFE32 -M extractrows -file myfile.txt -rows 14,7-9,2 -indexes yes -indexdelim space -out results.txt
Rows 2, 7, 8, 9, and 14 are extracted from myfile.txt. Row numbers are printed before each row (and separated by space).


FLATTEN

This function converts table from the type "same ID in multiple rows" to "one ID per row". It also counts all parameters and reports them in a separate file. Additional info in a separate column can be used as tags that determine whether any given ID should be included or excluded. In other words: also subsets of files (only the selected rows) can be converted with this function (see the examples below). Please note: The IDs have to be sorted so that the same IDs are sequentially below each other in the ID column. You can use the LFE function ROWSORT to sort your table like that if your IDs are in the first column. (You can use the LFE function SWAPCOLUMNS to put the IDs in the first column.) If your input file format is not suitable for this function, you may want to check out the LFE function PREFLATTEN.

Input file (the first column is called "ID", the second column is called "parameter"):
car green
car green
car small
bus green
bus large

Output file1, showing the counts for all parameters found:
green 3
large 1
small 1

Output file2, the flattened version of the input file:
ID green large small
car  2       0       1
bus  1       1       0

./LFE32 -M flatten -file -id-column -parameter-column -if-column -if -header -delim -stats-only -stats-show -out

-file  = input file name. REQUIRED.
-id-column  = column where the IDs are. REQUIRED. Notes: IDs are going to be the rows in the output file.
-parameter-column  = column where the parameters are. REQUIRED. Notes: parameters are going to be the columns in the output file.
-if-column  = column where the restriction tags are. Notes: these tags specify whether the particular row is going to be used by the program or ignored. The tag value is specified with the '-if' tag (see below).
-if  = tag value for the '-if-column'. Notes: any particular row in the input file is going to be used by the program if its tag is the same as specified by '-if'.
-header  = input file header. Options: "yes", "no". Default = "no".
-delim  = file delimiter. Options: a) "tab", "space", b) any symbol or word without spaces. Default = "tab".
-stats-only  = determines if only the parameter statistics file is produced or also the flattened file is produced. Options: "yes", "no". Default = "no". Notes: If "yes" is used, the flattened file is not produced. This option is useful because the flattened files tend to be very large and they should not be produced if only the parameter statistics are of interest.
-stats-show  = determines whether statistics are shown only for the non-zero entries (compact presentation). Options: "all", "nonzero". Default = "non zero". Notes: The option "all" should be chosen if the user wants to see also the parameters that occurred 0 times. This is useful when all reports have to be the same length. The zero entries can only be generated if some data are excluded from analysis by using the '-if-column/-if' switches.
-out  = output file name. Default = input file name + ".out". Notes: ".stats" is appended to the statistics file name, ".table" is appended to the flattened file name.

Example:
./LFE32 -M flatten -file myfile.txt -id-column 1 -parameter-column 2 -if-column 3 -if experiment -header yes -delim space -stats-show all -out results
In this example three columns are used from myfile.txt. Column 3 contains tags, and only the rows that contain the word "experiment" are used. The input file has header (which is not used in the output). The stats file will contain statistics also for the rows that did not have "experiment" in the third column (-stats-show all).


FRAGMENTROWS

This function allows to automatically divide a file into smaller parts by selecting the numbers of rows for the chunks. The last chunk may be smaller than the selected chunk size if the original file size is not a multiplicate of the the selected size. The new files are automatically given sequential index-containing names. All resulting files can have the same header (of unlimited rows) as the parent file. Also check out the LFE function 'EXTRACTROWS' and 'EXTRACT' if you want to manually specify the rows extracted.

./LFE32 -M fragmentrows -file -fragmentsize -headersize -style -out

-file  = input file name. REQUIRED.
-fragmentsize  = the number of rows per each new file (exluding the header). Options: any positive integer. Default = 1.
-headersize  = the number of header rows (they will appear the same in each new file). Options: any positive integer. Default = 0.
-style  = how the new files are names. Options: "prepend", "append". Default = "prepend". Note = This is a way to determine if file indexes are at the beginning or end of the file name.
-out  = output file name. Default = input file name + ".out".

Example:
./LFE32 -M fragmentrows -file myfile.txt -fragmentsize 1000 -headersize 5 -style append -out results.txt
Myfile.txt is fragmented into chunks containing 1000 rows; the first 5 rows are considered a header and they apper in every file made. The numerical index in the file name indicating the chunk order is at the end of the file name.


GROUPMEMBERSHIP (specialized)

This specialized function uses a file that has associations between rows and columns (e.g. individuals' height vs. individuals' membership in the various weight categories) and compares this with a reference value (for example a group ID) stored in another column. The reference column should contain IDs recoded as 0,1,2...n. The function returns statistics on how well the values fit with the reference column values. Please contact me for more info.

./LFE32 -M groupmembership -file -scan-col -ref-col -header -delim -missing -out
-file  = input file name. REQUIRED.
-scan-col  = columns where the main data are. REQUIRED. Options: any valid column numbers or ranges without spaces (Ex: 1,3-5,7 means that columns 1,3,4,5,7 contain data for scanning).
-ref-col  = column that contains the reference (Ex: group number) values. REQUIRED. Options: any valid column number.
-header  = input file header. Options: "yes", "no". Default = "no".
-delim  = file delimiter. Options: a) "tab", "space", b) any symbol or word without spaces. Default = "tab".
-missing  = what is used to indicate the missing values. Options: a) any symbol or word without spaces, b) "none". Default = "NA". Note: "none" means that the missing entry fields are empty.
-out  = output file name. Default = input file name + ".out".

Example:
./LFE32 -M groupmembership -file mydata.txt -scan-col 2-3 -ref-col 4 -header yes -missing .. -out summary.txt
Data of columns 2 and 3 are scanned against column 4. Summary file lists how many entries in column 2 and 3 match with what value in column 4. The output is presented in 5 columns. Column 1 shows the reference column ID (group ID), column 2 shows the counts obtained from columns 2 and 3 (assuming integers), column 3 shows the maximal counts possible, column 4 is the % value (column 3 / column 4 *100), column 5 shows the average value for each group (assuming floating point values).


HORIZAPPEND

This function allows to append columns to a file from a second file. Related to MACROBLINDMERGE but does not use a separate reference file. Can select any columns from first file but only one or all columns from the second file. The two files must have the same number of rows (not checked beforehand). Defined column numbers of file 1 must not exceed the maximum column numbers. Delimiters can be different for both files and will be changed to the same (as defined by the user).

./LFE32 -M horizappend -file1 -file2 -delim1 -delim2 -newdelim -allfrom1 -allfrom2 -columns1 -columns2 -out

-file1  = input file1 name. REQUIRED.
-file2  = input file2 name. REQUIRED.
-delim1  = file1 delimiter. Options: a) "tab", "space", "comma", "semicolon", "colon", "bslash", "slash", "dash", "quote", "squote", b) any symbol or word without spaces. Note: As always, if a symbol used here interferes with the command line, use with backlash (e.g. \/s). Default = "tab".
-delim2  = file2 delimiter. Options: a) "tab", "space", "comma", "semicolon", "colon", "bslash", "slash", "dash", "quote", "squote", b) any symbol or word without spaces. Note: As always, if a symbol used here interferes with the command line, use with backlash (e.g. \/s). Default = "tab".
-newdelim  = new dile file delimiter. Options: a) "tab", "space", "comma", "semicolon", "colon", "bslash", "slash", "dash", "quote", "squote", b) any symbol or word without spaces. Note: As always, if a symbol used here interferes with the command line, use with backlash (e.g. \/s). Default = "tab".
-allfrom1  = use if all columns from file 1 are included; overrides '-columns1' Options: "yes", "no" Default "yes".
-allfrom2  = use if all columns from file 2 are included; overrides '-columns2' Options: "yes", "no" Default "yes".
-columns1  = what columns from file1 are used Options: ranges a for example 1,7,9-13 (must be legal ranges) Default no default.
-columns2  = what column from file2 is used Options: legal integer or omit if '-allfrom2' is used Default no default.
-out  = output file name. Default = input file name + ".out".

Example:
./LFE32 -M horizappend -file1 f1.txt -file2 f2.txt -delim1 space -delim2 _ -newdelim \. -columns1 1,4,7-9 -allfrom2 yes -out results.txt
All columns from f2.txt are appended to columns 1,4,7,8,9 from f1.txt. New delimiter is ".".


IF2COLUMNSARESAME

This function extracts or removes rows from the file if the entries in two given columns are the same. NA is handled as a real entry.

./LFE32 -M if2columnsaresame -file -column1 -column2 -in-or-out -delim -header -out

-file  = input file name. REQUIRED.
-column1  = column 1 number Options: any legal integer Default no default. REQUIRED.
-column2  = column 2 number Options: any legal integer Default no default. REQUIRED.
-in-or-out  = include or exclude rows that meet the condition. Options: "in", "out". Default = "in". Notes: "in" includes rows, "out" excludes rows.
-delim  = file delimiter. Options: a) "tab", "space", "comma", "semicolon", "colon", "bslash", "slash", "dash", "quote", "squote", b) any symbol or word without spaces. Note: As always, if a symbol used here interferes with the command line, use with backlash (e.g. \/s). Default = "tab".
-header  = input file header. Options: "yes", "no". Default = "no".
-out  = output file name. Default = input file name + ".out".

Example:
./LFE32 -M if2columnsaresame -file myfile.txt -column1 3 -column2 5 -in-or-out out -delim space -header yes -out results.txt
All rows that contain the same value in columns 3 and 5 are removed.


ITEMIZE

Converts values into categories (sequential integers) based on ranges given by the user. It can also code the categories as 0/1 in such a way that each category is in a separate column (one column per range item). Example: If values are 0.5 and 1.5 and ranges are 0-1 and 1.01-2 then the first value becomes 0 and the second one becomes 1. Option one will simply replace the original values. Option two will put 1 in the first column if value is between 0 and 1 (otherwise the value is 0); and it will put 1 in the second column if the value is between 1.01 and 2 (otherwise the value is 0). Optionally type indeces are created as the top row (showing the main category number - this number increments with each column as the column is broken into categories). Header handling is smart, the category borders are appended to the header entries. It is possible to write only the columns that were categorized (itemized) into the output file or alternatively all columns can be written into the new file. This function was created for itemset creation for association rule mining but it has many more uses. See also DIGITIZE.

./LFE32 -M itemize -file -columns -ranges -out -style -delim -missing -header -header-size -showheader -typeindex -showallcolumns

-file  = input file name. REQUIRED.
-columns  = columns to be considered. REQUIRED. Syntax: any legal ineteger range. Ranges are indicated by dashes, ranges are separated by commas. Example: 3,5-7,11-12 means that the following columns are selected: 3,5,6,7,11,12. Note: Illegal ranges not allowed, if values less than 1 or larger than maxcolumn then the new values become 1 and maxcolumn.
-ranges  = ranges (for categories) to be considered. REQUIRED. Syntax: any legal ineteger range. Ranges are indicated by dashes, ranges are separated by commas. Example: 0-1,1.01-2 means that two ranges are selected .
-out  = output file name. Default = input file name + ".out".
-style  = how to create the categories. Options: a) "onecolumns" or "0", b) "flatten" or "1". Default = 0. Explanation: Option one will replace original values with categories (sequential integers staring with 0). Option two will create as many categories (new columns) for each entry as there were ranges defined. In this case coding is 0/1 ("0" if the entry did not belong the the given range and "1" if it did).
-delim  = file delimiter. Options: a) "tab", "space", "comma", "semicolon", "colon", "bslash", "slash", "dash", "quote", "squote", b) any symbol or word without spaces. Note: As always, if a symbol used here interferes with the command line, use with backlash (e.g. \/s). Default = "tab".
-missing = used to indicate the missing values. Options: a) any symbol or word without spaces, b) "none". Default = "NA". Note: Missing value is put everywhere where numbers cannot go, either due to missing info or formatting issues (such as encounter of non-numbers).
-header  = file header. Options: "yes", "no". Default = "no".
-header-size  = file header size (# of rows); use only if header is larger than 1 row; only the last row is used in the output file. Options: any integer. Default = 1.
-showheader  = if or not the header is written into output (if it exists). Options: "yes", "no". Default = "yes". Note: If '-style 1' is used the header will also show the categories.
-typeindex  = if or not type index row is created (after the header) and before the data. Options: "yes", "no". Default = "no". Note: This is useful only with the '-style 1' option. It shows the original column number (from among the columns that were selected with '-columns').
-showallcolumns  = if only selected (itemized) columns are written into the new file or also the untouched columns (in their original state). Options: "yes", "no". Default = "yes".

Example:
./LFE32 -M itemize -file input.txt -columns 2,4-7 -ranges 1.05-2.0,2.01-3.0 -out results.txt -style 1 -delim space -missing na -header yes -header-size 10 -showheader yes -typeindex yes -showallcolumns no
Here the itemization is done using the 0/1 coding. Only some columns are selected and the non-itemized columns are not shown. Total of two ranges are selected. Header is 10 rows long but only the 10th row is carried over to the putput file. The type index row is generated after the header row (and before the data).


LISTAPPEND

This function merges together unlimited number of files. It sequentially appends one file to the next according to a given list. Header is appended only once (and it can be fragmented from the original header block as defined by all the header flags).

./LFE32 -M listappend -list -header -header-size -header-1-size -out

-list  = list file name. REQUIRED. Note: This file should contain one valid file name per row. This list determines the order of how the files are merged together.
-header  = input file header. Options: "yes", "no". Default = "no". Note: If the "yes" option is selected, the header of the first file is written into the output, but the headers of the following files are simply removed.
-header-size  = how many rows is the total header. Options: any valid integer. Default = 1 (if header set to yes, otherwise 0). Note: don't use this flag is you don't have a header or header size is 1.
-header-1-size  = how many rows from the bottom of the total header block is the header that is going to be used as the new header in the out file. Options: any valid integer. Default = 1 (if header set to yes, otherwise 0). Note: don't use this flag is you don't have a header or header size is 1.
-out  = output file name. Default = list file name + ".out".

Example:
./LFE32 -M listappend -list list.txt -header yes -header-size 10 -header-1-size 1 -out mergedfile.txt
This example shows how to merge files (row-by-row) by providing a list of file names (list.txt). The header of the first file is retained but the headers of the subsequent files are removed before appending. Before merging, 10 first rows are removed from each file and the row number 10 (the first bottom row) of the first file is used as the new header at the top of the new merged file.


LISTSELECTION

This function extracts rows or IDs from a tabular file based on a list of IDs stored in a separate reference file (list). It can either include or exclude the rows/IDs that are specified in the list file. If you want to select rows based on two IDs, please use LFE function LISTSELECTION2.

./LFE32 -M listselection -file -list -header -column -delim -in-or-out -show -case -out

-file  = input file name. REQUIRED.
-list  = list file name. REQUIRED. Note: This file should contain one ID per row.
-header  = input file header. Options: "yes", "no". Default = "no".
-column  = column number. Options: a) valid column number, b) "all" (searches all columns). Default = "all".
-delim  = file delimiter. Options: a) "tab", "space", b) any symbol or word without spaces. Default = "tab".
-in-or-out  = include or exclude rows that contain a match in the reference list (-list). Options: "in", "out". Default = "in". Notes: "in" includes rows, "out" excludes rows.
-show  = whether to show the entire row or only the ID when a positive hit is found. Options: a) "row", b) "id". Default = "row". Notes: "row" is used to show the rows of the positive hits, "id" is used to show the IDs of the positive hits.
-case  = defines case sensitivity. Options: a) "yes", b) "no". Default = "yes". Notes: "yes" means case sensitive, "no" means case insensitive.
-out  = output file name. Default = input file name + ".out".

Example (maximal):
./LFE32 -M listselection -file myinput.txt -header yes -column all -delim space -list IDlist.txt -in-or-out out -show id -case no -out results.txt
This example selects rows from myinput.txt (which has a header) based on the IDs listed in IDlist.txt in a case-insensitive manner. The rows that contain any of the listed IDs in any of the columns are exluded from the selection (results.txt). The selection (results.txt) will contain only IDs, not rows.

Example (minimal):
./LFE32 -M listselection -file myinput.txt -list IDlist.txt
This example uses all the default options. The rows are included in a case-sensitive manner and reported in myinput.txt.out. There is no header, all columns are searched, the file is considered tab-delimited.


LISTSELECTION2

This function extracts rows or IDs from a tabular file based on a list of two IDs stored in a separate reference file (list). It can either include or exclude the rows/IDs that are specified in the list file. The compliance of each entry can be specified with AND/OR statements. Example: extract rows that contain ID1 or ID2. These IDs can be located in specific columns or any column. This function requires the presence of two IDs for each comparison. If you have simple list of IDs, please use the LFE function LISTSELECTION.

./LFE32 -M listselection2 -file -list -header -column1 -column2 -mode -delim -in-or-out -show -case -out

-file  = input file name. REQUIRED.
-list  = list file name. REQUIRED. Note: This file should contain two IDs per row (delimiter specified by the '-delim' switch, see below).
-header  = input file header. Options: "yes", "no". Default = "no".
-column1  = column nuber for the first ID of the list file. Options: a) valid column number, b) "all" (searches all columns). Default = "all".
-column2  = column number for the second ID of the list file. Options: a) valid column number, b) "all" (searches all columns). Default = "all".
-mode  = whether to treat the ID comparisons using the AND or OR statement. Options: "and", "or". Default = "and". Note: if "and" is used, you need both IDs to match for the "yes" answer, when "or" is used, you need only one. Whether the matched with the "yes" answer are going to be included or excluded is determined by the '-in-or-out' switch.
-delim  = file delimiter. Options: a) "tab", "space", b) any symbol or word without spaces. Default = "tab". Note: The delimiter of the input file and of the list file must be the same.
-in-or-out  = include or exclude rows that received the "yes" answer by the '-mode' switch (see above). Options: "in", "out". Default = "in". Notes: "in" includes rows, "out" excludes rows that received the "yes" answer.
-show  = whether to show the entire row or only the ID when a positive hit is found. Options: a) "row", b) "id". Default = "row". Notes: "row" is used to show the rows of the positive hits, "id" is used to show the IDs of the positive hits.
-case  = defines case sensitivity. Options: a) "yes", b) "no". Default = "yes". Notes: "yes" means case sensitive, "no" means case insensitive.
-out  = output file name. Default = input file name + ".out".

Example (maximal):
./LFE32 -M listselection2 -file myinput.txt -header yes -column1 3 column2 all -mode or -delim space -list IDlist.txt -in-or-out out -show row -case no -out results.txt
This example selects rows from myinput.txt (which has a header) based on the two columns of IDs listed in IDlist.txt in a case-insensitive manner. The delimiter of myinput.txt and IDlist.txt is space. First each row is tested against the entries of the IDlist.txt. If the entry (ID) of the first column (of IDlist.txt) matches that of the third column of myinput.txt, or the entry of the second column (of IDlist.txt) matches the entry of any column of myinput.txt the verdict for the row under consideration is "yes". If the answer is "yes" the row is excluded because the '-in-or-out' switch is defined as "out". If the answer is "no", the entry is not excluded (it is included in the results.txt).


MACROBLINDMERGE

This function merges tabular files vertically (column-wise) based on directions stored in a reference file. It does not check that the lengths of the files match or that the column IDs match. This function is useful when the quality of input files is tightly controlled. Similar function is HORIZAPPEND (does not use a reference file). For merging files by the ID numbers, please refer to SMARTMERGE.

./LFE32 -M macroblindmerge -ref -delim

-ref  = reference file name. REQUIRED. Notes: Must contain three filenames per row, separated by spaces: input file1, input file2, merged file.
-delim  = delimiter for the merged file. Options: a) "tab", "space", b) any symbol or word without spaces. Default = "tab".

Example to show how to automatically merge 4 files (file1-4.txt) vertically by using a reference file. This is achieved by creating temporary files. The final result is named "result.txt".

./LFE32 -M chengedelim -ref myreference.txt -delim space

myreference.txt:
file1.txt   file2.txt   temp1.txt
temp1.txt   file3.txt   temp2.txt
temp2.txt   file4.txt   result.txt


MAKETABLE

This function merges subsequent rows into one if they have the same ID in the first column. Note that same ID's must be located sequentially in the file. Please also see 'rowstotable'.

./LFE32 -M onecolumn -file -out -delim -header

-file  = input file name. REQUIRED.
-out  = output file name. Default = input file name + ".out".
-delim  = file delimiter. Options: a) "tab", "space", "comma", "semicolon", "colon", "bslash", "slash", "dash", "quote", "squote", b) any symbol or word without spaces. Default = "tab".
-header  = input file header. Options: "yes", "no". Default = "no".

Example:
./LFE32 -M maketable -file mydata.txt -out results.txt -delim space -header no
In this example mydata.txt is converted into table where each ID of the first column occurs only once.


MERGETABLES

This function merges 2 or 3 tables into one by putting their columns with the same index next to one another in the new file. It also modifies the header names to make it obvious how the columns were merged.

./LFE32 -M mergetables -file1 -file2 -file3 -out -delim -header1 -header2 -header3

-file1 -file2 -file3  = input file names. At least two REQUIRED.
-out  = output file name. Default = input file name + ".out".
-delim  = file delimiter. Options: a) "tab", "space", "comma", "semicolon", "colon", "bslash", "slash", "dash", "quote", "squote", b) any symbol or word without spaces. Default = "tab".
-header1 -header2 -header3  = input file headers. Options: "yes", "no". Default = "no". Note: If all headers are present or missing, you can use "-headers yes" or "-headers no", respectively, to switch all headers on or off.

Example:
./LFE32 -M mergetables -file mydata1.txt -file2 mydata2.txt -file3 mydata4.txt -header1 yes -header2 yes -header3 yes -out results.
In this example three files are merged column-wise.


MULTIFILTER

This function allows filtering based on multiple (unlimited) parameters, using both the AND and OR logical operators. It can also deal with missing values by either considering them just as values or ignoring them.

./LFE32 -M multifilter -file -out -delim -missing -ignore-missing -logic -dir -header -filter

-file = file name. REQUIRED.
-out = output file name. Default = input file name + ".out".
-delim = file delimiter. Options: a) "tab", "space", "comma", "semicolon", "colon", b) any symbol or word without spaces. Default = "tab".
-missing = used to indicate the missing values. Options: a) any symbol or word without spaces, b) "none". Default = "NA".
-ignore-missing = whether missing values (defined by '-missing') are ignored (always treated as OK) or not (treated as defined bu the user, see below). Options: "yes", "no". Default: "no".
-logic = defines logical operator used for filtering. Options: "and", "or", Default: "and".
-dir = whether to include or exclude entries that met the criteria defined by '-logic' and '-filter'. Options: "include, "exclude". Default: "include".
-header = input file header. Options: "yes", "no". Default = "yes".
-filter = the main parameter that defines how to filter. The argument is composed of values and letters separated by ":" and ",". Each filter has the following structure: "name:parameter:value"; different filters are separated by ",". Name = what column to consider (this is an integer when header is not present or name when header is present). If the absolute value of the parameter is used, the name is preceded with "abs:" (see example below)). Parameter = how to treat the values (options: "s"=smaller than, "se" or "es" = smaller than or equal to, "l" means larger than, "le" or "el" means larger than or equal to, "e" = equal, "ne"or "en" means not equal). Parameter values are case-insensitive. Value = the reference value against which all entries are compared.

Examples:
./LFE32 -M multifilter -file mydata.txt -out results.txt -delim tab -missing NA -ignore-missing yes -logic and -dir include -header yes -filter abs:Slope:l:0.01,P:se:5e-8
Data file has a header with column names. "NA" is the missing value and it is ignored (it passes all tests). All entries are included that that have their absolute "slope" value larger than 0.001 AND their "P" value smaller than or equal to 5e-8.

./LFE32 -M multifilter -file mydata.txt -out results.txt -delim tab -missing NA -ignore-missing yes -logic or -dir exclude -header yes -filter SEslope:le:1,P:l:5e-8
Data file has a header with column names. "NA" is the missing values and it is ignored (it passes all tests). All entries are excluded that that have their SEslope value larger than or equal to 1 OR their "P" value larger than 5e-8.

./LFE32 -M multifilter -file mydata.txt -out results.txt -delim tab -missing NA -ignore-missing yes -logic or -dir include -header yes -filter Slope:l:0.5,Slope:s:-0.5
Data file has a header with column names. "NA" is the missing values and it is ignored (it passes all tests). All entries are included that that have their slope value over 0.5 OR under -0.5. Note that each column can have more than one filter set upon them.

./LFE32 -M multifilter -file mydata.txt -out results.txt -delim tab -missing NA -ignore-missing yes -logic and -dir exclude -header no -filter abs:1:e:2,2:ne:5,2:l:2
Data table does not have a header hence the column numbers must be used as identifiers instead of their names. "NA" is the missing values and it is ignored (it passes all tests). All entries are excluded that have their absolute value of column 1 equal to 2 AND the value in column 2 not equal to 5 AND the value in column 2 larger than 2.

./LFE32 -M multifilter -file mydata.txt -out results.txt -delim tab -missing NA -ignore-missing no -logic or -dir include -header no -filter 2:l:3,2:ne:NA
Data table does not have a header hence the column numbers must be used as identifiers instead of their names. "NA" is not ignored, it is treated as any other value. Here only the entries are included that have a value larger than 3 in column 2 OR don't have "NA" in column 2.


ONECOLUMN

This function extracts or removes one column from a tabular file. The column can be specified by a column number of header name (ID). If both header ID and column number are specified, the header name takes a priority. For extracting more columns, please refer to EXTRACT, EXTRACTCOLUMNS.

./LFE32 -M onecolumn -file -out -delim -column -headerID -dir

-file  = input file name. REQUIRED.
-out  = output file name. Default = input file name + ".out".
-delim  = file delimiter. Options: a) "tab", "space", "comma", "semicolon", "colon", "bslash", "slash", "dash", "quote", "squote", b) any symbol or word without spaces. Default = "tab".
-column  = column number. Options: valid column number Default = 1.
-headerID  = header ID of the column. Options: valid ID Note: This takes priority over the "-column" tag.
-dir  = direction of your action (extract or remove). Options: "yes" (extract), "in" (extract), "no" (remove), "out" (remove). Default = "in"/"yes".

Example:
./LFE32 -M onecolumn -file mydata.txt -out results.txt -delim space -headerID year -dir out
In this example the column that has the header "year" is removed from the file.


ORGANIZECOLUMNS

This function manipulates columns. You can define what columns you want to see, in what order and how many times in the new file. With this you can also create new columns and fill them with text or numbers.

./LFE32 -M organizecolumns -file -columns -delim -out

-file  = input file name. REQUIRED.
-columns  = array of numbers or symbols corresponding to new column order. REQUIRED. Notes: columns should be separated by commas and ranges by dashes (see the example below); there should be no spaces. All column numbers must be legal. If you want a new text column, simply add that text (no spaces). If you want several copies of each new text column use this syntax: "text:start-end". Example: cat:1-3, this will make three columns filled with "cat" (Attn! Even if you want just one copy of "cat" you still need to define a range, e.g. "cat:1-1").
-delim  = file delimiter. Options: a) "tab", "space", "comma", "semicolon", "colon", "bslash", "slash", "dash", "quote", "squote", b) any symbol or word without spaces. Default = "tab".
-out  = output file name. Default = input file name + ".out".

Example:
./LFE32 -M organizecolumns -file myfile.txt -columns 2,1,1,3-5,dog:1-2,3:1-1 -delim space -out results.txt
In the new file the new columns appear in this order: col2, col1, col1, col3, col4, col5, "dog", "dog", "3".


PREFLATTEN

This function converts a file that has several entries in one of the columns into the type that has exactly one entry in that column. This is done at the expense of adding more rows to the table. This function was originally designed to convert a compressed style table into the input file for the LFE function FLATTEN. The examples will clarify this more.

Input file (the second column contains comma-separated parameters):
car green,fast,small peter
bus red,large james

Output file (every column contains just one parameter):
car green peter
car fast peter
car small peter
bus red james
bus large james

./LFE32 -M preflatten -file -column -delim1 -delim2 -header -out

-file  = input file name. REQUIRED.
-column  = column where the compressed parameters are. REQUIRED.
-delim1  = input file main delimiter. Options: a) "tab", "space", "semicolon", b) any symbol or word without spaces. Default = "tab".
-delim2  = input file compressed column delimiter. Options: a) "tab", "space", "semicolon", b) any symbol or word without spaces. Default = "tab".
-header  = input file header. Options: "yes", "no". Default = "no".
-out  = output file name. Default = input file name + ".out".

Example:
./LFE32 -M preflatten -file myfile.txt -column 3 -delim1 space -delim2 semicolon -header no -out results.txt
In this example the compressed data are located in column 3 and the delimiter inside this column is semicolon (while the main file delimiter is space). Note that the results.txt is input for the LFE function FLATTEN, however it needs to be checked that all the identical IDs are located sequentaially below each other (see FLATTEN).


ROWSTOTABLE

This function converts a file of one column into a table with a specified number of columns. Rows are converted into N columns blockwise: first row becomes the first column, N-th row becomes the N-th column, N+1 row becomes the first column again and so on.

./LFE32 -M rowstotable -file -rowblock -delim -header -out

-file  = input file name. REQUIRED.
-delim  = delimiter of the input file. Options: a) "tab", "space", "comma", "semicolon", "colon", "slash", "bslash", "dash", "quote", "squote" b) any symbol or word without spaces. Default = "tab".
-header  = file header. Options: "yes", "no". Default = "no".
-rowblock  = how many rows will form the columns of the future table; the rows are converted into columns blockwise. Options: any integer. Default = no default.
-out  = output file name. Default = input file name + ".out".

Example:
./LFE32 -M rowstotable -file input.txt -rowblock 10 -delim space -header no -out results.txt
If input.txt contains 20 rows the new table is space delimited and contains 10 columns and 2 rows.


SIMMATRIX

This function converts a file where entries are in the rows and observations are in the columns into a symmetrical matrix. Note: This function returns a symmetrical matrix. Both symmetrical parts are calculated separately to reduce the memory usage (please note that this is achieved at the expence of the computational speed).

./LFE32 -M simmatrix -file -delim -col-header -row-header -missing -missing-replace -method -out

-file  = input file name. REQUIRED.
-delim  = delimiter of the input file. Options: a) "tab", "space", b) any symbol or word without spaces. Default = "tab".
-col-header  = input file columns header. Options: "yes", "no". Default = "no". Note: These are the observation IDs.
-row-header  = input file row header. Options: "yes", "no". Default = "no". Note: These are the entry IDs (e.g. individual IDs). These IDs are used as both the row and column IDs in the final similarity matrix.
-missing  = what is used to indicate the missing values. Options: a) any symbol or word without spaces, b) "none". Default = "NA". Note: "none" means that the missing entry fields are empty.
-missing-replace  = what number (0, -9 etc.) is used in the output file for missing value. Options: any valid number. Default = "0.0". Note: Only valid numbers are accepted.
-method  = metric used to measure differences. Options: "linear", "square". Default = "linear".
-out  = output file name. Default = input file name + ".out".

Example:
./LFE32 -M simmatrix -file mydata.txt -delim space -col-header yes -row-header yes -missing ABSENT -missing-replace -99 -method square -out table.txt
The input data file has individual IDs in separate rows and observations in the columns. It is converted in the similarity matrix so that -99 is used when missing data fields are encountered. The input file has both headers (both the rows and columns have identifyiers in the first positions). The distance/similarity metric is square of the differences.


SMARTMERGE (slow, unless "-match once" is used)

This function merges tabular files vertically (column-wise) by matching the IDs located in the first columns of the files. This means that the order of rows is retained. IDs of the first column are used for matching, the results file will contain only one copy of IDs (the IDs of the second file are not included in the outcome). Summary is automatically written into log files indicating what rows were not used and what rows were used more than once. This function can fill in missing data according to user specification. This function can also be used to add or remove columns or to sort rows (based on template). See the example box below for more info, or contact me to learn all the options of this function.

./LFE32 -M smartmerge -file1 -file2 -header1 -header2 -columns1 -columns2 -missing -delim -show -match -out

-file1  = input file 1 name. REQUIRED.
-file2  = input file 2 name. REQUIRED.
-header1  = input file 1 header. Options: "yes", "no". Default = "no".
-header2  = input file 2 header. Options: "yes", "no". Default = "no".
-columns1  = number of columns in file 1 including the ID column. Options: any positive integer above 0. Note: = if this parameter is not defined, the column number is found automatically based on the first row. If the predefined column number is smaller than the real column number, file 1 is vertically truncated, if it is larger, however, the missing columns are filled with the '-missing' text.
-columns2  = number of columns in file 2 including the ID column. Options: any positive integer above 0. Note: = if this parameter is not defined, the column number is found automatically based on the first row. If the predefined column number is smaller than the real column number, file 2 is vertically truncated, if it is larger, however, the missing columns are filled with the '-missing' text.
-missing  = what to write instead of missing values. Options: a) any symbol or word without spaces, b) "none". Default = "NA". Note: "none" means that the missing entries (empty entries) are going to remain empty.
-delim  = delimiter of the input files. Options: a) "tab", "space", b) any symbol or word without spaces. Default = "tab". Note: both input files should have the same delimter.
-show  = whether to show rows that did not match. Options: "all", "matches". Default = "all". Notes: "all" prints all rows in the results file, "matches" prints only matched rows in the results file.
-match  = whether the file 2 IDs are recycled or only used once when matched with the file 1 IDs. Options: a)"always", b) "once".Note: "always" means that file 1 and file 2 IDs are matched whenever the are identical, "once" means that each file 2 ID can only be used once (they are not recycled once they are used). ATTN: If you can use "once" (for example, you know beforehand that you have the same number of ID's in both files) using this option can make matching incredibly faster!
-out  = output file name. Default = input file 2 name + ".out".

Example:
./LFE32 -M smartmerge -file1 firstfile.txt -file2 secondfile.txt -missing ABSENT -show all -match always -out mergedfile.txt
Here all IDs are shown in the results file, both matched and unmatched. Whenever a missing value in encountered (no match) the word "ABSENT" is written. The file 2 IDs are always matched (they are never used up). The input files do not have headers or they are not specified, file column numbers are determined automatically based on the first row of the files.

firstfile.txt:
car red
bus yellow
boat green
secondfile.txt:
boat pink
bike black
car blue
mergedfile.txt:
car red blue
bus yellow ABSENT
boat green pink


SMARTMERGE2 (fast for number-containing ID's)

This function does the same as Smartmerge. The differences are: a) does not use the '-columns1', '-columns2' flags as column number is found automatically, b) introduces five new flags (see below), c) requires that each file has a uniform number of columns throughout (and does not check that this is true!), d) is up to 10x faster than Smartmerge if unique IDs contain numbers, e) log file no loger reports how many times each ID was used during merging, f) log2 file only produced when matchonce==TRUE, g) unique ID can now be loated in any predefined column, h) either or both unique IDs can be removed from results file during merging.

./LFE32 -M smartmerge2 -file1 -file2 -header1 -header2 -missing -delim -show -match -out -forkloc -id1 -id2 -first-id -second-id

Below are listed only the flags that are used in addition to Smartmerge (see above):
-forkloc  = position in the unique ID that contains numbers. Options: any integer. Default = 1. Note: This number should point to the most variable numeric position in the unique ID. It can point to a non-number location or be larger than the length of the name of the unique ID but then there is no speed gain relative to Smartmerge. Example: If unique ID is of the type "data934" and the position 6 is most variable among the IDs, it is a good idea to set forkloc to 6 ('-forkloc 6').
-id1  = column number of the first file that contains the unique IDs. Options: any integer. Default = 1.
-id2  = column number of the second file that contains the unique IDs. Options: any integer. Default = 1.
-first-id  = whether to show the unique IDs of the first file in the final merged file. Options: "show/yes","dontshow/no". Default = "show/yes".
-second-id  = whether to show the unique IDs of the second file in the final merged file. Options: "show/yes","dontshow/no". Default = "show/yes".

Example:
./LFE32 -M smartmerge2 -file1 first.txt -file2 second.txt -header1 yes -header2 no -missing na -delim space -show all -match always -out merged.txt -forkloc 5 -id1 2 -id2 3 -first-id yes -second-id no
Here only the unique IDs of the first file are shown in the results file. First file has unique IDs in the 2. column, second file has them in the 3. column. The most variable numberic position of the unique IDs is 5.


SNAP

This function can operate on normal or nested columns (columns that are internally using another delimiter than the main columns). You can: extract columns, delete columns, change column order, duplicate or multiply columns, split columns, fill in missing values. Note that EXTRACT or EXTRACTCOLUMNS cannot multiply columns or change their order. While less powerful the ORGANIZECOLUMNS function can also duplicate columns and change their order (and may be more intuitive to use in most cases). Please see examples below that demonstrate SNAP usage.

./LFE32 -M snap -file -column -snap-columns -delim -snap-delim1 -snap-delim2 -snap -missing -out

-file  = input file name. REQUIRED.
-column  = main column number. REQUIRED. Notes: Any valid integer.
-snap-columns  = column numbers (of the nested column) to define actions. REQUIRED. Notes: Any valid integer. Notes: columns should be separated by commas and ranges by dashes (see the example below); there should be no spaces.
-delim  = file main delimiter. Options: a) "tab", "space", "comma", "semicolon", "colon", "bslash", "slash", "dash", "quote", "squote", "dot" b) any symbol or word without spaces. Default = "tab".
-snap-delim1  = nested column input delimiter. Options: a) "tab", "space", "comma", "semicolon", "colon", "bslash", "slash", "dash", "quote", "squote", "dot" b) any symbol or word without spaces. Default = "tab".
-snap-delim2  = nested column output delimiter. Options: a) "tab", "space", "comma", "semicolon", "colon", "bslash", "slash", "dash", "quote", "squote", "dot" b) any symbol or word without spaces. Default = "tab".
-snap  = how to treat columns that do not exist. Options: 0 (default; missing columns that are requested (by -snap-columns) are filled in with the missing value), 1(missing columns are not displayed regardless of whether they are requested or not by -snap-columns)
-missing  = what is used to indicate the missing values. Options: any symbol or word without spaces. Default = "NA".
-out  = output file name. Default = input file name + ".out".

Example (reorganize nested columns):
./LFE32 snap -file myfile.txt -column 2 -snap-columns 2,1,3-6 -delim tab -snap-delim1 space -snap-delim2 space -snap 1 -out results.txt
The main table is tab-delimited, column 2 is the nested column. Sub-columns 1 and 2 of the 2nd column are swapped while the following columns (3-6) remain as they were (same order). This line "cat dog mouse rat" becomes "dog cat mouse rat". Note: Columns that are missing, are not created. If you were to use "-snap 0" then the missing columns would have been created and filled in with NA. The above line would become "dog cat mouse rat NA NA".
Example (duplicate and select, fill in missing):
./LFE32 snap -file myfile.txt -column 2 -snap-columns 7,7,100 -delim tab -snap-delim1 space -snap-delim2 space -snap 0 -out results.txt
The main table is tab-delimited, column 2 is the nested column. Sub-column 7 is duplicated and one copy of column 100 is added. The other sub-columns are deleted. Columns that are missing are filled in with the missing value NA.
Example (break nested column into the main table):
./LFE32 snap -file myfile.txt -column 2 -snap-columns 1-50 -delim tab -snap-delim1 space -snap-delim2 tab -snap 0 -out results.txt
The main table is tab-delimited, column 2 is the nested column and space delimited. The first 50 sub-columns of the main column beome part of the main column (their delimiter becomes tab just as in the main column. Columns that are chosen but are missing are filled in with the missing value NA.
NOTE:
If you want to work with the main columns and not nested columns all you need to do is set the main file delimiter to something that is not present in your table and work with column 1:
Example: -delim colon -column 1.


SPLITBYID

This function sorts the rows of an input file into new files based on ID's in the specified column. The new file names are the same as the ID's that are searched for in the columns. The ID list (-list) is given by the user. If an ID is found that us not specified in the ID list (-list) the corresponding row will go into the overflow file (-overflow). If only one ID is searched for, the EXTRACTID function might be an easier solution to use. To make the list file for this script refer to DUPLICATES (the first columns of it's *.summary file can be used as an all-inclusive list).

./LFE32 -M sortbyid -file -list -column -overflow -append -header -header-size -delim -rowchunk

-file  = input file name. REQUIRED.
-list  = list file name; contains the list of all ID's that are used for extraction; these become the new file names. REQUIRED.
-column  = column name where the ID is. Options: valid column number. Default = no default. REQUIRED
-overflow  = overflow file name; this is the file where the rows that are not matched will go. Options: any file name; "no" (means that overflow file is not created, no nonmatched rows are simply ignored and lost). Default = input file name + ".overflow".
-append  = append text to file names (that would otherwise be simply the ID's). Options: any text Default = no default.
-header  = input file header. Options: "yes", "no". Default = "no".
-header-size  = number of rows in the input file header. Options: any valid integer. Default = 1.
-delim  = file delimiter. Options: a) "tab", "space", "comma", "semicolon", "colon", "bslash", "slash", "dash", "quote", "squote", b) any symbol or word without spaces. Note: As always, if a symbol used here interferes with the command line, use with backlash (e.g. \/s). Default = "tab".
-rowchunk  = how many rows are processed before they are written into output files (larger values make the script faster). Options: any positive integer (limited by computer memory) Default = 10000.

Example:
./LFE32 -M sortbyid -file myfile -list mylist -column 1 -overflow overflow -append .txt -header yes -header-size 10 -delim dash
Here rows from "mylist" are sorted into new files by the ID in the first column as definned by "list". Header is 10 rows and is incorporated in each noew file. The ID's that are not matched are put in file "overflow". Append ".txt" to each file name.


TEMPLATEFILTER

This function extracts rows from file1 if a user-given reference value is smaller/equal/larger than the value in a specified column of file2. Attn: This script is easy to modify to use a value from file1 instead of the reference value.

./LFE32 -M templatefilter -file -template -template-column -value -direction -column 1 -delim1 -delim2 -delims -header1 -header2 -headers -out

-file  = input file name. REQUIRED.
-template  = template file name. REQUIRED.
-template-column  = column number of template file. REQUIRED.
-value  = the reference value against which the '-templatecol' values are compared. REQUIRED.
-direction  = filtering direction. Options: "S" (smaller than), "E" (equal to), "L" (larger than). Default = "no".
-column  = main column number. REQUIRED. Notes: Any valid integer. Attn! It is not used at all in the current version but you need to include it as a dummy (e.g. '-column 1').
-delim1  = file delimiter. Options: a) "tab", "space", "comma", "semicolon", "colon", "bslash", "slash", "dash", "quote", "squote", b) any symbol or word without spaces. Note: As always, if a symbol used here interferes with the command line, use with backlash (e.g. \/s). Default = "tab".
-delim2  = template file delimiter. Options: a) "tab", "space", "comma", "semicolon", "colon", "bslash", "slash", "dash", "quote", "squote", b) any symbol or word without spaces. Note: As always, if a symbol used here interferes with the command line, use with backlash (e.g. \/s). Default = "tab".
-delims  = can be set the same way as '-delim1' or '-delim2' but sets both delimiters at once; overrides the other delimiter flags.
-header1  = input file header. Options: "yes", "no". Default = "no".
-header2  = template file header. Options: "yes", "no". Default = "no".
-headers  = can be set the same way as '-header1' or '-header2' but sets both headers at once; overrides the other header flags.
-out  = output file name. Default = input file name + ".out".

Example:
./LFE32 -M templatefilter -file f1.txt -template f2.txt -template-column 2 -value 0.5 -direction L -header1 yes -delims space -column 1
If the value in column '-templatecol' of f2.txt is larger than 0.5 then the corresponding row is extracted from f1.txt. The flag '-column 1' is a dummy but it is required.


RANGEMERGE (fast for number-containing ID's)

Merges two files by ID much like SMARTMERGE2, but does it loosely (allows non-exact matches with a specified range) based on values in specified columns. Also allows to do it only on rows that have a certain ID in the specified column. SMARTMERGE2 flags apply (please see above, info not duplicated here) but there are several additional ones.

./LFE32 -M rangemerge -file1 -file2 -range -anchorcol1 -anchorcol2 -header1 -header2 -headers -missing -delim -show -match -forkloc -id1 -id2 -first-id -second-id -out

Below are listed only the flags that are used in addition to Smartmerge2 (see above):
-range  = what is the maximal difference in values for the two rows to merge. Options: any integer. Default = 0.
-anchorcol1  = file1 column that is used as an anchor (merging only done if matches the file2 column value as defined by '-anchorcol2'). Options: any legal integer, OPTIONAL. Default NA.
-anchorcol2  = file2 column that is used as an anchor (merging only done if matches the file1 column value as defined by '-anchorcol1'). Options: any legal integer, OPTIONAL. Default NA.
-headers  = can be set the same way as '-header1' or '-header2' but sets both headers at once; overrides the other header flags.

Example:
./LFE32 -M rangemerge -file1 f1.txt -file2 f2.txt -range 100 -ancorcol1 4 -anchorcol2 7 -headers yes -missing na -delim space -show all -match always -out merged.txt -forkloc 5 -id1 2 -id2 3 -first-id yes -second-id no
Files are matched using columns 2 and 3 of the respective files if their values differ by less than 100 and both files have the same entries in columns 4 and 7, respectively.

ToomasHaller.com 2017