Info
I often use the following arguments to perl:
- -e Makes the line of code be executed instead of a script
- -n Forces your line to be called in a loop. Allows you to take lines from the diamond operator (or stdin)
- -p Forces your line to be called in a loop. Prints $_ at the end
Perl One-Liners
One-Liner: Count the number of times a specific character appears in each line
- This counts the number of quotation marks in each line and prints it
perl -ne '$cnt = tr/"//;print "$cnt\n"' inputFileName.txt
One-Liner: Add string to beginning of each line
- Adds string to each line, followed by tab
perl -pe 's/(.*)/string\t$1/' inFile > outFile
One-Liner: Add newline to end of each line
- Append a new line to each line
perl -pe 's//\n/' all.sent.classOnly > all.sent.classOnly.sep
One-Liner: Print only some columns of a file
- Columns separated by a space
cut fileWithLotsOfColumns.txt -d" " -f 1,2,3,4 > fileWithOnlyFirst4Cols.txt
One-Liner: Print all columns except the first
cut -d" " -f 1 --complement filename > filename.
One-Liner: Replace a pattern with another one inside the file with backup
- Replace all occurrences of pattern1 (e.g. [0-9]) with pattern2
perl -p -i.bak -w -e 's/pattern1/pattern2/g' inputFile
One-Liner: Print only non-uppercase letters
- Go through file and only print words that do not have any uppercase letters.
perl -ne 'print unless m/[A-Z]/' allWords.txt > allWordsOnlyLowercase.txt
One-Liner: Print one word per line
- Go through file, split line at each space and print words one per line.
perl -ne 'print join("\n", split(/ /,$_));print("\n")' someText.txt > wordsPerLine.txt
One-Liner: Kill all screen sessions (no remorse)
- Since there's no screen command that would kill all screen sessions regardless of what they're doing, here's a perl one-liner that really kills ALL screen sessions without remorse.
screen -ls | perl -
ne '/(\d+)\./;print $1' | xargs -l
kill -
9
- The killall command may also do the job...
One-Liner: Return all unique words in a text document (divided by spaces), sorted by their counts (how often they appear)
- assuming no punctuation marks:
perl -ne 'print join("\n", split(/\s+/,$_));print("\n")' documents.txt > wordsOnePerLine.txt
cat wordsOnePerLine.txt | sort | uniq -c | sort -n > wordCountsSorted.txt
One-Liner: Delete all special characters
- or in other words, delete every character that is not a letter, white space or line end (replace with nothing)
perl -pne 's/[^a-zA-Z\s]*//g' text_withSpecial.txt > text_lettersOnly.txt
One-Liner: Lower case everything
perl -pne 'tr/[A-Z]/[a-z]/' textWithUpperCase.txt > textwithoutuppercase.txt;
One-Liner Combination: Combine lower-casing with word counting and sorting
perl -pne
'tr/[A-Z]/[a-z]/' sentences.txt | perl -
ne 'print join("\n", split(/ /,$_));print("\n")' |
sort | uniq -c |
sort -n
One-Liner: Print only one column
- Print only the second column of the data when using tabular as a separator
perl -ne '@F = split("\t", $_); print "$F[1]";' columnFileWithTabs.txt > justSecondColumn.txt
One-Liner: Print only text between tags
perl -ne 'if (m/\<a\>(.*?)\<\/a\>/g){print "$1\n"}' textFile
- The same as a script:
- Extracting multiple multiline patterns between a start and an end tag
- Here, we want to extract everything between <parse> and </parse>.
#!/usr/bin/perl -w
local $/;
open(DAT,
"yourFile.xml") ||
die("Could not open file!");
my $content =
<DAT>;
while ($content =~
m/
<parse>(.*?
)<\/parse>/sg
){
print "$1\n"
};
One-Liner: Sort lines by their length
perl -e 'print sort {length $a <=> length $b} <>' textFile
One-Liner: Print second column, unless it contains a number
perl -lane 'print $F[1] unless $F[1] =~ m/[0-9]/' wordCounts.txt
One-Liner: Trim/ Collapse white spaces and replace new lines by something else
echo "The cat sat on the mat
asd sad das " | perl -ne 's/\n/ /; print $_; print(";")' | perl -ne 's/\s+/ /g; print $_'
One-Liner: Get the average of one column from certain lines
grep "another criterion" thisDataFile.txt | perl -
ne '@F = split(",", $_); print "$F[29]\n";' | awk
'{sum+=$1} END { print "Average = ",sum/NR}'
One-Liner: How to sort a file by a column
- Columns are separated by a space, we sort numerically (-n) and we sort by the 10'th column (-k10)
- bash does the job here, no perl needed ;)
sort -t' ' -n -k10 eSet1_both.txt
One-Liner: Replace specific space but also copy a group of matches
- matches a group of numbers in the beginning of a line
perl -p -i.bak -w -e 's/^([0-9]+) "/$1\t"/g' someFile.txt
More info
Sending multiple commands to screen session on different machines
#!/usr/bin/perl -w
# This script creates screen sessions, ssh's to machines and executes code on these machines.
# parameters: -s (start) -r (run) -q (quit)
# HowTo:
# 1) change the executed code, property folder and prefix to your values
# 2) select your machines
# 3) on the machine where you want your screen sessions run to start your sessions: ./clusterSubmitJobs.pl -s
# 4) once you're done and want to quit all your sessions: ./clusterSubmitJobs.pl -q
# author: richard socher.org
use strict;
use Getopt::
Std;
use List::
Util qw[min max
];
my %options=
();
getopts
("srq",\
%options);
#------------------
# files to be considered
my $folder =
'/folderWithInputFiles';
my $prefix =
'tests_';
my $ext =
'.config';
# code to run with files
my $code =
'./runMyScript.sh -configFile ';
# deprecated by mstat
my @freemachines =
('machine1.yourPlace.edu',
'machine2.yourPlace.edu');
#-------------------
my $full =
$folder .
$prefix .
'*' .
$ext;
print "Using files: $full \n";
my @files = <
$full*>;
my $numMachines =
@freemachines;
my $numFiles =
@files;
my $minNum = min
($numMachines,
$numFiles);
for (my $i =
0;
$i <
$minNum;
$i++
) {
if ($options{s}){
print "Creating screen session: freemachines[$i] for \t $files[$i] \n";
system("screen -d -m -S $freemachines[$i]");
system("screen -S $freemachines[$i] -p 0 -X stuff \"ssh $freemachines[$i]\015\"");
}
if ($options{r
}){
print "run: screen -S $freemachines[$i] -p 0 -X stuff \"$code $files[$i]\"\n";
system("screen -S $freemachines[$i] -p 0 -X stuff \"$code $files[$i]\015\"");
}
if ($options{q}){
print "screen -S $freemachines[$i] -p 0 -X stuff \"exit\n";
system("screen -S $freemachines[$i] -p 0 -X stuff \"exit\015\"");
system("screen -S $freemachines[$i] -p 0 -X stuff \"exit\015\"");
}
}
How to install new CPAN modules?
perl -MCPAN -e shell # go to CPAN install mode
install Bundle::CPAN # update CPAN
reload cpan
install Set::Scalar