Shell and Vim on Speech and Language Processing

UNIX Shell Cmds

Basic operations
… … , … , ~ … for parent directory . . means current directory, ~ means home
. = ls, ls is the list, which shows the names of the files in the directory.
ls ~ means the list of home directory
ls -l show all the subfolders under this folder in detail
ls -l is similar to ls -l
ls -l Documents/*.pdf prints out all the pdf files in the Documents directory
ls -al show details of the files in no directory (dash: files, f: folders, you can put files
pwd print working directory prints the address of the directory you are in at the moment
cd , cd . /, cd ~ first directory
cd /… /… /… /… cd is a change directory, which goes to a particular file address
cd /… /… /… /; ls goes to a file address and then displays all file names
ls - returns to the previous directory
cd … go to the previous directory
cd … /… /… /… /… exit back to the original n parent directory
q quit
clear clear code
← → Toggle code back and forth, so you can easily use repetitive code multiple times.
history View history, even after shutting down the machine.
echo Print like other languages’ PRINT operations
wc -l fish count the number of lines in the file that have fish in them
wc -c fish count the number of files with fish in them
File Operations
touch A create new file name A
ls -l A View file A details
mv A B change the name of A to B
mv ‘…/A.txt’ Documents/Books Move the A text file from somewhere to the Books folder under Documents
mv ‘…/*txt’ Documents/Books move all text files from a location to the Books folder under Documents
rm B Delete file B directly without trash, this is permanently deleted
rm -i B Ask before deleting B (recommended)
cat A.txt concatenate/catenate (concatenate, make continuous) means you can run some files at the same time, here it will run and print out the txt file
more A.txt print out the contents of the A text file completely, and then type “/filename” to find the file you are looking for
less A.txt You can use the up and down arrows to navigate through the printed text or space to go up and down
source .bash_aliases Run this bash_aliases file
nano A.txt you can enter the edit mode of the document A, you can use Ctrl + S to save the modified file after the change
find / -name “A” search for the file with the name “A”
find / -name “A” 2>/dev/null Search for files with the file name “A” and only view the results as valid
grep E find document E (recommended)
grep E /A/B/C Regularize search for document E in the specified location
grep $USER anchor the end of a line e.g. ‘grep$’ matches all lines ending with user
folder operations
mkdir A create new folder A
mkdir A/C create subfolder C of folder A
mv A B rename folder A to B
rmdir B If folder B is empty, you can remove it directly, it will be deleted directly without trash (not recommended)
rm -ir B Delete the files in the folder one after another (recommended)
network operations
curl ‘(http://xiaos.site c url = see url, will download the resource code of the web page (doesn’t work often)
curl -L ‘http://xiaos.site follow redirect, will download the resource code of the web page (recommended)
curl -o robertzhangxiao.html-L ‘(http://xiaos.site will directly download the html file from this site and save it
curl -L ‘(http://xiaos.site‘ in the vertical line grep fish look in the downloaded file
variables
numbers=’XXX’ define variables without spaces in the equal sign
echo $numbers Output variables
echo $LINES x $COLUMNS output the row variable
echo $PATH output path environment variable, here is to output the pragram address
Shell Scripts file followed by sh
bin is the binary
ls bin all binary files, assuming it will output magic
bin/magic run this binary file called magic
PATH=$PATH:/Users/student/bin You can do the same if you type magic
Note: Not all sh files can be run on linux systems, but not on macs and win.
console
type PS1=’$’ will remove the header name
aliases ll=’ls -la’ will make the long code shorter, and then just type ll
aliases View all aliases variables
1
cp  -r S0252/S0252_mic/* ./S0150/S0150_mic/

Copy all the data from the “S0252/S0252_mic/“ directory to “/S0150/S0150_mic/“ directory. “-r” means copy directly without any warnings.

1
head ...txt

Just check the first few lines of the text.

1
wc -l ...txt

Check how many lines of the txt

1
du -sh

Check the size of the directory.

1
du -h --max-depth=1 /.

Check all the directory size under the current directory.

1
cat ...txt | tr '[:upper:]' '[:lower]'

We can translate the upper case words in that file into lower case.

1
cat ...txt | tr '[:upper:]' '[:lower]' | grep -o "[a-z]"

Print the document letter by letter.

1
2
3
4
a
d
c
b
1
cat ...txt | tr '[:upper:]' '[:lower]' | sort

Print the document letter by letter and sort them.

1
2
3
4
a
b
c
d
1
cat ...txt | tr '[:upper:]' '[:lower]' | sort | uniq -c 

Print how many each letter occur.

1
2
3
4
100 a
125 b
31 c
22 d
1
cat ...txt | tr '[:upper:]' '[:lower]' | sort | uniq -c | sort -nr  ### here the "r" in "nr" means reverse the sorting, means from the up to the bottom and vice versa.

Print how many each letter occur by the frequency.

1
2
3
4
125  b
100 a
22 d
31 c

Using Egrep to read the column:

There is a .lab speech file, which is labbeled as well:

Here the first column is the timming, second is the frequency, and the third is the labelled data.

1
2
3
4
0.1213 123 y
0.1232 111 uw
0.2113 110 eh
.............

So we now need to read all the third column information, we use egrep:

1
2
egrep -h -o "[a-z]{1,2}$" *.lab  ### we are looking for the lower case letters, $ means that they are happened at the end of the line

This will print:

1
2
3
y
uw
eh
1
egrep -h -o "[a-z]{1,2}$" *.lab | sort | uniq -c | sort -nr 

This will print the each phone frequency in reverse order:

1
2
3
121 y
120 uw
110 eh
1
ls | wc -l

Check how many files in one directory

1
rm -rf ./

Delete the current directory. No warrning will occur.

1
cat ./.../*.txt

Print all the .txt files in that directory.

1
cat ./.../*.txt > ./text

Print all the .txt file’s content in that text file

1
python3 ./.../..py > ./text

print the .py running results on text file.

file .wav :
Check the identity of the wav file size

Use mv to change the file name:

1
mv ./../../.py ./../../.py

We can use remove to change the file’s name.

1
which ...

Check where … is, the location of …

1
ll -lh

check all the files’ size

If there has a space in the beginning of the file’s name, we just need to delete it.

1
sed 's|^ ||'

Adding a “_” in the middle of the file name:
eg. SPKID 09912 into SPKID_09912, g means globally.

1
sed 's| |_|g'

Or

1
sed 's|SPKID|SPKID_|'

align two files:

1
paste -d ' ' wav.scp wav_id > tmp.txt 

Delete each lines’ particular words by grep:

1
pip freeze | grep -v "@ the things you want to remove" > requirements.txt

If we want to have a better shell scripting way like preparing those files, we can just do:

1
2
3
4
5
6
7
8
9
mkdir -p data/voxceleb1_train

# get all the .wav file path, eg. /data/voxceleb1/dev/id1231/...wav
find /data/voxceleb1/dev -name *.wav > data/voxceleb1_train/temp.lst

# generate the wav.scp, eg. id1231 data/voxceleb1/dev/id1231/...wav
# 1st. using split to cut "a" text with "/"
# 2st. cut the a[8] value with "." and save into the "b"
awk '{split($0, a, "/"); {split(a[8], b,".")}; print a[6]"-"a[7]"-"b[1], $1}' data/voxceleb1_train/temp.list > data/voxceleb1_train/wav.scp
1
2
3
# 1. Delete ".wav" into " "
# 2.
sed 's/\.wav//g' /data/the_text_we_need_to_handle.txt | awk '{if($1 = "1"){print $2}esle{print $2, $3}}' > processed.txt

Vim:

To the top:

1
GG

To the bottom:

1
gg

vim name+tab :

1
auto-type the name 

auto sort:

1
:sort

check how many lines:

1
:set number

delete one line:

1
dd

search the “keyword”

1
/"keyword"

check the difference between two different files:

1
vimdiff A.txt B.txt

Shell and Vim on Speech and Language Processing
http://xiaos.site/2022/07/11/Shell-and-Vim-on-Speech-and-Language-Processing/
Author
Xiao Zhang
Posted on
July 11, 2022
Licensed under