When we use the command line to batch copy files from S3 or to count the number of files, we want to exclude files on S3 that start with_$folder$
The placeholder file at the end, how should this regular expression be written?
Shell Implementation
Here are the statistics of the divisions under a certain position in S3_$folder$
The number of files in the ending file:
aws s3 ls --recursive s3://my-s3-location/ | grep -v '.*_\$folder\$' | wc -l
Filtering with grep is relatively simple because grep has a-v,--invert-match
Parameter: "reverse match", i.e. filter out the rows on match.
Java implementation
In contrast, if it's a java program, it's a little hard to write, because the java regular interface doesn't have a "reverse match" setting, so this is the way to write the regular:^(?!.*[_]\$folder\$$).*$
We'll start withs3-dist-cp
This command, for example, has its--srcPattern
parameter is a Java regular expression that matches the file to be copied if we want to exclude from the copy those annoying S3_$folder$
The document at the end, should be written like this:
nohup s3-dist-cp \ -=599 \ --src=s3://my-hbase-snapshots/usertable-20231205 \ --dest=hdfs://${SINK_CLUSTER_NAMENODES}:8020/user/hbase/ \ --srcPattern='^(?!.*[_]\$folder\$$).*$' \ --multipartUploadChunkSize=1024 &> & tail -f
Supplement:
Regular expression text filtering
grep text filter
The default is to match and display by behavior-based units.
The default match is as long as it contains the pattern character
grep -w is a word-by-word match, which is inconsistent with normal matching
Word separators, numbers plus letters plus underscores all count as part of the word.
grep -f /etc/passwd
Match the line number of the displayed result
grep and relationship and or relationship
1. and grep root /etc/passwd | grep shutdown
2. or grep -e root -e shutdown /etc/passwd
regular expression (math.)
1. Character Matching
. represents an arbitrary character . Placing it inside [] means . itself.
2. Number of matches
Number of occurrences of a character
* :: Indicates that the number of occurrences of the character preceding the * symbol is indeterminate
3. Location anchoring
Beginning of line ^ cannot match the beginning of a string in the middle.
End of line $ Cannot match the end of an intervening string
Word beginnings \<root root is on the leftmost side of the word
Word endings root\> root is on the rightmost side of the word
4. Grouping
1. echo wangwangwangggww | grep "\(wang\)\{3\}"
2. Backward references
The Difference Between Regular Expressions and Wildcards
A regular expression matches the contents of a file or a standard output string, a wildcard matches the name of a file. The two operate on different objects.
Matching String Problems
When the shell executes a command, the regular expression takes the entire output as a string, including the invisible space character.
Some commands output one or more spaces, others do not.
1. \(\) and \{\} must be added before the () symbol and before the {} brackets in an expression.
grep "^\(.*\):.*\1$" /etc/passwd
2. Regular expressions start at the top of the string by default, but if the anchor is at the end of the line, then the regular expression will start at the end.
1. Start searching from the end
2. Start searching from the head
3. Examples of groupings
The first subgroup matches to the string is 7, the last [0-9]*\1 means that it matches to the end of 7 and 7 can contain any number of digits in front of it.
Difference between basic and extended regular
1. Basic Regular Syntax Parentheses and curly braces need to be preceded by the \ symbol for escaping.
grep -w "[0-9]\{2,3\}" /etc/passwd
2. Extended Regular Do not precede parentheses and curly braces with an escape character.
grep -Ew "[0-9]{2,3}" /etc/passwd
egrep -w "[0-9]{2,3}" /etc/passwd
To this regular expression: filter S3 on the _$folder $ end of the placeholder file on the article is introduced to this, more related to regular expression filter placeholder file content, please search for my previous posts or continue to browse the following related articles I hope that you will support me more in the future!