Core Function Find

From Sputnik Wiki

(Difference between revisions)

Jump to: navigation, search

Revision as of 09:01, 5 August 2014

Find( <string>, <needle>, <offset>, <plain> )

 . --- (a dot) represents all characters. 
%a --- all letters. 
%c --- all control characters. 
%d --- all digits. 
%l --- all lowercase letters. 
%p --- all punctuation characters. 
%s --- all space characters. 
%u --- all uppercase letters. 
%w --- all alphanumeric characters. 
%x --- all hexadecimal digits. 
%z --- the character with hex representation 0x00 (null). 
%% --- a single '%' character.
%1 --- captured pattern 1.
%2 --- captured pattern 2 (and so on).

Important! - the uppercase versions of the above represent the complement of the class. eg. %U represents everything except uppercase letters, %D represents everything except digits.

There are some "magic characters" (such as %) that have special meanings. These are:

^ $ ( ) % . [ ] * + - ?

If you want to use those in a pattern (as themselves) you must precede them by a % symbol.

eg. %% would match a single %

As with normal regular expressions you can build your own pattern classes by using square brackets, eg.

[abc] ---> matches a, b or c [a-z] ---> matches lowercase letters (same as %l) [^abc] ---> matches anything except a, b or c [%a%d] ---> matches all letters and digits [%a%d_] ---> matches all letters, digits and underscore [%[%]] ---> matches square brackets (had to escape them with %)

The repetition characters are:

+  ---> 1 or more repetitions (greedy)
*  ---> 0 or more repetitions (greedy)
-  ---> 0 or more repetitions (non greedy)
?  ---> 0 or 1 repetition only

The standard "anchor" characters apply:

^  ---> anchor to start of subject string
$  ---> anchor to end of subject string

You can also use round brackets to specify "captures", similar to normal regular expressions:

You see (.*) here

Here, whatever matches (.*) becomes the first pattern.

You can also refer to matched substrings (captures) later on in an expression:

printr find ("You see dogs and dogs", "You see (.*) and %1"); // 1    21    dogs
printr find ("You see dogs and cats", "You see (.*) and %1"); // NULL

This example shows how you can look for a repetition of a word matched earlier, whatever that word was ("dogs" in this case).

As a special case, an empty capture string returns as the captured pattern, the position of itself in the string. eg.

printr find ("You see dogs and cats", "You .* ()dogs .*"); // 1    21    9

What this is saying is that the word "dogs" starts at column 9.

Finally you can look for nested "balanced" things (such as parentheses) by using %b, like this:

printr find ("I see a (big fish (swimming) in the pond) here", "%b()"); // 9    41

After %b you put 2 characters, which indicate the start and end of the balanced pair. If it finds a nested version it keeps processing until we are back at the top level. In this case the matching string was "(big fish (swimming) in the pond)".

Example

Search for raw text in a string (No patterns)

my $Test = "Hello cat world!";
printr Find($Test, "cat", 0, true);
// Prints
// Array
// (
//     [0] => 6
//     [1] => 8
// )

Search for raw text in a string but handle the result manually

my $Test = "the quick brown fox";
my List ($Pos, $Len) = Find($Test, "brown", 0, true);
say "Position: $Pos";
say "EndPosition: $Len";
say "String: $Text";
// Prints
// Position: 10
// EndPosition: 14

Use a pattern to find it note it only returns the index and size when it cant find any group matches

my $Test = "Hello cat world!";
printr Find($Test, "cat");
// Prints
// Array
// (
//     [0] => 6
//     [1] => 8
// )

Another pattern match with no groups

my $Test = "the quick brown fox";
printr Find($Test, "quick");
// Prints
// Array
// (
//     [0] => 4
//     [1] => 8
// )

A group capture pattern this time

my $Test = "the quick brown fox";
printr Find($Test, "(%a+)");
// Prints
// Array
// (
//     [0] => 0
//     [1] => 2
//     [2] => the
// )

Another group capture pattern this time

my $Test = "the quick brown fox";
printr Find($Test, "(%a+)", 10);
// Prints
// Array
// (
//     [0] => 10
//     [1] => 14
//     [2] => brown
// )

Another pattern this time but we will handle the capture ourself

my $Test = "the quick brown fox";
my List ($Pos, $PosEnd) = Find($Test, "(%a+)", 10);
say "Position: $Pos";
say "PosEnd: $PosEnd";
say "String: " . substr($Test, $Pos, strlen($Test) - $PosEnd);
// Prints
// Position: 10
// PosEnd: 14
// String: brown

What happens when no match is found with regular expressions

my $Test = "the quick brown fox";
printr vardump(Find($Test, "fruit"));
// Prints
// NULL

What happens when no match is found without regular expressions

my $Test = "the quick brown fox";
printr vardump(Find($Test, "fruit", 0, true));
// Prints
// NULL

More examples

my $Test = "You see dogs and dogs";
printr Find($Test, "You see (.*)");
// Prints
// Array
// (
//     [0] => 0
//     [1] => 20
//     [2] => dogs and dogs
// )

You can also refer to matched substrings (captures) later on in an expression:

my $Test = "You see dogs and dogs";
printr Find($Test, "You see (.*) and %1");
// Prints
// Array
// (
//     [0] => 0
//     [1] => 20
//     [2] => dogs
// )

As shown here when the matched substring is not found NULL is returned

my $Test = "You see dogs and cats";
printr vardump(Find($Test, "You see (.*) and %1"));
// Prints
// NULL

Another example of referring to matched substrings (captures) later on in an expression:

my $Test = "You sir see dogs and dogs = sir";
printr Find($Test, "You (.*) see (.*) and %2 = %1");
// Prints
// Array
// (
//     [0] => 0
//     [1] => 30
//     [2] => sir
//     [3] => dogs
// )

Core Function Find

Revision as of 09:01, 5 August 2014

Contents

Description

Parameters

string

needle

offset

plain

Return Value

Remarks

Patterns

Example

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Toolbox

@@ Line 1: / Line 1: @@
 <pre>
-Find( <string>, <needle>, <offset>, <plain>, <ignoreCase> )
+Find( <string>, <needle>, <offset>, <plain> )
 </pre>
 === Description ===
-Search for a match in a string and return the match with its found position and length.
+Find the first occurrence of the pattern in the string passed.
 === Parameters ===
@@ Line 12: / Line 12: @@
 The string to evaluate.
+==== needle ====
+The needle to search for.
+See Remarks because the needle can be a complex pattern
 ==== offset ====
@@ Line 19: / Line 25: @@
 Default: 0
-==== needle ====
+==== plain ====
-The needle to search for.
+Optional; Flag to indicate if the operations should use patterns or not.
+<pre>
+true = treat the needle as a pattern
+false = treat the needle as plain text
+</pre>
+Default: true
+=== Return Value ===
+Success: Returns a pair of values representing the start and end of the string.
+Failure: Returns NULL.
+=== Remarks ===
+This function is pretty much the same a the LUA String.Find() however this one returns start position starting at 0 (LUA Find() starts at 1) and lowers the end position by 1.
-(Can be a regular expression pattern)
+This is because in Sputnik chars in a string start at 0 not 1.
-If using a regular expression pattern it works pretty much like all regular expressions in Sputnik but there are some exceptions and some additional features.
+==== Patterns ====
-You should read the regular expression page [[Core Function Regex Match|here]] to learn more about what patterns you can use in them.
+The standard patterns you can search for are:
-Additional character classes for Find()
 <pre>
+ . --- (a dot) represents all characters.
 %a --- all letters.
 %c --- all control characters.
@@ Line 48: / Line 71: @@
 Important! - the uppercase versions of the above represent the complement of the class. eg. %U represents everything except uppercase letters, %D represents everything except digits.
-==== plain ====
+There are some "magic characters" (such as %) that have special meanings. These are:
-Optional; Flag to indicate if the operations should use regular expressions or not.
+^ $ ( ) % . [ ] * + - ?
-<pre>
+If you want to use those in a pattern (as themselves) you must precede them by a % symbol.
-true = treat the needle as a regular expression
-false = treat the needle as plain text
-</pre>
-Default: true
+eg. %% would match a single %
+As with normal regular expressions you can build your own pattern classes by using square brackets, eg.
-==== ignoreCase ====
+[abc] ---> matches a, b or c
+[a-z] ---> matches lowercase letters (same as %l)
+[^abc] ---> matches anything except a, b or c
+[%a%d] ---> matches all letters and digits
+[%a%d_] ---> matches all letters, digits and underscore
+[%[%]] ---> matches square brackets (had to escape them with %)
-Optional; Flag to indicate if the operations should use regular expressions or not.
+The repetition characters are:
 <pre>
-true = use case insensitive search
++  ---> 1 or more repetitions (greedy)
-false = use case sensitive search
+*  ---> 0 or more repetitions (greedy)
+-  ---> 0 or more repetitions (non greedy)
+?  ---> 0 or 1 repetition only
 </pre>
-Default: false
+The standard "anchor" characters apply:
-=== Return Value ===
+<pre>
+^  ---> anchor to start of subject string
+$  ---> anchor to end of subject string
+</pre>
-Success: Returns A array of the single match or an array of all arrayed captured.
+You can also use round brackets to specify "captures", similar to normal regular expressions:
-Failure: Returns NULL.
+You see (.*) here
-=== Remarks ===
+Here, whatever matches (.*) becomes the first pattern.
-Will seek to return single captures as a single array.
+You can also refer to matched substrings (captures) later on in an expression:
-=== Example ===
+<syntaxhighlight lang="sputnik">
+printr find ("You see dogs and dogs", "You see (.*) and %1"); // 1    21    dogs
+printr find ("You see dogs and cats", "You see (.*) and %1"); // NULL
+</syntaxhighlight>
-Search for raw text in a string (No patterns)
+This example shows how you can look for a repetition of a word matched earlier, whatever that word was ("dogs" in this case).
+As a special case, an empty capture string returns as the captured pattern, the position of itself in the string. eg.
 <syntaxhighlight lang="sputnik">
-my $Test = "Hello cat world!";
+printr find ("You see dogs and cats", "You .* ()dogs .*"); // 1    21    9
-printr Find($Test, "cat", 0, true);
-// Prints
-// Array
-// (
-//     [0] => cat
-//     [1] => 6
-//     [2] => 3
-// )
 </syntaxhighlight>
-Search for raw text in a string (No patterns) case insensitively
+What this is saying is that the word "dogs" starts at column 9.
+Finally you can look for nested "balanced" things (such as parentheses) by using %b, like this:
+<syntaxhighlight lang="sputnik">
+printr find ("I see a (big fish (swimming) in the pond) here", "%b()"); // 9    41
+</syntaxhighlight>
+After %b you put 2 characters, which indicate the start and end of the balanced pair. If it finds a nested version it keeps processing until we are back at the top level. In this case the matching string was "(big fish (swimming) in the pond)".
+=== Example ===
+Search for raw text in a string (No patterns)
 <syntaxhighlight lang="sputnik">
 my $Test = "Hello cat world!";
-printr Find($Test, "CAT", 0, true, true);
+printr Find($Test, "cat", 0, true);
 // Prints
 // Array
 // (
-//     [0] => cat
+//     [0] => 6
-//     [1] => 6
+//     [1] => 8
-//     [2] => 3
 // )
 </syntaxhighlight>
@@ Line 113: / Line 154: @@
 <syntaxhighlight lang="sputnik">
 my $Test = "the quick brown fox";
-my List ($Text, $Pos, $Len) = Find($Test, "brown", 0, true);
+my List ($Pos, $Len) = Find($Test, "brown", 0, true);
 say "Position: $Pos";
-say "Length: $Len";
+say "EndPosition: $Len";
 say "String: $Text";
 // Prints
 // Position: 10
-// Length: 5
+// EndPosition: 14
-// String: brown
 </syntaxhighlight>
-Use regular expressions to find it note it only returns the index and size when it cant find any group matches
+Use a pattern to find it note it only returns the index and size when it cant find any group matches
 <syntaxhighlight lang="sputnik">
@@ Line 132: / Line 172: @@
 // (
 //     [0] => 6
-//     [1] => 3
+//     [1] => 8
 // )
 </syntaxhighlight>
-Another regular expression match with no groups
+Another pattern match with no groups
 <syntaxhighlight lang="sputnik">
 my $Test = "the quick brown fox";
@@ Line 144: / Line 184: @@
 // (
 //     [0] => 4
-//     [1] => 5
+//     [1] => 8
 // )
 </syntaxhighlight>
-A group capture regular expression this time
+A group capture pattern this time
 <syntaxhighlight lang="sputnik">
 my $Test = "the quick brown fox";
@@ Line 155: / Line 195: @@
 // Array
 // (
-//     [0] => the
+//     [0] => 0
-//     [1] => 0
+//     [1] => 2
-//     [2] => 3
+//     [2] => the
 // )
 </syntaxhighlight>
-Another group capture regular expression this time
+Another group capture pattern this time
 <syntaxhighlight lang="sputnik">
 my $Test = "the quick brown fox";
@@ Line 168: / Line 208: @@
 // Array
 // (
-//     [0] => brown
+//     [0] => 10
-//     [1] => 10
+//     [1] => 14
-//     [2] => 5
+//     [2] => brown
 // )
 </syntaxhighlight>
-Another regular expression this time but we will handle the capture ourself
+Another pattern this time but we will handle the capture ourself
 <syntaxhighlight lang="sputnik">
 my $Test = "the quick brown fox";
-my List ($Pos, $Len) = Find($Test, "%a+", 10);
+my List ($Pos, $PosEnd) = Find($Test, "(%a+)", 10);
 say "Position: $Pos";
-say "Length: $Len";
+say "PosEnd: $PosEnd";
-say "String: " . substr($Test, $Pos, $Len);
+say "String: " . substr($Test, $Pos, strlen($Test) - $PosEnd);
 // Prints
 // Position: 10
-// Length: 5
+// PosEnd: 14
 // String: brown
 </syntaxhighlight>
@@ Line 211: / Line 251: @@
 // Array
 // (
-//     [0] => dogs
+//     [0] => 0
-//     [1] => 8
+//     [1] => 20
-//     [2] => 4
+//     [2] => dogs and dogs
 // )
 </syntaxhighlight>
@@ Line 225: / Line 265: @@
 // Array
 // (
-//     [0] => dogs
+//     [0] => 0
-//     [1] => 8
+//     [1] => 20
-//     [2] => 4
+//     [2] => dogs
 // )
 </syntaxhighlight>
@@ Line 248: / Line 288: @@
 // Array
 // (
-//     [1] => Array
+//     [0] => 0
-//         (
+//     [1] => 30
-//             [0] => sir
+//     [2] => sir
-//             [1] => 4
+//     [3] => dogs
-//             [2] => 3
-//         )
-//     [2] => Array
-//         (
-//             [0] => dogs
-//             [1] => 12
-//             [2] => 4
-//         )
 // )
 </syntaxhighlight>
 [[Category:Core Function]]