Core Function Scanf

From Sputnik Wiki
(Difference between revisions)
Jump to: navigation, search
(Example)
(Remarks)
Line 40: Line 40:
  
 
If there are more substrings expected in the format than there are available within str, they will be ignored and you will get what did match.
 
If there are more substrings expected in the format than there are available within str, they will be ignored and you will get what did match.
 +
 +
C and C++ developers have used the scanf() family of functions (scanf(), sscanf(), fscanf(), etc.) as a quick and easy way to parse well-structured input. The basic idea is to be able to specify the format of an input string in a way that allows the function to extract fields from that string.
 +
 +
For example, if your input string is "X123 Y456", you could specify a format string of "X%d Y%d" to extract the two numeric values, 123 and 456. The "%d" tells the function to extract a decimal value at the current location. The parser reads the decimal value until a non-digit character is encountered. The values are then assigned to variables and returned to the caller. In this example, the X and Y are character literals. These are characters in the input string expected to match the same characters in the format string.
 +
 +
Processing stops either when the end of the format string is reached, or when characters in the input string cannot be processed according to the format string.
 +
 +
To be sure, there are limits to this approach. For example, you wouldn't use scanf() to parse source code. It works best will well-structured input that can readily be defined into fields. Thoes who like the regular expressions can use those for parsing well-structured text. However, for cases where scanf() works, or for developers who are accustomed to using scanf(), it provides a simple and convenient way to parse many types of text.
 +
 +
==== The scanf() Format String ====
 +
 +
The scanf() format string provides a flexible way to describe the fields in the input string. Although there are standards, different C compilers seemed to have slightly different rules about the meaning of some parts of the format string. The following definition is for format strings used by the this scanf().
 +
 +
 +
Characters Description
 +
Whitespace Any whitespace characters in the format string causes
 +
the position to advance to the next non-whitespace character
 +
in the input string. Whitespace characters include spaces,
 +
tabs and new lines.
 +
 +
Non-Whitespace except percent (%) Any character that is not a whitespace character or part of a
 +
format specifier (which begins with a % character) advances
 +
past the same matching character in the input string.
 +
 +
Format specifier A sequence that begins with a percent sign (%) to signify a
 +
format specifier, or field, that will be parsed and stored
 +
in a variable. A format specifier has the following form.
 +
 +
%[*][width][modifiers]type
 +
 +
Items within square brackets ([]) are optional. The following table describes elements within the format specifier.
  
 
=== Example ===
 
=== Example ===

Revision as of 19:26, 22 January 2013

Scanf( <expression>, <def> )

Contents

Description

Parses input from a string according to a format.

Parameters

expression

The string to evaluate.

def

The formation string containing the definition of how to parse the string.

extra ...

Optionally pass in variables by reference that will contain the parsed values.

Return Value

If using the extra params

Success: Returns number of matches and fills in the extra variables (Will make some 0 if there was no match found for that variable).

Failure: Returns 0.

If NOT using the extra params

Success: Returns array of all captured objects from the parsed string.

Failure: Returns empty array.

Remarks

If only two parameters were passed to this function, the values parsed will be returned as an array. Otherwise, if optional parameters are passed, the function will return the number of assigned values.

If there are more substrings expected in the format than there are available within str, they will be ignored and you will get what did match.

C and C++ developers have used the scanf() family of functions (scanf(), sscanf(), fscanf(), etc.) as a quick and easy way to parse well-structured input. The basic idea is to be able to specify the format of an input string in a way that allows the function to extract fields from that string.

For example, if your input string is "X123 Y456", you could specify a format string of "X%d Y%d" to extract the two numeric values, 123 and 456. The "%d" tells the function to extract a decimal value at the current location. The parser reads the decimal value until a non-digit character is encountered. The values are then assigned to variables and returned to the caller. In this example, the X and Y are character literals. These are characters in the input string expected to match the same characters in the format string.

Processing stops either when the end of the format string is reached, or when characters in the input string cannot be processed according to the format string.

To be sure, there are limits to this approach. For example, you wouldn't use scanf() to parse source code. It works best will well-structured input that can readily be defined into fields. Thoes who like the regular expressions can use those for parsing well-structured text. However, for cases where scanf() works, or for developers who are accustomed to using scanf(), it provides a simple and convenient way to parse many types of text.

The scanf() Format String

The scanf() format string provides a flexible way to describe the fields in the input string. Although there are standards, different C compilers seemed to have slightly different rules about the meaning of some parts of the format string. The following definition is for format strings used by the this scanf().


Characters Description Whitespace Any whitespace characters in the format string causes the position to advance to the next non-whitespace character in the input string. Whitespace characters include spaces, tabs and new lines.

Non-Whitespace except percent (%) Any character that is not a whitespace character or part of a format specifier (which begins with a % character) advances past the same matching character in the input string.

Format specifier A sequence that begins with a percent sign (%) to signify a format specifier, or field, that will be parsed and stored in a variable. A format specifier has the following form.

%[*][width][modifiers]type

Items within square brackets ([]) are optional. The following table describes elements within the format specifier.

Example

my $RET = Scanf("X123 Y456", "X%d Y%d");
printr($RET);
 
my $RET = Scanf("Copyright 2009-2011 CompanyName (Multi-Word Message)", "Copyright %d-%d %s (%[^)]");
printr($RET);

Not using optional parameters

// getting the serial number
list($serial) = Scanf("SN/2350001", "SN/%d");
// and the date of manufacturing
$mandate = "January 01 2000";
list($month, $day, $year) = Scanf($mandate, "%s %d %d");
println("Item $serial was manufactured on: $year-" . substr($month, 0, 3) . "-$day");

Using optional parameters If optional parameters are passed, the function will return the number of assigned values.

// get author info and generate DocBook entry
$auth = "24\tLewis Carroll";
$n = Scanf($auth, "%d\t%s %s", $id, $first, $last);
print("<author id='$id'>
    <firstname>$first</firstname>
    <surname>$last</surname>
</author>\n");

Example of how to parse a file name without getting the . trapped inside the first %s

$out = scanf('file_name.gif', 'file_%[^.].%s', $fpart1, $fpart2);
println("Name '$fpart1' Ext '$fpart2'");

Example of using [] to spawn a character set

$date = 'january-2008';
// notice it is scanning for all characters a-z and uppercase A-Z
// so it will match any case of the month name
Scanf($date, '%[a-zA-Z]-%d', $month, $year);
println("Parsed values: '$month', '$year'");
Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox