Core Function Scanf

From Sputnik Wiki
Jump to: navigation, search
Scanf( <expression>, <def> )

Contents

Description

Parses input from a string according to a format.

Parameters

expression

The string to evaluate.

def

The formation string containing the definition of how to parse the string.

extra ...

Optionally pass in variables by reference that will contain the parsed values.

Return Value

If using the extra params

Success: Returns number of matches and fills in the extra variables (Will make some 0 if there was no match found for that variable).

Failure: Returns 0.

If NOT using the extra params

Success: Returns array of all captured objects from the parsed string.

Failure: Returns empty array.

Remarks

If only two parameters were passed to this function, the values parsed will be returned as an array. Otherwise, if optional parameters are passed, the function will return the number of assigned values.

If there are more substrings expected in the format than there are available within str, they will be ignored and you will get what did match.

C and C++ developers have used the scanf() family of functions (scanf(), sscanf(), fscanf(), etc.) as a quick and easy way to parse well-structured input. The basic idea is to be able to specify the format of an input string in a way that allows the function to extract fields from that string.

For example, if your input string is "X123 Y456", you could specify a format string of "X%d Y%d" to extract the two numeric values, 123 and 456. The "%d" tells the function to extract a decimal value at the current location. The parser reads the decimal value until a non-digit character is encountered. The values are then assigned to variables and returned to the caller. In this example, the X and Y are character literals. These are characters in the input string expected to match the same characters in the format string.

Processing stops either when the end of the format string is reached, or when characters in the input string cannot be processed according to the format string.

To be sure, there are limits to this approach. For example, you wouldn't use scanf() to parse source code. It works best will well-structured input that can readily be defined into fields. Thoes who like the regular expressions can use those for parsing well-structured text. However, for cases where scanf() works, or for developers who are accustomed to using scanf(), it provides a simple and convenient way to parse many types of text.

The scanf() Format String

The scanf() format string provides a flexible way to describe the fields in the input string. Although there are standards, different C compilers seemed to have slightly different rules about the meaning of some parts of the format string. The following definition is for format strings used by the this scanf().

Characters 				Description
Whitespace 				Any whitespace characters in the format string causes
					the position to advance to the next non-whitespace character
					in the input string. Whitespace characters include spaces,
					tabs and new lines.

Non-Whitespace except percent (%) 	Any character that is not a whitespace character or part of a
					format specifier (which begins with a % character) advances
					past the same matching character in the input string.

Format specifier 			A sequence that begins with a percent sign (%) to signify a
					format specifier, or field, that will be parsed and stored
					in a variable. A format specifier has the following form.

%[*][width][modifiers]type

Items within square brackets ([]) are optional. The following table describes elements within the format specifier.

Element 		Meaning
* 			Indicates that this field is parsed normally but not stored in a variable.
width 			Specifies the maximum number of characters to be read for this field.
modifiers 		If supplied, modifies the size of the data type where the field is stored.
			If not supplied, the default size is used. Supported modifiers are listed
			below.
			hh: For integer fields, the result is stored in an 8-bit variable.
			Ignored for floating point fields.
			h: For integer fields, the result is stored in a 16-bit variable.
			Ignored for floating point fields.
			l For integer fields, the result is stored in a 64-bit variable. Floating
			point fields are stored in a double.
			ll Same effect as the l modifier.
width 			Specifies the maximum number of characters to include in this field.
type 			Specifies the field type as described in the following table.
Type 			Meaning
c 			Reads a single character. If a width > 1 is specified, an array of
			characters is read.
d, i 			Reads a decimal integer. Number may begin with 0 (octal),
			0x (hexadecimal) or a + or - sign.
e, E, f, g,G 		Reads a floating point variable. Number may begin with a + or - sign,
			and may be written using exponential notation.
o 			Reads an unsigned octal integer
s 			Reads a string of characters up to the end of the input string, the
			next whitespace character, or until the number of characters specified
			for the width has been read.
u 			Reads an unsigned decimal integer. Number may begin with 0 (octal),
			0x (hexadecimal) or a + sign.
x, X 			Reads an unsigned hexadecimal integer.
[] 			Reads a string of characters that are included within square brackets.
			For example, "[abc]" will read all characters that are either a, b, or c.
			Use "[^abc]" to read all character that are not a, b, or c. If the
			first character after "[" or after "[^" is "]", the closing square bracket
			is considered to be one of the characters rather than the end of the scanset.
			This supports macros such as [a-z] will read any letter between a-z so if you
			wanted to read only hex chars you could enter [a-zA-Z0-9] and it would work.

Example

my $RET = Scanf("X123 Y456", "X%d Y%d");
printr($RET);
 
my $RET = Scanf("Copyright 2009-2011 CompanyName (Multi-Word Message)", "Copyright %d-%d %s (%[^)]");
printr($RET);

Not using optional parameters

// getting the serial number
list($serial) = Scanf("SN/2350001", "SN/%d");
// and the date of manufacturing
$mandate = "January 01 2000";
list($month, $day, $year) = Scanf($mandate, "%s %d %d");
println("Item $serial was manufactured on: $year-" . substr($month, 0, 3) . "-$day");

Using optional parameters If optional parameters are passed, the function will return the number of assigned values.

// get author info and generate DocBook entry
$auth = "24\tLewis Carroll";
$n = Scanf($auth, "%d\t%s %s", $id, $first, $last);
print("<author id='$id'>
    <firstname>$first</firstname>
    <surname>$last</surname>
</author>\n");

Example of how to parse a file name without getting the . trapped inside the first %s

$out = scanf('file_name.gif', 'file_%[^.].%s', $fpart1, $fpart2);
println("Name '$fpart1' Ext '$fpart2'");

Example of using [] to spawn a character set

$date = 'january-2008';
// notice it is scanning for all characters a-z and uppercase A-Z
// so it will match any case of the month name
Scanf($date, '%[a-zA-Z]-%d', $month, $year);
println("Parsed values: '$month', '$year'");
Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox