Method
GLibMatchInfofetch_pos
since: 2.14
Declaration [src]
gboolean
g_match_info_fetch_pos (
const GMatchInfo* match_info,
gint match_num,
gint* start_pos,
gint* end_pos
)
Description [src]
Returns the start and end positions (in bytes) of a successfully matching capture parenthesis.
Valid values for match_num are 0 for the full text of the match,
1 for the first paren set, 2 for the second, and so on.
As end_pos is set to the byte after the final byte of the match (on success),
the length of the match can be calculated as end_pos - start_pos.
As a best practice, initialize start_pos and end_pos to identifiable
values, such as G_MAXINT, so that you can test if
g_match_info_fetch_pos() actually changed the value for a given
capture parenthesis.
The parameter match_num corresponds to a matched capture parenthesis. The
actual value you use for match_num depends on the method used to generate
match_info. The following sections describe those methods.
Methods Using Non-deterministic Finite Automata Matching
The methods g_regex_match() and g_regex_match_full()
return a GMatchInfo using traditional (greedy) pattern
matching, also known as
Non-deterministic Finite Automaton
(NFA) matching. You pass the returned GMatchInfo from these methods to
g_match_info_fetch_pos() to determine the start and end positions
of capture parentheses. The values for match_num correspond to the capture
parentheses in order, with 0 corresponding to the entire matched string.
match_num can refer to a capture parenthesis with no match. For example,
the string b matches against the pattern (a)?b, but the capture parenthesis (a) has no match. In this case, g_match_info_fetch_pos()
returns true and sets start_pos and end_pos to -1 when called with
match_num as 1 (for (a)).
For an expanded example, a regex pattern is (a)?(.*?)the (.*),
and a candidate string is glib regexes are the best. In this scenario
there are four capture parentheses numbered 0–3: an implicit one
for the entire string, and three explicitly declared in the regex pattern.
Given this example, the following table describes the return values from g_match_info_fetch_pos() for various values of match_num.
match_num |
Contents | Return value | Returned start_pos |
Returned end_pos |
|---|---|---|---|---|
| 0 | Matches entire string | True | 0 | 25 |
| 1 | Does not match first character | True | -1 | -1 |
| 2 | All text before the |
True | 0 | 17 |
| 3 | All text after the |
True | 21 | 25 |
| 4 | Capture paren out of range | False | Unchanged | Unchanged |
The following code sample and output implements this example.
#include <glib.h>
int
main (int argc, char *argv[])
{
g_autoptr(GError) local_error = NULL;
const char *regex_pattern = "(a)?(.*?)the (.*)";
const char *test_string = "glib regexes are the best";
g_autoptr(GRegex) regex = NULL;
regex = g_regex_new (regex_pattern,
G_REGEX_DEFAULT,
G_REGEX_MATCH_DEFAULT,
&local_error);
if (regex == NULL)
{
g_printerr ("Error creating regex: %s\n", local_error->message);
return 1;
}
g_autoptr(GMatchInfo) match_info = NULL;
g_regex_match (regex, test_string, G_REGEX_MATCH_DEFAULT, &match_info);
int n_matched_strings = g_match_info_get_match_count (match_info);
// Print header line
g_print ("match_num Contents Return value returned start_pos returned end_pos\n");
// Iterate over each capture paren, including one that is out of range as a demonstration.
for (int match_num = 0; match_num <= n_matched_strings; match_num++)
{
gboolean found_match;
g_autofree char *paren_string = NULL;
int start_pos = G_MAXINT;
int end_pos = G_MAXINT;
found_match = g_match_info_fetch_pos (match_info,
match_num,
&start_pos,
&end_pos);
// If no match, display N/A as the found string.
if (start_pos == G_MAXINT || start_pos == -1)
paren_string = g_strdup ("N/A");
else
paren_string = g_strndup (test_string + start_pos, end_pos - start_pos);
g_print ("%-9d %-25s %-12d %-18d %d\n", match_num, paren_string, found_match, start_pos, end_pos);
}
return 0;
}
match_num Contents Return value returned start_pos returned end_pos
0 glib regexes are the best 1 0 25
1 N/A 1 -1 -1
2 glib regexes are 1 0 17
3 best 1 21 25
4 N/A 0 2147483647 2147483647
Methods Using Deterministic Finite Automata Matching
The methods g_regex_match_all() and
g_regex_match_all_full()
return a GMatchInfo using
Deterministic Finite Automaton
(DFA) pattern matching. This algorithm detects overlapping matches. You pass
the returned GMatchInfo from these methods to g_match_info_fetch_pos()
to determine the start and end positions of each overlapping match. Use the
method g_match_info_get_match_count() to determine the number
of overlapping matches.
For example, a regex pattern is <.*>, and a candidate string is
<a> <b> <c>. In this scenario there are three implicit capture
parentheses: one for the entire string, one for <a> <b>, and one for <a>.
Given this example, the following table describes the return values from
g_match_info_fetch_pos() for various values of match_num.
match_num |
Contents | Return value | Returned start_pos |
Returned end_pos |
|---|---|---|---|---|
| 0 | Matches entire string | True | 0 | 11 |
| 1 | Matches <a> <b> |
True | 0 | 7 |
| 2 | Matches <a> |
True | 0 | 3 |
| 3 | Capture paren out of range | False | Unchanged | Unchanged |
The following code sample and output implements this example.
#include <glib.h>
int
main (int argc, char *argv[])
{
g_autoptr(GError) local_error = NULL;
const char *regex_pattern = "<.*>";
const char *test_string = "<a> <b> <c>";
g_autoptr(GRegex) regex = NULL;
regex = g_regex_new (regex_pattern,
G_REGEX_DEFAULT,
G_REGEX_MATCH_DEFAULT,
&local_error);
if (regex == NULL)
{
g_printerr ("Error creating regex: %s\n", local_error->message);
return -1;
}
g_autoptr(GMatchInfo) match_info = NULL;
g_regex_match_all (regex, test_string, G_REGEX_MATCH_DEFAULT, &match_info);
int n_matched_strings = g_match_info_get_match_count (match_info);
// Print header line
g_print ("match_num Contents Return value returned start_pos returned end_pos\n");
// Iterate over each capture paren, including one that is out of range as a demonstration.
for (int match_num = 0; match_num <= n_matched_strings; match_num++)
{
gboolean found_match;
g_autofree char *paren_string = NULL;
int start_pos = G_MAXINT;
int end_pos = G_MAXINT;
found_match = g_match_info_fetch_pos (match_info, match_num, &start_pos, &end_pos);
// If no match, display N/A as the found string.
if (start_pos == G_MAXINT || start_pos == -1)
paren_string = g_strdup ("N/A");
else
paren_string = g_strndup (test_string + start_pos, end_pos - start_pos);
g_print ("%-9d %-25s %-12d %-18d %d\n", match_num, paren_string, found_match, start_pos, end_pos);
}
return 0;
}
match_num Contents Return value returned start_pos returned end_pos
0 <a> <b> <c> 1 0 11
1 <a> <b> 1 0 7
2 <a> 1 0 3
3 N/A 0 2147483647 2147483647
Available since: 2.14
Parameters
match_num-
Type:
gintNumber of the capture parenthesis.
start_pos-
Type:
gint*Pointer to location where to store the start position, or
NULL.The argument will be set by the function. The argument can be NULL. end_pos-
Type:
gint*Pointer to location where to store the end position (the byte after the final byte of the match), or
NULL.The argument will be set by the function. The argument can be NULL.
Return value
Type: gboolean
True if match_num is within range, false otherwise. If
the capture paren has a match, start_pos and end_pos contain the
start and end positions (in bytes) of the matching substring. If the
capture paren has no match, start_pos and end_pos are -1. If
match_num is out of range, start_pos and end_pos are left unchanged.