Skip to main content

%Net.URLParser

class %Net.URLParser extends %Library.RegisteredObject

Parses a url into component parts

Method Inventory

Methods

final classmethod Compose(ByRef Components As %Library.String) as %Library.String
Composes a URL from its Components array as defined by the Decompose() method.
This method will use Components("netloc"), if defined, rather than composing Components("username"), Components("password"), Components("host"), and Components("port").
final classmethod Decode(component As %String) as %String
classmethod DecodePath(scheme As %String, path As %String) as %String
Decode path in a scheme-specific way. The path is fully decoded if the scheme is one of the following: tel, mailto, jdbc, urn, or ldap. Otherwise, the path is left unchanged. Users can support more scheme by overriding this class method. RFC 3986 states that "URIs that differ in the replacement of a reserved character with its corresponding percent-encoded octet are not equivalent". Therefore, this method should return component as it is, unless the decoding is granted by the scheme.
final classmethod DecodeQueryComponent(component As %String) as %String
final classmethod Decompose(Url As %Library.String, Output Components As %Library.String)
Parses a URL reference into its constituent components.

The input parameters 'Url' is the string to parse.

The output parameter 'Components' will contain an array subscripted by the name of the component part with the parsed value as the data.

do ##class(%Net.URLParser).Decompose("http://user:pass@www.intersystems.com:80/path/?name=John#id",.Components)
Components("scheme")="http" Components("netloc")="user:pass@www.intersystems.com:80" Components("userinfo")="user:pass" Components("username")="user" Components("password")="pass" Components("host")="www.intersystems.com" Components("port")=80 Components("path")="/path/" Components("query")="name=John" Components("query",1,"key")="name" Components("query",1,"value")="John" Components("fragment")="id"

All of the components may be undefined, except for path, which is always defined even when it is empty. A URL reference is considered a URL only if it includes a scheme. Otherwise, it is a relative reference (e.g., "../foo/bar"), typically used to express a location relative to another URL.

URLs are described briefly below, please see RFC 3986 for a full description

  • A URL contains the name of the scheme being used (<scheme>) followed by a colon and then a string (the <scheme-specific-part>) whose interpretation depends on the scheme.
  • Scheme names consist of a sequence of characters. The lower case letters "a"--"z", digits, and the characters plus ("+"), period ("."), and hyphen ("-") are allowed. For resiliency, programs interpreting URLs should treat upper case letters as equivalent to lower case in scheme names (e.g., allow "HTTP" as well as "http").
  • While the syntax for the rest of the URL may vary depending on the particular scheme selected, URL schemes that involve the direct use of an IP-based protocol to a specified host on the Internet use a common syntax for the scheme-specific data:
    //<user>:<password>@<host>:<port>/<path>

The different components obey the following rules:

  • username - Some schemes (e.g., ftp) allow the specification of a user name.
  • password - If present, it follows the user name separated from it by a colon ":". The user name (and password), if present, are followed by a commercial at-sign "@". Within the user and password field, any ":", "@", or "/" must be encoded.
    Note that an empty user name or password is different than no user name or password; there is no way to specify a password without specifying a user name. E.g., <URL:ftp://@host.com/> has an empty user name and no password, <URL:ftp://host.com/> has no user name, while <URL:ftp://foo:@host.com/> has a user name of "foo" and an empty password.
  • host - The fully qualified domain name of a network host, or its IP address as a set of four decimal digit groups separated by ".". Fully qualified domain names take the form as described in Section 3.5 of RFC 1034 [13] and Section 2.1 of RFC 1123 [5]: a sequence of domain labels separated by ".", each domain label starting and ending with an alphanumerical character and possibly also containing "-" characters. The rightmost domain label will never start with a digit, though, which syntactically distinguishes all domain names from the IP addresses.
  • port - The port number to connect to. Most schemes designate protocols that have a default port number. Another port number may optionally be supplied, in decimal, separated from the host by a colon. If the port is omitted, the colon is as well.
  • netloc (a.k.a. authority) - The portion of the URL containing the username, password, host, and port. If present, it follows the scheme separated from it by double slash "//". This is provided for when user needs the URL without any parsing. This is normally the same as <user>:<password>@<host>:<port>, but not always.
  • path - The path supplies hierarchical details of how the specified resource can be accessed: a sequence of path segments separated by a slash ("/") character.
  • query - If present, it follows path, separated by "?". It supplies non-hierarchical details typically as a sequence of key-value pairs separated by "&". Values, if present, follow their keys, separated by "=".
  • fragment - If present, it follows path and query, separated by "#".
final classmethod DecomposeQuery(Query As %Library.String, Output Components As %Library.String)
final classmethod Encode(component As %String) as %String
deprecated final classmethod Parse(Url As %Library.String, ByRef Components As %Library.String)
WARNING: This method has been deprecated in favor of Decompose().

Parses an url into its constituent components.

The output parameter 'Components' will contain an array subscripted by the name of the component part with the parsed value as the data. E.G. for a URL such as

http://www.intersystems.com

the Components array will contain the scheme in this form:- Components("scheme")="http"
URLs are described briefly below, please see RFC1738 for a full description

A URL contains the name of the scheme being used (<scheme>) followed by a colon and then a string (the <scheme-specific-part>) whose interpretation depends on the scheme.

Scheme names consist of a sequence of characters. The lower case letters "a"--"z", digits, and the characters plus ("+"), period ("."), and hyphen ("-") are allowed. For resiliency, programs interpreting URLs should treat upper case letters as equivalent to lower case in scheme names (e.g., allow "HTTP" as well as "http").

While the syntax for the rest of the URL may vary depending on the particular scheme selected, URL schemes that involve the direct use of an IP-based protocol to a specified host on the Internet use a common syntax for the scheme-specific data:
//<user>:<password>@<host>:<port><url-path>?<query>#<fragment>

Some or all of the parts "<user>:<password>@", ":<password>", ":<port>", "?<query>", and "#<fragment>" may be excluded. The "<url-path>" may be empty but must be included and must begin with "/" if anything before it is included.

The scheme specific data start with a double slash "//" to indicate that it complies with the common Internet scheme syntax. The different components obey the following rules:

  • user
    An optional user name. Some schemes (e.g., ftp) allow the specification of a user name.
  • password
    An optional password. If present, it follows the user name separated from it by a colon. The user name (and password), if present, are followed by a commercial at-sign "@". Within the user and password field, any ":", "@", or "/" must be encoded.

    Note that an empty user name or password is different than no user name or password; there is no way to specify a password without specifying a user name. E.g., <URL:ftp://@host.com/> has an empty user name and no password, <URL:ftp://host.com/> has no user name, while <URL:ftp://foo:@host.com/> has a user name of "foo" and an empty password.

  • host
    The fully qualified domain name of a network host, or its IP address as a set of four decimal digit groups separated by ".". Fully qualified domain names take the form as described in Section 3.5 of RFC 1034 [13] and Section 2.1 of RFC 1123 [5]: a sequence of domain labels separated by ".", each domain label starting and ending with an alphanumerical character and possibly also containing "-" characters. The rightmost domain label will never start with a digit, though, which syntactically distinguishes all domain names from the IP addresses.
  • port
    The port number to connect to. Most schemes designate protocols that have a default port number. Another port number may optionally be supplied, in decimal, separated from the host by a colon. If the port is omitted, the colon is as well.
  • path
    The rest of the locator consists of data specific to the scheme, and is known as the "path". It supplies the details of how the specified resource can be accessed. Note that the "/" between the host (or port) and the path is NOT part of the path. The path syntax depends on the scheme being used, as does the manner in which it is interpreted.
final classmethod UnwrapIPv6(host As %String) as %String
final classmethod WrapIPv6(host As %String) as %String

Inherited Members

Inherited Methods

FeedbackOpens in a new tab