Skip to main content

%Net.URLParser

class %Net.URLParser extends %Library.RegisteredObject

Parses a url into component parts

Method Inventory

Methods

final classmethod Compose(ByRef Components As %Library.String) as %Library.String
Composes a URL from its Components array as defined by the Decompose() method.
This method will use Components("netloc"), if defined. To compose a URL with Components("username"), Components("password"), Components("host"), or Components("port"),
set Components("netloc") to ""
final classmethod Decompose(Url As %Library.String, ByRef Components As %Library.String)
Parses an URL into its constituent components.

The input parameters 'Url' is the string to parse. Some or all of the parts "<user>:<password>@", ":<password>", ":<port>", and "/<path>" may be excluded.

The output parameter 'Components' will contain an array subscripted by the name of the component part with the parsed value as the data.

do ##class(%Net.URLParser).Decompose("http://user:pass@www.intersystems.com:80/path/",.Components)
Components("host")="www.intersystems.com"
Components("netloc")="user:pass@www.intersystems.com:80"
Components("password")="pass"
Components("path")="/path/"
Components("port")=80
Components("scheme")="http"
Components("username")="user"

URLs are described briefly below, please see RFC1738 for a full description

  • A URL contains the name of the scheme being used (<scheme>) followed by a colon and then a string (the <scheme-specific-part>) whose interpretation depends on the scheme.
  • Scheme names consist of a sequence of characters. The lower case letters "a"--"z", digits, and the characters plus ("+"), period ("."), and hyphen ("-") are allowed. For resiliency, programs interpreting URLs should treat upper case letters as equivalent to lower case in scheme names (e.g., allow "HTTP" as well as "http").
  • While the syntax for the rest of the URL may vary depending on the particular scheme selected, URL schemes that involve the direct use of an IP-based protocol to a specified host on the Internet use a common syntax for the scheme-specific data:
    //<user>:<password>@<host>:<port>/<path>

The scheme specific data start with a double slash "//" to indicate that it complies with the common Internet scheme syntax. The different components obey the following rules:

  • user - An optional user name. Some schemes (e.g., ftp) allow the specification of a user name.
  • password - An optional password. If present, it follows the user name separated from it by a colon. The user name (and password), if present, are followed by a commercial at-sign "@". Within the user and password field, any ":", "@", or "/" must be encoded.
    Note that an empty user name or password is different than no user name or password; there is no way to specify a password without specifying a user name. E.g., <URL:ftp://@host.com/> has an empty user name and no password, <URL:ftp://host.com/> has no user name, while <URL:ftp://foo:@host.com/> has a user name of "foo" and an empty password.
  • host - The fully qualified domain name of a network host, or its IP address as a set of four decimal digit groups separated by ".". Fully qualified domain names take the form as described in Section 3.5 of RFC 1034 [13] and Section 2.1 of RFC 1123 [5]: a sequence of domain labels separated by ".", each domain label starting and ending with an alphanumerical character and possibly also containing "-" characters. The rightmost domain label will never start with a digit, though, which syntactically distinguishes all domain names from the IP addresses.
  • port - The port number to connect to. Most schemes designate protocols that have a default port number. Another port number may optionally be supplied, in decimal, separated from the host by a colon. If the port is omitted, the colon is as well.
  • path - The rest of the locator consists of data specific to the scheme, and is known as the "path". It supplies the details of how the specified resource can be accessed.
    The path syntax depends on the scheme being used, as does the manner in which it is interpreted.
  • netloc - The portion of the URL containing the username, password, host, and port. This is provided for when user needs the URL without any parsing. This is normally the same as <user>:<password>@<host>:<port>, but not always.
deprecated final classmethod Parse(Url As %Library.String, ByRef Components As %Library.String)
WARNING: This method has been deprecated in favor of Decompose().

Parses an url into its constituent components.

The output parameter 'Components' will contain an array subscripted by the name of the component part with the parsed value as the data. E.G. for a URL such as

http://www.intersystems.com

the Components array will contain the scheme in this form:- Components("scheme")="http"
URLs are described briefly below, please see RFC1738 for a full description

A URL contains the name of the scheme being used (<scheme>) followed by a colon and then a string (the <scheme-specific-part>) whose interpretation depends on the scheme.

Scheme names consist of a sequence of characters. The lower case letters "a"--"z", digits, and the characters plus ("+"), period ("."), and hyphen ("-") are allowed. For resiliency, programs interpreting URLs should treat upper case letters as equivalent to lower case in scheme names (e.g., allow "HTTP" as well as "http").

While the syntax for the rest of the URL may vary depending on the particular scheme selected, URL schemes that involve the direct use of an IP-based protocol to a specified host on the Internet use a common syntax for the scheme-specific data:
//<user>:<password>@<host>:<port>/<url-path>

Some or all of the parts "<user>:<password>@", ":<password>", ":<port>", and "/<url-path>" may be excluded.

The scheme specific data start with a double slash "//" to indicate that it complies with the common Internet scheme syntax. The different components obey the following rules:

  • user
    An optional user name. Some schemes (e.g., ftp) allow the specification of a user name.
  • password
    An optional password. If present, it follows the user name separated from it by a colon. The user name (and password), if present, are followed by a commercial at-sign "@". Within the user and password field, any ":", "@", or "/" must be encoded.

    Note that an empty user name or password is different than no user name or password; there is no way to specify a password without specifying a user name. E.g., <URL:ftp://@host.com/> has an empty user name and no password, <URL:ftp://host.com/> has no user name, while <URL:ftp://foo:@host.com/> has a user name of "foo" and an empty password.

  • host
    The fully qualified domain name of a network host, or its IP address as a set of four decimal digit groups separated by ".". Fully qualified domain names take the form as described in Section 3.5 of RFC 1034 [13] and Section 2.1 of RFC 1123 [5]: a sequence of domain labels separated by ".", each domain label starting and ending with an alphanumerical character and possibly also containing "-" characters. The rightmost domain label will never start with a digit, though, which syntactically distinguishes all domain names from the IP addresses.
  • port
    The port number to connect to. Most schemes designate protocols that have a default port number. Another port number may optionally be supplied, in decimal, separated from the host by a colon. If the port is omitted, the colon is as well.
  • path
    The rest of the locator consists of data specific to the scheme, and is known as the "path". It supplies the details of how the specified resource can be accessed. Note that the "/" between the host (or port) and the path is NOT part of the path. The path syntax depends on the scheme being used, as does the manner in which it is interpreted.

Inherited Members

Inherited Methods

FeedbackOpens in a new tab