%iKnow.Source.Converter.Html

class %iKnow.Source.Converter.Html extends %iKnow.Source.Converter

This is a sample implementation for %iKnow.Source.Converter, designed to weed out HTML tags from plain text input. Data is first buffered into a process-private global and stripped of HTML in the Convert() call.

Converter parameters:

  1. Unescape As %Boolean: set to 1 to unescape HTML special characters such as converting "&" to "&" (default = 1)
  2. SkipTags As %String: comma-separated list of tags whose content (text nested between the start and end tag) is to be left out (default = "script,style")
  3. BreakLines As %Boolean: whether or not to insert double line breaks for non-inline tags (such as p, br, td, ...), in order for the iKnow engine to split sentences at those positions (default = 1)

Property Inventory

Method Inventory

Properties

property BreakLines as %Boolean [ InitialExpression = 1 ];
Property methods: BreakLinesDisplayToLogical(), BreakLinesGet(), BreakLinesIsValid(), BreakLinesLogicalToDisplay(), BreakLinesNormalize(), BreakLinesSet()
property SkipTags as %String) [ InitialExpression = ",script,style," ];
Property methods: SkipTagsDisplayToLogical(), SkipTagsGet(), SkipTagsIsValid(), SkipTagsLogicalToDisplay(), SkipTagsLogicalToOdbc(), SkipTagsNormalize(), SkipTagsSet()
property Unescape as %Boolean [ InitialExpression = 1 ];
Property methods: UnescapeDisplayToLogical(), UnescapeGet(), UnescapeIsValid(), UnescapeLogicalToDisplay(), UnescapeNormalize(), UnescapeSet()

Methods

method BufferString(data As %String) as %Status [ Language = objectscript ]
Buffer data in the PPG
method Convert() as %Status [ Language = objectscript ]

Loop through buffered data and strip off HTML tags. Reset the pointer in the root PPG node at the end, for NextConverterdPart() to know where to start.

method NextConvertedPart() as %String [ Language = objectscript ]
Loop through the PPG again and return processed strings.
method SetParams(params As %String) as %Status [ Language = objectscript ]

Utility method called by the %iKnow.Source.Processor and %iKnow.Source.Loader logic to register any new or changed parameter values.

classmethod StripHTML(ByRef pText As %String, pUnescape As %Boolean = 1, pSkipTags As %String = "script,style", pBreakLines As %Boolean = 1, Output pSC As %Status) as %String [ Language = objectscript ]
Utility method to strip HTML tags from the supplied string. See the class documentation for more details on the available parameters.

Inherited Members

Inherited Properties

Inherited Methods

Feedback