Skip to main content

This is documentation for Caché & Ensemble. See the InterSystems IRIS version of this content.Opens in a new tab

For information on migrating to InterSystems IRISOpens in a new tab, see Why Migrate to InterSystems IRIS?

News.DeepSee.NewsCube

class News.DeepSee.NewsCube extends %DeepSee.CubeDefinition

This class defines a DeepSee cube on top of the news articles in News.DeepSee.NewsArticle. In addition to a straightforward dimension on the publication date and news agency, the cube defines an iKnow measure for dealing with the textual content of the article itself and an entity dimenstion on top of it.

Rather than having the text to be indexed readily available in a string or stream column, this demo cube illustrates how more advanced logic can be invoked through a sourceExpression. As the NewsArticle table only has a column containing the URL to the actual article, the expression code (encapsulated in class method GetArticleText()) will first need to issue an HTTP request to fetch the full article and then strip the HTML tags from the retrieved content to get to the actual text.

Please note this is a demo to illustrate the concept, not necessarily a recommended implementation for cubes on news articles. As a separate HTTP request will be issued for each individual fact row and each such request can easily take a second, the cube build time will be significantly longer than in cases where the article content has already been fetched upfront.

Method Inventory

Methods

classmethod GetArticleText(pLink As %String) as %String
This class method fetches the actual article content at pLink and strips HTML tags from the raw text. See also GetRawTextFromLink() and StripHTML()
Derives the server name and URL from pLink and then uses these to target a %Net.HttpRequest at the article, returning its raw content as a string.
classmethod StripHTML(pRawText As %String, Output pSC As %Status) as %String
Strips HTML characters from pRawText.

Inherited Members

Inherited Methods

FeedbackOpens in a new tab