This is documentation for Caché & Ensemble.

For information on converting to InterSystems IRIS, see the InterSystems IRIS Adoption Guide and the InterSystems IRIS In-Place Conversion Guide, both available on the WRC Distributions page (login required).

Home > Class Reference > ENSLIB namespace > %iKnow.Classification.Optimizer

%iKnow.Classification.Optimizer

class %iKnow.Classification.Optimizer extends %Library.RegisteredObject

This class automates selecting "appropriate" terms for a %iKnow.Classification.Builder. After pointing an Optimizer instance to the Builder object that needs optimization, use the LoadTermsArray() and LoadTermsSQL() methods to queue a large number of potentially interesting terms the Optimizer should test. Then invoke its Optimize() method to let the Optimizer loop through the list of suggested terms automatically and add those terms having the highest positive impact on model accuracy (as measured according to ScoreMetric), removing terms that were already added to the model but turn out to have no significant positive impact on the model's accuracy.

See the individual property descriptions of their impact on the optimization process.

Property Inventory (Including Private)

Method Inventory (Including Private)

Properties (Including Private)

property AddCount as %Integer (MINVAL = 1) [ InitialExpression = 1 ];
The number of terms to add during an AddTerms() cycle. The top results according to RankScores() will be added, as selected from the AddWindowSize terms tested in the cycle.
Property methods: AddCountDisplayToLogical(), AddCountGet(), AddCountIsValid(), AddCountLogicalToDisplay(), AddCountNormalize(), AddCountSet()
property AddWindowSize as %Integer (MINVAL = 0) [ InitialExpression = 0 ];
The number of terms to test in each round. If left at 0, this defaults to the number of cores the system has available, which should be most efficient.
Property methods: AddWindowSizeDisplayToLogical(), AddWindowSizeGet(), AddWindowSizeIsValid(), AddWindowSizeLogicalToDisplay(), AddWindowSizeNormalize(), AddWindowSizeSet()
property Builder as %iKnow.Classification.Builder;
The builder object to be optimized.
Property methods: BuilderGet(), BuilderGetSwizzled(), BuilderIsValid(), BuilderNewObject(), BuilderSet()
property CategoryWeights [ MultiDimensional ];

If ScoreMetric is set to a 'Weighted*' value, the weights for each category are retrieved from this array, indexed by category name. If no category weight is set, it is assumed to be 0.

Note: Weights don't need to add up to 1.

Property methods: CategoryWeightsDisplayToLogical(), CategoryWeightsGet(), CategoryWeightsIsValid(), CategoryWeightsLogicalToDisplay(), CategoryWeightsLogicalToOdbc(), CategoryWeightsNormalize(), CategoryWeightsSet()
property CurrentClassifier as %String [ ReadOnly ];
The class name of the current "best" classifier. This value is set during Optimize(), or as part of the AddTerms() and RemoveTerms() methods.
Property methods: CurrentClassifierDisplayToLogical(), CurrentClassifierGet(), CurrentClassifierIsValid(), CurrentClassifierLogicalToDisplay(), CurrentClassifierLogicalToOdbc(), CurrentClassifierNormalize()
property CurrentScore as %Double [ ReadOnly ];
The score of the current classifier. This value is updated by AddTerms() and RemoveTerms().
Property methods: CurrentScoreDisplayToLogical(), CurrentScoreGet(), CurrentScoreIsValid(), CurrentScoreLogicalToDisplay(), CurrentScoreNormalize(), CurrentScoreOdbcToLogical()
property CurrentTestId as %Integer [ ReadOnly ];
The key to %DeepSee.PMML.Utils.TempResult for the test results of CurrentClassifier.
Property methods: CurrentTestIdDisplayToLogical(), CurrentTestIdGet(), CurrentTestIdIsValid(), CurrentTestIdLogicalToDisplay(), CurrentTestIdNormalize()
property DomainId as %Integer;
The domain using which the categorization model is being trained and tested. This assumes the value of the Builder's DomainId property when registering an IKnowBuilder instance as Builder, if not set explicitly.
Property methods: DomainIdDisplayToLogical(), DomainIdGet(), DomainIdIsValid(), DomainIdLogicalToDisplay(), DomainIdNormalize(), DomainIdSet()
property MaximalScoreDecrease as %Double (MAXVAL = 100, MINVAL = -100) [ InitialExpression = 0.05 ];
The maximal decrease in performance the optimizer should accept when trying to remove terms. If removing a term would imply a decrease larger than this figure, it will not be removed. A value of 1 means the maximal score decrease is 1%
Property methods: MaximalScoreDecreaseDisplayToLogical(), MaximalScoreDecreaseGet(), MaximalScoreDecreaseIsValid(), MaximalScoreDecreaseLogicalToDisplay(), MaximalScoreDecreaseNormalize(), MaximalScoreDecreaseOdbcToLogical(), MaximalScoreDecreaseSet()
property MetadataField as %String;
The metadata field containing the actual category values to compare predictions against. This assumes the value of the Builder's MetadataField property when registering an IKnowBuilder instance as Builder, if not set explicitly.
Property methods: MetadataFieldDisplayToLogical(), MetadataFieldGet(), MetadataFieldIsValid(), MetadataFieldLogicalToDisplay(), MetadataFieldLogicalToOdbc(), MetadataFieldNormalize(), MetadataFieldSet()
property MinimalScoreIncrease as %Double (MAXVAL = 100, MINVAL = -100) [ InitialExpression = 0.1 ];
The minimal score increase % a term should ensure to be retained for further testing. If the score does not increase by at least this figure, it will be discarded from the list of terms to test. A value of 1 means the minimal score increase should be 1%
Property methods: MinimalScoreIncreaseDisplayToLogical(), MinimalScoreIncreaseGet(), MinimalScoreIncreaseIsValid(), MinimalScoreIncreaseLogicalToDisplay(), MinimalScoreIncreaseNormalize(), MinimalScoreIncreaseOdbcToLogical(), MinimalScoreIncreaseSet()
property RemoveCount as %Integer (MINVAL = 1) [ InitialExpression = 3 ];
The number of terms to remove in a "remove" cycle. Setting this value > 1 assumes the terms deemed irrelevant (and scheduled to be removed) don't influence one another much and removing more in a single cycle will not worsen performance much more than the individual performance changes of each term removal alone.
Property methods: RemoveCountDisplayToLogical(), RemoveCountGet(), RemoveCountIsValid(), RemoveCountLogicalToDisplay(), RemoveCountNormalize(), RemoveCountSet()
property RemoveStepRatio as %Double (MAXVAL = 1, MINVAL = 0) [ InitialExpression = 0.1 ];

The ratio of RemoveTerms() cycles vs AddTerms() cycles. This should be a value between 0 and 1 (inclusive).

Note: Remove cycles take significantly longer than add cycles

Property methods: RemoveStepRatioDisplayToLogical(), RemoveStepRatioGet(), RemoveStepRatioIsValid(), RemoveStepRatioLogicalToDisplay(), RemoveStepRatioNormalize(), RemoveStepRatioOdbcToLogical(), RemoveStepRatioSet()
property ScoreMetric as %String (VALUELIST = ",MacroFmeasure,MacroPrecision,MacroRecall,MicroFmeasure,MicroPrecision,MicroRecall,WeightedPrecision,WeightedRecall,WeightedFmeasure") [ InitialExpression = "MacroFmeasure" ];
The default accuracy metric to use for evaluating test results, as used by RankScores(). If set to a 'Weighted*' value, the weights are retrieved from CategoryWeights.
Property methods: ScoreMetricDisplayToLogical(), ScoreMetricGet(), ScoreMetricIsValid(), ScoreMetricLogicalToDisplay(), ScoreMetricLogicalToOdbc(), ScoreMetricNormalize(), ScoreMetricSet()
property TestSet as %iKnow.Filters.Filter;
The test set to validate model accuracy increases/decreases against.
Property methods: TestSetGet(), TestSetGetSwizzled(), TestSetIsValid(), TestSetNewObject(), TestSetSet()
property Verbose as %String [ InitialExpression = 0 ];
If set to a boolean value, defines whether or not to write output to the current device during the Optimize() method. If set to a string, it is treated as a global reference to which output needs to be written.
Property methods: VerboseDisplayToLogical(), VerboseGet(), VerboseIsValid(), VerboseLogicalToDisplay(), VerboseLogicalToOdbc(), VerboseNormalize(), VerboseSet()

Methods (Including Private)

private method %OnClose() as %Status
Inherited description: This callback method is invoked by the %Close() method to provide notification that the current object is being closed.

The return value of this method is ignored.

private method %OnNew(pTaskId As %Integer = 0, pMasterObject As %Boolean = 1) as %Status
Inherited description: This callback method is invoked by the %New() method to provide notification that a new instance of an object is being created.

If this method returns an error then the object will not be created.

It is passed the arguments provided in the %New call. When customizing this method, override the arguments with whatever variables and types you expect to receive from %New(). For example, if you're going to call %New, passing 2 arguments, %OnNew's signature could be:

Method %OnNew(dob as %Date = "", name as %Name = "") as %Status If instead of returning a %Status code this returns an oref and this oref is a subclass of the current class then this oref will be the one returned to the caller of %New method.

method AddTerms(pCount As %Integer = -1, Output pAtEnd As %Boolean = 0) as %Status

This method does one round of processing, testing AddWindowSize candidate terms and selecting the best pCount terms according to RankScores(), unless it wouldn't meet the MinimalScoreIncreas threshold.

If pCount < 0, it defaults to RemoveCount.

method Cleanup() as %Status
This method clears the temporary artifacts the optimizer has created while optimizing, such as the CurrentClassifier class and CurrentTestId test results.
private method ClearTestInfo(pJobNumber As %Integer, pDropTestResults As %Boolean = 1, pDropTestClass As %Boolean = 1) as %Status
Clears internal and generated artifacts for one particular test.
method Initialize() as %Status
Initializes this Optimizer instance. This method is called automatically as part of Optimize()
method LoadTermsArray(ByRef pTerms, pListIndex As %Integer = 0) as %Status
Loads all terms from the supplied array. If pListIndex is non-zero, the term info is read from that index at each array position. If the term info itself is a list structure as well, it is interpreted as follows: pTerms(n) = $lb(term, type, negationpolicy, matchpolicy)
method LoadTermsSQL(pSQL As %String) as %Status
Loads a list of candidate terms based on a SQL query. The query should return a column named "term" containing the term's value and may return columns named "type", "negation" and "match" to configure the type, negation and count policy for each term being retrieved, respectively.
private method Log(pMessage As %String, pNewLines=1)
method Optimize(pMaxSteps As %Integer = 20) as %Status

In at most pMaxSteps steps, the current Builder will be optimized by testing, one at a time, the terms added through LoadTermsSQL() and LoadTermsArray(), judging which term works best for each test window by the results of RankScores() (see also AddTerms()). Every (1/RemoveStepRatio) rounds, all terms in the dictionary so far will be tested for their contribution to the current model score and the lowest RemoveCount terms will be removed (see also RemoveTerms()).

At the end of the optimization process, in addition to Builder being updated, CurrentClassifier will contain the class name of the last test class used to achieve the best result and pTestId will point to the test results for that class.

private method RankScores(ByRef pJobInfo, Output pRanked, Output pNoScore) as %Status

This method ranks the test results in pJobInfo according to the desired "score". By default, it will just look at the value of the metric identified by ScoreMetric, but this method can be overridden to calculate in more detail. When this method returns, pRanked is an ordered array containing the job IDs and score in ASCENDING order (pRanked(1) is the worst job):

pRanked([position]) = $lb([jobID], [score])

pJobInfo should contain the following information:
pJobInfo([jobID], "scores", [metric]) = [value]
pJobInfo([jobID], "testid") = [test ID] (key for %DeepSee.PMML.Utils.TempResults)
pJobInfo([jobID], "term") = [term ID] (not for initial evaluation)

See also GetScore()

method RemoveTerms(pCount As %Integer = -1) as %Status

Test the impact of removing each term in the current model's TermDictionary individually. The pCount terms for which, after removing it, RankScores() still returns the best score (which supposedly implies its contribution was minimial), will be removed from the TermDictionary, unless the decrease in performance surpasses MaximalScoreDecrease.

If pCount < 0, it defaults to RemoveCount.

method SaveClassifier(pClassName As %String, pOverwrite As %Boolean = 0) as %Status
Saves the CurrentClassifier class to the desired pClassName, so it will not be removed after this Optimizer instance is dropped. If CurrentClassifier is not set or if the class no longer exists for other reasons, the current builder object will create a classifier class based on its current state.

Inherited Members

Inherited Methods (Including Private)

FeedbackOpens in a new window