This class automates selecting "appropriate" terms for a %iKnow.Classification.Builder.
After pointing an Optimizer instance to the Builder object that needs optimization, use the
LoadTermsArray() and LoadTermsSQL() methods to queue a large
number of potentially interesting terms the Optimizer should test. Then invoke its
Optimize() method to let the Optimizer loop through the list of suggested terms
automatically and add those terms having the highest positive impact on model accuracy (as
measured according to ScoreMetric), removing terms that were already
added to the model but turn out to have no significant positive impact on the model's accuracy.
See the individual property descriptions of their impact on the optimization process.
The domain using which the categorization model is being trained and tested.
This assumes the value of the Builder's DomainId property when registering an IKnowBuilder
instance as Builder, if not set explicitly.
The maximal decrease in performance the optimizer should accept when trying to remove terms.
If removing a term would imply a decrease larger than this figure, it will not be removed.
A value of 1 means the maximal score decrease is 1%
The metadata field containing the actual category values to compare predictions against.
This assumes the value of the Builder's MetadataField property when registering an IKnowBuilder
instance as Builder, if not set explicitly.
The minimal score increase % a term should ensure to be retained for further testing. If the
score does not increase by at least this figure, it will be discarded from the list of terms
to test. A value of 1 means the minimal score increase should be 1%
The number of terms to remove in a "remove" cycle. Setting this value > 1 assumes the terms
deemed irrelevant (and scheduled to be removed) don't influence one another much and removing
more in a single cycle will not worsen performance much more than the individual performance
changes of each term removal alone.
property Verbose as %String [ InitialExpression = 0 ];
If set to a boolean value, defines whether or not to write output to the current device during
the Optimize() method. If set to a string, it is treated as a global reference
to which output needs to be written.
Inherited description: This callback method is invoked by the %New() method to
provide notification that a new instance of an object is being created.
If this method returns an error then the object will not be created.
It is passed the arguments provided in the %New call.
When customizing this method, override the arguments with whatever variables and types you expect to receive from %New().
For example, if you're going to call %New, passing 2 arguments, %OnNew's signature could be:
Method %OnNew(dob as %Date = "", name as %Name = "") as %Status
If instead of returning a %Status code this returns an oref and this oref is a subclass of the current
class then this oref will be the one returned to the caller of %New method.
Loads all terms from the supplied array.
If pListIndex is non-zero, the term info is read from that index at each array position.
If the term info itself is a list structure as well, it is interpreted as follows:
pTerms(n) = $lb(term, type, negationpolicy, matchpolicy)
Loads a list of candidate terms based on a SQL query. The query should return a column named
"term" containing the term's value and may return columns named "type", "negation" and "match"
to configure the type, negation and count policy for each term being retrieved, respectively.
private method Log(pMessage As %String, pNewLines=1)
At the end of the optimization process, in addition to Builder being
updated, CurrentClassifier will contain the class name of the last
test class used to achieve the best result and pTestId will point to the test
results for that class.
private method RankScores(ByRef pJobInfo, Output pRanked, Output pNoScore) as %Status
This method ranks the test results in pJobInfo according to the desired "score".
By default, it will just look at the value of the metric identified by
ScoreMetric, but this method can be overridden to calculate in more detail.
When this method returns, pRanked is an ordered array containing the job IDs and score in
ASCENDING order (pRanked(1) is the worst job):
pRanked([position]) = $lb([jobID], [score])
pJobInfo should contain the following information:
pJobInfo([jobID], "scores", [metric]) = [value]
pJobInfo([jobID], "testid") = [test ID] (key for %DeepSee.PMML.Utils.TempResults)
pJobInfo([jobID], "term") = [term ID] (not for initial evaluation)
Test the impact of removing each term in the current model's TermDictionary individually.
The pCount terms for which, after removing it, RankScores()
still returns the best score (which supposedly implies its contribution was minimial), will
be removed from the TermDictionary, unless the decrease in performance surpasses
Saves the CurrentClassifier class to the desired pClassName,
so it will not be removed after this Optimizer instance is dropped. If CurrentClassifier
is not set or if the class no longer exists for other reasons, the current builder object will
create a classifier class based on its current state.