-
Notifications
You must be signed in to change notification settings - Fork 396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataCutter-related fixes for multiclass #263
DataCutter-related fixes for multiclass #263
Conversation
Codecov Report
@@ Coverage Diff @@
## master #263 +/- ##
==========================================
- Coverage 86.61% 84.75% -1.86%
==========================================
Files 315 315
Lines 10345 10369 +24
Branches 346 563 +217
==========================================
- Hits 8960 8788 -172
- Misses 1385 1581 +196
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #263 +/- ##
==========================================
- Coverage 86.24% 82.47% -3.78%
==========================================
Files 320 321 +1
Lines 10471 10539 +68
Branches 352 558 +206
==========================================
- Hits 9031 8692 -339
- Misses 1440 1847 +407
Continue to review full report at Codecov.
|
@gerashegalov can you elaborate more on the proposed fixes? thanks. |
8333743
to
618bd92
Compare
@tovbinm there are multiple issues. One is the sort stability when there are multiple top labels with identical cardinality. We can solve this by adding the label as a secondary sort key. Similar issue with filtering labels for the minFraction prior to sorting. And #251 caused the data cutter omission to begin with |
core/src/main/scala/com/salesforce/op/stages/impl/tuning/DataCutter.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/salesforce/op/stages/impl/selector/ModelSelector.scala
Outdated
Show resolved
Hide resolved
@gerashegalov please let us know when you want the PR reviewed. |
6abd993
to
7571463
Compare
@tovbinm it's ready |
core/src/main/scala/com/salesforce/op/stages/impl/selector/ModelSelector.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/salesforce/op/stages/impl/tuning/DataCutter.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/salesforce/op/stages/impl/tuning/DataCutter.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/salesforce/op/stages/impl/tuning/DataCutter.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/salesforce/op/stages/impl/tuning/DataCutter.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/salesforce/op/stages/impl/tuning/DataCutter.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a new test case explicitly for label trimming issue?
not ready for review yet, checkpointing WIP, sorting out test issues. |
843f8cd
to
a43409a
Compare
core/src/main/scala/com/salesforce/op/stages/impl/tuning/DataCutter.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/salesforce/op/stages/impl/tuning/DataCutter.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/salesforce/op/stages/impl/tuning/DataCutter.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/salesforce/op/stages/impl/tuning/DataCutter.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/salesforce/op/stages/impl/tuning/DataCutter.scala
Outdated
Show resolved
Hide resolved
b7c72de
to
5a5b19d
Compare
marking as ready for review since all unit testing are passing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just remove the parts that were in there for testing and LGTM
core/src/main/scala/com/salesforce/op/stages/impl/tuning/DataCutter.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/salesforce/op/stages/impl/tuning/DataCutter.scala
Outdated
Show resolved
Hide resolved
test table num_vals
27992db
to
17c53fb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!!!!
🎆 🍾 |
Related issues
Fixes #245
DataCutter.estimate
whereasfilter
beforesort
is good for performance it may not drop labels non-deterministically unlike filtering over a stable order.Describe the proposed solution
Remove non-determinism