-
Notifications
You must be signed in to change notification settings - Fork 396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use compact and compressed model json by default #375
Conversation
Codecov Report
@@ Coverage Diff @@
## master #375 +/- ##
==========================================
+ Coverage 86.77% 86.79% +0.01%
==========================================
Files 336 336
Lines 10921 10922 +1
Branches 342 577 +235
==========================================
+ Hits 9477 9480 +3
+ Misses 1444 1442 -2
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don’t you need to mention the gzip codec when reading the model?
It's handled by Hadoop TextInputFormat based on the filename extension transparently. On the write path the filename extension is appended based on compression codec. On the read path, you can even mix files with different compression codecs/extensions and uncompressed files in the same dir. |
adec34d
to
527b3a1
Compare
Bug fixes: - Ensure correct metrics despite model failures on some CV folds [#404](#404) - Fix flaky `ModelInsight` tests [#395](#395) - Avoid creating `SparseVector`s for LOCO [#377](#377) New features / updates: - Model combiner [#385](#399) - Added new sample for HousingPrices [#365](#365) - Test to verify that custom metrics appear in model insight metrics [#387](#387) - Add `FeatureDistribution` to `SerializationFormat`s [#383](#383) - Add metadata to `OpStandadrdScaler` to allow for descaling [#378](#378) - Improve json serde error in `evalMetFromJson` [#380](#380) - Track mean & standard deviation as metrics for numeric features and for text length of text features [#354](#354) - Making model selectors robust to failing models [#372](#372) - Use compact and compressed model json by default [#375](#375) - Descale feature contribution for Linear Regression & Logistic Regression [#345](#345) Dependency updates: - Update tika version [#382](#382)
Related issues
Fixes #374
Describe the proposed solution
use compact serialization, apply gzip to it.
Describe alternatives you've considered
make this behavior configurable
Additional context
In a test scenario output is reduced from 1.6M to 188K