PyPI官网下载|SKompiler-0.3.1.post1.tar.gz资源-CSDN文库

版权申诉

9 浏览量 2022-01-16 07:28:04 上传评论收藏 35KB GZ 举报

共36个文件

py：24个

txt：5个

pkg-info：2个

《PyPI官网下载：探索SKompiler-0.3.1.post1.tar.gz——Python库的奥秘》 PyPI（Python Package Index），是Python开发者最重要的资源库之一，它提供了丰富的Python软件包，使得开发者能够方便地下载、安装和分享代码。在PyPI上，我们发现了"SKompiler-0.3.1.post1.tar.gz"这一资源，这是一款名为SKompiler的Python库的压缩包。本文将深入探讨这个库的功能、用途以及如何进行安装和使用。 SKompiler，全称为Simplified Compiler for Kernel Expressions，是一款旨在简化计算密集型任务，特别是用于科学计算和机器学习领域的Python库。它主要关注于编译数学表达式，以提高Python代码执行效率，特别是对于涉及矩阵运算和向量处理的部分。在"SKompiler-0.3.1.post1.tar.gz"这个压缩包中，包含了SKompiler库的源代码和其他相关文件。"post1"标识可能表示这是一个发布后的第一个小版本更新，通常修复了一些已知问题或添加了额外功能。".tar.gz"是一种常见的Linux/Unix压缩格式，它将文件打包并进行gzip压缩，便于用户下载和解压。要使用这个库，首先需要在本地环境中解压"SKompiler-0.3.1.post1.tar.gz"，然后通过Python的setuptools模块进行安装。通常，可以遵循以下步骤： 1. 解压压缩包： ``` tar -zxvf SKompiler-0.3.1.post1.tar.gz ``` 2. 进入解压后的目录： ``` cd SKompiler-0.3.1.post1 ``` 3. 使用setup.py安装： ``` python setup.py install ``` 安装完成后，便可以在Python代码中导入SKompiler库，利用它的功能。SKompiler的核心特性包括： - **表达式编译**：它可以将数学表达式编译成高效的Cython或Numba代码，这些编译后的函数能够在运行时提供比原生Python更快的速度。 - **跨平台兼容**：支持多种后端，如NumPy、Theano、Cython和Numba，可以在不同的计算环境中无缝切换。 - **优化**：对表达式进行自动优化，如消除冗余计算、合并常量等，以提高性能。 - **易于使用**：提供简洁的API，使开发者能够轻松地将复杂的数学表达式转换为高性能代码。在机器学习中，SKompiler可以帮助优化模型的计算过程，尤其在处理大规模数据集时，能显著提升计算速度。例如，它可以用于预处理数据，编译模型的内核函数，或者加速梯度计算等关键步骤。 "SKompiler-0.3.1.post1.tar.gz"是Python开发者的强大工具，它将复杂且性能敏感的数学表达式转化为高效代码，提升了科学计算和机器学习应用的执行效率。通过正确安装和熟练使用，我们可以更好地挖掘Python库的潜力，提升项目性能。

资源推荐

资源详情

资源评论

收起资源包目录

SKompiler-0.3.1.post1.tar.gz （36个子文件）

SKompiler-0.3.1.post1

setup.py 2KB

MANIFEST.in 40B

LICENSE 1KB

setup.cfg 249B

skompiler

__init__.py 277B

ast.py 14KB

api.py 1011B

fromskast

sqlalchemy.py 7KB

_common.py 6KB

__init__.py 51B

sympy.py 9KB

python.py 8KB

string.py 175B

excel.py 12KB

_sqla_multistage.py 10KB

toskast

__init__.py 59B

python.py 4KB

string.py 1KB

sklearn

__init__.py 7KB

linear_model

__init__.py 25B

base.py 2KB

logistic.py 3KB

tree

__init__.py 26B

base.py 4KB

ensemble

gradient_boosting.py 2KB

__init__.py 27B

forest.py 1KB

SKompiler.egg-info

top_level.txt 10B

zip-safe 2B

SOURCES.txt 1KB

PKG-INFO 13KB

dependency_links.txt 1B

requires.txt 123B

README.md 11KB

PKG-INFO 13KB

CHANGELOG.txt 356B

SKompiler: Translate trained SKLearn models to executable code in other languages ================================================================================ The package provides a tool for transforming trained SKLearn models into other forms, such as SQL queries, Excel formulas or Sympy expressions (which, in turn, can be translated to code in a variety of languages, such as C, Javascript, Rust, Julia, etc). Installation ------------ The simplest way to install the package is via `pip`: $ pip install SKompiler[full] Note that the `[full]` option includes the installations of `sympy`, `sqlalchemy` and `astor`, which are necessary if you plan to convert `SKompiler`'s expressions to `sympy` expressions (which, in turn, can be compiled to many other languages) or to SQLAlchemy expressions (which can be further translated to different SQL dialects) or to Python source code. If you do not need this functionality (say, you only need the raw `SKompiler` expressions or perhaps only the SQL conversions without the `sympy` ones), you may avoid the forced installation of all optional dependencies by simply writing $ pip install SKompiler (you are free to install any of the required extra dependencies, via separate calls to `pip install`, of course) Usage ----- ### Introductory example Let us start by walking through a simple example. We begin by training a model on a simple dataset, e.g.: from sklearn.datasets import load_iris from sklearn.ensemble import RandomForestClassifier X, y = load_iris(True) m = RandomForestClassifier(n_estimators=3, max_depth=3).fit(X, y) Suppose we need to express the logic of `m.predict` in SQLite. Here is how we can achieve that: from skompiler import skompile expr = skompile(m.predict) sql = expr.to('sqlalchemy/sqlite') Voila, the value of the `sql` variable is a super-long expression which looks like CASE WHEN ((CASE WHEN (x3 <= 2.449999988079071) THEN 1.0 ELSE CASE WHEN ... 100 lines or so ... THEN 1 ELSE 2 END as y It corresponds to the `m.predict` computation. Let us check how we can use it in a query. We import the data into an in-memory SQLite database: import sqlalchemy as sa import pandas as pd conn = sa.create_engine('sqlite://').connect() df = pd.DataFrame(X, columns=['x1', 'x2', 'x3', 'x4']).reset_index() df.to_sql('data', conn) query the data using the generated expression: results = pd.read_sql('select {0} from data'.format(sql), conn) and verify that the results match: assert (results.values.ravel() == m.predict(X).ravel()).all() Note that the generated SQL expression uses names `x1`, `x2`, `x3` and `x4` to refer to the input variables. You may choose different input variable names by writing: skompile(m.predict, ['a', 'b', 'c', 'd']).to('sqlalchemy/sqlite') ### Multiple outputs Now let us try to generate code for `m.predict_proba`: expr = skompile(m.predict_proba) expr.to('sqlalchemy/sqlite') The generated query is different from the previous one. Firstly, it is of the form ... as y1, ... as y2, ... as y3 The reason for that is that `m.predict_proba` produces three values - the probabilities of each class, and this is reflected in the SQL. You may, of course, provide different names to the outputs instead of `y1`,`y2`,`y3`: expr.to('sqlalchemy/sqlite', assign_to=['a','b','c']) You may obtain a list of three separate expressions without the `as ..` parts at all: expr.to('sqlalchemy/sqlite', assign_to=None) or request only the probability of the first class as a single `... as y2` expression: expr.to('sqlalchemy/sqlite', component=1, assign_to='y2') ### Multi-stage code You might have noted that the SQL code for `predict` was significantly longer than the code for `predict_proba`. Why so? Because predict(x) = argmax(predict_proba(x)) There is, however, no single `argmax` function in SQL, hence it has to be faked using approximately the following logic: predict(x) = if predict_proba(x)[0] == max(predict_proba(x)) then 0 else if predict_proba(x)[1] == max(predict_proba(x)) then 1 else 2 Note that the values of `predict_proba` in this expression must be expanded (and thus the computation repeated) multiple times. This problem could be overcome by performing computation in several steps, saving and reusing intermediate values, rather than doing everything within a single expression. In SQL this can bbe done with the help of `with` expressions: with proba as ( select [predict_proba computation] from data ), max as ( select [max computation] from proba ), argmax as ( select [argmax computation] from ... ) To generate this type of SQL, specify `multistage=True`: expr.to('sqlalchemy/sqlite', multistage=True, multistage_key_column='index', multistage_from_obj='data') Note that while in single-expression mode you only get a single column expression, which you need to wrap in the relevant `SELECT .. FROM ..` statement, in multistage mode the whole query is generated for you. For that reason you need to provide the name of the source table as well as the key column. The effect on the query size can be quite significant: len(expr.to('sqlalchemy/sqlite')) > 15558 len(expr.to('sqlalchemy/sqlite', multistage=True)) > 2574 The multi-stage translation is especially important if you need to generate Excel code, because Excel does not support formulas longer than 8196 characters. If you need to port complex models, splitting them up is therefore the only way. An example of how to deploy a large random forest model to Excel is available in [this video](https://www.youtube.com/watch?v=7vUfa7W0NpY). ### Other formats By changing the first parameter of the `.to()` call you may produce output in a variety of other formats besides SQLite: * `sqlalchemy`: raw SQLAlchemy expression (which is a dialect-independent way of representing SQL). Jokes aside, SQL is sometimes a totally valid choice for deploying models into production. Note that generated SQL may (depending on the chosen method) include functions `exp` and `log`. If you work with SQLite, bear in mind that these functions are not supported out of the box and need to be [added separately](https://stackoverflow.com/a/2108921/318964) via `create_function`. You can find an example of how this can be done in `tests/evaluators.py` in the package source code. * `sqlalchemy/<dialect>`: SQL string in any of the SQLAlchemy-supported dialects (`firebird`, `mssql`, `mysql`, `oracle`, `postgresql`, `sqlite`, `sybase`). This is a convenience feature for those who are lazy to figure out how to compile raw SQLAlchemy to actual SQL. * `excel`: Excel formula. Ever tried dragging a random forest equation down along the table? Fun! Due to its 8196-character limit on the formula length, however, Excel will not handle forests larger than `n_estimators=30` with `max_depth=5` or so, unfortunately. * `sympy`: A SymPy expression. Ever wanted to take a derivative of your model symbolically? * `sympy/<lang>`: Code in the language `<lang>`, generated via SymPy. Supported values for `<lang>` are `c`, `cxx`, `rust`, `fortran`, `js`, `r`, `julia`, `mathematica`, `octave`. Note that the quality of the generated code varies depending on the model, language and the value of the `assign_to` parameter. Again, this is just a convenience feature, you will get more control by dealing with `sympy` oode printers [manually](https://www.sympy.org/scipy-2017-codegen-tutorial/). * `python`: Python syntax tree (the same you'd get via `ast.parse`). This (and the following three options) are mostly useful for debugging and testing. * `python/code`: Python source code. The generated code will contain references to custom functions, such as `__argmax__`, `__sigmoid__`, etc. To execute the code you will need to provi

评论收藏

内容反馈

版权申诉