# Apache Spark
Spark is a fast and general cluster computing system for Big Data. It provides
high-level APIs in Scala, Java, Python, and R, and an optimized engine that
supports general computation graphs for data analysis. It also supports a
rich set of higher-level tools including Spark SQL for SQL and DataFrames,
MLlib for machine learning, GraphX for graph processing,
and Spark Streaming for stream processing.
<http://spark.apache.org/>
## Online Documentation
You can find the latest Spark documentation, including a programming
guide, on the [project web page](http://spark.apache.org/documentation.html).
This README file only contains basic setup instructions.
## Building Spark
Spark is built using [Apache Maven](http://maven.apache.org/).
To build Spark and its example programs, run:
build/mvn -DskipTests clean package
(You do not need to do this if you downloaded a pre-built package.)
You can build Spark using more than one thread by using the -T option with Maven, see ["Parallel builds in Maven 3"](https://cwiki.apache.org/confluence/display/MAVEN/Parallel+builds+in+Maven+3).
More detailed documentation is available from the project site, at
["Building Spark"](http://spark.apache.org/docs/latest/building-spark.html).
For general development tips, including info on developing Spark using an IDE, see ["Useful Developer Tools"](http://spark.apache.org/developer-tools.html).
## Interactive Scala Shell
The easiest way to start using Spark is through the Scala shell:
./bin/spark-shell
Try the following command, which should return 1000:
scala> sc.parallelize(1 to 1000).count()
## Interactive Python Shell
Alternatively, if you prefer Python, you can use the Python shell:
./bin/pyspark
And run the following command, which should also return 1000:
>>> sc.parallelize(range(1000)).count()
## Example Programs
Spark also comes with several sample programs in the `examples` directory.
To run one of them, use `./bin/run-example <class> [params]`. For example:
./bin/run-example SparkPi
will run the Pi example locally.
You can set the MASTER environment variable when running examples to submit
examples to a cluster. This can be a mesos:// or spark:// URL,
"yarn" to run on YARN, and "local" to run
locally with one thread, or "local[N]" to run locally with N threads. You
can also use an abbreviated class name if the class is in the `examples`
package. For instance:
MASTER=spark://host:7077 ./bin/run-example SparkPi
Many of the example programs print usage help if no params are given.
## Running Tests
Testing first requires [building Spark](#building-spark). Once Spark is built, tests
can be run using:
./dev/run-tests
Please see the guidance on how to
[run tests for a module, or individual tests](http://spark.apache.org/developer-tools.html#individual-tests).
There is also a Kubernetes integration test, see resource-managers/kubernetes/integration-tests/README.md
## A Note About Hadoop Versions
Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported
storage systems. Because the protocols have changed in different versions of
Hadoop, you must build Spark against the same version that your cluster runs.
Please refer to the build documentation at
["Specifying the Hadoop Version and Enabling YARN"](http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn)
for detailed guidance on building for a particular distribution of Hadoop, including
building for particular Hive and Hive Thriftserver distributions.
## Configuration
Please refer to the [Configuration Guide](http://spark.apache.org/docs/latest/configuration.html)
in the online documentation for an overview on how to configure Spark.
## Contributing
Please review the [Contribution to Spark guide](http://spark.apache.org/contributing.html)
for information on how to get started contributing to the project.
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
Apache Spark 是专为大规模数据处理而设计的快速通用的计算引擎。Spark是UC Berkeley AMP lab (加州大学伯克利分校的AMP实验室)所开源的类Hadoop MapReduce的通用并行框架,Spark,拥有Hadoop MapReduce所具有的优点;但不同于MapReduce的是——Job中间输出结果可以保存在内存中,从而不再需要读写HDFS,因此Spark能更好地适用于数据挖掘与机器学习等需要迭代的MapReduce的算法。
资源推荐
资源详情
资源评论
收起资源包目录
spark-2.4.3-bin-hadoop2.7.tgz (1062个子文件)
_common_metadata 210B
_metadata 743B
_SUCCESS 0B
_SUCCESS 0B
AnIndex 36KB
users.avro 334B
full_user.avsc 240B
user.avsc 185B
make2.bat 7KB
make.bat 199B
beeline 1KB
setup.cfg 854B
find-spark-home.cmd 3KB
spark-class2.cmd 2KB
load-spark-env.cmd 2KB
spark-shell2.cmd 2KB
pyspark2.cmd 2KB
run-example.cmd 1KB
spark-class.cmd 1KB
spark-submit.cmd 1KB
spark-shell.cmd 1KB
spark-sql.cmd 1KB
pyspark.cmd 1KB
sparkR.cmd 1KB
spark-submit2.cmd 1KB
spark-sql2.cmd 1KB
sparkR2.cmd 1KB
beeline.cmd 1KB
spark-defaults.conf 1KB
.coveragerc 872B
.part-r-00008.gz.parquet.crc 12B
.part-r-00007.gz.parquet.crc 12B
.part-r-00002.gz.parquet.crc 12B
.part-r-00005.gz.parquet.crc 12B
.part-r-00000-829af031-b970-49d6-ad39-30460a0be2c8.orc.crc 12B
.part-r-00004.gz.parquet.crc 12B
.part-r-00000-829af031-b970-49d6-ad39-30460a0be2c8.orc.crc 12B
pyspark.css 2KB
R.css 1KB
ages_newlines.csv 87B
people.csv 49B
ages.csv 26B
lpsa.data 10KB
test.data 128B
DESCRIPTION 2KB
Dockerfile 2KB
Dockerfile 2KB
Dockerfile 992B
find-spark-home 2KB
.gitignore 49B
00Index.html 118KB
LICENSE-jtransforms.html 29KB
LICENSE-javassist.html 25KB
layout.html 207B
MANIFEST.in 1KB
INDEX 15KB
scala-compiler-2.11.12.jar 14.89MB
breeze_2.11-0.13.2.jar 14.41MB
spark-core_2.11-2.4.3.jar 12.89MB
hive-exec-1.2.1.spark2.jar 10.97MB
spark-catalyst_2.11-2.4.3.jar 9.74MB
spire_2.11-0.13.0.jar 9.65MB
spark-sql_2.11-2.4.3.jar 9.43MB
spark-2.4.3-yarn-shuffle.jar 9.3MB
kubernetes-model-4.1.2.jar 8.95MB
hadoop-hdfs-2.7.3.jar 7.93MB
spark-mllib_2.11-2.4.3.jar 7.65MB
mesos-1.4.0-shaded-protobuf.jar 7MB
scala-library-2.11.12.jar 5.48MB
hive-metastore-1.2.1.spark2.jar 5.25MB
scala-reflect-2.11.12.jar 4.41MB
netty-all-4.1.17.Final.jar 3.6MB
shapeless_2.11-2.3.2.jar 3.36MB
calcite-core-1.2.0-incubating.jar 3.36MB
hadoop-common-2.7.3.jar 3.32MB
derby-10.12.1.1.jar 3.08MB
parquet-hadoop-bundle-1.6.0.jar 2.67MB
spark-network-common_2.11-2.4.3.jar 2.28MB
zstd-jni-1.3.2-2.jar 2.23MB
guava-14.0.1.jar 2.09MB
spark-streaming_2.11-2.4.3.jar 2.07MB
hadoop-yarn-api-2.7.3.jar 1.94MB
commons-math3-3.4.1.jar 1.94MB
spark-examples_2.11-2.4.3.jar 1.93MB
snappy-java-1.1.7.3.jar 1.93MB
datanucleus-core-3.2.10.jar 1.8MB
spark-hive-thriftserver_2.11-2.4.3.jar 1.73MB
datanucleus-rdbms-3.2.9.jar 1.73MB
hadoop-yarn-common-2.7.3.jar 1.6MB
hppc-0.7.2.jar 1.59MB
orc-core-1.5.5-nohive.jar 1.49MB
avro-1.8.2.jar 1.48MB
hadoop-mapreduce-client-core-2.7.3.jar 1.48MB
htrace-core-3.1.0-incubating.jar 1.41MB
spark-hive_2.11-2.4.3.jar 1.28MB
netty-3.9.9.Final.jar 1.27MB
arrow-vector-0.10.0.jar 1.26MB
ivy-2.4.0.jar 1.22MB
xercesImpl-2.9.1.jar 1.17MB
arpack_combined_all-0.1.jar 1.14MB
共 1062 条
- 1
- 2
- 3
- 4
- 5
- 6
- 11
资源评论
- cdzhrsh2020-08-17官网无法下载
电科老码农
- 粉丝: 2
- 资源: 2
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 旅游景点导游平台系统源代码.zip
- 美发门店管理系统项目源代码.zip
- 智能插座,个人学习整理,仅供参考
- 医院管理系统源代码.zip
- 西门子-大型堆垛机程序 真实运行案例 物流仓储 涵盖通信,算法,运动控制 实际项目程序 西门子S7-300+G120+ET200S 博途编程 采用用STL和SCL高级编程语言 无加密 仿
- 校园资源库系统源代码.zip
- 洗衣店管理系统项目源代码.zip
- 医院网站源代码.zip
- MPC5634 Bootloader
- 前后端分离的在线办公系统,项目采用 SpringBoot+Uniapp 开发,前端采用微信小程序展示.zip
- GitHub 搜索技巧与高级用法详解
- 威纶通触摸屏与三菱变频器modbus rtu通讯程序 触摸屏型号mt6103ip,变频器型号FR-D720(E700系列也可以用)
- HFSS创建SG模型的端口设置
- 微信小程序源码养老院管理系统(小程序)pf-毕业设计.zip
- 胶钉机程序 用国产三菱3U和威纶触摸屏编写 此程序已经实际设备上批量应用,程序成熟可靠,借鉴价值高,程序有注释
- 微机原理试卷及答案.zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功