# text-detection-ctpn
text detection mainly based on ctpn (connectionist text proposal network). It is implemented in tensorflow. I use id card detect as an example to demonstrate the results, but it should be noticing that this model can be used in almost every horizontal scene text detection task. The origin paper can be found [here](https://arxiv.org/abs/1609.03605). Also, the origin repo in caffe can be found in [here](https://github.com/tianzhi0549/CTPN). For more detail about the paper and code, see this [blog](http://slade-ruan.me/2017/10/22/text-detection-ctpn/)
***
# setup
- requirements: tensorflow1.3, cython0.24, opencv-python, easydict,(recommend to install Anaconda)
- if you do not have a gpu device,follow here to [setup](https://github.com/eragonruan/text-detection-ctpn/issues/43)
- if you have a gpu device, build the library by
```shell
cd lib/utils
chmod +x make.sh
./make.sh
```
***
# parameters
there are some parameters you may need to modify according to your requirement, you can find them in ctpn/text.yml
- USE_GPU_NMS # whether to use nms implemented in cuda or not
- DETECT_MODE # H represents horizontal mode, O represents oriented mode, default is H
- checkpoints_path # the model I provided is in checkpoints/, if you train the model by yourself,it will be saved in output/
***
# demo
- put your images in data/demo, the results will be saved in data/results, and run demo in the root
```shell
python ./ctpn/demo.py
```
***
# training
## prepare data
- First, download the pre-trained model of VGG net and put it in data/pretrain/VGG_imagenet.npy. you can download it from [google drive](https://drive.google.com/open?id=0B_WmJoEtfQhDRl82b1dJTjB2ZGc) or [baidu yun](https://pan.baidu.com/s/1kUNTl1l).
- Second, prepare the training data as referred in paper, or you can download the data I prepared from previous link. Or you can prepare your own data according to the following steps.
- Modify the path and gt_path in prepare_training_data/split_label.py according to your dataset. And run
```shell
cd prepare_training_data
python split_label.py
```
- it will generate the prepared data in current folder, and then run
```shell
python ToVoc.py
```
- to convert the prepared training data into voc format. It will generate a folder named TEXTVOC. move this folder to data/ and then run
```shell
cd ../data
ln -s TEXTVOC VOCdevkit2007
```
## train
Simplely run
```shell
python ./ctpn/train_net.py
```
- you can modify some hyper parameters in ctpn/text.yml, or just used the parameters I set.
- The model I provided in checkpoints is trained on GTX1070 for 50k iters.
- If you are using cuda nms, it takes about 0.2s per iter. So it will takes about 2.5 hours to finished 50k iterations.
***
# roadmap
- [x] cython nms
- [x] cuda nms
- [x] python2/python3 compatblity
- [x] tensorflow1.3
- [x] delete useless code
- [x] loss function as referred in paper
- [x] oriented text connector
- [x] BLSTM
- [ ] side refinement
***
# some results
`NOTICE:` all the photos used below are collected from the internet. If it affects you, please contact me to delete them.
<img src="data/oriented_results/001.jpg" width=320 height=240 /><img src="data/oriented_results/002.jpg" width=320 height=240 />
<img src="data/oriented_results/003.jpg" width=320 height=240 /><img src="data/oriented_results/004.jpg" width=320 height=240 />
<img src="data/oriented_results/009.jpg" width=320 height=480 /><img src="data/oriented_results/010.png" width=320 height=320 />
***
## oriented text connector
- oriented text connector has been implemented, i's working, but still need futher improvement.
- left figure is the result for DETECT_MODE H, right figure for DETECT_MODE O
<img src="data/oriented_results/007.jpg" width=320 height=240 /><img src="data/oriented_results/007.jpg" width=320 height=240 />
<img src="data/oriented_results/008.jpg" width=320 height=480 /><img src="data/oriented_results/008.jpg" width=320 height=480 />
***
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
资源包含文件:lunwen文档+任务书+开题报告+文献综述+外文翻译+软件使用说明书+源码及数据集 流程分为两个部分,一是文本检测,二是文本识别。 文字检测的主要功能为:从图像中找到文字区域,并将文字区域从原始图像中分离出来。 文字识别的主要功能为:从分离出来的图像上,进行文字识别。 文字识别流程: 1)预处理:去噪(滤波算法)、图像增强、缩放,其目的是去除背景或者噪点,突出文字部分,并缩放图片为适于处理的大小 2)特征抽取:常用特征:边缘特征、笔画特征、结构特征、纹理特征。 3)识别:分类器,随机森林 、SVM、NN、CNN等神经网络。 本次设计的环境如下 软件环境 操作系统 ubantu 16.04 Tensorflow tensorflow1.3.0-gpu Python python2.7 硬件环境 CPU Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz GPU TITAN X (Pascal) 详细介绍参考:https://biyezuopin.blog.csdn.net/article/details/125342848
资源推荐
资源详情
资源评论
收起资源包目录
基于Python深度学习的文字检测识别系统.zip (118个子文件)
cython_nms.c 358KB
bbox.c 319KB
gpu_nms.c 285KB
checkpoint 121B
gpu_nms.cpp 285KB
nms_kernel.cu 5KB
VGGnet_fast_rcnn_iter_50000.ckpt.data-00000-of-00001 68.26MB
基于Python深度学习的文字检测识别系统 毕业论文.doc 3.04MB
外文翻译.doc 1010KB
基于Python深度学习的文字检测识别系统 开题报告.doc 57KB
基于Python深度学习的文字检测识别系统 任务书.doc 36KB
基于Python深度学习的文字检测识别系统 文献综述.doc 34KB
软件使用说明书.doc 21KB
weights_densenet.h5 18.92MB
gpu_nms.hpp 146B
VGGnet_fast_rcnn_iter_50000.ckpt.index 2KB
demo.jpg 1.06MB
005.jpg 691KB
006.jpg 460KB
demo_detect.jpg 400KB
005.jpg 382KB
009.jpg 354KB
006.jpg 273KB
004.jpg 259KB
009.jpg 238KB
004.jpg 227KB
001.jpg 206KB
003.jpg 174KB
008.jpg 170KB
007.jpg 160KB
001.jpg 157KB
demo_rec.jpg 144KB
008.jpg 121KB
007.jpg 77KB
003.jpg 61KB
002.jpg 49KB
002.jpg 22KB
LICENSE 11KB
README.md 2KB
README.md 4KB
VGGnet_fast_rcnn_iter_50000.ckpt.meta 634KB
VGG_imagenet.npy 1024B
图片1.png 617KB
图片2.png 487KB
V68ZM4U50~6TY1F1NLJGCTE.png 412KB
384%WU[6CD8X[A`(W5JFO3N.png 354KB
@PD)C$QQFEA0AX$Y]_`_2B8.png 203KB
2WEQ25ER[`6C0M0@]R)4W{X.png 200KB
_R3`PT75Y%T1LXXJO)HP5)N.png 191KB
1M1A`}U%94NVENCV8B%SXI5.png 157KB
PVNLE_3XC~EQY5@95J3073N.png 149KB
H[H(9WTR7%{]PNUC)}VPJ}M.png 144KB
`NQW[{NI{MLJ4F)UC]H@LVL.png 120KB
010.png 67KB
}M)(PLU8E){WSY{GUT~P5QP.png 53KB
}9F%1_IMM6%Z$SA][0(T7MM.png 41KB
010.png 13KB
network.py 18KB
keys.py 17KB
anchor_target_layer_tf.py 13KB
pascal_voc.py 10KB
config.py 10KB
train.py 9KB
minibatch.py 8KB
proposal_layer_tf.py 7KB
ToVoc.py 7KB
train.py 6KB
roidb.py 6KB
imdb.py 5KB
text_proposal_connector_oriented.py 4KB
split_label.py 4KB
demo.py 4KB
setup.py 4KB
text_detect.py 3KB
VGGnet_train.py 3KB
text_proposal_graph_builder.py 3KB
densenet.py 3KB
densenet.py 3KB
boxes_grid.py 3KB
ocr.py 3KB
bbox_transform.py 3KB
layer.py 3KB
text_proposal_connector.py 2KB
detectors.py 2KB
VGGnet_test.py 2KB
test.py 2KB
model.py 2KB
setup_cpu.py 2KB
blob.py 1KB
train_net.py 1KB
generate_anchors.py 1KB
ds_utils.py 1KB
other.py 1KB
__init__.py 1024B
__init__.py 1024B
__init__.py 1024B
factory.py 841B
demo.py 816B
__init__.py 554B
timer.py 552B
共 118 条
- 1
- 2
shejizuopin
- 粉丝: 1w+
- 资源: 1303
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 注塑技术员试题及答案.doc
- 自学考试房地产开发和经营重点.doc
- 江苏镇江市2018年中考语文试题答案和解析.doc
- 精神病学试题与答案.doc
- 教育行动研究报告的写作.doc
- 老年人常见疾病的护理知识.doc
- 考试后激励学生的话.doc
- 廉洁文化主题教育课教学案.doc
- 贫困家庭申请书范文(精选多篇).doc
- 培训机构教学计划.doc
- 全新版大学英语综合教程3contentquestions答案.doc
- 全科医师转岗培训理论考试题和正确答案.doc
- 全国居民健康素养知识问卷80题及答案.doc
- 服装公司薪酬福利管理手册.docx
- 服装薪酬体系-KPI绩效考核指标.xls
- 各岗位KPI绩效考核指标——服装生产企业.xls
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
- 1
- 2
- 3
- 4
前往页