Fork from [https://github.com/eragonruan/text-detection-ctpn.git](https://github.com/eragonruan/text-detection-ctpn.git)
Add for CPU ,if you want apply to GPU ,git clone https://github.com/eragonruan/text-detection-ctpn.git
# text-detection-ctpn
text detection mainly based on ctpn (connectionist text proposal network). It is implemented in tensorflow. I use id card detect as an example to demonstrate the results, but it should be noticing that this model can be used in almost every horizontal scene text detection task. The origin paper can be found [here](https://arxiv.org/abs/1609.03605). Also, the origin repo in caffe can be found in [here](https://github.com/tianzhi0549/CTPN). This repo is mainly based on faster rcnn framework, so there remains tons of useless code. I'm still working on it. For more detail about the paper and code, see this [blog](http://slade-ruan.me/2017/10/22/text-detection-ctpn/)
***
# setup
- requirements: tensorflow1.3, cython0.24, opencv-python, easydict,(recommend to install Anaconda)
- build the library
```shell
cd lib/utils
chmod +x make.sh
./make.sh
```
***
# parameters
there are some parameters you may need to modify according to your requirement, you can find them in ctpn/text.yml
- USE_GPU_NMS # whether to use nms implemented in cuda,if you do not have a gpu device,follow here to [setup](https://github.com/eragonruan/text-detection-ctpn/issues/43)
- DETECT_MODE # H represents horizontal mode, O represents oriented mode, default is H
***
# demo
put your images in data/demo, the results will be saved in data/results, and run demo in the root
```shell
python ./ctpn/demo.py
```
***
# training
## prepare data
- First, download the pre-trained model of VGG net and put it in data/pretrain/VGG_imagenet.npy. you can download it from [google drive](https://drive.google.com/open?id=0B_WmJoEtfQhDRl82b1dJTjB2ZGc) or [baidu yun](https://pan.baidu.com/s/1kUNTl1l).
- Second, prepare the training data as referred in paper, or you can download the data I prepared from previous link. Or you can prepare your own data according to the following steps.
- Modify the path and gt_path in prepare_training_data/split_label.py according to your dataset. And run
```shell
cd prepare_training_data
python split_label.py
```
- it will generate the prepared data in current folder, and then run
```shell
python ToVoc.py
```
- to convert the prepared training data into voc format. It will generate a folder named TEXTVOC. move this folder to data/ and then run
```shell
cd ../data
ln -s TEXTVOC VOCdevkit2007
```
## train
Simplely run
```shell
python ./ctpn/train_net.py
```
- you can modify some hyper parameters in ctpn/text.yml, or just used the parameters I set.
- The model I provided in checkpoints is trained on GTX1070 for 50k iters.
- If you are using cuda nms, it takes about 0.2s per iter. So it will takes about 2.5 hours to finished 50k iterations.
***
# roadmap
- [x] cython nms
- [x] cuda nms
- [x] python2/python3 compatblity
- [x] tensorflow1.3
- [x] delete useless code
- [x] loss function as referred in paper
- [x] oriented text connector
- [ ] side refinement
- [ ] model optimization
***
# some results
`NOTICE:` all the photos used below are collected from the internet. If it affects you, please contact me to delete them.
<img src="/data/results/002.jpg" width=320 height=240 /><img src="/data/results/003.jpg" width=320 height=240 />
<img src="/data/results/009.jpg" width=320 height=480 /><img src="/data/results/010.png" width=320 height=320 />
<img src="/data/results/IMG_0708.png" width=320 height=480 /><img src="/data/results/CgREFFmZV8uAde7ZAABbuGILFDY720.jpg" width=320 height=480 />
<img src="/data/results/car.jpg" width=320 height=480 /><img src="/data/results/CgREFFmX_VWAXK-sAAGOUUyUl5Q448.jpg" width=320 height=480 />
***
# comparison of horizontal and oriented text connector
- oriented text connector has been implemented, i's working, but still need futher improvement.
- left figure is the result for DETECT_MODE H, right figure for DETECT_MODE O
<img src="/data/results/007.jpg" width=320 height=240 /><img src="/data/oriented_results/007.jpg" width=320 height=240 />
<img src="/data/results/008.jpg" width=320 height=480 /><img src="/data/oriented_results/008.jpg" width=320 height=480 />
***
Scikit-learn
- 粉丝: 5306
- 资源: 4326
最新资源
- 西门子CPU 224XP全方案资料:图纸、PCB、BIN文件及元件清单,三层结构单板集成方案,cpu224xp生产方案 西门子224XP方案,图纸,PCB,BIn文件,原件清单,什么都 全 单板的只
- 基于FX5U GX WORKS3平台的Modbus控制变频器实现多段速与定位控制系统的设计与应用,FX5U GX WORKS3平台编码器+modbus控制台达变频器 多段速 案例实现10个位置的定位
- 基于32单片机实现的蓝牙遥控智能避障小车设计详解:超声波测距、红外传感器避障及蓝牙/Wi-Fi远程控制,利用易安卓开发环境和E4A手机APP编译软件 ,基于32单片机蓝牙遥控智能避障小车 设计详解 1
- "优质Labview框架分享:涵盖众多VI源码供学习交流,适用于软件初学者及进阶者参考学习(附源码)",搜集到的很好的labview框架,里面有很多vi可以借鉴参考学习 注: 1.该程序框架主要用来
- ABAQUS复合地基承载力模拟研究:源码解析与应用,ABAQUS复合地基承载力数值模拟,源文件 ,核心关键词:ABAQUS; 复合地基; 承载力; 数值模拟; 源文件;,"ABAQUS模拟复合地基
- FPGA Verilog SPI主机源码实测无时序问题,附从机代码分享,fpga verilog SPI主机源码,实测160m无时序问题,送从机代码 ,核心关键词:FPGA; Verilog; SPI
- 基于无速度矢量控制的飞车启动技术,全速度段追踪与s函数应用,纯C代码实现,适用于DSP28X系列的可移植资料分享,无速度矢量控制+飞车启动,全速度段可追踪,s functin,纯C代码,大厂内部资料
- 污水处理项目全套资料:WinCC 7.0编程带西门子300程序注释,工艺流程图及报表中文注释完整呈现,污水处理 有wincc7.0项目带西门子300程序带注释,有 工艺流程图,报表,wincc里的所有
- 基于C#的TCP Socket服务器端通信源码,商业级物联网项目核心代码,支持多连接与数据读取,简易管理,适合初学者和项目需求使用 ,Socket tcp服务器端通信源码,C#编写,服务器端部分,这是
- 昆仑通态MCGS样板程序集:实时报表查询、动画画面展示、报警监控导出、U盘数据导出及多行业应用样板程序集成,支持Modbus通信和数据转发功能 ,昆仑通态(MCGS)样板程序,包含历史实时报表查询导出
- 西门子S7-200SMART PLC使用PROFINET协议控制G120变频器实例教程:实现频率设定与显示、正反转控制功能手册与固件升级包齐全,西门子S7-200SMART型PLC使用PROFINET
- 西门子PLC与MM420变频器多元控制程序:数字模拟量控制、USS通讯及实时调整功能方案,西门子S7-200PLC与MM420变频器数字量模拟量USS 通讯控制西门子224XP型PLC和MCGS触摸屏
- "PCS储能逆变并网模型:背靠背三电平设计+SVPWM算法,全功能介绍与技术特性分析",PCS储能逆变并网模型,包括: (1)逆变侧采用背靠背三电平设计,SVPWM控制算法,中点平衡算法,马鞍波 L
- "ABB ACS510变频器在恒压供水设计中的应用:灵活控制1至多台水泵,自动循环运行,工作效率平衡且与昆仑通态触摸屏无缝集成",恒压供水设计:可以用ABB的ACS510变频器:功能1:一台变频器拖动
- MATLAB AHP层次分析法:自写代码实现完美运行权重设计解决方案,MATLAB AHP AHP层次分析法code 自写代码 完美运行 权重设计 ,核心关键词:MATLAB; AHP层次分析法
- 翼星求生服务器搭建【Icarus Dedicated Server For Linux】
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈