-
Facebook
- London
- https://shamangary.github.io/
Stars
This is a replicate of DeepSeek-R1-Zero and DeepSeek-R1 training on small models with limited data
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.
[ICML'24 Spotlight] "TravelPlanner: A Benchmark for Real-World Planning with Language Agents"
Grandmaster-Level Chess Without Search
Fast and memory-efficient exact attention
(MAF-YOLOv2) with high parameter utilization and high precision
LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.
Official implementation of OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion
Open source Claude Artifacts – built with Llama 3.1 405B
Code at the speed of thought – Zed is a high-performance, multiplayer code editor from the creators of Atom and Tree-sitter.
A Comprehensive Benchmark for Document Parsing and Evaluation
Long Context Transfer from Language to Vision
【ArXiv】PDF-Wukong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling
A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.
YOLO models trained by DocLayNet - power your Document Intelligent by Layout Analysis
Implementation of Nougat Neural Optical Understanding for Academic Documents
Everything about the SmolLM2 and SmolVLM family of models
A simple screen parsing tool towards pure vision based GUI agent