LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.

Python 300 13 Updated Jan 13, 2025

google-deepmind / alphadev

Python 701 74 Updated Jun 20, 2023

wanghao9610 / OV-DINO

Official implementation of OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion

Python 287 17 Updated Jan 17, 2025

DS4SD / docling

Get your documents ready for gen AI

Python 19,133 1,011 Updated Jan 26, 2025

apple / ml-cross-entropy

Python 309 24 Updated Dec 31, 2024

virattt / ai-hedge-fund

An AI Hedge Fund Team

Python 7,122 1,375 Updated Jan 25, 2025

Nutlope / llamacoder

Open source Claude Artifacts – built with Llama 3.1 405B

TypeScript 5,328 1,130 Updated Jan 22, 2025

zed-industries / zed

Code at the speed of thought – Zed is a high-performance, multiplayer code editor from the creators of Atom and Tree-sitter.

Rust 53,479 3,414 Updated Jan 26, 2025

voideditor / void

TypeScript 9,507 522 Updated Jan 26, 2025

opendatalab / OmniDocBench

A Comprehensive Benchmark for Document Parsing and Evaluation

Python 210 20 Updated Jan 17, 2025

EvolvingLMMs-Lab / LongVA

Long Context Transfer from Language to Vision

Python 357 19 Updated Nov 20, 2024

yh-hust / PDF-Wukong

【ArXiv】PDF-Wukong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling

109 4 Updated Oct 18, 2024

ppaanngggg / layoutreader

A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.

Python 160 11 Updated May 23, 2024

ppaanngggg / yolo-doclaynet

YOLO models trained by DocLayNet - power your Document Intelligent by Layout Analysis

Python 83 16 Updated Jan 6, 2025

facebookresearch / nougat

Implementation of Nougat Neural Optical Understanding for Academic Documents

Python 9,166 587 Updated Apr 16, 2024

philschmid / deep-learning-pytorch-huggingface

Jupyter Notebook 793 184 Updated Jan 23, 2025

LLaVA-VL / LLaVA-NeXT

Python 3,317 297 Updated Oct 16, 2024

huggingface / smollm

Everything about the SmolLM2 and SmolVLM family of models

Python 1,619 85 Updated Jan 24, 2025

microsoft / OmniParser

A simple screen parsing tool towards pure vision based GUI agent

Jupyter Notebook 5,585 442 Updated Jan 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tsun-Yi Yang shamangary

Achievements

Achievements

Block or report shamangary

Stars

hkust-nlp / simpleRL-reason

WooSunghyeon / paca

deepseek-ai / DeepSeek-R1

apple / ml-depth-pro

OSU-NLP-Group / TravelPlanner

vsubramaniam851 / multiagent-ft

google-deepmind / searchless_chess

MiniMax-AI / MiniMax-01

Dao-AILab / flash-attention

yang-0201 / MHAF-YOLO

Y-Sui / Table-meets-LLM

ictnlp / LLaVA-Mini