New DICModel enhances ICT image captioning with multi-modal LLMs

By PulseAugur Editorial · [1 sources] · 2026-05-08 04:00

Researchers have developed a novel Domain-specific Image Captioning Model (DICModel) designed for the ICT industry, utilizing a multi-stage progressive training strategy. This approach combines synthesized image-text pairs with expert annotations to enhance the model's understanding of domain-specific visual information. The DICModel, with only 7 billion parameters, demonstrates superior performance compared to larger state-of-the-art models, significantly improving BLEU metrics and accuracy on domain-specific questions. AI

IMPACT This research could improve the extraction of visual information in specialized domains, potentially advancing multimodal AI capabilities.

RANK_REASON This is a research paper detailing a new model and training methodology. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New DICModel enhances ICT image captioning with multi-modal LLMs

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Lianying Chao, Kai Zhang, Haoran Cai, Sijie Wu, Xubin Li, Xin Chen · 2026-05-08 04:00

Multi-Modal LLM based Image Captioning in ICT: Bridging the Gap Between General and Industry Domain

arXiv:2601.09298v2 Announce Type: replace Abstract: In the information and communications technology (ICT) industry, training a domain-specific large language model (LLM) or constructing a retrieval-augmented generation system requires a substantial amount of high-value domain kn…

COVERAGE [1]

Multi-Modal LLM based Image Captioning in ICT: Bridging the Gap Between General and Industry Domain

RELATED ENTITIES

RELATED TOPICS