Bert
PulseAugur coverage of Bert — every cluster mentioning Bert across labs, papers, and developer communities, ranked by signal.
- used by Roberta 90%
- instance of natural language processing 90%
- instance of DagsHub 90%
- instance of Roberta 70%
- instance of T5 Text To Text Transfer Transformer 70%
- used by ModernBERT 70%
- competes with T5 Text To Text Transfer Transformer 70%
- used by natural language processing 60%
- authored by Eugene Yan 50%
19 day(s) with sentiment data
-
Post-training LLMs offer complex, in-demand alternative to benchmarking
A Reddit user proposes post-training large language models as a more intellectually engaging alternative to simply benchmarking downloaded models. The user, who has four years of experience in supervised fine-tuning (SF…
-
BERT Overfitting Addressed by SBERT Solution
This article discusses the issue of overfitting when fine-tuning pre-trained language models like BERT on smaller, domain-specific datasets. It proposes SBERT as a solution, leveraging a geometrical perspective to addre…
-
Guide to selecting embedding models for AI projects
This article discusses how to choose embedding models for AI projects. It explains that embeddings represent abstract data as numerical vectors, where similar values indicate semantic and mathematical closeness, making …
-
New metric quantifies AI model vulnerability to hardware faults
Researchers have developed a new metric called Parameter Vulnerability Factor (PVF) to quantify the susceptibility of AI models to hardware faults, specifically silent data corruptions (SDCs). This metric aims to standa…
-
LLM GatorTron-3.9B outperforms BERT in predicting heart failure risk for cancer patients
Researchers from the University of Florida Health have developed a study using large language models (LLMs) to predict heart failure risk in cancer patients. The study utilized electronic health records from over 12,000…
-
BERT models outperform Llama 4 Maverick in climate news framing analysis
A new research paper compares two methods for detecting threat and solution framing in German climate news: fine-tuned BERT models and few-shot prompting with Llama 4 Maverick. The study found that fine-tuned BERT class…
-
MARBERT model enhances Arabic tweet analysis for STC customer service
Researchers have developed a new method for sentiment and spam detection in Arabic tweets using the MARBERT model. This approach aims to improve customer service for Saudi Telecom Company (STC) by analyzing feedback on …
-
CoLA framework enhances multimodal AI adaptation with dual-path LoRA
Researchers have introduced CoLA (Cross-Modal Low-rank Adaptation), a novel framework designed to efficiently adapt foundation models for multimodal tasks. Unlike existing methods that adapt each modality in isolation, …
-
AI fine-tuning: Dataset quality overshadows technical parameters
This article emphasizes the critical importance of high-quality datasets for fine-tuning AI models, arguing that dataset construction is often overlooked in favor of technical parameters like learning rate and quantizat…
-
Hugging Face uses local models for free OpenClaw repo triage
Hugging Face demonstrated how to use local, open-weight models for triaging issues and pull requests in the OpenClaw repository. This approach leverages models like Gemma and Qwen within an agent harness, offering a cos…
-
ChatGPT's advanced capabilities stem from internal state, not just autocomplete
Large language models like ChatGPT are more than simple autocomplete tools, despite predicting text one token at a time. The process involves a complex internal state that interprets the input context, topic, and tone, …
-
Many-shot ICL matches BERT performance in NER tasks
A new research paper explores the effectiveness of many-shot in-context learning (ICL) for Named Entity Recognition (NER) using large language models (LLMs). The study found that by scaling ICL to hundreds of examples, …
-
New method pre-trains Tsetlin Machines with language model clusters for interpretability
Researchers have developed a novel framework to enhance the interpretability of Tsetlin Machines (TMs) by integrating knowledge from pre-trained language models like BERT. This method groups text samples into semantic c…
-
New framework improves implicit hate speech detection generalization
Researchers have developed ImpSH, a new framework designed to improve the generalizability of implicit hate speech detection models. This triplet-based approach aligns posts with their implied statements and uses contex…
-
New technique improves SPLADE retrieval models with larger encoders
Researchers have identified a performance degradation issue when using larger, more powerful pretrained encoders with SPLADE, a neural sparse retrieval model. This problem, termed a "scale mismatch" in the MLM head, can…
-
New EHR models leverage ICD code hierarchy for improved predictions
Researchers have developed new methods for electronic health record (EHR) foundation models to better utilize the hierarchical structure of ICD diagnosis codes. Current models treat these codes as flat tokens, ignoring …
-
New framework enables zero-shot captioning of Indonesian traditional clothing
Researchers have developed Custom ZeroCLIP, a novel retrieval-augmented vision-language framework designed for the zero-shot captioning of traditional Indonesian clothing. This system utilizes a combination of CLIP and …
-
Hugging Face Transformers library simplifies AI model integration
The Hugging Face Transformers library has become a cornerstone for AI development, simplifying the process of loading and utilizing pre-trained models. Initially a chatbot startup, Hugging Face pivoted to open-source to…
-
New RePAIR architecture learns chess concepts via self-supervised learning
Researchers have developed a new self-supervised learning architecture called RePAIR, which combines elements of MAE, JEPA, and BERT. This architecture is designed to encode sequential data, such as chess positions, int…
-
DeepSeek-R1-8B fine-tuned for financial NER with LoRA and NEFTune
Researchers have fine-tuned the DeepSeek-R1-8B language model for financial named-entity recognition (NER) tasks. By employing Low-Rank Adaptation (LoRA) and Noisy Embedding Fine-Tuning (NEFTune), the adapted model achi…