PulseAugur / Brief
LIVE 20:59:46

Brief

last 24h
[50/436] 186 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. Automated ICD Classification of Psychiatric Diagnoses: From Classical NLP to Large Language Models

    Researchers have developed an automated system to classify psychiatric diagnoses using Natural Language Processing and Machine Learning techniques, mapping free-text clinical descriptions to the International Classification of Diseases (ICD). The study evaluated various text representation methods on a dataset of over 145,000 Spanish psychiatric descriptions. Results showed that transformer-based models, particularly the e5_large model fine-tuned for the task, significantly outperformed traditional methods, achieving a micro F1 score of 0.866. AI

    Automated ICD Classification of Psychiatric Diagnoses: From Classical NLP to Large Language Models

    IMPACT Demonstrates LLM potential in specialized clinical domains, potentially reducing administrative burden and improving diagnostic consistency.

  2. Smarter edits? Post-editing with error highlights and translation suggestions

    A new research paper explores the effectiveness of AI-driven error highlighting and correction suggestions for professional translators. The study found that while these tools did not improve productivity or translation quality compared to standard post-editing, the AI-generated error highlights were better received than those derived from quality estimation. Furthermore, the inclusion of correction suggestions enhanced the overall user experience for translators. AI

    Smarter edits? Post-editing with error highlights and translation suggestions

    IMPACT AI-driven suggestions can improve translator experience, though current productivity gains are limited.

  3. Distill to Think, Foresee to Act: Cognitive-Physical Reinforcement Learning for Autonomous Driving

    Researchers have introduced CoPhy, a novel cognitive-physical reinforcement learning framework designed to enhance autonomous driving capabilities. This framework integrates knowledge from large vision-language models into a Bird's-Eye View encoder to provide cognitive understanding without increased inference cost. It also features an auto-regressive world model that predicts future semantic maps based on potential actions, creating a sandbox for deriving safety metrics. CoPhy utilizes a dual-reward mechanism to optimize driving policies, ensuring both safety compliance and adherence to user-defined language instructions, and has demonstrated state-of-the-art performance on driving benchmarks. AI

    Distill to Think, Foresee to Act: Cognitive-Physical Reinforcement Learning for Autonomous Driving

    IMPACT Introduces a new framework for autonomous driving that aims to improve safety and intent compliance through advanced RL techniques.

  4. SurgOnAir: Hierarchy-Aware Real-Time Surgical Video Commentary

    Researchers have developed SurgOnAir, a novel streaming vision-language model designed for real-time surgical video commentary. Unlike previous offline methods, SurgOnAir processes video frames sequentially to generate narration tokens as visual input becomes available, enabling immediate responsiveness to surgical dynamics. The model is trained on the SurgOnAir-11k dataset, which includes hierarchical supervision for action, step, and phase levels, allowing it to produce multi-level, hierarchy-aware textual responses and explicitly mark key workflow transitions. AI

    SurgOnAir: Hierarchy-Aware Real-Time Surgical Video Commentary

    IMPACT Enables real-time AI assistance in surgery by providing immediate, context-aware commentary on surgical procedures.

  5. Google addressed over 200 internal Chrome vulnerabilities from March to May 2026, a surge coinciding with its adoption of AI security tools. # Cybersecurity # A

    Google has seen a significant increase in internal Chrome vulnerability reports, with over 200 identified between March and May 2026. This surge appears to coincide with the company's integration of AI-powered security tools into its development process. The adoption of these AI tools may be contributing to the higher detection rate of security flaws within the Chrome browser. AI

    IMPACT Increased AI adoption in security tools may lead to faster vulnerability detection and patching in software development.

  6. LoCar: Localization-Aware Evaluation of In-Vehicle Assistants through Fine-Grained Sociolinguistic Control

    Researchers have developed a new evaluation framework called LoCar to assess in-vehicle AI assistants, specifically focusing on Korean language localization. The study found that current large language models struggle with consistent control of Korean honorifics and show weaker performance in strategic conversational aspects like clarification and proactivity. These findings highlight the need for automotive AI to prioritize precise linguistic tailoring and safety-oriented interaction management over general competence. AI

    LoCar: Localization-Aware Evaluation of In-Vehicle Assistants through Fine-Grained Sociolinguistic Control

    IMPACT Introduces a specialized evaluation framework to improve the linguistic precision and safety of in-vehicle AI assistants.

  7. AIMBio-Mat: An AI-Native FAIR Platform for Closed-Loop Materials Discovery and Biomedical Translation

    Researchers have introduced AIMBio-Mat, a conceptual framework designed to integrate materials discovery with biomedical translation. This AI-native platform aims to link material properties, processing, and biological responses with safety and governance considerations. The framework proposes a blueprint for transforming disparate data into actionable discovery workflows, with a minimum viable prototype for AI-guided nanomaterials in drug delivery. AI

    AIMBio-Mat: An AI-Native FAIR Platform for Closed-Loop Materials Discovery and Biomedical Translation

    IMPACT Provides a blueprint for integrating AI into materials discovery and biomedical translation, potentially accelerating the development of new therapies and materials.

  8. TextSculptor: Training and Benchmarking Scene Text Editing

    Researchers have introduced TextSculptor, a new framework designed to improve scene text editing in images. This framework includes an automated data construction pipeline that generates a large dataset of 3.2 million samples for text-to-image synthesis and text editing tasks. Additionally, TextSculptor provides a benchmark suite covering four core editing functions: addition, replacement, removal, and hybrid editing, aiming to enhance the performance of open-source models in this domain. AI

    TextSculptor: Training and Benchmarking Scene Text Editing

    IMPACT Enhances open-source capabilities for precise text manipulation in images, potentially improving applications like content creation and accessibility tools.

  9. GradeLegal: Automated Grading for German Legal Cases

    Researchers have developed a system called GradeLegal to automate the grading of German legal exam solutions using large language models. The study evaluated 27 different LLMs and various prompting strategies, finding that reasoning-oriented models can achieve high agreement with expert graders in public law, reaching a quadratic weighted kappa of 0.91. However, performance in criminal law was lower, indicating a more challenging task. Ensembling multiple models further improved grading accuracy, offering a potential alternative to top-tier proprietary models. AI

    GradeLegal: Automated Grading for German Legal Cases

    IMPACT Automated grading systems could streamline feedback for legal students and reduce bottlenecks for educators.

  10. Blog Update: Google's Object-Oriented Programming Specialized Code Editor "Antigravity" Has Evolved into a Standalone App, No Longer VSCode-Based, So I Decided to Immediately Try Making "Something Like Daytona USA" https://kanoayu.cloudfree.jp/2026/05/21/%ef%bd%b8%ef%b

    The AI-powered code editor Antigravity, developed by Google, has transitioned from a VSCode-based platform to a standalone application. This evolution allows for enhanced capabilities and a more specialized user experience for developers. The author plans to utilize the updated editor to create a game reminiscent of Daytona USA. AI

    IMPACT Standalone AI code editor enhances developer tools and workflows.

  11. Fine-grained Claim-level RAG Benchmark for Law

    Researchers have developed ClaimRAG-LAW, a new benchmark dataset designed to evaluate retrieval-augmented generation (RAG) systems in the legal domain. This dataset supports both French and English, catering to both legal experts and non-experts with diverse question types. Initial evaluations using ClaimRAG-LAW revealed limitations in the retrieval and generation capabilities of current state-of-the-art legal RAG systems. AI

    Fine-grained Claim-level RAG Benchmark for Law

    IMPACT This new benchmark aims to improve the accuracy and reliability of AI systems in the legal field, potentially leading to more trustworthy legal AI applications.

  12. Other World Computing Announces OWC Stack AI™, the World's First* Thunderbolt™ 5 Compatible AI Accelerator and Storage Hub, Offering a New Choice: "AI at Your Fingertips" https://www.yayafa.com/2805173/ # AgenticAi # AI # Artifici

    Other World Computing (OWC) has launched the OWC Stack AI, a new storage hub and AI accelerator. This device is notable for being the first to support Thunderbolt 5 technology. It aims to bring AI capabilities directly to users' workstations. AI

    Other World Computing Announces OWC Stack AI™, the World's First* Thunderbolt™ 5 Compatible AI Accelerator and Storage Hub, Offering a New Choice: "AI at Your Fingertips" https://www.yayafa.com/2805173/ # AgenticAi # AI # Artifici

    IMPACT Provides localized AI acceleration and storage for workstations, potentially improving performance for AI tasks on personal machines.

  13. Should I Buy Cursor Pro Plan?

    Cursor, an AI-powered code editor, is being evaluated by users regarding its Pro plan's performance and potential limitations. Users are inquiring about sustained performance over time, specifically whether they will encounter limits or errors after extended use. The discussion centers on the value proposition of the Pro plan for individuals dedicating significant daily time to coding. AI

    IMPACT Users are discussing the practical performance and potential limitations of an AI-powered coding tool, impacting developer workflow.

  14. Towards Physically Consistent 4D Scene Reconstruction for Closed-loop Autonomous Driving Simulation

    Researchers have developed a new method called Orthogonal Projected Gradient (OPG) to improve 4D scene reconstruction for autonomous driving simulations. Existing methods struggle to accurately model both novel-view synthesis and time-varying information simultaneously. OPG addresses this by first ensuring the integrity of spatial representations and then restricting temporal updates to the spatial null space, preventing divergence in parameter estimation. A temporal regularization strategy further refines the scene by enforcing smoothness based on physical appearance evolution, ensuring reconstructed scenes are physically consistent. AI

    Towards Physically Consistent 4D Scene Reconstruction for Closed-loop Autonomous Driving Simulation

    IMPACT Improves the fidelity of simulations used to train autonomous driving systems, potentially accelerating development and safety validation.

  15. Building a Custom Taxonomy of AI Skills and Tasks from the Ground Up with Job Postings

    Researchers have developed a blueprint called TaxonomyBuilder to systematically construct taxonomies of AI skills from job postings. Their study, using two large job posting corpora, found that filtering input data leads to better domain-specific coverage than using unfiltered data for clustering and LLM-enhanced labeling tools. This approach aims to efficiently map complex domains like AI skills in the workplace. AI

    Building a Custom Taxonomy of AI Skills and Tasks from the Ground Up with Job Postings

    IMPACT Provides a structured method for understanding and categorizing AI skills, potentially aiding in workforce development and talent acquisition.

  16. Beyond Text-to-SQL: An Agentic LLM System for Governed Enterprise Analytics APIs

    Researchers have developed Analytic Agent, an LLM-based system designed to securely interact with enterprise analytics APIs using natural language. This system addresses the limitations of Text-to-SQL by enabling non-technical users to access complex, governed data through APIs rather than raw databases. Analytic Agent translates user intents into API calls, validates permissions, and generates compliant visualizations, demonstrating reliability on 90 real-world enterprise use cases. AI

    Beyond Text-to-SQL: An Agentic LLM System for Governed Enterprise Analytics APIs

    IMPACT Enables non-technical users to securely access governed enterprise data through natural language, potentially improving business intelligence workflows.

  17. LiteViLNet: Lightweight Vision-LiDAR Fusion Network for Efficient Road Segmentation

    Researchers have developed LiteViLNet, a new lightweight neural network designed for efficient road segmentation in autonomous driving systems. This network effectively fuses RGB camera data with LiDAR geometric information, utilizing a dual-stream lightweight encoder and depth-wise separable convolutions. LiteViLNet achieves a competitive accuracy of 96.36% MaxF score with only 14.04 million parameters, outperforming many heavier models in inference speed and demonstrating its suitability for resource-constrained edge devices. AI

    LiteViLNet: Lightweight Vision-LiDAR Fusion Network for Efficient Road Segmentation

    IMPACT Enables more efficient and accurate road segmentation for autonomous systems on edge devices.

  18. Hybrid Machine Learning Model for Forest Height Estimation from TanDEM-X and Landsat Data

    Researchers have developed a hybrid machine learning model that integrates optical Landsat data with existing TanDEM-X interferometric measurements to improve forest height estimation. This enhanced model addresses ambiguities in previous methods by incorporating complementary information about forest type and structure. Validation against airborne LiDAR data showed a significant reduction in error, confirming the benefit of using multispectral inputs for more accurate remote sensing of forest parameters. AI

    Hybrid Machine Learning Model for Forest Height Estimation from TanDEM-X and Landsat Data

    IMPACT Enhances remote sensing capabilities for environmental monitoring and resource management.

  19. I guess my prompt is too heavy 😳

    A Reddit user reported that the Cursor IDE consumed an unexpectedly large amount of memory, displaying a message indicating it was using gigabytes of RAM. The user expressed surprise at the high memory usage, noting that only three windows were open at the time. AI

    I guess my prompt is too heavy 😳

    IMPACT Indicates potential performance issues or resource management challenges in AI-powered development tools.

  20. DrawMotion: Generating 3D Human Motions by Freehand Drawing

    Researchers have developed DrawMotion, a diffusion-based framework for generating 3D human motions that incorporates both text and hand-drawn sketches as input conditions. This dual-condition approach allows for more precise control over motion generation, with the hand-drawn element providing spatial guidance. Experiments show that using freehand drawings can reduce the time required for motion generation by nearly half compared to text-only methods. AI

    DrawMotion: Generating 3D Human Motions by Freehand Drawing

    IMPACT Enables more intuitive and efficient creation of 3D animations by combining text and visual input.

  21. 3D Reconstruction and Knowledge Distillation to Improve Multi-View Image Models to Explore Spike Volume Estimation in Wheat

    Researchers have developed a novel hybrid approach to estimate wheat spike volume using a combination of 3D reconstruction and knowledge distillation techniques. This method aims to overcome the challenges of traditional measurement methods, which are either computationally expensive or sensitive to environmental conditions. By distilling knowledge from a 3D model into a 2D image-based Transformer, the system achieves a significant reduction in mean absolute error and inference time, making it suitable for high-throughput field phenotyping. AI

    3D Reconstruction and Knowledge Distillation to Improve Multi-View Image Models to Explore Spike Volume Estimation in Wheat

    IMPACT Enables more efficient and accurate crop yield analysis through advanced AI-driven image processing.

  22. PaintCopilot: Modeling Painting as Autonomous Artistic Continuation

    Researchers have introduced PaintCopilot, a novel AI system designed to assist in artistic painting by modeling the creative process as an autonomous continuation of prior artistic actions. Unlike methods that aim to reconstruct a target image, PaintCopilot generates future brushstrokes based on learned artistic dynamics and the evolving state of the canvas. The system comprises three models that predict artist intent, generate temporally coherent strokes, and synthesize localized sequences, enabling fluid co-creative workflows where artists and AI alternate control. AI

    PaintCopilot: Modeling Painting as Autonomous Artistic Continuation

    IMPACT Introduces a new AI paradigm for creative tools, potentially enabling more intuitive human-AI co-creation in visual arts.

  23. With aluminum prices up 20%, recycling startups bet on AI to cash in https://techcrunch.com/2026/05/21/with-aluminum-prices-up-20-recycling-startups-bet-on-ai-t

    Aluminum recycling startups are increasingly leveraging artificial intelligence to improve their operations and capitalize on rising aluminum prices. These companies are integrating AI technologies to enhance sorting accuracy, optimize processing efficiency, and ultimately increase the yield of recycled aluminum. This strategic adoption of AI aims to make recycling more economically viable and environmentally sustainable. AI

    IMPACT AI integration in recycling can improve resource efficiency and sustainability, potentially lowering costs for manufacturers.

  24. Bridging Structure and Language: Graph-Based Visual Reasoning for Autonomous Road Understanding

    Researchers have developed a new framework called the Combined Road Substrate (CRS) to improve visual reasoning for autonomous driving. CRS integrates geometric road structure with open-vocabulary semantics, allowing for more precise road understanding than current vision-language models. Training smaller models with CRS-enriched scenes significantly enhances their compositional reasoning abilities, shifting failure modes from relational understanding to attribute recognition, indicating that structured supervision is key rather than just model scale. AI

    Bridging Structure and Language: Graph-Based Visual Reasoning for Autonomous Road Understanding

    IMPACT Enhances AI's ability to perform complex reasoning for autonomous driving by providing structured supervision.

  25. New York City Mayor Zohran Mamdani is launching a Twitch show

    New York City Mayor Zohran Mamdani is launching a new Twitch show called "Talk with the People," set to premiere on May 21st. The show aims to engage with constituents by answering questions directly from the live chat about local issues. Mamdani plans to stream the series across multiple platforms, including YouTube and Facebook, to maximize reach. AI

    New York City Mayor Zohran Mamdani is launching a Twitch show

    IMPACT This initiative by a city mayor to engage constituents via a Twitch show has minimal direct impact on AI operators or the broader AI industry.

  26. GenAI-Driven Threat Detection with Microsoft Security Copilot

    Microsoft has developed a Dynamic Threat Detection Agent (DTDA) integrated into its Security Copilot, designed to autonomously investigate security incidents and generate new detection logic. This agent utilizes a unified timeline of security data, LLM prompt contracts, and a planner-executor loop to identify hidden threats. In evaluations, DTDA achieved 80.1% precision and generated novel alerts for about 15% of investigated incidents, demonstrating its capability to find missed malicious activity at scale. AI

    GenAI-Driven Threat Detection with Microsoft Security Copilot

    IMPACT Autonomous AI agents can now identify missed malicious activity at production scale, improving cybersecurity.

  27. VISTA: Technical Report for the Ego4D Short-Term Object Interaction Anticipation at EgoVis 2026

    Researchers have developed VISTA, a system designed to anticipate human-object interactions in egocentric videos. VISTA combines spatial object detection with temporal context from video clips to predict future interactions, including object location, action categories, and timing. The system achieved first place in the EgoVis 2026 Ego4D Short-Term Object Interaction Anticipation Challenge. AI

    VISTA: Technical Report for the Ego4D Short-Term Object Interaction Anticipation at EgoVis 2026

    IMPACT This research advances egocentric video understanding and interaction prediction, potentially improving applications in robotics and augmented reality.

  28. Governance by Construction for Generalist Agents

    Researchers have developed a policy system called CUGA designed to provide governance for generalist AI agents operating in enterprise environments. This system acts as a modular, policy-as-code layer that integrates with existing LLM agents without requiring model fine-tuning. CUGA enforces governance through five checkpoints: intent guarding, steering reasoning via playbooks, enforcing tool usage, human-in-the-loop approvals for risky actions, and output formatting. The system aims to ensure predictable, auditable, and compliance-aware behavior in complex workflows, as demonstrated in a healthcare scenario. AI

    Governance by Construction for Generalist Agents

    IMPACT Introduces a novel policy-as-code framework to enhance safety and compliance for enterprise AI agents without model retraining.

  29. Google ruined Antigravity quotas. Thinking about moving to Cursor Pro, but how are the limits?

    A web developer is seeking alternatives to Google's Antigravity IDE after recent changes to its AI model quotas have rendered it unusable for their workflow. The developer previously relied on a Google AI Pro subscription for unlimited access to Gemini 3 Flash, which significantly boosted productivity by allowing simultaneous context of API and front-end code. Now, with drastically reduced quotas, they are inquiring about the usage limits and reliability of Cursor Pro for similar tasks. AI

    IMPACT Developers are evaluating AI tool usability and cost-effectiveness based on changing quota structures.

  30. AMD says its $4K Ryzen AI Halo workstation practically pays for itself

    AMD has launched its Ryzen AI Halo workstation, priced at $4,000, which the company claims can pay for itself through efficiency gains. The workstation is designed for AI-intensive tasks and aims to provide a cost-effective solution for professionals. This release highlights AMD's strategy to integrate AI capabilities directly into their hardware offerings. AI

    AMD says its $4K Ryzen AI Halo workstation practically pays for itself

    IMPACT Offers a dedicated hardware solution for AI tasks, potentially improving efficiency for professionals using AI tools.

  31. GraphRAG on Consumer Hardware: Benchmarking Local LLMs for Healthcare EHR Schema Retrieval

    Researchers evaluated the GraphRAG pipeline for retrieving information from Electronic Health Record (EHR) schemas using open-source large language models deployed on consumer hardware. The study benchmarked models like Llama 3.1, Mistral, Qwen 2.5, and Phi-4-mini on a single GPU, assessing indexing efficiency, knowledge graph construction, latency, and answer quality. Results indicated that models below approximately 7 billion parameters struggle with structured output errors, and local retrieval generally outperformed global summarization in terms of speed and factual accuracy. AI

    GraphRAG on Consumer Hardware: Benchmarking Local LLMs for Healthcare EHR Schema Retrieval

    IMPACT Demonstrates the feasibility of using smaller, locally deployed LLMs for complex tasks like EHR schema retrieval, potentially improving privacy and reducing costs in healthcare.

  32. Spatial Gram Alignment for Ultra-High-Resolution Image Synthesis

    Researchers have introduced Spatial Gram Alignment (SGA), a new framework designed to improve ultra-high-resolution image synthesis using large-scale pre-trained Latent Diffusion Models (LDMs). Traditional methods struggle with extreme resolutions due to a conflict between learnability and fidelity, where direct feature distillation can degrade generation quality. SGA addresses this by aligning self-similarities of generative features with foundation model priors, preserving microscopic pixel-level fidelity while ensuring macroscopic structural coherence. AI

    Spatial Gram Alignment for Ultra-High-Resolution Image Synthesis

    IMPACT Enables more detailed and structurally coherent ultra-high-resolution image generation, potentially improving applications in digital art and media.

  33. Decomposing Subject-Driven Image Generation via Intermediate Structural Prediction

    Researchers have developed a new two-stage framework for subject-driven text-to-image generation that first predicts a structural map (like a Canny edge map) and then renders the final image using both appearance and structure. This approach aims to better preserve high-frequency details such as logos, patterns, and text, which are often degraded in existing methods. To enhance text handling, they also created a large dataset of 100,000 image pairs with textual consistency, and evaluations using GPT-4.1 showed significant improvements over baseline methods. AI

    Decomposing Subject-Driven Image Generation via Intermediate Structural Prediction

    IMPACT This research offers a novel approach to improving the fidelity of text-to-image generation, particularly for preserving fine details and text.

  34. Google Confirms 2 Critical New Flaws—How To Jump The Update Queue

    Google has confirmed two critical security vulnerabilities in its Chrome browser, identified as CVE-2026-9111 and CVE-2026-9110. These flaws affect WebRTC and the Chrome user interface, respectively. While Google is rolling out an automatic update over the coming days and weeks, users can manually initiate the update by navigating to Help > About Google Chrome within the browser. AI

    Google Confirms 2 Critical New Flaws—How To Jump The Update Queue

    IMPACT Minimal direct impact on AI operations; focuses on web browser security.

  35. torchtune: PyTorch native post-training library

    A new PyTorch-native library called torchtune has been introduced to simplify the post-training phase for large language models. This library focuses on modularity and direct access to PyTorch components, aiming to facilitate efficient fine-tuning, experimentation, and deployment. Torchtune is designed to be highly flexible for research iteration and has demonstrated competitive performance and memory efficiency compared to existing frameworks like Axolotl and Unsloth. AI

    IMPACT Provides a flexible, PyTorch-native framework for LLM fine-tuning, potentially accelerating research and reproducible LLM development.

  36. roto 2.0: The Robot Tactile Olympiad

    Researchers have introduced roto 2.0, a new benchmark for tactile-based reinforcement learning in robotics. This benchmark utilizes GPU parallelism and focuses on end-to-end "blind" manipulation tasks across four different robotic morphologies. The team demonstrated a significant performance improvement, with their agents achieving 13 Baoding ball rotations in 10 seconds, which is substantially faster than existing methods. By open-sourcing the environments and baseline models, they aim to lower the entry barrier for researchers in this field. AI

    IMPACT Introduces a standardized benchmark to accelerate research and development in tactile-based robotic manipulation.

  37. Ordering Matters: Rank-Aware Selective Fusion for Blended Emotion Recognition

    Researchers have developed a novel framework for recognizing blended emotions by selectively fusing information from multiple pre-extracted video and audio encoders. This rank-aware approach uses an attention-based gating module to identify and combine the most informative encoders, improving accuracy in distinguishing subtle and overlapping multimodal cues. The system also incorporates unsupervised domain adaptation to enhance robustness and was recognized with a second-place ranking in the BlEmoRE challenge. AI

    IMPACT Introduces a novel method for improving the accuracy and robustness of AI systems designed for nuanced emotion recognition.

  38. AttriStory: Fine-grained Attribute Realization for Visual Storytelling with Diffusion Models

    Researchers have introduced AttriStory, a new benchmark and method for improving fine-grained attribute realization in visual storytelling generated by diffusion models. The system addresses the challenge of ensuring specific attributes like clothing color and textures are accurately depicted across narrative scenes. AttriStory utilizes a plug-and-play latent optimization module and a novel AttriLoss objective to guide the diffusion model during the early stages of image generation, enhancing attribute control without altering existing story generation pipelines. AI

    AttriStory: Fine-grained Attribute Realization for Visual Storytelling with Diffusion Models

    IMPACT Enhances control over specific visual details in AI-generated narratives, moving towards more precise attribute-driven storytelling.

  39. iTryOn: Mastering Interactive Video Virtual Try-On with Spatial-Semantic Guidance

    Researchers have introduced iTryOn, a new framework designed to enhance interactive virtual try-on experiences in videos. This system addresses the limitations of current methods by enabling subjects to actively interact with their clothing, a feature previously overlooked. iTryOn utilizes a video diffusion Transformer with a multi-level interaction injection mechanism, incorporating a 3D hand prior for spatial guidance and global/action captions for semantic understanding. AI

    IMPACT Enables more dynamic and controllable virtual try-on experiences by allowing active garment interaction.

  40. AIGaitor: Privacy-preserving and cloud-free motion analysis for everyone, using edge computing

    Researchers have developed AIGaitor, a novel system for motion analysis that operates entirely on a smartphone, eliminating the need for cloud processing. This approach addresses key barriers in clinical motion capture, such as cost, complexity, and privacy concerns, as identified by rehabilitation clinicians. AIGaitor utilizes on-device neural accelerators to perform markerless monocular motion capture and deep-learning analysis, achieving processing times comparable to cloud-based systems. AI

    IMPACT Enables accessible, private, and low-cost motion analysis for clinical and personal use via consumer smartphones.

  41. HiRes: Inspectable Precedent Memory for Reaction Condition Recommendation

    Researchers have developed HiRes, a new system for recommending chemical reaction conditions that integrates learned representations with a k-NN retrieval layer. This approach provides both accurate predictions and the specific chemical precedents that justify them. HiRes achieves state-of-the-art performance on the USPTO-Condition dataset for catalyst, solvent, and reagent selection, outperforming previous models and demonstrating statistically significant gains over purely parametric methods. AI

    IMPACT Enhances AI's utility in chemical synthesis planning by providing interpretable and accurate reaction condition recommendations.

  42. Teaching AI Through Benchmark Construction: QuestBench as a Course-Based Practice for Accountable Knowledge Work

    Researchers have developed QuestBench, a new benchmark designed to teach students how to evaluate AI systems by having them construct verification tasks. This approach exposes students to the complexities of AI-era knowledge work, encouraging them to define what constitutes a trustworthy AI-generated answer. Evaluations on QuestBench, which covers 14 humanities and social science domains, revealed significant failure rates for current AI systems, with even the top performer, GPT-5.5, achieving only a 57.58% pass rate on student-designed questions. AI

    IMPACT Highlights the limitations of current AI in nuanced knowledge domains, suggesting a need for improved evaluation methods beyond simple task completion.

  43. VBFDD-Agent for Electric Vehicle Battery Fault Detection and Diagnosis: Descriptive Text Modeling of Battery Digital Signals

    Researchers have developed VBFDD-Agent, a novel system designed for detecting and diagnosing faults in electric vehicle batteries. This agent utilizes a descriptive text modeling approach, transforming raw battery data into natural language descriptions to create a specialized corpus. By integrating this corpus with maintenance manuals and large language model reasoning, VBFDD-Agent provides structured diagnostic results and actionable maintenance recommendations, enhancing human-AI collaboration in battery health management. AI

    VBFDD-Agent for Electric Vehicle Battery Fault Detection and Diagnosis: Descriptive Text Modeling of Battery Digital Signals

    IMPACT Introduces a new method for AI-driven diagnostics in electric vehicles, potentially improving safety and maintenance efficiency.

  44. SpineContextResUNet: A Computationally Efficient Residual UNet for Spine CT Segmentation

    Researchers have developed SpineContextResUNet, a new 3D Residual U-Net architecture designed for efficient segmentation of spinal CT scans. This model addresses the high computational demands of existing methods by using a lightweight Context Block with parallel multi-dilated convolutions, avoiding the need for resource-intensive Transformers or RNNs. SpineContextResUNet achieves high accuracy on public benchmarks and demonstrates viable inference performance on commodity hardware, making it suitable for point-of-care diagnostics and edge devices. AI

    SpineContextResUNet: A Computationally Efficient Residual UNet for Spine CT Segmentation

    IMPACT Enables more accessible AI-driven medical diagnostics on low-resource hardware.

  45. PACD-Net: Pseudo-Augmented Contrastive Distillation for Glycemic Control Estimation from SMBG

    Researchers have developed PACD-Net, a novel self-supervised framework designed to estimate glycemic control metrics from sparse self-monitoring of blood glucose (SMBG) data. This approach uses pseudo-SMBG samples as teacher signals and contrastive learning to ensure consistent representations across different sampling patterns. The model, which employs a hybrid Swin Transformer-CNN backbone, demonstrates superior accuracy and stability compared to existing methods for estimating Time Above Range, Time in Range, and Time Below Range from real-world SMBG data, particularly under extremely sparse conditions. AI

    PACD-Net: Pseudo-Augmented Contrastive Distillation for Glycemic Control Estimation from SMBG

    IMPACT Offers a practical tool for interpreting clinical SMBG data and a generalizable method for learning from sparse sensor data.

  46. Closed Loop Dynamic Driving Data Mixture for Real-Synthetic Co-Training

    Researchers have developed AutoScale, a novel closed-loop system designed to optimize the mixture of real and synthetic data for training autonomous driving models. This system dynamically adjusts the data mixture based on performance feedback, addressing the challenges of scene bias and inefficient data utilization in current co-training methods. AutoScale employs Graph Regularized AutoEncoder for scene representation and Cluster-aware Gradient Ascent for reweighting, demonstrating improved performance with fewer synthetic samples under budget constraints. AI

    IMPACT This approach could lead to more efficient and effective training of autonomous driving systems by optimizing data usage.

  47. Draw2Think: Harnessing Geometry Reasoning through Constraint Engine Interaction

    Researchers have developed Draw2Think, a new framework that enhances geometric reasoning in vision-language models by interacting with the GeoGebra constraint engine. This system uses a Propose-Draw-Verify loop to externalize hypotheses onto an executable canvas, ensuring geometric accuracy and allowing for auditable checks on both model construction and engine measurements. Draw2Think significantly improves the accuracy of geometric problem-solving and rendering scores on various benchmarks. AI

    Draw2Think: Harnessing Geometry Reasoning through Constraint Engine Interaction

    IMPACT Improves geometric reasoning capabilities in vision-language models, potentially leading to more accurate AI systems for tasks involving spatial understanding.

  48. A Non-Reference Diffusion-Based Restoration Framework for Landsat 7 ETM+ SLC-off Imagery in Antarctica

    Researchers have developed DiffGF, a novel framework designed to restore corrupted Landsat 7 satellite imagery from Antarctica. This method utilizes a diffusion-based approach in latent and pixel spaces, eliminating the need for external reference data, which is often unavailable or unreliable for the rapidly changing Antarctic landscape. A new dataset, SLCANT, was created to train and evaluate DiffGF, demonstrating its effectiveness in high-fidelity image restoration and its utility in downstream applications like crevasse segmentation. AI

    IMPACT Enables better utilization of historical satellite data for environmental monitoring and research in challenging regions.

  49. Sketch2MinSurf: Vision-Language Guided Generation of Editable Minimal Surfaces from Hand-Drawn Sketches

    Researchers have developed Sketch2MinSurf, a novel framework for generating editable 3D minimal surfaces from hand-drawn sketches. This approach combines vision-language guidance with geometric optimization, addressing the challenges of non-Euclidean surface representation and topological consistency. The system utilizes a spatial-topological encoding and a specialized loss function to ensure both accurate reconstruction and coherent topology, producing artifact-free, editable manifolds suitable for design workflows. AI

    Sketch2MinSurf: Vision-Language Guided Generation of Editable Minimal Surfaces from Hand-Drawn Sketches

    IMPACT Enables more intuitive and direct creation of complex 3D models for design and art applications.

  50. MTR-Suite: A Framework for Evaluating and Synthesizing Conversational Retrieval Benchmarks

    Researchers have developed MTR-Suite, a new framework designed to improve the evaluation and creation of conversational retrieval benchmarks. This suite includes MTR-Eval, an LLM-based tool for identifying alignment gaps in existing benchmarks, and MTR-Pipeline, a multi-agent system that generates high-fidelity dialogues at a significantly reduced cost. The framework also introduces MTR-Bench, a comprehensive benchmark that simulates real-world conversational challenges like topic switching and verbosity, offering enhanced discriminative power for retrieval-augmented generation systems. AI

    MTR-Suite: A Framework for Evaluating and Synthesizing Conversational Retrieval Benchmarks

    IMPACT MTR-Suite aims to improve the evaluation and creation of benchmarks for retrieval-augmented generation systems, potentially leading to more accurate and robust AI assistants.