2024/11 45

INFERENCE OPTIMAL VLMS NEED ONLY ONEVISUAL TOKEN BUT LARGER MODELS

https://arxiv.org/pdf/2411.03312https://github.com/locuslab/llava-token-compression.Let me format the text with line breaks for each sentence: Vision Language Models (VLMs) have demonstrated strong capabilities across various visual understanding and reasoning tasks. However, their real-world deployment is often constrained by high latency during inference due to substantial compute required to ..

카테고리 없음 2024.11.11

DeeR-VLA: Dynamic Inference of MultimodalLarge Language Models for Efficient Robot Execution

https://arxiv.org/pdf/2411.02359 https://github.com/yueyang130/DeeR-VLA.Let me help you format each sentence with line breaks: Multimodal Large Language Models (MLLMs) have demonstrated remarkable comprehension and reasoning capabilities with complex language and visual data. These advances have spurred the vision of establishing a generalist robotic MLLM proficient in understanding complex huma..

카테고리 없음 2024.11.11

Nearest Neighbor Normalization Improves Multimodal Retrieval 논문리뷰

https://arxiv.org/pdf/2410.24114https://github.com/multimodal-interpretability/nnn GitHub - multimodal-interpretability/nnn: Nearest Neighbor Normalization (EMNLP 2024)Nearest Neighbor Normalization (EMNLP 2024). Contribute to multimodal-interpretability/nnn development by creating an account on GitHub.github.comMultimodal models leverage large-scale pretraining to achieve strong but still imper..

카테고리 없음 2024.11.05

HelpSteer2: Open-source dataset for trainingtop-performing reward models

https://arxiv.org/pdf/2406.08673High-quality preference datasets are essential for training reward models that can effectively guide large language models (LLMs) in generating high-quality responses aligned with human preferences.  As LLMs become stronger and better aligned, permissively licensed preference datasets, such as Open Assistant, HHRLHF, and HelpSteer need to be updated to remain effe..

카테고리 없음 2024.11.04

Style over Substance: Failure Modes of LLM Judges in Alignment Benchmarking 논문리뷰

The release of ChatGPT in November 2022 sparked an explosion of interest in post-training and an avalanche of new preference optimization (PO) methods.These methods claim superior alignment by virtue of better correspondence with human pairwise preferences, often measured by LLM-judges.In this work, we attempt to answer the following question – do LLM-judge preferences translate to progress on o..

카테고리 없음 2024.11.03

RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style 논문리뷰

https://huggingface.co/papers/2410.16184https://arxiv.org/pdf/2410.16184Reward models are critical in techniques like Reinforcement Learning from Human Feedback (RLHF) and Inference Scaling Laws, where they guide language model alignment and select optimal responses. Despite their importance, existing reward model benchmarks often evaluate models by asking them to distinguish between responses g..

카테고리 없음 2024.11.03

BoNBoN Alignment for Large Language Modelsand the Sweetness of Best-of-n Samplin

https://arxiv.org/pdf/2406.00832 This paper concerns the problem of aligning samples from large language models to human preferences using best-of-n sampling, where we draw n samples, rank them, and return the best one. We consider two fundamental problems. First: what is the relationship between best-of-n and approaches to alignment that train LLMs to output samples with a high expected reward ..

카테고리 없음 2024.11.01