paperec

ABSTRACT:

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-...

ABSTRACT:

Long-context modeling is crucial for next-generation language models, yet the high compu tational cost of standard attention mechanisms poses significant computational challenges. Sparse attention offers a promising direction for improving efficiency while maintaining model capabilities. We present NSA, a Natively trainable Sparse Attention mechanism that integrates algorithmic innovations with hardware-aligned optimizations to achieve efficient long-context modeling. NSA employs a dynamic hierarchical sparse strategy, combining coar...

ABSTRACT:

Diffusion-based image generators are becoming unique methods for illumination harmonization and editing. The current bottleneck in scaling up the training of diffusion-based illumination editing models is mainly in the difficulty of preserving the underlying image details and maintaining intrinsic properties, such as albedos, unchanged. Without appropriate constraints, directly training the latest large image models with complex, varied, or in-the-wild data is likely to produce a structure-guided random image generator, rather than achieving ...

ABSTRACT:

Wereport the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer based model pre-trained to predict the next token in a document. The post-training alignment process results in improved performance on measures o...

ABSTRACT:

Neuro-Symbolic (NeSy) AI could be regarded as an analogy to human dual-process cognition, modeling the intuitive Sys tem1withneuralnetworksandthealgorithmic System 2with symbolic reasoning. However, for complex learning targets, NeSy systems often generate outputs inconsistent with do mainknowledgeanditischallenging to rectify them. Inspired by the human Cognitive Reflection, which promptly detects errors in our intuitive response and revises them by invoking the System 2reasoning, we propose to improve NeSysystems byintroduci...

ABSTRACT:

A fundamental task in multi-agent systems is to match n agents to n alternatives (e.g., resources or tasks). Often, this is accomplished by eliciting agents’ ordinal rankings over the alternatives instead of their exact numerical utilities. While this simplifies elicitation, the incomplete information leads to ineffi ciency, captured by a worst-case measure called distortion. A recent line of work shows that making just a few queries to each agent regarding their cardinal utility for an alternative can significantly improve the disto...

PAPEREC