The multimodal fusion for natural human-computer interaction involves complex intelligent architectures which are subject to the unexpected errors and mistakes of users. These architectures should react to events occurring simultaneously, and possibly redundantly, from different input media. In this paper, intelligent agent-based generic architectures for multimedia multimodal dialog protocols are proposed. Global agents are decomposed into their relevant components. Each element is modeled separately. The elementary models are then linked together to obtain the full architecture. The generic components of the application are then monitored by an agent-based expert system which can then perform dynamic changes in reconfiguration, adaptation, and evolution at the architectural level. For validation purposes, the proposed multiagent architectures and their dynamic reconfiguration are applied to practical examples, including a W3C application.