MexSWIN represents a novel architecture designed specifically for generating images from text descriptions. This innovative system leverages the power of transformers to bridge the gap between textual input and visual output. By employing a unique combination of encoding strategies, MexSWIN achieves remarkable results in creating diverse and cohere