
2 MindEye2 and 2.1 Shared-Subject Functional Alignment
2.2 Backbone, Diffusion Prior, & Submodules
2.3 Image Captioning and 2.4 Fine-tuning Stable Diffusion XL for unCLIP
3 Results and 3.1 fMRI-to-Image Reconstruction
3.3 Image/Brain Retrieval and 3.4 Brain Correlation
6 Acknowledgements and References
A Appendix
A.2 Additional Dataset Information
A.3 MindEye2 (not pretrained) vs. MindEye1
A.4 Reconstruction Evaluations Across Varying Amounts of Training Data
A.5 Single-Subject Evaluations
A.7 OpenCLIP BigG to CLIP L Conversion
A.9 Reconstruction Evaluations: Additional Information
A.10 Pretraining with Less Subjects
A.11 UMAP Dimensionality Reduction
A.13 Human Preference Experiments
Special thanks to Dustin Podell, Vikram Voleti, Andreas Blattmann, and Robin Rombach for technical assistance fine-tuning Stable Diffusion XL to support our unCLIP usecase. Thanks to the MedARC Discord community for being the public forum from which this research was developed, particularly thank you to Connor Lane, Alex Nguyen, Atmadeep Bannerjee, Amir Refaee, and Mohammed Baharoon for their helpful discussions. Thanks to Alessandro Gifford and Connor Lane for providing useful feedback on drafts of the manuscript. Thank you to Richard Vencu for help navigating the Stability AI HPC. Thanks to Stability AI for their support for open neuroAI research and providing the computational resources necessary to develop MindEye2. Collection of the Natural Scenes Dataset was supported by NSF IIS-1822683 and NSF IIS-1822929.
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning Transferable Visual Models From Natural Language Supervision, February 2021. URL http:// arxiv.org/abs/2103.00020. arXiv:2103.00020 [cs].
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-Resolution Image Synthesis with Latent Diffusion Models, April 2022. URL http:// arxiv.org/abs/2112.10752. arXiv:2112.10752 [cs].
Emily J. Allen, Ghislain St-Yves, Yihan Wu, Jesse L. Breedlove, Jacob S. Prince, Logan T. Dowdle, Matthias Nau, Brad Caron, Franco Pestilli, Ian Charest, J. Benjamin Hutchinson, Thomas Naselaris, and Kendrick Kay. A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence. Nature Neuroscience, 25(1):116–126, January 2022. ISSN 1097-6256, 1546-1726. doi: 10.1038/ s41593-021-00962-x. URL https://www.nature.com/ articles/s41593-021-00962-x.
Yu Takagi and Shinji Nishimoto. High-resolution image reconstruction with latent diffusion models from human brain activity. preprint, Neuroscience, November 2022. URL http://biorxiv.org/lookup/doi/10.1101/ 2022.11.18.517004.
Yu Takagi and Shinji Nishimoto. Improving visual image reconstruction from human brain activity using latent diffusion models via multiple decoded inputs, 2023.
Furkan Ozcelik, Bhavin Choksi, Milad Mozafari, Leila Reddy, and Rufin VanRullen. Reconstruction of Perceived Images from fMRI Patterns and Semantic Brain Exploration using InstanceConditioned GANs, February 2022. URL http://arxiv. org/abs/2202.12692. arXiv:2202.12692 [cs, eess, q-bio].
Furkan Ozcelik and Rufin VanRullen. Brain-Diffuser: Natural scene reconstruction from fMRI signals using generative latent diffusion, March 2023. URL http://arxiv.org/abs/ 2303.05334. arXiv:2303.05334 [cs, q-bio].
Guy Gaziv, Roman Beliy, Niv Granot, Assaf Hoogi, Francesca Strappini, Tal Golan, and Michal Irani. Self-supervised Natural Image Reconstruction and Large-scale Semantic Classification from Brain Activity. NeuroImage, 254:119121, July 2022. ISSN 10538119. doi: 10.1016/j.neuroimage.2022. 119121. URL https://linkinghub.elsevier.com/ retrieve/pii/S105381192200249X.
Paul Scotti, Atmadeep Banerjee, Jimmie Goode, Stepan Shabalin, Alex Nguyen, Ethan Cohen, Aidan Dempster, Nathalie Verlinde, Elad Yundler, David Weisberg, Kenneth Norman, and Tanishq Abraham. Reconstructing the Mind’s Eye: fMRI-to-Image with Contrastive Learning and Diffusion Priors. Advances in Neural Information Processing Systems, 36:24705– 24728, December 2023. URL https://proceedings. neurips.cc/paper_files/paper/2023/hash/ 4ddab70bf41ffe5d423840644d3357f4-Abstract-Conference. html.
Reese Kneeland, Jordyn Ojeda, Ghislain St-Yves, and Thomas Naselaris. Reconstructing seen images from human brain activity via guided stochastic search. Conference on Cognitive Computational Neuroscience, 2023a. doi: 10.32470/CCN.2023. 1672-0.
Reese Kneeland, Jordyn Ojeda, Ghislain St-Yves, and Thomas Naselaris. Second Sight: Using brain-optimized encoding models to align image distributions with human brain activity, June 2023b. URL http://arxiv.org/abs/2306.00927. arXiv:2306.00927 [cs, q-bio].
Reese Kneeland, Jordyn Ojeda, Ghislain St-Yves, and Thomas Naselaris. Brain-optimized inference improves reconstructions of fMRI brain activity, December 2023c. URL http: //arxiv.org/abs/2312.07705. arXiv:2312.07705 [cs, q-bio].
Matteo Ferrante, Tommaso Boccato, and Nicola Toschi. Through their eyes: multi-subject Brain Decoding with simple alignment techniques, August 2023a. URL http://arxiv.org/ abs/2309.00627. arXiv:2309.00627 [cs, q-bio].
Alexis Thual, Yohann Benchetrit, Felix Geilert, Jérémy Rapin, Iurii Makarov, Hubert Banville, and Jean-Rémi King. Aligning brain functions boosts the decoding of visual semantics in novel subjects, December 2023. URL http://arxiv.org/abs/ 2312.06467. arXiv:2312.06467 [cs, eess, q-bio].
Zijiao Chen, Jiaxin Qing, and Juan Helen Zhou. Cinematic Mindscapes: High-quality Video Reconstruction from Brain Activity, May 2023a. URL http://arxiv.org/abs/2305. 11675. arXiv:2305.11675 [cs].
Zijiao Chen, Jiaxin Qing, Tiange Xiang, Wan Lin Yue, and Juan Helen Zhou. Seeing Beyond the Brain: Conditional Diffusion Model with Sparse Masked Modeling for Vision Decoding, March 2023b. URL http://arxiv.org/abs/2211. 06956. arXiv:2211.06956 [cs].
Jingyuan Sun, Mingxiao Li, Zijiao Chen, Yunhao Zhang, Shaonan Wang, and Marie-Francine Moens. Contrast, Attend and Diffuse to Decode High-Resolution Images from Brain Activities, December 2023. URL http://arxiv.org/abs/2305. 17214. arXiv:2305.17214 [cs].
Weijian Mai and Zhijun Zhang. UniBrain: Unify Image Reconstruction and Captioning All in One Diffusion Model from Human Brain Activity, August 2023. URL http://arxiv. org/abs/2308.07428. arXiv:2308.07428 [cs].
Weihao Xia, Raoul de Charette, Cengiz Öztireli, and Jing-Hao Xue. DREAM: Visual Decoding from Reversing Human Visual System, October 2023. URL http://arxiv.org/abs/ 2310.02265. arXiv:2310.02265 [cs, eess, q-bio].
Roman Beliy, Guy Gaziv, Assaf Hoogi, Francesca Strappini, Tal Golan, and Michal Irani. From voxels to pixels and back: Self-supervision in natural-image reconstruction from fMRI, July 2019. URL http://arxiv.org/abs/1907. 02431. arXiv:1907.02431 [cs, eess, q-bio, stat].
Guohua Shen, Tomoyasu Horikawa, Kei Majima, and Yukiyasu Kamitani. Deep image reconstruction from human brain activity. PLOS Computational Biology, 15(1):e1006633, January 2019a. ISSN 1553-7358. doi: 10.1371/journal. pcbi.1006633. URL https://dx.plos.org/10.1371/ journal.pcbi.1006633.
Guohua Shen, Kshitij Dwivedi, Kei Majima, Tomoyasu Horikawa, and Yukiyasu Kamitani. End-to-End Deep Image Reconstruction From Human Brain Activity. Frontiers in Computational Neuroscience, 13, 2019b. ISSN 1662-5188. URL https://www.frontiersin.org/ articles/10.3389/fncom.2019.00021.
K. Seeliger, U. Güçlü, L. Ambrogioni, Y. Güçlütürk, and M.A.J. van Gerven. Generative adversarial networks for reconstructing natural images from brain activity. NeuroImage, 181:775–785, November 2018. ISSN 10538119. doi: 10.1016/j.neuroimage. 2018.07.043. URL https://linkinghub.elsevier. com/retrieve/pii/S105381191830658X.
Yunfeng Lin, Jiangbei Li, and Hanjing Wang. DCNNGAN: Reconstructing Realistic Image from fMRI, January 2019. URL http://arxiv.org/abs/1901.07368. arXiv:1901.07368 [cs, eess].
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical Text-Conditional Image Generation with CLIP Latents, April 2022. URL http://arxiv.org/ abs/2204.06125. arXiv:2204.06125 [cs].
Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, Patrick Schramowski, Srivatsa Kundurthy, Katherine Crowson, Ludwig Schmidt, Robert Kaczmarczyk, and Jenia Jitsev. LAION5B: An open large-scale dataset for training next generation image-text models, October 2022. URL http://arxiv. org/abs/2210.08402. arXiv:2210.08402 [cs].
Gabriel Ilharco, Mitchell Wortsman, Ross Wightman, Cade Gordon, Nicholas Carlini, Rohan Taori, Achal Dave, Vaishaal Shankar, Hongseok Namkoong, John Miller, Hannaneh Hajishirzi, Ali Farhadi, and Ludwig Schmidt. OpenCLIP, July 2021. URL https://doi.org/10.5281/zenodo. 5143773.
Sylvain Gugger, Lysandre Debut, Thomas Wolf, Philipp Schmid, Zachary Mueller, Sourab Mangrulkar, Marc Sun, and Benjamin Bossan. Accelerate: Training and inference at scale made simple, efficient and adaptable., 2022. URL https: //github.com/huggingface/accelerate.
Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, and Yuxiong He. ZeRO: Memory Optimizations Toward Training Trillion Parameter Models, May 2020. URL http://arxiv.org/ abs/1910.02054. arXiv:1910.02054 [cs, stat].
J. Talairach and P. Tournoux. Co-planar stereotaxic atlas of the human brain. 3-Dimensional proportional system: an approach to cerebral imaging. The Journal of Laryngology & Otology, 104(1):72–72, January 1990. ISSN 1748-5460, 0022-2151. doi: 10.1017/S0022215100111879. URL https://www.cambridge.org/core/journals/ journal-of-laryngology-and-otology/ article/abs/co-planar-stereotaxic-atlas-of-the-human-brain-3-dimensional-proportional-system-an-appro46C98B7A1D9ABB728CB5A5709C09AF89. Publisher: Cambridge University Press.
J Mazziotta, A Toga, A Evans, P Fox, J Lancaster, K Zilles, R Woods, T Paus, G Simpson, B Pike, C Holmes, L Collins, P Thompson, D MacDonald, M Iacoboni, T Schormann, K Amunts, N Palomero-Gallagher, S Geyer, L Parsons, K Narr, N Kabani, G Le Goualher, D Boomsma, T Cannon, R Kawashima, and B Mazoyer. A probabilistic atlas and reference system for the human brain: International Consortium for Brain Mapping (ICBM). Philosophical Transactions of the Royal Society of London. Series B, 356(1412):1293– 1322, August 2001. ISSN 0962-8436. doi: 10.1098/rstb.2001. 0915. URL https://www.ncbi.nlm.nih.gov/pmc/ articles/PMC1088516/.
James V. Haxby, J. Swaroop Guntupalli, Andrew C. Connolly, Yaroslav O. Halchenko, Bryan R. Conroy, M. Ida Gobbini, Michael Hanke, and Peter J. Ramadge. A Common, HighDimensional Model of the Representational Space in Human Ventral Temporal Cortex. Neuron, 72(2):404–416, October 2011. ISSN 08966273. doi: 10.1016/j.neuron.2011. 08.026. URL https://linkinghub.elsevier.com/ retrieve/pii/S0896627311007811.
Xiaohua Zhai, Xiao Wang, Basil Mustafa, Andreas Steiner, Daniel Keysers, Alexander Kolesnikov, and Lucas Beyer. LiT: Zero-Shot Transfer with Locked-image text Tuning, June 2022. URL http://arxiv.org/abs/2111.07991. arXiv:2111.07991 [cs].
Sungnyun Kim, Gihun Lee, Sangmin Bae, and Seyoung Yun. Mixco: Mix-up contrastive learning for visual representation. ArXiv, abs/2010.06300, 2020.
Adrien Bardes, Jean Ponce, and Yann LeCun. Vicregl: Self-supervised learning of local visual features. ArXiv, abs/2210.01571, 2022.
Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, and Lijuan Wang. GIT: A Generative Image-to-text Transformer for Vision and Language, December 2022. URL http://arxiv.org/abs/2205. 14100. arXiv:2205.14100 [cs].
Matteo Ferrante, Furkan Ozcelik, Tommaso Boccato, Rufin VanRullen, and Nicola Toschi. Brain Captioning: Decoding human brain activity into images and text, May 2023b. URL http:// arxiv.org/abs/2305.11560. arXiv:2305.11560 [cs].
Xingqian Xu, Zhangyang Wang, Eric Zhang, Kai Wang, and Humphrey Shi. Versatile Diffusion: Text, Images and Variations All in One Diffusion Model, March 2023. URL http:// arxiv.org/abs/2211.08332. arXiv:2211.08332 [cs].
Hu Ye, Jun Zhang, Sibo Liu, Xiao Han, and Wei Yang. IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models, August 2023. URL http://arxiv.org/ abs/2308.06721. arXiv:2308.06721 [cs].
Justin Pinkney. Lambda Diffusers, 2022. URL https:// github.com/LambdaLabsML/lambda-diffusers. publicationType: misc; publisher: GitHub; journal: GitHub repository.
Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations, January 2022. URL http://arxiv.org/abs/ 2108.01073. arXiv:2108.01073 [cs].
Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis, July 2023. URL http://arxiv.org/ abs/2307.01952. arXiv:2307.01952 [cs].
Shanchuan Lin, Bingchen Liu, Jiashi Li, and Xiao Yang. Common Diffusion Noise Schedules and Sample Steps are Flawed, January 2024. URL http://arxiv.org/abs/2305. 08891. arXiv:2305.08891 [cs].
Nicholas Guttenberg. Diffusion with Offset Noise, 2023. URL https://www.crosslabs.org//blog/ diffusion-with-offset-noise.
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. Microsoft COCO: Common Objects in Context. In David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars, editors, Computer Vision – ECCV 2014, Lecture Notes in Computer Science, pages 740–755, Cham, 2014. Springer International Publishing. ISBN 978-3-319-10602-1. doi: 10.1007/978-3-319-10602-1_48.
Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, April 2004. ISSN 1941-0042. doi: 10.1109/TIP.2003.819861. Conference Name: IEEE Transactions on Image Processing.
Mingxing Tan and Quoc V. Le. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, September 2020. URL http://arxiv.org/abs/1905.11946. arXiv:1905.11946 [cs, stat].
Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, and Armand Joulin. Unsupervised Learning of Visual Features by Contrasting Cluster Assignments, January 2021. URL http://arxiv.org/abs/2006. 09882. arXiv:2006.09882 [cs].
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language Models are Unsupervised Multitask Learners. 2019.
Chin-Yew Lin. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain, July 2004. Association for Computational Linguistics. URL https://aclanthology.org/ W04-1013.
Satanjeev Banerjee and Alon Lavie. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In Jade Goldstein, Alon Lavie, Chin-Yew Lin, and Clare Voss, editors, Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 65–72, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics. URL https://aclanthology.org/W05-0909.
Nils Reimers and Iryna Gurevych. Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation, October 2020. URL http://arxiv.org/abs/2004.09813. arXiv:2004.09813 [cs].
Sikun Lin, Thomas Sprague, and Ambuj K. Singh. Mind Reader: Reconstructing complex images from brain activities, September 2022. URL http://arxiv.org/abs/2210.01769. arXiv:2210.01769 [cs, eess, q-bio].
Thomas Naselaris, Kendrick N. Kay, Shinji Nishimoto, and Jack L. Gallant. Encoding and decoding in fMRI. NeuroImage, 56(2), 2011. doi: 10.1016/j.neuroimage.2010.07.073.
Ghislain St-Yves, Emily J. Allen, Yihan Wu, Kendrick Kay, and Thomas Naselaris. Brain-optimized neural networks learn nonhierarchical models of representation in human visual cortex. bioRxiv, 2022. doi: 10.1101/2022.01.21.477293.
Po-Hsuan (Cameron) Chen, Janice Chen, Yaara Yeshurun, Uri Hasson, James Haxby, and Peter J Ramadge. A ReducedDimension fMRI Shared Response Model. In Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015. URL https://papers. nips.cc/paper_files/paper/2015/hash/ b3967a0e938dc2a6340e258630febd5a-Abstract. html.
Jessie Huang, Erica L. Busch, Tom Wallenstein, Michal Gerasimiuk, Andrew Benz, Guillaume Lajoie, Guy Wolf, Nicholas B. Turk-Browne, and Smita Krishnaswamy. Learning shared neural manifolds from multi-subject FMRI data, December 2021. URL http://arxiv.org/abs/2201.00622. arXiv:2201.00622 [cs, eess, q-bio].
Samuel A Nastase, Valeria Gazzola, Uri Hasson, and Christian Keysers. Measuring shared responses across subjects using intersubject correlation. Social Cognitive and Affective Neuroscience, page nsz037, May 2019. ISSN 1749-5016, 1749-5024. doi: 10.1093/scan/nsz037. URL https: //academic.oup.com/scan/advance-article/ doi/10.1093/scan/nsz037/5489905.
Erica L. Busch, Lukas Slipski, Ma Feilong, J. Swaroop Guntupalli, Matteo Visconti di Oleggio Castello, Jeremy F. Huckins, Samuel A. Nastase, M. Ida Gobbini, Tor D. Wager, and James V. Haxby. Hybrid hyperalignment: A single high-dimensional model of shared information embedded in cortical patterns of response and functional connectivity. NeuroImage, 233:117975, June 2021. ISSN 10538119. doi: 10.1016/j.neuroimage.2021. 117975. URL https://linkinghub.elsevier.com/ retrieve/pii/S1053811921002524.
Steffen Schneider, Jin Hwa Lee, and Mackenzie Weygandt Mathis. Learnable latent embeddings for joint behavioural and neural analysis. Nature, 617(7960):360– 368, May 2023. ISSN 1476-4687. doi: 10.1038/ s41586-023-06031-6. URL https://www.nature.com/ articles/s41586-023-06031-6. Number: 7960 Publisher: Nature Publishing Group.
Alexandre Défossez, Charlotte Caucheteux, Jérémy Rapin, Ori Kabeli, and Jean-Rémi King. Decoding speech from non-invasive brain recordings, August 2022. URL http://arxiv.org/ abs/2208.12266. arXiv:2208.12266 [cs, eess, q-bio].
Yohann Benchetrit, Hubert Banville, and Jean-Remi King. Brain decoding: toward real-time reconstruction of visual perception. October 2023. URL https://openreview.net/ forum?id=3y1K6buO8c.
Huzheng Yang, James Gee, and Jianbo Shi. Memory Encoding Model, August 2023. URL http://arxiv.org/abs/ 2308.01175. arXiv:2308.01175 [cs].
Connor Lane and Gregory Kiar. A Parameter-efficient Multisubject Model for Predicting fMRI Activity, August 2023. URL https://arxiv.org/abs/2308.02351v1.
Jerry Tang, Amanda LeBel, Shailee Jain, and Alexander G. Huth. Semantic reconstruction of continuous language from non-invasive brain recordings. Nature Neuroscience, pages 1–9, May 2023. ISSN 1546-1726. doi: 10.1038/ s41593-023-01304-9. URL https://www.nature.com/ articles/s41593-023-01304-9. Publisher: Nature Publishing Group.
Martin M. Monti, Audrey Vanhaudenhuyse, Martin R. Coleman, Melanie Boly, John D. Pickard, Luaba Tshibanda, Adrian M. Owen, and Steven Laureys. Willful Modulation of Brain Activity in Disorders of Consciousness. New England Journal of Medicine, 362(7):579– 589, February 2010. ISSN 0028-4793. doi: 10.1056/ NEJMoa0905370. URL https://doi.org/10.1056/ NEJMoa0905370. Publisher: Massachusetts Medical Society _eprint: https://doi.org/10.1056/NEJMoa0905370.
Grant Wallace, Stephen Polcyn, Paula P. Brooks, Anne C. Mennen, Ke Zhao, Paul S. Scotti, Sebastian Michelmann, Kai Li, Nicholas B. Turk-Browne, Jonathan D. Cohen, and Kenneth A. Norman. RT-Cloud: A cloud-based software framework to simplify and standardize real-time fMRI. NeuroImage, 257:119295, August 2022. ISSN 10538119. doi: 10.1016/j.neuroimage.2022. 119295. URL https://linkinghub.elsevier.com/ retrieve/pii/S1053811922004141.
Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, and Lucas Beyer. Sigmoid Loss for Language Image Pre-Training, September 2023. URL http://arxiv.org/abs/2303.15343. arXiv:2303.15343 [cs].
Jacob S Prince, Ian Charest, Jan W Kurzawski, John A Pyles, Michael J Tarr, and Kendrick N Kay. Improving the accuracy of single-trial fMRI response estimates using GLMsingle. eLife, 11:e77599, November 2022. ISSN 2050-084X. doi: 10.7554/ eLife.77599. URL https://doi.org/10.7554/eLife. 77599. Publisher: eLife Sciences Publications, Ltd.
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium, January 2018. URL http://arxiv.org/abs/ 1706.08500. arXiv:1706.08500 [cs, stat].
Romain Beaumont. Clip Retrieval: Easily compute clip embeddings and build a clip retrieval system with them, 2022. URL https://github.com/rom1504/ clip-retrieval. publicationType: misc; publisher: GitHub; journal: GitHub repository.
Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. The Faiss library. 2024. _eprint: 2401.08281.
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012. URL https://proceedings. neurips.cc/paper_files/paper/2012/hash/ c399862d3b9d6b76c8436e924a68c45b-Abstract.html.
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the Inception Architecture for Computer Vision. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2818–2826, June 2016. doi: 10.1109/CVPR.2016.308. ISSN: 1063-6919.
Leland McInnes, John Healy, and James Melville. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction, September 2020. URL http://arxiv.org/ abs/1802.03426. arXiv:1802.03426 [cs, stat].
Weixin Liang, Yuhui Zhang, Yongchan Kwon, Serena Yeung, and James Zou. Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning, October 2022. URL http://arxiv.org/abs/2203.02053. arXiv:2203.02053 [cs].
Emily J. Allen, Ghislain St-Yves, Yihan Wu, Jesse L. Breedlove, Logan T. Dowdle, Brad Caron, Franco Pestilli, Ian Charest, J. Benjamin Hutchinson, Thomas Naselaris, and Kendrick Kay. A massive 7T fMRI dataset to bridge cognitive and computational neuroscience. preprint, Neuroscience, February 2021. URL http://biorxiv.org/lookup/doi/10.1101/ 2021.02.22.432340.
Gabriel H. Sarch, Michael J. Tarr, Katerina Fragkiadaki, and Leila Wehbe. Brain Dissection: fMRI-trained Networks Reveal Spatial Selectivity in the Processing of Natural Images, May 2023. URL https://www.biorxiv.org/ content/10.1101/2023.05.29.542635v1. Pages: 2023.05.29.542635 Section: New Results.
Andrew Luo, Margaret Marie Henderson, Michael J. Tarr, and Leila Wehbe. BrainSCUBA: Fine-Grained Natural Language Captions of Visual Cortex Selectivity. October 2023a. URL https://openreview.net/forum? id=mQYHXUUTkU&referrer=%5Bthe%20profile% 20of%20Leila%20Wehbe%5D(%2Fprofile%3Fid% 3D~Leila_Wehbe1).
Andrew F. Luo, Margaret M. Henderson, Leila Wehbe, and Michael J. Tarr. Brain Diffusion for Visual Exploration: Cortical Discovery using Large Scale Generative Models, November 2023b. URL http://arxiv.org/abs/2306.03089. arXiv:2306.03089 [cs].
This paper is available on arxiv under CC BY 4.0 DEED license.
Authors:
(1) Paul S. Scotti, Stability AI and Medical AI Research Center (MedARC);
(2) Mihir Tripathy, Medical AI Research Center (MedARC) and a Core contribution;
(3) Cesar Kadir Torrico Villanueva, Medical AI Research Center (MedARC) and a Core contribution;
(4) Reese Kneeland, University of Minnesota and a Core contribution;
(5) Tong Chen, The University of Sydney and Medical AI Research Center (MedARC);
(6) Ashutosh Narang, Medical AI Research Center (MedARC);
(7) Charan Santhirasegaran, Medical AI Research Center (MedARC);
(8) Jonathan Xu, University of Waterloo and Medical AI Research Center (MedARC);
(9) Thomas Naselaris, University of Minnesota;
(10) Kenneth A. Norman, Princeton Neuroscience Institute;
(11) Tanishq Mathew Abraham, Stability AI and Medical AI Research Center (MedARC).