AI research

CityDreamer creates unlimited 3D cities

CityDreamer, a generative AI model, creates unbounded 3D cities by separating the generation of building instances from other background objects. This model allows for better handling of the diverse appearance of buildings in urban environments – one of the main challenges compared to generating natural environments, as methods such as GANCraft do. To enhance the …

CityDreamer creates unlimited 3D cities Read More »

ChatGPT does years of student research in a fraction of an hour

Summary A team of researchers at UC Berkeley has successfully used ChatGPT to generate large datasets to study metal-organic frameworks (MOFs) useful in combating climate change. According to a recent study published in the Journal of the American Chemical Society, the use of ChatGPT enabled the rapid collection of data on MOFs, accelerating research. MOFs …

ChatGPT does years of student research in a fraction of an hour Read More »

BioCoder is a benchmark for AI-generated bioinformatics code

Summary BioCoder is a benchmark designed to support the development of AI models for bioinformatics. Researchers at Yale University and Google Deepmind introduce BioCoder, a benchmark for testing the ability of AI models to generate bioinformatics-specific code. As the capabilities of ChatGPT or specialized code models grow, the models will be used for increasingly complex …

BioCoder is a benchmark for AI-generated bioinformatics code Read More »

MVDream creates impressive 3D renderings from text

Summary MVDream uses Stable Diffusion and NeRFs to generate some of the best 3D renderings yet from text prompts. Researchers at ByteDance present MVDream (Multi-view Diffusion for 3D Generation), a diffusion model capable of generating high-quality 3D renderings from text prompts. Similar models already exist, but MVDream achieves comparatively high quality and avoids two core …

MVDream creates impressive 3D renderings from text Read More »

New computer vision method teaches AI to say ‘no’

Summary CLIPN teaches CLIP the “semantics of negations”. This should help computer vision to recognize classes that were not part of the training data. Computer vision models recognize objects in the images on which they were trained. In real-world applications, however, these models often encounter unknown objects outside their training data, leading to poor results. …

New computer vision method teaches AI to say ‘no’ Read More »

Meta’s foundational model for computer vision is now open source

Summary Meta releases DINOv2 as open source under the Apache 2.0 license. Meta also introduces FACET (FAirness in Computer Vision EvaluaTion), a benchmark for bias in computer vision models. Update, August 31, 2023: Meta releases its computer vision model DINOv2 under the Apache 2.0 license to give developers and researchers more flexibility for downstream tasks. …

Meta’s foundational model for computer vision is now open source Read More »

Meta’s latest AI model makes scientific PDFs machine-readable

Summary Metas Nougat is an AI text recognition model that can reliably convert scientific PDFs to text. Researchers at Meta have unveiled Nougat (Neural Optical Understanding for Academic Documents), an AI model that converts PDF images of scientific articles into structured, machine-readable text. Nougat aims to bridge the gap between human-readable PDF documents and machine-readable …

Meta’s latest AI model makes scientific PDFs machine-readable Read More »

AI gets much better at reading text in images

Summary BLIVA is a vision language model that excels at reading text in images, making it useful in real-world scenarios and applications in many industries. Researchers at UC San Diego have developed BLIVA, a vision language model designed to better handle images that contain text. Vision language models (VLMs) extend large language models (LLMs) by …

AI gets much better at reading text in images Read More »

Fine-tuned Meta Code Llama outperforms GPT-4 in key benchmark

Summary Shortly after the release of Meta’s Code Llama code model, the open-source community tries to fine-tune it – and immediately achieves a new top score, surpassing OpenAI’s GPT-4. Phind, an AI co-programming startup, has announced that it has achieved a new high score on the HumanEval benchmark, an important evaluation test for AI programming …

Fine-tuned Meta Code Llama outperforms GPT-4 in key benchmark Read More »

StableVideo lets you edit video with Stable Diffusion

Summary StableVideo brings some video editing capabilities to Stable Diffusion, such as allowing style transitions or changing backgrounds. Generating realistic and temporally coherent videos from text prompts remains a challenge for AI systems, with even state-of-the-art systems such as those from RunwayML still showing significant inconsistencies. While there is still much work to be done …

StableVideo lets you edit video with Stable Diffusion Read More »

Scroll to Top