Open-sourcing 5,000hrs of self-driving dataset
(huggingface.co)
To unlock the potential for robotics AI, Yaak teamed up with the LeRobot team at 🤗 and is excited to announce Learning to Drive (L2D) to the robotics AI community. L2D is the world’s largest multimodal dataset aimed at building an open-sourced spatial intelligence for the automotive domain with first class support for 🤗’s LeRobot training pipeline and models.
To unlock the potential for robotics AI, Yaak teamed up with the LeRobot team at 🤗 and is excited to announce Learning to Drive (L2D) to the robotics AI community. L2D is the world’s largest multimodal dataset aimed at building an open-sourced spatial intelligence for the automotive domain with first class support for 🤗’s LeRobot training pipeline and models.
Hugging Face datasets and models for cybersecurity/sofwtare vulnerabilities
(huggingface.co)
CIRCL is the CERT (Computer Emergency Response Team/Computer Security Incident Response Team) for the private sector, communes and non-governmental entities in Luxembourg.
CIRCL is the CERT (Computer Emergency Response Team/Computer Security Incident Response Team) for the private sector, communes and non-governmental entities in Luxembourg.
Some critical issues with the SWE-bench dataset
(arxiv.org)
To facilitate a rigorous evaluation of LLMs in practical coding contexts, Carlos et al. introduced the SWE-bench dataset, which comprises 2,294 real-world GitHub issues and their corresponding pull requests, collected from 12 widely used Python repositories.
To facilitate a rigorous evaluation of LLMs in practical coding contexts, Carlos et al. introduced the SWE-bench dataset, which comprises 2,294 real-world GitHub issues and their corresponding pull requests, collected from 12 widely used Python repositories.
Harvard Is Releasing a Free AI Training Dataset
(wired.com)
Harvard University announced Thursday it’s releasing a high-quality dataset of nearly 1 million public-domain books that could be used by anyone to train large language models and other AI tools.
Harvard University announced Thursday it’s releasing a high-quality dataset of nearly 1 million public-domain books that could be used by anyone to train large language models and other AI tools.
Datasaurus dozen – Different datasets with the same descriptive statistics
(wikipedia.org)
The Datasaurus dozen comprises thirteen data sets that have nearly identical simple descriptive statistics to two decimal places, yet have very different distributions and appear very different when graphed.[1] It was inspired by the smaller Anscombe's quartet that was created in
The Datasaurus dozen comprises thirteen data sets that have nearly identical simple descriptive statistics to two decimal places, yet have very different distributions and appear very different when graphed.[1] It was inspired by the smaller Anscombe's quartet that was created in
Foursquare Open Source Places: A new foundational dataset
(simonwillison.net)
Foursquare Open Source Places: A new foundational dataset for the geospatial community (via) I did not expect this!
Foursquare Open Source Places: A new foundational dataset for the geospatial community (via) I did not expect this!
Seeing faces in things: A model and dataset for pareidolia
(mhamilton.net)
The human visual system is well-tuned to detect faces of all shapes and sizes. While this brings obvious survival advantages, such as a better chance of spotting unknown predators in the bush, it also leads to spurious face detections. "Face pareidolia" describes the perception of face-like structure among otherwise random stimuli: seeing faces in coffee stains or clouds in the sky. In this paper, we study face pareidolia from a computer vision perspective.
The human visual system is well-tuned to detect faces of all shapes and sizes. While this brings obvious survival advantages, such as a better chance of spotting unknown predators in the bush, it also leads to spurious face detections. "Face pareidolia" describes the perception of face-like structure among otherwise random stimuli: seeing faces in coffee stains or clouds in the sky. In this paper, we study face pareidolia from a computer vision perspective.