Close Menu
  • Magazine
    • Current Issue
    • Issue Archive
    • Subscribe
  • Topics
    • AI/Machine Learning
    • CT
    • Fluoroscopy/C-Arm
    • General Radiology
    • Interventional Radiology
    • MRI
    • Nuclear Medicine/Molecular Imaging
    • PACS/RIS/Informatics
    • Radiation Oncology
    • Radiology Management
    • Reimbursement & Coding
    • Research News
    • Ultrasound
    • Women’s Imaging
  • E-Newsletter
  • Education
    • ARMRIT Annual Meeting
    • MRI Books
    • Webinars
  • Careers
  • Events
  • Resources
    • Product Directories
    • Resource Listing
    • Reprints
    • Writers’ Guidelines

Join Our Email List

Facebook X (Twitter) LinkedIn
Trending
  • The Other Clouds: Dose Reduction and Workflow
  • CT Screening Can Reduce Lung Cancer Deaths; Now What?
  • Can Incidental CT Scans Predict Heart Disease?
  • Study: Pay-for-Performance Programs Improve Radiology Report Turnaround Times
  • Hospitals Running Medical Practices
  • Challenges Facing Radiology Directors
  • Threats and Opportunities
  • More Than Just Study Data
Tuesday, June 16
  • About
  • Contact
  • Advertise
  • Gift Shop
Facebook X (Twitter) LinkedIn
Radiology Today MagazineRadiology Today Magazine
Subscribe
  • Magazine
    • Current Issue
    • Issue Archive
    • Subscribe
  • Topics
    • AI/Machine Learning
    • CT
    • Fluoroscopy/C-Arm
    • General Radiology
    • Interventional Radiology
    • MRI
    • Nuclear Medicine/Molecular Imaging
    • PACS/RIS/Informatics
    • Radiation Oncology
    • Radiology Management
    • Reimbursement & Coding
    • Research News
    • Ultrasound
    • Women’s Imaging
  • E-Newsletter
  • Education
    • ARMRIT Annual Meeting
    • MRI Books
    • Webinars
  • Careers
  • Events
  • Resources
    • Product Directories
    • Resource Listing
    • Reprints
    • Writers’ Guidelines
Radiology Today MagazineRadiology Today Magazine
Home»Topics»Women's Imaging»Women’s Imaging: Collective Effort
Women's Imaging

Women’s Imaging: Collective Effort

Lessons Learned From Compiling a 1M DBT Dataset
Vol. 27 No. 2 P. 6Rebekah MoanApril 1, 20265 Mins Read
Facebook Twitter LinkedIn Email Threads Bluesky Copy Link

Little by little, things that once seemed unimaginable can become feasible—including preparing a dataset of more than one million digital breast tomosynthesis (DBT) studies with paired histology outcomes for an AI developer. “You have to start somewhere,” says Luke Bideaux, president and CEO of Vega Imaging Informatics. “Take those first steps, and it keeps getting better with each step along the way.”

Vega has a data program that allows it to acquire information from various sites in three regions of the United States. The exact number of sites and specific locations is confidential, but by joining the program, Vega can interact with the site’s protected health information as a business associate and create a deidentified dataset.

Participating health care facilities join the program and deploy the data processing server on their premises. Vega has to integrate with the facilities’ PACS so it can extract copies of the studies. Through the use of specialized tools and processes, Vega completes the end-to-end data processing on behalf of the imaging facilities. This allows these facilities to leverage Vega’s data specialists to lead the data projects, while sharing in the revenue that’s generated from them.

When Vega received a request from an AI developer to compile a large multimodal, deidentified DBT dataset, Vega contracted with sites within its data program to create it. The company performed a large-scale extraction across the sites, including DICOM images and associated clinical data from various information systems. The final dataset included deidentified images, demographic data elements, BI-RADS information, radiology reports, and other important attributes about the imaging exams. Crucially, biopsy outcome information for more than 22,000 patients was included, containing over 7,000 cancer cases.

“We’ve done more complicated data projects than this because they were about rare diseases or hard-to-find patient populations,” Bideaux says. “But this dataset is significant because of its size, as it is intended to represent a large distribution of demographics, breast densities, BIRADS scores, cancer types, and more, all based on real-world occurrences.”

The inclusion of such a wide swath of the patient population into new AI models means the AI solutions will work more effectively because broader patient populations are represented in the data being used for training and testing. “So many times, we see AI developers working with limited sources. That results in AI solutions working in a lab, but when they try to deploy those solutions in the real world, they fail,” he says. “With this contribution, I believe huge strides will be made in the effectiveness of the AI solutions being developed in the area of breast imaging.”

Big Data
The downside to such a broad representation of patients is that it requires immense amounts of data to be processed, validated, and distributed. Vega acknowledged that these challenges need to be addressed on a case-by-case basis with each participating facility. Every site Vega works with has a PACS that performs differently so a tailored approach to data processing was crucial. There are numerous techniques to maximize data migration performance, such as leveraging multithreading, move delays, and lossless compression so the source systems aren’t overwhelmed and maintain a consistent data transfer pace.

“We had to find the perfect harmony of configurations to keep the extractions humming, without noticeably impacting the operational performance of the PACS,” Bideaux says. “It takes a lot of fine tuning, and we did continue to optimize our configurations as the project went on.”

After extracting the data, Vega deidentified the protected health information at the metadata and pixel levels of every DBT image. It also managed nonimage objects, such as scanned documents, text-based reports, DICOM SR objects, presentation state objects, and more to ensure that the final version of each study was fully deidentified.

“There was an extensive amount of testing and validation completed upfront, and then once the data processing was finished, there was a full validation procedure that was done according to our process to validate that the deidentification was successful,” Bideaux says.

From there, Vega moved into the distribution phase, where it leveraged one of the major cloud service vendors and some of the transfer appliance solutions that the cloud provider offers to import the data at a large scale into secure cloud buckets. The overall file size of the dataset—nearly 1 petabyte—is what made the project particularly challenging for Vega. It had to use offline transfer appliances because uploading that much data over facilities’ WAN circuits would have thrown day-today operations at those sites into disarray.

Once the data was in the cloud, the AI developer was able to access the deidentified data and complete their acceptance review. The entire process—including contracting to conclusion—took about a year, according to Bideaux. Bideaux’s advice to others working in the imaging data field is, “Don’t be afraid to try something that has a chance of failure. That’s the only way you’re going to be able to achieve something of great significance.”

These sorts of AI projects are not possible without the cooperation of health care facilities. “I think the message is pretty clear from the health care providers that they need AI that actually works in the real world, and the only way that’s going to happen is if we get more health care providers contributing data for the advancement of medical imaging AI,” he says. “Health care organizations can’t have it both ways where they keep their data siloed off but, at the same time, reap the benefits from what other organizations have contributed. If everyone took that stance, we’d never get anywhere.”

— Rebekah Moan is a freelance journalist and ghostwriter based in Oakland, California. Her specialties are health care and profiles.

Department
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related Posts

Editor’s Note: Steps Forward

June 1, 2026

Radiation Safety: Safety Check

June 1, 2026

AI Insights: Balancing the Load

June 1, 2026
  • Facebook
  • X
  • LinkedIn

E-Newsletters

A trusted resource for industry professionals, Radiology Today reports the latest news and information that matters to radiologists, radiology administrators, and technologists.

1721 Valley Forge Road #486, Valley Forge, PA 19481
Phone: 800-278-4400 or 610-948-9500
Subscriptions: 833-790-6897

Facebook X (Twitter) LinkedIn

Subscribe

  • Home
  • Subscribe
  • About
  • Contact
  • Advertise
  • Privacy Policy
  • Terms & Conditions
© 2026 Radiology Today Magazine. All rights reserved.

Type above and press Enter to search. Press Esc to cancel.