Accessible and Preservable File Formats
In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data size for storing, handling, and transmitting content.
From Wikipedia, the free encyclopedia
Consistency
Mo_2mTorr_100W_index76_area_histogram.csv
Mo_3mTorr_100W_index187_area_histogram.csv
Mo_5mTorr_100W_index3_area_histogram.csv
Mo_2mTorr_500W_20240323_Index24_AreaHistogram.xlsx
Mo_3mTorr_500W_20240323_Index63_AreaHistogram.xlsx
Mo_5mTorr_500W_20240324_Index5_AreaHistogram.xlsx
Mo_2mTorr_750W_20240325_Index72_AreaHistogram.xlsx
Mo_750W_3mTorr_index16_area_histogram.csv
Mo_750W_5mTorr_index52_area_histogram.csv
import os
import pandas
filelist = [f'{folder}/{i}' for i in os.listdir(folder) if 'AreaHistogram.xlsx' in i]
for fpath in filelist:
df = pandas.read_excel(fpath)
df.to_csv(fpath.replace('.xlsx', '.csv')
os.remove(fpath)
Mo_2mTorr_100W_index76_area_histogram.csv
Mo_3mTorr_100W_index187_area_histogram.csv
Mo_5mTorr_100W_index3_area_histogram.csv
Mo_2mTorr_500W_20240323_Index24_AreaHistogram.csv
Mo_3mTorr_500W_20240323_Index63_AreaHistogram.csv
Mo_5mTorr_500W_20240324_Index5_AreaHistogram.csv
Mo_2mTorr_750W_20240325_Index72_AreaHistogram.csv
Mo_750W_3mTorr_index16_area_histogram.csv
Mo_750W_5mTorr_index52_area_histogram.csv
- All start with the material,
Mo
(molybdenum) - All include a pressure value (e.g.,
2mTorr
) - All include a power value (e.g.,
500W
) - All include an index number (e.g.,
index63
) - All include a variant of “area histogram”
- Some, but not all, include the date
import os
filelist = [i for i in os.listdir(folder) if ('area_histogram.csv' in i) or ('AreaHistogram.csv' in i)]
for filename in filelist:
fileprops = filename.split('_')
pr = [i for i in fileprops if 'mTorr' in i][0]
po = [i for i in fileprops if 'W' in i][0]
index_num = [i.lower() for i in fileprops if 'index' in i.lower()][0]
newfilename = f'Mo_{pr}_{po}_{index_num}_area_histogram.csv'
os.rename(f'{folder}/filename', f'{folder}{newfilename}')
Mo_2mTorr_100W_index76_area_histogram.csv
Mo_3mTorr_100W_index187_area_histogram.csv
Mo_5mTorr_100W_index3_area_histogram.csv
Mo_2mTorr_500W_index24_area_histogram.csv
Mo_3mTorr_500W_index63_area_histogram.csv
Mo_5mTorr_500W_index5_area_histogram.csv
Mo_2mTorr_750W_index72_area_histogram.csv
Mo_3mTorr_750W_index16_area_histogram.csv
Mo_5mTorr_750W_index52_area_histogram.csv
Images
- Resolution
- Pixel dimensions
- Scaling (Colorbar, axis range, etc)
- Label types (Axes, titles, units, scaling, colormaps, etc.)
CSV/Tabular Data
- Column/Row names match exactly
- Column/Row names in the same order
- Consistent column-to-data type mapping
- Consistent missing data representation
User Friendly Organization
- Write Directions: Provide ample explanation and direction for your dataset.
- Folder Structures: Organize files into folders and subfolders in such a way to make data easy to find.
- Classification: Sort by importance, sensitivity, and more.
- External Reviewers: When in doubt, ask for a review from someone else – preferably not involved in your project.
- Wilkinson, Mark D., Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg et al. “FAIR Guiding Principles for scientific data management and stewardship.” Scientific data 3, no. 1 (2016): 1-9.
- File formats
- Lossless/Lossy reference