PADS: A Language and System for Automatic Tool Generation from Ad Hoc Data Sources

82
Опубликовано 6 сентября 2016, 18:09
An ad hoc data source is any semistructured data source for which useful data analysis and transformation tools are not readily available. Such data must be queried, transformed and displayed by systems administrators, computational biologists, financial analysts and hosts of others on a regular basis. PADS is a domain-specific language extension for C and O'Caml that allows programmers to specify the formats of ad hoc data sources using a set of type declarations. The PADS compiler generates a collection of useful tools from these declarations including a parser, printer, data validator, formatter, error profiler, xml converter and query engine. Programmers may use PADS by writing a description by hand or by asking the system to infer a pads description directly from example data. The multi-phase inference algorithm operates by inferring a candidate format and then optimizing it relative to an information-theorectic scoring function. Inferred descriptions may be automatically pushed through PADS compiler to generate fully functional tools with no human intervention. The entire process takes just seconds to complete on 1K of example data, and has the potential to greatly improve the productivity of data analysis. This ongoing research is a collaboration between AT&T research and Princeton University. It involves Mary Fernandez, Kathleen Fisher, Yitzhak Mandelbaum, David Walker, Qian Xi, and Kenny Zhu. More information, software and research papers are available at www.padsproj.org.
Свежие видео
5 дней – 86 98421:14
Well, since you asked.....
8 дней – 114 24914:59
iPhone 16 Pro Max - My Initial Review!
9 дней – 1 1980:46
3 reasons to attend DevFest!
10 дней – 2 8550:27
Connecting to your smart life
автотехномузыкадетское