Although unique expected energy models can be generated for a given photovoltaic (PV) site, a standardized model is also needed to facilitate performance comparisons across fleets. Current standardized expected energy models for PV work well with sparse data, but they have demonstrated significant over-estimations, which impacts accurate diagnoses of field operations and maintenance issues. This research addresses this issue by using machine learning to develop a data-driven expected energy model that can more accurately generate inferences for energy production of PV systems. Irradiance and system capacity information was used from 172 sites across the United States to train a series of models using Lasso linear regression. The trained models generally perform better than the commonly used expected energy model from international standard (IEC 61724-1), with the two highest performing models ranging in model complexity from a third-order polynomial with 10 parameters (R2adj= 0.994) to a simpler, second-order polynomial with 4 parameters (R2adj= 0.993), the latter of which is subject to further evaluation. Subsequently, the trained models provide a more robust basis for identifying potential energy anomalies for operations and maintenance activities as well as informing planning-related financial assessments. We conclude with directions for future research, such as using splines to improve model continuity and better capture systems with low (≤1000 kW DC) capacity.
PV system reliability analyses often depend on production data to evaluate the system state. However, using this information alone leads to incomplete assessments, since contextual information about potential sources of data quality issues is lacking (e.g., missing data from offline communications vs. offline production). This paper introduces a new Python-based software capability (called pvOps) for fusing production data with readily available text-based maintenance information to improve reliability assessments. In addition to details about the package development process, the general capabilities to gain actionable insights using field data are presented through a case study. These findings highlight the significant potential for continued advancements in operational assessments.
PV system reliability analyses often depend on production data to evaluate the system state. However, using this information alone leads to incomplete assessments, since contextual information about potential sources of data quality issues is lacking (e.g., missing data from offline communications vs. offline production). This paper introduces a new Python-based software capability (called pvOps) for fusing production data with readily available text-based maintenance information to improve reliability assessments. In addition to details about the package development process, the general capabilities to gain actionable insights using field data are presented through a case study. These findings highlight the significant potential for continued advancements in operational assessments.
Sampling is an important step in the machine learning process because it prioritizes samples that help the model best summarize the important concepts required for the task at hand. The process of determining the best sampling method has been rarely studied in the context of graph neural networks. In this paper, we evaluate multiple sampling methods (i.e., ascending and descending) that sample based off different definitions of centrality (i.e., Voterank, Pagerank, degree) to observe its relation with network topology. We find that no sampling method is superior across all network topologies. Additionally, we find situations where ascending sampling provides better classification scores, showing the strength of weak ties. Two strategies are then created to predict the best sampling method, one that observes the homogeneous connectivity of the nodes, and one that observes the network topology. In both methods, we are able to evaluate the best sampling direction consistently.
Principal component analysis (PCA) reduces dimensionality by generating uncorrelated variables and improves the interpretability of the sample space. This analysis focused on assessing the value of PCA for improving the classification accuracy of failures within current-voltage (IV) traces. Our results show that combining PCA with random forests improves classification by only ~1% (bringing the accuracy to >99%), compared to a baseline of only random forests (without PCA) of >98%. The inclusion of PCA, however, does provide an opportunity to study an interesting representation of all of the features on a single, two-dimensional feature space. A visualization of the first two principal components (similar to IV profile but rotated) captures how the inclusion of a current differential feature causes a notable separation between failure modes due to their effect on the slope. This work continues the discussion of generating different ways of extracting information from the IV curve, which can help with failure classification - especially for failures that only exhibit marginal profile changes in IV curves.
Accurate diagnosis of failures is critical for meeting photovoltaic (PV) performance objectives and avoiding safety concerns. This analysis focuses on the classification of field-collected string-level current-voltage (IV) curves representing baseline, partial soiling, and cracked failure modes. Specifically, multiple neural network-based architectures (including convolutional and long short-term memory) are evaluated using domain-informed parameters across different portions of the IV curve and a range of irradiance thresholds. The analysis identified two models that were able to accurately classify the relatively small dataset (400 samples) at a high accuracy (99%+). Findings also indicate optimal irradiance thresholds and opportunities for improvements in classification activities by focusing on portions of the IV curve. Such advancements are critical for expanding accurate classification of PV faults, especially for those with low power loss (e.g., cracked cells) or visibly similar IV curve profiles.