Background
Smart grid overview
Smart grid architecture
Smart grid opportunities
-
Added value for utilities To help utilities better manage the grid and thus make the right decisions at the right time, smart grids use several optimization, control and monitoring systems that allow utilities to have more details about the grid in real time. From the utilities viewpoint, the benefits of smart grids are numerous and can be summarized as follows: (i) Improving the overall management of the production, transport and distribution system, (ii) Enhancing energy independence through the integration of renewable energies, (iii) Optimising the management and modelling of the available capacities of energy production according to the real and/or spontaneous demand, (iv) Maintaining network balance by managing under-voltage and over-voltage in real time, (v) Improving the security of electricity grids and reducing fraud, and (vi) Improving the quality of services and the customer service.
-
Added value for customers Smart grids offer many options for customers by using interactive and scalable models of power grid and energy demand. The customers are the users (consumers) of electricity, both residential and industrial. It is more and more frequent that the customer himself produces electrical energy using alternative energy production methods (solar energy, biomass, wind ...). The use of real-time communications with smart grid control and monitoring systems enables the measurement and optimization of the energy value of the customers on the grid. In addition, with the help of smart meters and other equipments of the smart grid, consumers can control their consumption in real time and avoid peak loads through price benefits. They can run their washing machines, dryers and dishwashers at off-peak times, when energy price is very low. As a result, customers not only save money but also require less generation capacity.
Smart grid systems
Communication systems
-
Wireless technologies Smart grid is composed of a large number of devices of various types and most of them can only communicate using wireless channels [6]. Wireless technologies are facing challenging issues in term of bandwidth, scalability and distance requirements, especially for transmitting large data. As a result, wireless connexion is used in multi-service layer to catch data. Technologies such as IEEE802.11, IEEE802.15.4, Bluetooth, Infrared, ZigBee and Radio frequency are applicable for smart grid applications. IEEE802.15.4 and ZigBee are the most successful, because the other air interfaces have some limits to face some nodes specific requirements such as energy consumption, bandwidth demand, throughput and latency. For example Radio frequency have a lack of protocols, a broadcast signal and there are no security system [7] and Bluetooth supports star topology only, operates in few nodes and requires low density. To optimize wireless network capacity, Multiple-Input Multiple-Output (MIMO), Orthogonal Frequency-Division Multiplexing (OFDM) technologies were proposed recently [8].
-
Wired technologies Wireline networks are more efficient than wireless ones, because they offer higher capacity, optimal communication delay and wide coverage, but on the other hand they require an extra investment for cable deployment. Wireline technologies are based on lot of technologies including fiber optics, IP-based Wavelength Division Multiplexing (WDM) network and SONET/SDH etc [9]. Optical technologies make wireline networks support between 155 Mbps and 160 Gbps [10]. Power Line Communication (PLC) is another kind of wired technology used by electrical companies to transmit data over existing power cables. This technology helps utilities to reduce costs, because it is over traditional electric power grids. PLC presents some challenges such as limited data rates due to attenuation, delay and replication of the phase. Recently, broadband PLC and narrow-band PLC helped utilities to overcome the limited data rate that has reached more than 200 Mbps [11].
Information systems
Supervisory control and data acquisition
Advanced metering infrastructure
Outage management system
Geographic information system
Customer information system
Demand response management system
Data management issues in smart grid
Standars and interoperability
Management of massive data volume
Security and data privacy
Big Data for smart grid
Big Data life cycle
Data sources
Data integration
-
Service Oriented Architecture (SOA) all enterprise systems combine a great number of software, each one has its own way to provide services to users. So the problem is how to manage and maintain all these systems. As a solution, SOA makes software communicate together using a single approach which makes data integration easier and more flexible [19]. In smart grids, SOA is used essentially on demand systems.
-
Enterprise Service Bus (ESB) is based on a great number of approaches to manage communication between different kinds of systems such as GIS, OMS, CIS etc. ESB brings a lot of benefits to reduce cost and time in term of management, monitoring and divergence of integration [20]. In smart grid, ESB technologies are strongly related to SOA, since it makes it more robust and flexible.
-
Common Information Models (CIM) are used for smart grid persistence and for the integrated data architecture and are critical, especially in the success or failure of data management. CIM refers to UML models for the electric power industry. It plays a very important role in energy management systems in term of data integration, time and cost. In general, CIM help to exchange data with technical grid infrastructure. The CIM become primordial in power systems in order to guarantee the data interoperability, in the case of implementing different applications. CIM operate in data transformation level, it is used with ESB for the normalization and standardization of the data between smart grid systems.
-
Messaging represents communication systems based on exchanging messages. These messages include data and other information from different applications managed by messaging server [21].
Data storage
-
Distributed File System (DFS) is a file system that allows multiple users on multiple machines to share files and storage resources. It is based on client/server as storage mechanism, and it permits every user to get a local copy of the stored data. There is a great number of solutions that use DFS for example: Googles GFS, Quantcast File System, HDFS, Ceph, Lustre GlusterFS, PVFS etc.
-
NoSQl databases is a new database approach to overcome the limitations of traditional relational SQL databases in the case of massive data. This kind of databases present three architectures: key-value solutions such as Dynamo and Voldemort, column-oriented solutions such as Cassandra and HBase and documents databases solutions such as MongoDB and CouchDB.
Data analytics
Data visualization
Data transmission
Criterias for choosing Big Data technologies
Technical perspective | Criterias |
---|---|
Availability and fault tolerance | Redundancy and resilience in networks, servers, physical storage, etc. |
Scalability and flexibility | Tools must be evolutionary and scalable |
Performance (latency) | Data processing time (single transaction, query request) |
Computational complexity | Computation tools extension (data mining, business intelligence) |
Distributed storage capacity and configurations | Storage systems parameters, such as storage nodes needed in terms of availability, periodic basis, etc. |
Data processing modes | Batch, real and hybrid processing |
Data security | Security compliance according to the platform requirements |
Big Data resources requirements
Big Data hardware requirements
Framework | Hadoop | Storm | Spark | Flink |
---|---|---|---|---|
Operating systems
| Red Hat Enterprise Linux (RHEL) v5.x or 6.x (64-bit) CentOS v5.x or 6.x (64-bit) SUSE Linux Enterprise Server 11, SP1 (64-bit) | CentOS Linux Windows | Windows XP/7/8 Mac OS X 10.7-9 Linux | Linux Mac OS X Windows (Cygwin) |
RAM
| 64 GB at least | 8 GB at least | 8 GB at least | 8 GB at least |
CPU
| 2 cores at least | 8 cores at least | 8 cores at least | 8 cores at least |
Network
| 10 Gigabit at least | 10 Gigabit at least | 10 Gigabit at least | 10 Gigabit at least |
Hard disk
| 12–24 disks per node for each 1TB at least | 6 disks per node for each 1TB at least | 4–8 disks per node for each 1TB at least | 12–24 disks per node for each 1TB at least |
Cloud computing frameworks
Big Data implementation in smart grid: the case of customer data analytics
Added value of customer data analytics
Big Data tools for customer data analytics
-
Batch processing tools Big data analytics offers a great number of methods to process data starting from batch processing. Hadoop [27] is a suitable choice for batch analytics for smart grid. Since smart grid systems are distributed geographically, distributed file systems are very useful for it. Hadoop has Hbase as a database system, Hadoop Distributed File System (HDFS) as a storage system, and MapReduce as a processing engine. Although, Hadoop can’t handle modern Information Technology (IT) systems in data velocity, scalability and machine learning algorithms [28].
-
Real time processing tools Real time processing is fast in term of execution than batch processing, because it handles data with high velocity requirements using stream processing or complex event processing systems. Real time processing can be implemented using several solutions such as S4, Splunk, Storm etc. Storm [29] is the most appropriate real time processing solution for smart grids, because it is open source, distributed and fault-tolerance and offers great number of opportunities as real time processing system, including message handling reliability, parallel computations and simple programming model etc. Storm can be used with Kafka for data integration and and Hbase for data storage.
-
Hybrid processing tools Hybrid processing can handle both batch and real time processing. Spark [30] is a framework used for batch processing, but it has also real time processing solution with Spark streaming. Spark handles large-scale data processing, and also it includes useful tools such as Spark SQL, Spark Streaming, machine learning library and GraphX. All that make Spark meet Big Data requirements in smart grid. Spark streaming uses real time complex event processing engine to handle velocity issues. When using Spark, data storage can be done using HDFS or even Hbase [31]. Apache Flink [32] is another framework able to process data in both batch and stream modes. Flink is based on enormous APIs like transformations functions (map/reduce, group etc.), that make it scalable, easy to deploy, fault tolerance and fast in execution. Flink is efficient in machine learning, because it adopts its own machine learning library called FlinkML. Flink already has libraries to access HDFS, so it can be easily used with HDFS to store data.