aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-07-16net: mvpp2: debugfs: add classifier hit countersMaxime Chevallier
The classification operations that are used for RSS make use of several lookup tables. Having hit counters for these tables is really helpful to determine what flows were matched by ingress traffic, and see the path of packets among all the classifier tables. This commit adds hit counters for the 3 tables used at the moment : - The decoding table (also called lookup_id table), that links flows identified by the Header Parser to the flow table. There's one entry per flow, located at : .../mvpp2/<controller>/flows/XX/dec_hits Note that there are 21 flows in the decoding table, whereas there are 52 flows in the Header Parser. That's because there are several kind of traffic that will match a given flow. Reading the hit counter from one sub-flow will clear all hit counter that have the same flow_id. This also applies to the flow_hits. - The flow table, that contains all the different lookups to be performed by the classifier for each packet of a given flow. The match is done on the first entry of the flow sequence. - The C2 engine entries, that are used to assign the default rx queue, and enable or disable RSS for a given port. There's one entry per flow, located at: .../mvpp2/<controller>/flows/XX/flow_hits There is one C2 entry per port, so the c2 hit counter is located at : .../mvpp2/<controller>/ethX/c2_hits All hit counter values are 16-bits clear-on-read values. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-16net: mvpp2: debugfs: add entries for classifier flowsMaxime Chevallier
The classifier configuration for RSS is quite complex, with several lookup tables being used. This commit adds useful info in debugfs to see how the different tables are configured : Added 2 new entries in the per-port directory : - .../eth0/default_rxq : The default rx queue on that port - .../eth0/rss_enable : Indicates if RSS is enabled in the C2 entry Added the 'flows' directory : It contains one entry per sub-flow. a 'sub-flow' is a unique path from Header Parser to the flow table. Multiple sub-flows can point to the same 'flow' (each flow has an id from 8 to 29, which is its index in the Lookup Id table) : - .../flows/00/... /01/... ... /51/id : The flow id. There are 21 unique flows. There's one flow per combination of the following parameters : - L4 protocol (TCP, UDP, none) - L3 protocol (IPv4, IPv6) - L3 parameters (Fragmented or not) - L2 parameters (Vlan tag presence or not) .../type : The flow type. This is an even higher level flow, that we manipulate with ethtool. It can be : "udp4" "tcp4" "udp6" "tcp6" "ipv4" "ipv6" "other". .../eth0/... .../eth1/engine : The hash generation engine used for this flow on the given port .../hash_opts : The hash generation options indicating on what data we base the hash (vlan tag, src IP, src port, etc.) Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-16net: mvpp2: debugfs: add hit counter stats for Header Parser entriesMaxime Chevallier
One helpful feature to help debug the Header Parser TCAM filter in PPv2 is to be able to see if the entries did match something when a packet comes in. This can be done by using the built-in hit counter for TCAM entries. This commit implements reading the counter, and exposing its value on debugfs for each filter entry. The counter is a 16-bits clear-on-read value, located at: .../mvpp2/<controller>/parser/XXX/hits Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-16net: mvpp2: add a debugfs interface for the Header ParserMaxime Chevallier
Marvell PPv2 Packer Header Parser has a TCAM based filter, that is not trivial to configure and debug. Being able to dump TCAM entries from userspace can be really helpful to help development of new features and debug existing ones. This commit adds a basic debugfs interface for the PPv2 driver, focusing on TCAM related features. <mnt>/mvpp2/ --- f2000000.ethernet \- f4000000.ethernet --- parser --- 000 ... | \- 001 | \- ... | \- 255 --- ai | \- header_data | \- lookup_id | \- sram | \- valid \- eth1 ... \- eth2 --- mac_filter \- parser_entries \- vid_filter There's one directory per PPv2 instance, named after pdev->name to make sure names are uniques. In each of these directories, there's : - one directory per interface on the controller, each containing : - "mac_filter", which lists all filtered addresses for this port (based on TCAM, not on the kernel's uc / mc lists) - "parser_entries", which lists the indices of all valid TCAM entries that have this port in their port map - "vid_filter", which lists the vids allowed on this port, based on TCAM - one "parser" directory (the parser is common to all ports), containing : - one directory per TCAM entry (256 of them, from 0 to 255), each containing : - "ai" : Contains the 1 byte Additional Info field from TCAM, and - "header_data" : Contains the 8 bytes Header Data extracted from the packet - "lookup_id" : Contains the 4 bits LU_ID - "sram" : contains the raw SRAM data, which is the result of the TCAM lookup. This readonly at the moment. - "valid" : Indicates if the entry is valid of not. All entries are read-only, and everything is output in hex form. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-16net: mvpp2: switch to SPDX identifiersAntoine Tenart
Use the appropriate SPDX license identifiers and drop the license text. This patch is only cosmetic. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-07-15 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Various different arm32 JIT improvements in order to optimize code emission and make the JIT code itself more robust, from Russell. 2) Support simultaneous driver and offloaded XDP in order to allow for advanced use-cases where some work is offloaded to the NIC and some to the host. Also add ability for bpftool to load programs and maps beyond just the cgroup case, from Jakub. 3) Add BPF JIT support in nfp for multiplication as well as division. For the latter in particular, it uses the reciprocal algorithm to emulate it, from Jiong. 4) Add BTF pretty print functionality to bpftool in plain and JSON output format, from Okash. 5) Add build and installation to the BPF helper man page into bpftool, from Quentin. 6) Add a TCP BPF callback for listening sockets which is triggered right after the socket transitions to TCP_LISTEN state, from Andrey. 7) Add a new cgroup tree command to bpftool which iterates over the whole cgroup tree and prints all attached programs, from Roman. 8) Improve xdp_redirect_cpu sample to support parsing of double VLAN tagged packets, from Jesper. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-15Merge branch 'bpf-tcp-listen-cb'Daniel Borkmann
Andrey Ignatov says: ==================== This patchset adds TCP-BPF callback for listening sockets. Patch 0001 provides more details and is the main patch in the set. Patch 0006 adds selftest for the new callback. Other patches are bug fixes and improvements in TCP-BPF selftest to make it easier to extend in 0006. ==================== Acked-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-15selftests/bpf: Test case for BPF_SOCK_OPS_TCP_LISTEN_CBAndrey Ignatov
Cover new TCP-BPF callback in test_tcpbpf: when listen() is called on socket, set BPF_SOCK_OPS_STATE_CB_FLAG so that BPF_SOCK_OPS_STATE_CB callback can be called on future state transition, and when such a transition happens (TCP_LISTEN -> TCP_CLOSE), track it in the map and verify it in user space later. Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-15selftests/bpf: Better verification in test_tcpbpfAndrey Ignatov
Reduce amount of copy/paste for debug info when result is verified in the test and keep that info together with values being checked so that they won't get out of sync. It also improves debug experience: instead of checking manually what doesn't match in debug output for all fields, only unexpected field is printed. Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-15selftests/bpf: Switch test_tcpbpf_user to cgroup_helpersAndrey Ignatov
Switch to cgroup_helpers to simplify the code and fix cgroup cleanup: before cgroup was not cleaned up after the test. It also removes SYSTEM macro, that only printed error, but didn't terminate the test. Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-15selftests/bpf: Fix const'ness in cgroup_helpersAndrey Ignatov
Lack of const in cgroup helpers signatures forces to write ugly client code. Fix it. Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-15bpf: Sync bpf.h to tools/Andrey Ignatov
Sync BPF_SOCK_OPS_TCP_LISTEN_CB related UAPI changes to tools/. Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-15bpf: Add BPF_SOCK_OPS_TCP_LISTEN_CBAndrey Ignatov
Add new TCP-BPF callback that is called on listen(2) right after socket transition to TCP_LISTEN state. It fills the gap for listening sockets in TCP-BPF. For example BPF program can set BPF_SOCK_OPS_STATE_CB_FLAG when socket becomes listening and track later transition from TCP_LISTEN to TCP_CLOSE with BPF_SOCK_OPS_STATE_CB callback. Before there was no way to do it with TCP-BPF and other options were much harder to work with. E.g. socket state tracking can be done with tracepoints (either raw or regular) but they can't be attached to cgroup and their lifetime has to be managed separately. Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-14Merge branch 'mlxsw-VRRP'David S. Miller
Ido Schimmel says: ==================== mlxsw: Add VRRP support When a router that is acting as the default gateway of a host stops functioning, the host will encounter packet loss until the router starts functioning again. To increase the reliability of the default gateway without performing reconfiguration on the host, a host can use a Virtual Router Redundancy Protocol (VRRP) Router. This virtual router is composed from several routers where only one is actually forwarding packets from the host (the master router) while the other routers act as backup routers. The election of the master router is determined by the VRRP protocol [1]. Packets addressed to the virtual router are always sent to the virtual router MAC address (IPv4: 00-00-5E-00-01-XX, IPv6: 00-00-5E-00-02-XX). Such packets can only be accepted by the master router and must be discarded by the backup routers. In Linux, VRRP is usually implemented by configuring a macvlan with the virtual router MAC on top of the router interface that is connected to the host / LAN. The macvlan on the master router is assigned the virtual IP (VIP) that the host uses as its gateway. In order to support VRRP in mlxsw, we first need to enable macvlan upper devices on top of mlxsw netdevs and their uppers. This is done by the first patch, which also takes care of sanitizing macvlan configurations that are not currently supported by the driver. The second patch directs packets with destination MAC addresses as the macvlans to the router so that they will undergo an L3 lookup. This is consistent with the kernel's behavior where the macvlan's Rx handler will re-inject such packets to the Rx path so that they will be picked up by the IPvX protocol handlers and undergo an L3 lookup. Note that the driver prevents the macvlans from being enslaved to other devices, to ensure the packets will be picked up by the protocol handler and not by another Rx handler. The third patch adds packet traps for VRRP control packets for both IPv4 and IPv6. Finally, the last patch optimizes the reception of VRRP MACs by potentially skipping one L2 lookup for them. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-14mlxsw: spectrum_router: Optimize processing of VRRP MACsIdo Schimmel
Hosts using a VRRP router send their packets with a destination MAC of the VRRP router which is of the following form [1]: IPv4 - 00-00-5E-00-01-{VRID} IPv6 - 00-00-5E-00-02-{VRID} Where VRID is the ID of the virtual router. Such packets are directed to the router block in the ASIC by an FDB entry that was added in the previous patch. However, in certain cases it is possible to skip this FDB lookup and send such packets directly to the router. This is accomplished by adding these special MAC addresses to the RIF cache. If the cache is hit, the packet will skip the L2 lookup and ingress the router with the RIF specified in the cache entry. 1. https://tools.ietf.org/html/rfc5798#section-7.3 Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-14mlxsw: spectrum: Add VRRP trapsIdo Schimmel
Virtual Router Redundancy Protocol packets are used to communicate the state of the Master router associated with the virtual router ID (VRID). These are link-local multicast packets sent with IP protocol 112 that are trapped in the router block in the ASIC. Add a trap for these packets and mark the trapped packets to prevent them from potentially being re-flooded by the bridge driver. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-14mlxsw: spectrum_router: Direct macvlans' MACs to routerIdo Schimmel
An IP packet received on a netdev with a macvlan upper whose MAC matches the packet's destination MAC will be re-injected to the Rx path as if it was received by the macvlan, and perform an L3 lookup. Reflect this functionality to the ASIC by programming FDB entries that will direct MACs of macvlan uppers to the router. In a similar fashion to router interfaces (RIFs) that are programmed upon the addition of the first IP address on an interface and destroyed upon the removal of the last IP address, the FDB entries for the macvlan are added and destroyed based on the addition of the first and removal of the last IP address on the macvlan. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-14mlxsw: spectrum: Enable macvlan upper devicesIdo Schimmel
In order to allow more unicast MAC addresses (e.g., VRRP virtual MAC) to be directed to the router we need to enable macvlan uppers on top of mlxsw netdevs. Allow macvlan upper devices on top of mlxsw netdevs and sanitize configurations that can't work. For example, a macvlan can't be enslaved to a bridge as without ACLs the device doesn't take the destination MAC into account when classifying a packet to a bridge instance (i.e., a FID). Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-14tcp: remove redundant rcv_nxt updateYafang Shao
tcp_rcv_nxt_update() is already executed in tcp_data_queue(). This line is redundant. See bellow, tcp_queue_rcv tcp_rcv_nxt_update(tcp_sk(sk), TCP_SKB_CB(skb)->end_seq); tcp_rcv_nxt_update(tp, TCP_SKB_CB(skb)->end_seq); <<<< redundant Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-14bpf: btf: print map dump and lookup with btf infoOkash Khawaja
This patch augments the output of bpftool's map dump and map lookup commands to print data along side btf info, if the correspondin btf info is available. The outputs for each of map dump and map lookup commands are augmented in two ways: 1. when neither of -j and -p are supplied, btf-ful map data is printed whose aim is human readability. This means no commitments for json- or backward- compatibility. 2. when either -j or -p are supplied, a new json object named "formatted" is added for each key-value pair. This object contains the same data as the key-value pair, but with btf info. "formatted" object promises json- and backward- compatibility. Below is a sample output. $ bpftool map dump -p id 8 [{ "key": ["0x0f","0x00","0x00","0x00" ], "value": ["0x03", "0x00", "0x00", "0x00", ... ], "formatted": { "key": 15, "value": { "int_field": 3, ... } } } ] This patch calls btf_dumper introduced in previous patch to accomplish the above. Indeed, btf-ful info is only displayed if btf data for the given map is available. Otherwise existing output is displayed as-is. Signed-off-by: Okash Khawaja <osk@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-14bpf: btf: add btf print functionalityOkash Khawaja
This consumes functionality exported in the previous patch. It does the main job of printing with BTF data. This is used in the following patch to provide a more readable output of a map's dump. It relies on json_writer to do json printing. Below is sample output where map keys are ints and values are of type struct A: typedef int int_type; enum E { E0, E1, }; struct B { int x; int y; }; struct A { int m; unsigned long long n; char o; int p[8]; int q[4][8]; enum E r; void *s; struct B t; const int u; int_type v; unsigned int w1: 3; unsigned int w2: 3; }; $ sudo bpftool map dump id 14 [{ "key": 0, "value": { "m": 1, "n": 2, "o": "c", "p": [15,16,17,18,15,16,17,18 ], "q": [[25,26,27,28,25,26,27,28 ],[35,36,37,38,35,36,37,38 ],[45,46,47,48,45,46,47,48 ],[55,56,57,58,55,56,57,58 ] ], "r": 1, "s": 0x7ffd80531cf8, "t": { "x": 5, "y": 10 }, "u": 100, "v": 20, "w1": 0x7, "w2": 0x3 } } ] This patch uses json's {} and [] to imply struct/union and array. More explicit information can be added later. For example, a command line option can be introduced to print whether a key or value is struct or union, name of a struct etc. This will however come at the expense of duplicating info when, for example, printing an array of structs. enums are printed as ints without their names. Signed-off-by: Okash Khawaja <osk@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-14bpf: btf: export btf types and name by offset from libOkash Khawaja
This patch introduces btf__resolve_type() function and exports two existing functions from libbpf. btf__resolve_type follows modifier types like const and typedef until it hits a type which actually takes up memory, and then returns it. This function follows similar pattern to btf__resolve_size but instead of computing size, it just returns the type. These functions will be used in the followig patch which parses information inside array of `struct btf_type *`. btf_name_by_offset is used for printing variable names. Signed-off-by: Okash Khawaja <osk@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-14tools: include reallocarray feature test in FEATURE_TESTS_BASICJakub Kicinski
perf propagates its feature check results to libbpf. This means features for which perf probes must be a superset of libbpf's required features. perf depends on FEATURE_TESTS_BASIC for its list of features. commit 531b014e7a2f ("tools: bpf: make use of reallocarray") added reallocarray use to libbpf, make perf also perform the reallocarray feature check. Fixes: 531b014e7a2f ("tools: bpf: make use of reallocarray") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-13net: mvpp2: mvpp2_cls_flow_get() can be statickbuild test robot
Fixes: f9358e12a0af ("net: mvpp2: split ingress traffic into multiple flows") Signed-off-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-13of: mdio: Support fixed links in of_phy_get_and_connect()Linus Walleij
By a simple extension of of_phy_get_and_connect() drivers that have a fixed link on e.g. RGMII can support also fixed links, so in addition to: ethernet-port { phy-mode = "rgmii"; phy-handle = <&foo>; }; This setup with a fixed-link node and no phy-handle will now also work just fine: ethernet-port { phy-mode = "rgmii"; fixed-link { speed = <1000>; full-duplex; pause; }; }; This is very helpful for connecting random ethernet ports to e.g. DSA switches that typically reside on fixed links. The phy-mode is still there as the fixes link in this case is still an RGMII link. Tested on the Cortina Gemini driver with the Vitesse DSA router chip on a fixed 1Gbit link. Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-13net: sched: refactor flower walk to iterate over idrVlad Buslov
Extend struct tcf_walker with additional 'cookie' field. It is intended to be used by classifier walk implementations to continue iteration directly from particular filter, instead of iterating 'skip' number of times. Change flower walk implementation to save filter handle in 'cookie'. Each time flower walk is called, it looks up filter with saved handle directly with idr, instead of iterating over filter linked list 'skip' number of times. This change improves complexity of dumping flower classifier from quadratic to linearithmic. (assuming idr lookup has logarithmic complexity) Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reported-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-14samples/bpf: xdp_redirect_cpu handle parsing of double VLAN tagged packetsJesper Dangaard Brouer
People noticed that the code match on IEEE 802.1ad (ETH_P_8021AD) ethertype, and this implies Q-in-Q or double tagged VLANs. Thus, we better parse the next VLAN header too. It is even marked as a TODO. This is relevant for real world use-cases, as XDP cpumap redirect can be used when the NIC RSS hashing is broken. E.g. the ixgbe driver HW cannot handle double tagged VLAN packets, and places everything into a single RX queue. Using cpumap redirect, users can redistribute traffic across CPUs to solve this, which is faster than the network stacks RPS solution. It is left as an exerise how to distribute the packets across CPUs. It would be convenient to use the RX hash, but that is not _yet_ exposed to XDP programs. For now, users can code their own hash, as I've demonstrated in the Suricata code (where Q-in-Q is handled correctly). Reported-by: Florian Maury <florian.maury-cv@x-cli.eu> Reported-by: Marek Majkowski <marek@cloudflare.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-13net: ipmr: add support for passing full packet on wrong vifNikolay Aleksandrov
This patch adds support for IGMPMSG_WRVIFWHOLE which is used to pass full packet and real vif id when the incoming interface is wrong. While the RP and FHR are setting up state we need to be sending the registers encapsulated with all the data inside otherwise we lose it. The RP then decapsulates it and forwards it to the interested parties. Currently with WRONGVIF we can only be sending empty register packets and will lose that data. This behaviour can be enabled by using MRT_PIM with val == IGMPMSG_WRVIFWHOLE. This doesn't prevent IGMPMSG_WRONGVIF from happening, it happens in addition to it, also it is controlled by the same throttling parameters as WRONGVIF (i.e. 1 packet per 3 seconds currently). Both messages are generated to keep backwards compatibily and avoid breaking someone who was enabling MRT_PIM with val == 4, since any positive val is accepted and treated the same. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-13Merge branch 'bpf-xdp-driver-and-hw'Daniel Borkmann
Jakub Kicinski says: ==================== This set is adding support for loading driver and offload XDP at the same time. This enables advanced use cases where some of the work is offloaded to the NIC and some is done by the host. Separate netlink attributes are added for each mode of operation. Driver callbacks for offload are cleaned up a little, including removal of .prog_attached flag. ==================== Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-13nfp: add support for simultaneous driver and hw XDPJakub Kicinski
Split handling of offloaded and driver programs completely. Since offloaded programs always come with XDP_FLAGS_HW_MODE set in reality there could be no sharing, anyway, programs would only be installed in driver or in hardware. Splitting the handling allows us to install programs in HW and in driver at the same time. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-13selftests/bpf: add test for multiple programsJakub Kicinski
Add tests for having an XDP program attached in the driver and another one attached in HW simultaneously. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-13netdevsim: add support for simultaneous driver and hw XDPJakub Kicinski
Allow netdevsim to accept driver and offload attachment of XDP BPF programs at the same time. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-13xdp: support simultaneous driver and hw XDP attachmentJakub Kicinski
Split the query of HW-attached program from the software one. Introduce new .ndo_bpf command to query HW-attached program. This will allow drivers to install different programs in HW and SW at the same time. Netlink can now also carry multiple programs on dump (in which case mode will be set to XDP_ATTACHED_MULTI and user has to check per-attachment point attributes, IFLA_XDP_PROG_ID will not be present). We reuse IFLA_XDP_PROG_ID skb space for second mode, so rtnl_xdp_size() doesn't need to be updated. Note that the installation side is still not there, since all drivers currently reject installing more than one program at the time. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-13xdp: factor out common program/flags handling from driversJakub Kicinski
Basic operations drivers perform during xdp setup and query can be moved to helpers in the core. Encapsulate program and flags into a structure and add helpers. Note that the structure is intended as the "main" program information source in the driver. Most drivers will additionally place the program pointer in their fast path or ring structures. The helpers don't have a huge impact now, but they will decrease the code duplication when programs can be installed in HW and driver at the same time. Encapsulating the basic operations in helpers will hopefully also reduce the number of changes to drivers which adopt them. Helpers could really be static inline, but they depend on definition of struct netdev_bpf which means they'd have to be placed in netdevice.h, an already 4500 line header. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-13xdp: don't make drivers report attachment modeJakub Kicinski
prog_attached of struct netdev_bpf should have been superseded by simply setting prog_id long time ago, but we kept it around to allow offloading drivers to communicate attachment mode (drv vs hw). Subsequently drivers were also allowed to report back attachment flags (prog_flags), and since nowadays only programs attached will XDP_FLAGS_HW_MODE can get offloaded, we can tell the attachment mode from the flags driver reports. Remove prog_attached member. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-13xdp: add per mode attributes for attached programsJakub Kicinski
In preparation for support of simultaneous driver and hardware XDP support add per-mode attributes. The catch-all IFLA_XDP_PROG_ID will still be reported, but user space can now also access the program ID in a new IFLA_XDP_<mode>_PROG_ID attribute. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-13Merge branch 'bpf-arm-jit-improvements'Daniel Borkmann
Russell King says: ==================== Four further jit compiler improves for 32-bit ARM. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-13ARM: net: bpf: improve 64-bit ALU implementationRussell King
Improbe the 64-bit ALU implementation from: movw r8, #65532 movt r8, #65535 movw r9, #65535 movt r9, #65535 ldr r7, [fp, #-44] adds r7, r7, r8 str r7, [fp, #-44] ldr r7, [fp, #-40] adc r7, r7, r9 str r7, [fp, #-40] to: movw r8, #65532 movt r8, #65535 movw r9, #65535 movt r9, #65535 ldrd r6, [fp, #-44] adds r6, r6, r8 adc r7, r7, r9 strd r6, [fp, #-44] Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-13ARM: net: bpf: improve 64-bit store implementationRussell King
Improve the 64-bit store implementation from: ldr r6, [fp, #-8] str r8, [r6] ldr r6, [fp, #-8] mov r7, #4 add r7, r6, r7 str r9, [r7] to: ldr r6, [fp, #-8] str r8, [r6] str r9, [r6, #4] We leave the store as two separate STR instructions rather than using STRD as the store may not be aligned, and STR can handle misalignment. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-13ARM: net: bpf: improve 64-bit sign-extended immediate loadRussell King
Improve the 64-bit sign-extended immediate from: mov r6, #1 str r6, [fp, #-52] ; 0xffffffcc mov r6, #0 str r6, [fp, #-48] ; 0xffffffd0 to: mov r6, #1 mov r7, #0 strd r6, [fp, #-52] ; 0xffffffcc Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-13ARM: net: bpf: improve 64-bit load immediate implementationRussell King
Rather than writing each 32-bit half of the 64-bit immediate value separately when the register is on the stack: movw r6, #45056 ; 0xb000 movt r6, #60979 ; 0xee33 str r6, [fp, #-44] ; 0xffffffd4 mov r6, #0 str r6, [fp, #-40] ; 0xffffffd8 arrange to use the double-word store when available instead: movw r6, #45056 ; 0xb000 movt r6, #60979 ; 0xee33 mov r7, #0 strd r6, [fp, #-44] ; 0xffffffd4 Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-12net: gemini: Indicate that we can handle jumboframesLinus Walleij
The hardware supposedly handles frames up to 10236 bytes and implements .ndo_change_mtu() so accept 10236 minus the ethernet header for a VLAN tagged frame on the netdevices. Use ETH_MIN_MTU as minimum MTU. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12net: gemini: Move main init to portLinus Walleij
The initialization sequence for the ethernet, setting up interrupt routing and such things, need to be done after both the ports are clocked and reset. Before this the config will not "take". Move the initialization to the port probe function and keep track of init status in the state. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12net: gemini: Allow multiple ports to instantiateLinus Walleij
The code was not tested with two ports actually in use at the same time. (I blame this on lack of actual hardware using that feature.) Now after locating a system using both ports, add necessary fix to make both ports come up. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12net: gemini: Improve connection printsLinus Walleij
Switch over to using a module parameter and debug prints that can be controlled by this or ethtool like everyone else. Depromote all other prints to debug messages. The phy_print_status() was already in place, albeit never really used because the debuglevel hiding it had to be set up using ethtool. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12net: gemini: Look up L3 maxlen from tableLinus Walleij
The code to calculate the hardware register enumerator for the maximum L3 length isn't entirely simple to read. Use the existing defines and rewrite the function into a table look-up. Acked-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12Merge branch 'devlink-Add-support-for-region-access'David S. Miller
Alex Vesker says: ==================== devlink: Add support for region access This is a proposal which will allow access to driver defined address regions using devlink. Each device can create its supported address regions and register them. A device which exposes a region will allow access to it using devlink. The suggested implementation will allow exposing regions to the user, reading and dumping snapshots taken from different regions. A snapshot represents a memory image of a region taken by the driver. If a device collects a snapshot of an address region it can be later exposed using devlink region read or dump commands. This functionality allows for future analyses on the snapshots to be done. The major benefit of this support is not only to provide access to internal address regions which were inaccessible to the user but also to provide an additional way to debug complex error states using the region snapshots. Implemented commands: $ devlink region help $ devlink region show [ DEV/REGION ] $ devlink region del DEV/REGION snapshot SNAPSHOT_ID $ devlink region dump DEV/REGION [ snapshot SNAPSHOT_ID ] $ devlink region read DEV/REGION [ snapshot SNAPSHOT_ID ] address ADDRESS length length Show all of the exposed regions with region sizes: $ devlink region show pci/0000:00:05.0/cr-space: size 1048576 snapshot [1 2] pci/0000:00:05.0/fw-health: size 64 snapshot [1 2] Delete a snapshot using: $ devlink region del pci/0000:00:05.0/cr-space snapshot 1 Dump a snapshot: $ devlink region dump pci/0000:00:05.0/fw-health snapshot 1 0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30 0000000000000010 0000 0000 ffff ff04 0029 8c00 0028 8cc8 0000000000000020 0016 0bb8 0016 1720 0000 0000 c00f 3ffc 0000000000000030 bada cce5 bada cce5 bada cce5 bada cce5 Read a specific part of a snapshot: $ devlink region read pci/0000:00:05.0/fw-health snapshot 1 address 0 length 16 0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30 For more information you can check devlink-region.8 man page Future: There is a plan to extend the support to include a write command as well as performing read and dump live region v1->v2: -Add a parameter to enable devlink region snapshot -Allocate snapshot memory using kvmalloc -Introduce destructor function devlink_snapshot_data_dest_t to avoid double allocation v2->v3: -Fix incorrect comment in devlink.h for DEVLINK_ATTR_REGION_SIZE from u32 to u64 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12net/mlx4_core: Use devlink region_snapshot parameterAlex Vesker
This parameter enables capturing region snapshot of the crspace during critical errors. The default value of this parameter is disabled, it can be enabled using devlink param commands. It is possible to configure during runtime and also driver init. Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12devlink: Add generic parameters region_snapshotAlex Vesker
region_snapshot - When set enables capturing region snapshots Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12net/mlx4_core: Add Crdump FW snapshot supportAlex Vesker
Crdump allows the driver to create a snapshot of the FW PCI crspace and health buffer during a critical FW issue. In case of a FW command timeout, FW getting stuck or a non zero value on the catastrophic buffer, a snapshot will be taken. The snapshot is exposed using devlink, cr-space, fw-health address regions are registered on init and snapshots are attached once a new snapshot is collected by the driver. Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>